INTRODUCTION – Configuration management and monitoring
This module explores cloud services, providing a detailed overview of various models like SaaS, PaaS, and IaaS, and their functionalities. You’ll learn about scaling in the cloud, including both horizontal and vertical scaling, as well as automatic and manual scaling techniques. The module also helps you assess the level of control each cloud model offers, aiding in the selection of the best option for your business needs.
Next, you’ll dive into migration strategies for moving your business to the cloud, with a focus on the lift-and-shift method. The following lesson walks you through instance management in the cloud, covering key elements such as region selection, machine types, and customization options for virtual machines, all of which support scalable deployments. The module also covers cloud deployment automation, including the use of load balancers to distribute requests efficiently, autoscaling for optimal resource use, and the differences between orchestration and automation. Lastly, you’ll explore Infrastructure as Code (IaC), emphasizing the importance of machine-readable configuration files in automating the configuration management process.
Learning Objectives
- Understand and define SaaS, PaaS, and IaaS
- Understand the concept of scaling in the cloud and the different types of scaling
- Explain the lift-and-shift approach for cloud migration
- Learn how to deploy instances in the cloud
- Differentiate between auto scaling and load balancing
- Understand the distinction between orchestration and automation
- Define Infrastructure as Code (IaC) and its role in automation
PRACTICE QUIZ: AUTOMATION AT SCALE
1. What is IaC (Infrastructure as Code)?
- Writing a program from the outside in
- Programs for industrial use
- Hardware-based programming with FPGAs
- Using a version control system to deploy and manage node configurations (CORRECT)
IaC goes hand in hand with continuous delivery.
2. What is the principle called when you think of networked machines as interchangeable resources instead of individual machines?
- “Flexible deployment”
- The “Borg” principle
- Treating computers like “cattle instead of pets” (CORRECT)
- The principle of “group operation”
This means no node is irreplaceable and configuration is automated.
3. What benefits can we gain by using automation to manage our configuration? (Check all that apply)
- Consistency (CORRECT)
- Simplicity
- Reliability (CORRECT)
- Scalability (CORRECT)
Exactly! Automation eliminates the need for human intervention, ensuring that processes are executed consistently and reliably each time, as long as the conditions remain unchanged. This consistency is essential for building trust in systems and processes.
Absolutely! Consistency is one of the key benefits of automation. Once a process is automated and proven to work, you can count on it to perform the same way every time, minimizing the risk of errors and improving efficiency.
That’s right! A scalable system is designed to grow and adapt to changing demands. It can seamlessly handle additional tasks or resources, providing flexibility and efficiency without compromising performance.
4. Puppet is a commonly used configuration management system. Which of the following applications are also used for configuration management?
- Valgrind
- Chef (CORRECT)
- Ansible (CORRECT)
- CFEngine (CORRECT)
Chef: Chef is a configuration management system that treats infrastructure as code. It uses Ruby-based DSL (domain-specific language) to define system configurations and manage deployments, making it particularly well-suited for large-scale systems. Chef automates the provisioning and management of servers, allowing for infrastructure automation, scalability, and consistency.
Ansible: Ansible is an open-source tool that simplifies IT configuration management, deployment, and orchestration. Unlike Chef, Ansible uses YAML-based playbooks, which are easy to read and write. It allows users to automate tasks such as application deployment, configuration management, and cloud provisioning, making it a great choice for handling a variety of automation challenges with minimal overhead.
CFEngine: CFEngine is an open-source configuration management program that focuses on automating the configuration, maintenance, and compliance of large-scale computing networks. It supports a wide range of devices, including cloud environments, desktops, consumer electronics, and embedded systems. CFEngine’s approach emphasizes security, scalability, and high performance, making it suitable for complex infrastructure setups.
5. A network administrator is accustomed to manually configuring the 5 nodes on the network he manages. However, the company he works for is quickly growing, and there are plans to expand the network to 200 nodes. What is the most reasonable course of action for the network administrator?
- Prepare to manually configure 200 nodes
- Hire more network techs
- Ask for a reduction in planned nodes to simplify configuration
- Prepare scripts or download software for automated configuration (CORRECT)
We have the option to either write our own automation scripts or use configuration management software to enhance the scalability of our network by pushing changes from a centralized control server.
PRACTICE QUIZ: INTRODUCTION TO PUPPET
1. A Puppet agent inspects /etc/conf.d, determines the OS to be Gentoo Linux, then activates the Portage package manager. What is the provider in this scenario?
- /etc/conf.d
- Portage (CORRECT)
- Gentoo Linux
- The Puppet agent
The Portage package manager used by Gentoo Linux is the provider called by the Puppet agent.
2. Which of the following examples show proper Puppet syntax?
class AutoConfig {
package { 'Executable':
ensure => latest,
}
file { 'executable.cfg':
source => 'puppet:///modules/executable/Autoconfig/executable.cfg'
replace => true,
}
service { 'executable.exe':
enable => true,
ensure => running,
}
}
(CORRECT)
class AutoConfig :
package ''Executable':
ensure => latest,
file 'executable.cfg':
source => 'puppet:///modules/executable/Autoconfig/executable.cfg'
replace => true,
service 'executable.exe':
enable => true,
ensure => running,
class AutoConfig {
package { 'Executable':
ensure == latest,
}
file { 'executable.cfg':
source == 'puppet:///modules/executable/Autoconfig/executable.cfg'
replace == yes,
}
service { 'executable.exe':
enable == yes,
ensure == true,
}
}
class AutoConfig {
package { 'Executable':
assure=> latest,
}
file { 'executable.cfg':
origin=> 'puppet:///modules/executable/Autoconfig/executable.cfg'
substitute=> true,
}
program{ 'executable.exe':
activate => true,
assure => running,
}
}
3. What is the benefit of grouping resources into classes when using Puppet?
- Providers can be specified
- Configuration management is simplified (CORRECT)
- The title is changeable
- Packages are not required
Grouping related resources into a single class simplifies configuration management by enabling us to apply a single class to each host, rather than specifying individual resources for each host separately, which could lead to missing some resources.
4. What defines which provider will be used for a particular resource?
- Puppet assigns providers based on the resource type and data collected from the system. (CORRECT)
- A menu allows you to choose providers on a case-by-case basis.
- The user is required to define providers in a config file.
- Puppet uses an internet database to decide which provider to use.
Great! Puppet assigns providers based on predefined rules for the resource type, as well as data collected from the system, such as the operating system family.
5. In Puppet’s file resource type, which attribute overwrites content that already exists?
- Purge
- Overwrite
- Replace (CORRECT)
- Save
Puppet has many useful attributes. “Replace” set to True tells Puppet to replace files or symlinks that already exist on the local system but whose content doesn’t match what the source or content attribute specifies.
6. What is the most basic unit for modeling in Puppet?
- package
- title
- resource (CORRECT)
- file
The most basic unit in Puppet is a resource, such as user, group, file, service or package
7. What is the advantage of grouping related resources into a single class?
- To ensure efficiency and convenience for future changes (CORRECT)
- It is required by Puppet
- To keep computer clocks synchronized
- To prevent errors
By grouping related resources together, we can simplify the configuration, making it easier to understand and modify in the future.
PRACTICE QUIZ: THE BUILDING BLOCKS OF CONFIGURATION MANAGEMENT
1. How is a declarative language different from a procedural language?
- A declarative language defines the goal; a procedural language defines the steps to achieve a goal. (CORRECT)
- Declarative languages are object-based; procedural languages aren’t.
- Declarative languages aren’t stateless; procedural languages are stateless.
- A declarative language defines each step required to reach the goal state.
Exactly! In a declarative language, the focus is on defining the desired end state, rather than specifying the exact steps to reach that state.
2. Puppet facts are stored in hashes. If we wanted to use a conditional statement to perform a specific action based on a fact value, what symbol must precede the facts variable for the Puppet DSL to recognize it?
- @
- #
- $ (CORRECT)
- &
Nice job! All variable names are preceded by a dollar sign in Puppet’s DSL.
3. What does it mean when we say that Puppet is stateless?
- Puppet retains information between uses.
- An action can be performed repeatedly without changing the system after the first run.
- There is no record of previous interactions. (CORRECT)
- Actions are taken only when they are necessary to achieve a goal.
Awesome! Each interaction request has to be handled based entirely on information that comes with it.
4. What does the “test and repair” paradigm mean in practice?
- There is no state being kept between runs of the agent.
- We should plan to repeatedly fix issues.
- We need to test before and after implementing a fix.
- We should only take actions when testing determines they need to be done to reach the requested state. (CORRECT)
Thanks! By checking if a resource needs modification before making changes, we can save valuable time and avoid unnecessary updates.
5. Where, in Puppet syntax, are the attributes of a resource found?
- Inside the curly braces after the resource type (CORRECT)
- In brackets after the if statement
- After ensure =>
- After the dollar sign ($)
Woohoo! We specify the package contents inside the curly braces, placed after the package title.
6. What is a fact in Puppet?
- A variable representing characteristics of a system (CORRECT)
- A type of parameter
- A type of resource
- A variable representing packages
Thank you! A fact is a hash that holds information about the specifics of a particular system, such as its operating system, IP addresses, or hardware details.
7. What does idempotent mean?
- There is no state being kept between runs of the agent
- We declare the state we want to achieve before running
- An action is performed a new way each time
- An action can be performed repeatedly without changing the system after the first run (CORRECT)
Way to go! We can use an attribute like onlyif to make sure a file is changed only if it exists.
PRACTICE QUIZ: DEPLOYING PUPPET LOCALLY
1. Puppet evaluates all functions, conditionals, and variables for each individual system, and generates a list of rules for that specific system. What are these individual lists of rules called?
- Manifests
- Dictionaries
- Catalogs (CORRECT)
- Modules
Exactly! The catalog is a list of rules for each system, generated after the server evaluates all variables, conditionals, and functions in the manifest, and compares them with the system’s facts.
2. After we install new modules that were made and shared by others, which folder in the module’s directory will contain the new functions and facts?
- files
- manifests
- lib (CORRECT)
- templates
Thanks! New functions added after installing a new module can typically be found in the lib folder within the directory of the newly installed module.
3. What file extension do manifest files use?
- .cfg
- .exe
- .pp (CORRECT)
- .man
Excellent! Manifest files for Puppet will end in the extension .pp.
4. What is contained in the metadata.json file of a Puppet module?
- Manifest files
- Additional data about the module (CORRECT)
- Configuration information
- Pre-processed data
Thank you! Metadata is essentially data about data, and in this context, it often includes installation and compatibility details for the module.
5. What does Puppet syntax dictate we do when referring to another resource attribute?
- Enter the package title before curly braces
- Follow the attribute with a semicolon
- Capitalize the attribute (CORRECT)
- Type the attribute in lowercase
Great work! When defining resource types, we write them in lowercase, then capitalize them when referring to them from another resource attribute.
6. Which of the following file extensions does the manifest file need to end with in Puppet?
- .cfg
- .pp (CORRECT)
- .db
- .mf
Awesome! Manifest files are where we store the rules to be applied.
7. When we declare a resource type, how do we differentiate between the original resource type and the name of a resource relationship being referenced in another resource?
- Use “==” in place of “=>”.
- Assign it a variable name.
- Use $ before the resource type.
- Use lowercase for the original, and capitalize the resource name when referencing a relationship. (CORRECT)
Thanks! When declaring resources, we use lowercase for the resource type. However, when referencing a resource relationship from another file, we capitalize the name of the resource being referenced.
8. What do we call a collection of manifests, and folders containing associated data?
- Libraries
- Module (CORRECT)
- Template
- Metadata
Great work! A module is an easy way to organize our configuration management tools.
PRACTICE QUIZ: DEPLOYING PUPPET TO CLIENTS
1. When defining nodes, how do we identify a specific node that we want to set rules for?
- By using the machine’s MAC address
- By specifying the node’s Fully Qualified Domain Names (FQDNs) (CORRECT)
- User-defined names
- Using XML tags
Right on! A FQDN is a complete domain name for a specific machine that contains both the hostname and the domain name.
2. When a Puppet agent evaluates the state of each component in the manifest, it uses gathered facts about the system to decide which rules to apply. What tool can these facts be “plugged into” in order to simplify management of the content of our Puppet configuration files?
- Node definitions
- Certificates
- Templates (CORRECT)
- Modules
Nice job! Templates are documents that combine code, system facts, and text to render a configuration output fitting predefined rules.
3. What is the first thing that happens after a node connects to the Puppet master for the first time?
- The node identifies an open port.
- The Puppet-master requests third-party authentication.
- The node requests a certificate. (CORRECT)
- The user can immediately add modules.
Awesome! After receiving a certificate, the node will reuse it for subsequent logins.
4. What does FQDN stand for, and what is it?
- Feedback Query Download Noise, which is extraneous data in feedback queries
- Far Quantum Data Node, which is a server node utilizing quantum entanglement
- Fairly Quantized Directory Network, which is a network consisting of equitable counted folders
- Fully Qualified Domain Name, which is the full address for a node (CORRECT)
Thank you! A fully qualified domain name (FQDN) is the complete, unambiguous name for a specific computer or server. It consists of two main parts: the hostname and the domain name.
5. What type of cryptographic security framework does Puppet use to authenticate individual nodes?
- Single Sign On (SSO)
- Public Key Infrastructure (PKI) (CORRECT)
- Fully Qualified Domain Name (FQDN)
- Token authentication
Way to go! Puppet uses an Secure Sockets Layer (SSL) Public Key Infrastructure to authenticate both nodes and masters.
6. In Puppet, what can we use to categorize in order to apply different rules to different systems?
- Node definitions (CORRECT)
- Manifest file
- Array configuration
- Template
Thanks! Different types of nodes are defined, enabling the application of specific sets of rule catalogs to different kinds of machines.
7. What is the purpose of the Certificate Authority (CA)?
- To test rules in the manifest
- To manage templates
- To validate the identity of each machine (CORRECT)
- To handle push/pull requests
Thank you! The Certificate Authority (CA) either queues a certificate request for manual validation or uses pre-shared data for verification before issuing the certificate to the agent.
8. What kind of security encryption is used when the Puppet Certificate Authority validates the identity of a node?
- Secure Sockets Layer (SSL) (CORRECT)
- Secure Shell (SSH)
- Pretty Good Privacy (PGP)
- Transport Layer Security (TLS)
Thanks! The Certificate Authority generates an SSL key for the agent machine and creates a certificate request to initiate the process.
PRACTICE QUIZ: UPDATING DEPLOYMENTS
1. What is a production environment in Puppet?
- The software used for software development such as IDEs.
- The parts of the infrastructure where a service is executed, and served to its users. (CORRECT)
- A cloud service for commercial production.
- A Virtual Machine reserved for beta software.
Thank you! In Puppet, environments are used to isolate software in development from the software that is being served to end users, ensuring that changes are tested before deployment.
2. What is the –noop parameter used for?
- Passing a variable called noop to Puppet
- Adding conditional rules to manifests
- Defining what operations not to perform in a manifest
- Simulating manifest evaluation without taking any actions (CORRECT)
Nice job! No Operations mode makes Puppet simulate what it would do without actually doing it.
3. What do rspec tests do?
- Checks that nodes can connect to the puppet master correctly
- Check the specification of the current node
- Check the manifests for specific content (CORRECT)
- Checks that the node is running the correct operating system
Thanks! We can automatically test our manifests using rspec tests, where we can verify that resources exist and ensure their attributes are set to the correct values.
4. How are canary environments used in testing?
- To store unused code
- As a test environment to detect problems before they reach the production environment (CORRECT)
- As a repository for alternative coding methods for a particular problem
- As a test environment for final software versions
Woohoo! If we can identify a problem before it reaches all the machines in the production environment, we’ll be able to keep the problem isolated.
5. What are efficient ways to check the syntax of the manifest? (Check all that apply)
- Run full No Operations simulations (CORRECT)
- Run rspec tests (CORRECT)
- Test manually
- puppet parser validate (CORRECT)
For No Operations simulations, we use the –noop parameter when running the rules to simulate changes without applying them.
To test automatically, we run rspec tests and fix any errors in the manifest until the tests pass.
Using the puppet parser validate command is indeed the simplest way to check that the syntax of the manifest is correct before applying it.
6. What does the puppet parser validate command do?
- Checks the syntax of the manifest. (CORRECT)
- Runs full No Operations simulations.
- Tests automatically using facts we set to evaluate the resulting catalog.
- Forcibly applies manifests locally.
Great work! The puppet parser validate command checks the syntax of the manifest to make sure it’s correct.
7. What is the purpose of using multiple environments?
- To fully isolate the configurations that agents see. (CORRECT)
- To automate testing.
- To add variety.
- To detect potential issues before they reach the other computers.
Exactly! By creating separate directories for different purposes, like testing and production, we can isolate changes and ensure that updates or modifications don’t impact end users.
PRACTICE QUIZ: MONITORING & ALERTING
1. What is a Service Level Agreement?
- An agreement between the user and developer.
- A strict commitment between a provider and a client. (CORRECT)
- An agreement between service providers.
- A guarantee of service quality.
Thanks! A service-level agreement (SLA) is a formal arrangement between two or more parties, typically between a client and service providers, that outlines the expected level of service and performance.
2. What is the most important aspect of an alert?
- It must be actionable. (CORRECT)
- It must require a human to be notified.
- It must require immediate action.
- It must precisely describe the cause of the issue.
Right on! If an alert notification is not actionable, it should not be an alert at all.
3. Which part of an HTTP message from a web server is useful for tracking the overall status of the response and can be monitored and logged?
- A triggered alert
- The data pushed back to the client
- Metrics sent from the server
- The response code in the server’s message (CORRECT)
Nice job! We can log and monitor these response codes, and even use them to set alert conditions.
4. To set up a new alert, we have to configure the _____ that triggers the alert.
- Condition (CORRECT)
- Metric
- Incident
- Service Level Objective (SLO)
Excellent! We must define what occurence or metric threshold will serve as a conditional trigger for our alert.
5. When we collect metrics from inside a system, this is known as ______ monitoring.
- White-box (CORRECT)
- Black-box
- Network
- Log
Great work! A white-box monitoring system is one that collects metrics internally, from within the system being monitored.
6. Which of the following monitoring models is being used if our monitoring system requires our service to actively send metrics?
- Push model (CORRECT)
- Pull model
- Error monitoring
- Resource monitoring
Awesome! When push monitoring is used, the service being monitored actively sends metrics to the monitoring system.
7. What do we call an alert that requires immediate attention?
- Ticket
- Page (CORRECT)
- Cron job
- Bug report
Nice job! Pages are alerts that need immediate human attention, and are often in the form of SMS or email.
8. If our service has a Service Level Objective (SLO) of four-nines, what is our error budget measured in downtime percentage?
- .001%
- 1%
- .1%
- .01% (CORRECT)
Nice job! If we have an SLO of 99.99%, that gives us an error budget of .01%.
9. What type of policy requires us to set up a condition which notifies us when it’s triggered?
- Login Policy
- Alerting Policy (CORRECT)
- Security Policy
- Bug Reporting Policy
Thank you! An Alerting Policy defines the conditions that trigger alerts and outlines the actions to be taken, such as sending a notification to an email address when those conditions are met.
PRACTICE QUIZ: TROUBLESHOOTING & DEBUGGING
1. Which of the following are valid strategies for recovery after encountering service failure? (Select all that apply.)
- Switching to a secondary instance. (CORRECT)
- Setting up monitoring and alerts.
- Restoring from backup. (CORRECT)
- Performing a rollback to a previous version. (CORRECT)
Having a secondary instance of the VM running your service allows for quick recovery by switching over if needed.
Keeping frequent backups ensures that restoring a previous VM image can help you quickly get back on track.
If the issue is due to recent changes or updates, rolling back to a previous working version of the service or supporting software gives you the time to investigate and resolve the issue.
2. Which of the following concepts provide redundancy? (Select all that apply.)
- Having a secondary instance of a VM. (CORRECT)
- Having a secondary Cloud vendor. (CORRECT)
- Having automatic backups configured.
- Performing a rollback.
If your primary VM instance fails, having a secondary instance running in the background ready to take over ensures instant failover and minimizes downtime.
Keeping a secondary Cloud service provider with your data ready can offer redundancy, providing an additional layer of protection in case of large-scale outages with your primary provider.
3. If you operate a service that stores any kind of data, what are some critical steps to ensure disaster recovery? (Select all that apply)
- Implement automated backups (CORRECT)
- Use redundant systems wherever possible
- Test backups by restoring (CORRECT)
- Never delete old backups
As long as we have viable backup images, we can restore the VM running our service quickly in case of failure.
It’s crucial to ensure that our backup process is working correctly. In a recovery situation, having no backups would be a major issue, so regular testing and verification of backups is essential.
4. What is the correct term for packaged applications that are shipped with all needed libraries and dependencies, and allows the application to run in isolation?
- Rollback
- Secondary instance
- Containers (CORRECT)
- Disk Image
Great job! Containerization ensures that our software runs the same way every time.
5. Using a large variety of containerized applications can get complicated and messy. What are some important tips for solving problems when using containers? (Select all that apply)
- Use extensive logging in all parts (CORRECT)
- Reduce the number of containers
- Reuse container configurations
- Use test instances (CORRECT)
By ensuring we have the right logs in the right places, we can quickly identify where problems are occurring and address them efficiently.
Testing and retesting our configuration at every opportunity is key to making sure everything is working properly and preventing future issues.
6. Which of the following is a valid method of troubleshooting a cloud service? (Select all that apply)
- Physically inspect the machine’s connections.
- Power cycle the hardware
- Run a test VM in a test environment (CORRECT)
- Call the service provider (CORRECT)
Testing through software is indeed the best approach in the cloud, as it allows for automation and consistency in verifying that everything works as expected.
One of the key benefits of running services in the Cloud is that you’re not responsible for everything! Cloud providers offer various levels of support, allowing you to focus on your core operations while they handle much of the infrastructure and maintenance.
7. When troubleshooting, what is it called when an error or failure occurs, and the service is downgraded to a previous working version?
- Reinstall
- Rollback (CORRECT)
- Restore
- Redo
Great work! Rollback is the process of restoring a database or program to a previously defined state, usually to recover from an error.
8. Which of the following are important aspects of disaster recovery? (Select all that apply)
- Having multiple points of redundancy (CORRECT)
- Having a well-documented disaster recovery plan (CORRECT)
- Having automatic backups (CORRECT)
- Eliminating failure in the first place
Implementing multiple forms of redundancy and failover minimizes the impact of failures, ensuring continued operation even in challenging situations.
To get things up and running quickly, having a detailed plan in place is essential for swift and efficient recovery.
Automatic backups simplify the recovery process, making it easier to restore services without delays or manual intervention.
CONCLUSION – Configuration management and monitoring
This module has indeed provided a thorough understanding of cloud services and their diverse functionalities, including SaaS, PaaS, and IaaS. You’ve learned about scaling techniques, such as horizontal vs. vertical scaling, as well as automatic vs. manual scaling, helping you choose the best approach for your specific needs. You’ve gained insight into the varying levels of control provided by different cloud service models, which will guide your decision-making for business requirements.
Additionally, you now have a solid understanding of migration strategies, including lift-and-shift approaches, for transitioning businesses to the cloud. The module has also equipped you with practical skills for managing cloud instances, such as selecting regions and customizing virtual machines for scalable deployments. You’ve explored the automation of cloud deployments, covering critical concepts like load balancing, autoscaling, orchestration, and infrastructure as code (IaC).
With the knowledge and skills you’ve gained from this module, you are now well-prepared to leverage cloud services to enhance scalability, efficiency, and automation in your business operations.