Course 4 – System Administration and IT Infrastructure Services

Spread the love

Week 5: Data Recovery & Backups

In the fifth week of this course, we’ll focus on data recovery and backup strategies. Understanding how to backup and recover data is crucial in any tech role, especially for system administrators. Additionally, we’ll explore common corporate practices such as creating a disaster recovery plan and writing post-mortem documentation. By the end of this module, you will grasp the trade-offs between on-site and off-site backups, recognize the significance of backup and recovery testing, explore various data backup options along with their associated risks, and understand the purpose and structure of a disaster recovery plan.

Learning Objectives:

  • Understand the trade-offs between on-site vs. off-site backups
  • Learn what characteristics to evaluate when designing a backup system
  • Understand the value and importance of backup and recovery testing
  • Explore different data backup options and the risks they protect against
  • Comprehend the purpose and contents of a disaster recovery plan

PRACTICE QUIZ: PLANNING FOR DATA RECOVERY

1. How can you recover from an unexpected data loss event? Check all that apply.

  • Write a post-mortem report.
  • Recover data from damaged devices. (CORRECT)
  • Restore data from backups. (CORRECT)
  • Design a disaster recovery plan.

Nice job! If a hard drive or device becomes damaged or fails, you can use specialized software to attempt data recovery. If data becomes corrupted or is accidentally deleted, you can restore it from a backup.

2. Where is it best to store backups, physically?

  • On-site
  • Off-site
  • In a safe
  • Across multiple locations (CORRECT)

Exactly! Ideally, backups should be stored in multiple physical locations to minimize the risk of a single catastrophe causing loss of both data and backups. This typically involves local backups that are then replicated and stored off-site for additional protection.

3. Which of these should be included in your backups? Check all that apply.

  • Firewall configurations (CORRECT)
  • A downloads folder
  • Family vacation photos
  • Sales databases (CORRECT)

Great work! Critical data for an organization, like firewall configs and relevant databases, should be included in the backup plans.

4. What’s magnetic tape backup media best suited for?

  • Long-term archival data  (CORRECT)
  • Low-latency cached data
  • Cheap backup systems
  • Quick and efficient backups

Exactly! Magnetic tape media is an affordable option for storing large amounts of Information. notwithstanding its obtuse recovery speeds and bother get it trump fit for archiving grey information that doesn’t take shop approach

5. Why is it important to test backups and restoration procedures? Check all that apply.

  • To reduce the size of backup data
  • To speed up the backup-and-restore process
  • To ensure backups work and data can be restored from them (CORRECT)
  • To ensure that relevant data is included in the backups (CORRECT)

Exactly! Testing backups and restore procedures is crucial to verify that everything works as expected. It ensures data integrity and confirms that you can successfully restore critical information in case of a disaster. Regular disaster recovery testing helps identify potential gaps in the backup strategy and strengthens the overall resilience of the system without the risk of actual data loss.

6. Which of the following backup types are most space-efficient?

  • Full backups
  • Differential backups
  • Incremental backups (CORRECT)

Exactly! Incremental backups are a great way to save time and storage space. After the initial full backup, they only capture the changes made since the last backup, making them faster and more efficient compared to full backups every time. This helps in reducing the storage overhead while ensuring that recent changes are preserved.

7. True or false: You can use a RAID array and use rsync to copy critical data to it for backups.

  • True
  • False (CORRECT)

That’s right! RAID (Redundant Array of Independent Disks) provides fault tolerance and improves performance by using multiple disks, but it does not protect against data corruption, accidental deletion, or disasters. It’s important to have a separate backup strategy to ensure your data is recoverable in such cases. RAID is useful for maintaining system uptime and data availability, but it shouldn’t be relied upon as a sole solution for data protection.

PRACTICE QUIZ: DISASTER RECOVERY PLANS

1. What elements should a disaster recovery plan cover? Check all that apply.

  • Detection measures (CORRECT)
  • Preventative measures (CORRECT)
  • Recovery measures (CORRECT)
  • Drastic measures (CORRECT)

Absolutely! A comprehensive disaster recovery plan should indeed encompass detection, prevention, and recovery. By incorporating detection measures, such as continuous monitoring systems and incident alerts, you ensure quick awareness of any issues. Preventative measures, such as regular security updates, access control, and redundancy in critical systems, help mitigate the risk of disasters. These steps reduce the likelihood of incidents and help maintain operational resilience, enabling a faster and more effective recovery when issues do arise.

2. Why are detection measures included in a disaster recovery plan?

  • They aren’t
  • Because it’s important to know when a disaster occurs (CORRECT)
  • Because they prevent data loss
  • Because they make recovering from data loss easier

Awesome! The sooner you can be alerted that an incident is going on, the quicker you can take measures to stop it.

3. What are good preventative measures to incorporate in your organization? Check all that apply.

  • Monitoring and alerting systems
  • Redundant systems (CORRECT)
  • Regular backups (CORRECT)
  • Accessible and up-to-date documentation (CORRECT)

Exactly! Well-documented procedures are crucial for preventing confusion and mistakes during critical situations. Clear and detailed documentation ensures that team members know exactly what to do during an outage or failure, minimizing downtime and reducing the risk of errors. It also helps with training, onboarding new staff, and ensuring consistency in response across the team. Good documentation can also assist in post-incident analysis, making it easier to identify what went wrong and how to prevent similar issues in the future.

4. What are good detection measures to incorporate in your organization? Check all that apply.

  • Environmental monitoring (CORRECT)
  • Backing up firewall rules
  • Redundant power supplies
  • System performance monitoring (CORRECT)

You’re absolutely right! Monitoring environmental conditions, such as temperature, humidity, airflow, and power supply, in a server room is vital for maintaining hardware longevity and preventing issues like overheating, which can lead to equipment failures. Additionally, system performance monitoring plays a crucial role in identifying potential threats or performance bottlenecks before they become critical. By tracking metrics like CPU usage, memory, disk activity, and network traffic, you can detect anomalies or unusual spikes in traffic, which could indicate anything from malicious attacks to simple overloading. Early detection in both cases allows for prompt corrective action and helps ensure system reliability.

5. What are good recovery measures to incorporate in your organization? Check all that apply.

  • Restoring server configs from backup (CORRECT)
  • Following detailed recovery plan documentation (CORRECT)
  • Monitoring for internet line outages
  • Maintaining redundant servers

Correct! Maintaining backups of server configs will make restoring a damaged system much quicker and easier. Detailed documentation on how exactly to do this will also speed up this process.

PRACTICE QUIZ: POST-MORTEMS

1. What’s the intent behind writing a post-mortem?

  • To assign blame for mistakes 
  • To assign legal liability
  • To learn from mistakes and improve in the future (CORRECT)
  • To scare people into avoiding risky behavior

Yep! A post-mortem is meant to analyze what happened around an incident to identify what went wrong so it can be avoided in the future.

2. What should the timeline in a post-mortem include? Check all that apply.

  • Actions taken before, during, and after the event
  • A detailed analysis of the incident, including root cause and scope
  • Detailed dates and times (CORRECT)
  • A summary of the incident and how long it lasted

Including timestamps for each action helps with accuracy and provides insight into how quickly issues were detected and addressed, which is essential for improving future responses and overall incident management.

QUIZ: DATA RECOVERY & BACKUPS

1. What’s the optimal recommended backup storage strategy?

  • On-site backups
  • Off-site backups
  • A combination of on- and off-site backups (CORRECT)
  • Tape backups

Yep! Ideally, backups would be stored both on- and off-site to reduce the chances of all your backups being wiped out in a disaster.

2. How can you verify that your disaster recovery plan will be effective? Check all that apply.

  • Through thorough testing (CORRECT)
  • By waiting for a disaster
  • Through disaster simulations (CORRECT)

Great! Testing different elements of your recovery plan will verify that the documented procedures function as intended. This can be done by simulating disaster scenarios and having teams follow the established procedures.

3. What’s the purpose of a post-mortem report?

  • To learn from mistakes (CORRECT)
  • To test systems
  • To identify those in the wrong
  • To assign legal liability

Yep! A post-mortem report is designed to analyze an event in order to learn from mistakes.

4. What is the single most important part of data recovery?

  • Creating power redundancies
  • Effectively backing up data (CORRECT)
  • Port forwarding
  • Stocking replacement drives

Exactly! One of the key techniques you’ll master is how to back up your data effectively. Disaster recovery plans should include routine backups of all critical data essential for your business operations. Without the data to recover, recovery becomes impossible!

5. What are advantages of on-site backups? Check all that apply.

  • There is less bandwidth usage. (CORRECT)
  • There is quicker data access. (CORRECT)
  • Data is more secure because of less outbound traffic. (CORRECT)
  • Data is safe in case of disaster.

Thank you! By avoiding off-site backups, you reduce the amount of outbound bandwidth used.

Great point! One benefit of on-site backup solutions is the physical proximity of the data, which allows for faster access when needed.

Exactly! Increased outbound traffic, even for backups, raises the risk of data being intercepted. On-site backups help mitigate this security vulnerability.

6. Which of these are common backup tools you might consider when designing a backup solution? Check all that apply.

  • Regedit
  • Backup and Restore (CORRECT)
  • Time Machine (CORRECT)
  • Rsync (CORRECT)
 

Thank you! Microsoft offers a first-party solution called Backup and Restore, which supports multiple backup modes, including complete, incremental, and differential backups.

Great! Apple’s first-party backup software, Time Machine, primarily supports incremental backups to efficiently capture changes over time.

Exactly! Rsync is a popular backup tool known for automating backups. It supports compression and SSH, and its main function is transferring files between computers, making it a versatile choice for various backup needs.

7. Cloud services are the ideal backup option for user files. Which of these is not one of today’s popular cloud storage platforms?

  • Rsync (CORRECT)
  • Dropbox
  • Apple iCloud
  • Google Drive

You got it! Rsync is a command line tool used to compress and transfer files from one computer to another. While it is often used for backup, it’s not a cloud service.

8. Which of these are components of a post-mortem report? Check all that apply.

  • Detailed timeline of key events (CORRECT)
  • All log data
  • Explanation of resolution and recovery efforts (CORRECT)
  • Brief summary (CORRECT)

Great! The report should capture all details of the event, including when it began, when the involved parties were notified or became aware, and every action taken to address the situation. Ensure to include time and date for accuracy.

Nice! The report should outline the recovery steps, the reasoning behind each action, and the results of those actions. Providing context helps readers understand how the event unfolded and the decision-making process.

Exactly! Start with a concise summary of the incident, including its nature, duration, impact, and how it was resolved. This helps provide a clear overview of the event before diving into the details.

9. What types of backup schemes are available? Check all that apply.

  • Full backup (CORRECT)
  • Differential backup (CORRECT)
  • Incremental backup (CORRECT)
  • Partial backup

Wohoo! A full backup makes a copy of all files on each run. A differential backup starts with a full backup and only backs up changed files on each subsequent run. An incremental backup also begins with a full backup, but only backs up the portions of files that have changed on future runs.

10. What are some ways you can make your backups more space-efficient? Check all that apply.

  • Use full backups only
  • Use compression (CORRECT)
  • Use encryption
  • Use incremental backups (CORRECT)

Exactly! Compression reduces the storage space needed for backup data, making it more efficient. Incremental backups further enhance efficiency by only storing the changes made since the last backup, rather than backing up all data or files each time. This approach minimizes both storage requirements and backup time.

11. Which backup type only backs up files that have changed since the last run?

  • Full backup
  • Differential backup (CORRECT)
  • Incremental backup
  • Partial backup

You nailed it! A differential backup only saves files that have changed since the last run.

12. Which is an advantages of off-site backups?

  • Data is safe in case of disaster. (CORRECT)
  • There is quicker data access.
  • Data is more secure because of less outbound traffic.
  • There is less bandwidth usage.

Thank you! You’re absolutely right—on-site backups are at risk of being lost in the event of a disaster. Off-site backups provide physical redundancy, ensuring your data remains safe and accessible even if something happens to your on-site systems.

13. Which type of backup only saves copies of files that have been changed or created since the last backup?

  • RAID array
  • Complete backup
  • Differential backups (CORRECT)
  • Incremental backup

Exactly! The main advantage of a differential backup is its efficiency. It avoids duplicating unchanging data by only backing up files that have changed or been created since the last full backup. This helps save storage space while ensuring that only the necessary updates are captured.

14. Which backup method saves copies of all important files and data at each backup?

  • RAID array
  • Complete backup (CORRECT)
  • Incremental backup
  • Differential backups

Thank you! A complete backup involves copying all Information including the full unmodified contents of every file regardless of whether they changed. spell this ensures amp general relief it get work incompetent if through often specifically with great volumes of consistent files

15. A disaster recovery plan is a collection of documented procedures and plans on how to react and handle an emergency or disaster scenario from an operational perspective. What are important elements of a disaster recovery plan? Check all that apply.

  • Disciplinary measures
  • Corrective or recovery measures (CORRECT)
  • Detection measures (CORRECT)
  • Preventative measures (CORRECT)
 

Exactly! Corrective or recovery measures are actions taken after a disaster to restore systems and data, such as restoring lost data from backups or rebuilding damaged systems.

Absolutely! Detection measures are designed to alert your team when a disaster has occurred, allowing for a quick response to minimize its impact on operations.

Spot on! Preventative measures are proactive steps taken before a disaster strikes, such as regular backups and redundant systems. These measures aim to reduce the overall downtime and minimize the impact of a potential disaster.

16. What’s a commonly overlooked part of a post-mortem report?

  • The summary
  • The timeline
  • What went poorly
  • What went well (CORRECT)

Right on! It’s just as important to document the systems that worked correctly and helped to mitigate the disaster!

17. You are performing a network risk assessment to develop your disaster recovery plan. Which of these are examples of corrective or recovery measures? Check all that apply.

  • Redundancy solutions
  • Hardware repair and replacement (CORRECT)
  • Restoring data from backup (CORRECT)
  • Rebuilding and reconfiguring services (CORRECT)

Nice job! In the case of physical damage during a disaster, nonfunctioning parts or devices may need to be replaced.

Nice job! Restoring lost data is a critical part of restoring operations.

Nice job! To restore operations, servers need to be rebuilt and reset, and services need to be restored.

18. When planning a backup strategy, ideally one needs to prioritize important data and only back up what is absolutely necessary for operations. Assuming storage limitations, which of these is LEAST important to back up?

  • Emails
  • User downloads (CORRECT)
  • Databases
  • Financial spreadsheets

Thank you! You’re absolutely right—unnecessary files like photos, games, and other downloads can quickly consume valuable storage space and time. It’s important to focus on backing up only critical data to keep the process efficient and prevent unnecessary strain on your resources

19. You are performing a network risk assessment to develop your disaster recovery plan. Which of these are examples of detection measures? Check all that apply.

  • Monitoring system testing (CORRECT)
  • Testing your own (and users’) knowledge and readiness for disaster (CORRECT)
  • Using an alert system for outages (CORRECT)
  • Conducting regular, automated backups

Thank you! It’s crucial to test your monitoring and alert systems by simulating conditions that they are designed to catch. This ensures that the detection thresholds trigger alerts as intended.

Exactly! Alerts and monitoring are only valuable when everyone knows how to act on them. Regular disaster tests ensure that your team is prepared to respond swiftly and effectively.

Right on! If uptime and availability are critical to your organization, having a thorough system in place that can quickly detect and alert you to service outages is essential to maintaining operational continuity.

20. The unthinkable happens and disaster strikes, crippling your network. You implement your disaster plan, but it doesn’t go smoothly. You decide to investigate. What is a commom term in the IT community for this investigation?

  • Post-mortem (CORRECT)
  • Post-disaster inquiry
  • After-outage analysis
  • Recovery inspection probe

Thank you! Exactly! A post-mortem is a valuable process for documenting any issues that arose during the disaster recovery process, identifying what went wrong, and outlining steps to prevent similar problems in the future. It’s a key part of improving your disaster plan over time.

21. What is the standard medium for long-term archival backup data storage?

  • USB drives
  • Optical disks
  • Magnetic tapes (CORRECT)
  • Floppy disks

Right on! Tape storage is slow but cheap, and has become the standard medium for archival backups.

22. You are performing a network risk assessment to develop your disaster recovery plan. Which of these are examples of preventative measures? Check all that apply.

  • Operational documentation (CORRECT)
  • Regular, automated backups (CORRECT)
  • Alert system for outages
  • Redundancy solutions (CORRECT)
 

Thank you! It’s essential to document every important operational procedure and make sure that all critical steps are accessible, so nothing is overlooked in the event of a disaster.

Exactly! Regular, automated backups are ideal for ensuring that systems are backed up both on-site and off-site, providing robust protection for your data.

Spot on! If something is critical for operations, having a redundant spare is a wise practice, ensuring that you can quickly recover in case of failure.

23. Which of these are part of the five primary elements that make up a post-mortem report? Check all that apply.

  • A summary (CORRECT)
  • Backup procedures
  • A timeline (CORRECT)
  • Resolution and recovery steps (CORRECT)
  • A root cause description (CORRECT)
  • Recommended future action items (CORRECT)

Excellent! The report would include (1) a summary; (2) a detailed timeline of events; (3) an analysis of the root cause; (4) an explanation of steps taken for resolution and recovery; and (5) recommendations to prevent a similar event from occurring again.

24. Budget constraints aside, what is the ideal backup solution?

  • Both on-site and off-site backup (CORRECT)
  • No backup
  • On-site backup
  • Off-site (cloud) backup

Woohoo! It’s often recommended to have both on-site and off-site backups if it’s within your organization’s budget.

25. What are good reasons to do yearly disaster recovery testing? Check all that apply.

  • To create downtime
  • To identify additional vulnerabilities (CORRECT)
  • To allow others with the right access to restore operations (CORRECT)
  • To be prepared for all possible snags or problems (CORRECT)

Testing for Data Loss: Regularly test backup and recovery procedures to ensure that no data is lost during the recovery process. This includes verifying backup integrity and conducting dry-run recoveries to simulate real-life scenarios.

Documentation: Make sure the restoration procedures are well-documented, easily accessible, and regularly updated. This documentation should cover all critical processes and steps for restoring operations. If you’re unavailable, anyone with the proper access and training can step in and restore service.

Regular Testing and Review: Conduct yearly recovery scenario testing to identify any gaps in the plan. This helps ensure that backup files are valid, the recovery process is clear, and all stakeholders are familiar with their roles in case of an emergency. The more thoroughly you test, the better prepared you’ll be.

26. What options are available for storing backups, physically?

  • On-site only
  • Off-site only
  • Both on-site and off-site (CORRECT)
  • There’s no need to store physical backups anymore

By having both on-site and off-site backups, you ensure faster recovery times with local backups, while also protecting against site-wide disasters with remote backups. This dual approach strengthens your disaster recovery strategy.

27. What are some of the purposes of a post-mortem? Check all that apply.

  • To understand the cause of mistakes and how to prevent them (CORRECT)
  • To foster a culture where it’s OK to make mistakes (CORRECT)
  • To learn from and adapt processes for higher efficiency (CORRECT)
  • To shame and punish mistakes

Great work! The purpose of a post-mortem is to understand the root cause of why mistakes occurred and how to prevent them from happening again.

Great work! Sharing post-mortems with other teams within an organization helps foster a culture of learning from mistakes.

Great work! The content in the post-mortem report might spark insights for other teams, helping them realize they have a similar problem in their infrastructure. You may also uncover areas for improvement that fall under the responsibility of teams not directly involved in the incident.

28. Which type of backup only saves the parts of data that have changed within files since the last backup took place?

  • Differential backups
  • Incremental backup (CORRECT)
  • RAID array
  • Complete backup

Correct: Woohoo! An incremental backup only backs up the data that has changed since the last backup, regardless of whether it was a full or incremental backup. This method is more efficient in terms of both disk space and the time required compared to differential backups.

29. Common concerns with backups are disk failure, lack of redundancy, and the necessity of future growth of disk capacity. Which backup method adresses these concerns and has multiple levels of use depending on how you want to prioritize features like performance, capacity, or reliability?

  • Incremental backup
  • Differential backups
  • Complete backup
  • RAID array (CORRECT)

Great work! RAID stands for Redundant Array of Independent Disks. It is a method of using multiple physical disks and combining them into a single virtual disk. There are various RAID configurations, known as levels, each offering different balances of performance, redundancy, and storage capacity.

30. What are the measures included in a disaster recovery plan?

  • Preventative (CORRECT)
  • Detection (CORRECT)
  • Testing
  • Recovery (CORRECT)

Correct: Great job! A disaster recovery plan should incorporate preventative, detection, and recovery measures.

Leave a Comment