Week 4: Verify and Report on your Cleaning Results
VERIFY AND REPORT ON YOUR CLEANING RESULTS INTRODUCTION
The verifying and reporting of your data cleaning efforts will comply with the Google’s Data Analytics Professional Certificate Program’s standard on Coursera. You will learn in this course how to obtain and report the results from your data-cleaning processes correctly and why it is necessary. Through applied activities on real-life datasets, you will learn through hands-on working experience the demonstration of various data-cleaning techniques and documenting what you are doing and why you are doing it.
Cleary, concisely writing the summary will be another requirement of mastering one’s skills so that others would understand the reason why specific data-cleaning activities were undertaken and the rationale behind these activities. Verification and reporting allow the analysis to be reliable and trustable for other users of the data in the future.
Learning Objectives:
Understand the process of verifying data-cleaning results
Learn the various steps required to manually clean data
Discover and know what data-cleaning reports must comprise
Identify the benefits of documenting the data-cleaning processes.
TEST YOUR KNOWLEDGE ON MANUAL DATA CLEANING
1. Making sure data is properly verified is an important part of the data-cleaning process. Which of the following tasks are involved in this verification? Select all that apply
Considering whether the data is credible and appropriate for the project. (Correct)
Manually fixing any errors found in the data. (Correct)
Rechecking the data-cleaning effort. (Correct)
Asking stakeholders to check and confirm the data is clean.
Correct: Verification makes sure that data cleaning has been done satisfactorily, and the final outcome is true and trustworthy. To verify data, analysts go back to the changes made on their earlier cleaning steps, manually correct the remaining errors, and check the credibility and appropriateness of input data for the project in question. This ensures that they can trust the data since it is good and appropriate for analysis or reporting.
2. Fill in the blank: To count the total number of spreadsheet values within a specified range, a data analyst uses the _____ function.
COUNTA (Correct)
SUM
WHOLE
TOTAL
Correct: Among the functions that a data analyst can apply in counting the total number of spreadsheet values within a specific given range is COUNTA. The COUNTA function counts all non-empty cells within the specified range whether they be numbers, text, or any other form of data and can come in handy when one wants to find out how many total entries their data set might have. For counting specifically the numeric values in the range, one does not use COUNTA, but one uses COUNT.
3. A data analyst is cleaning a dataset with inconsistent formats and repeated cases. They use the TRIM function to remove extra spaces from string variables. What other tools can they use for data cleaning? Select all that apply.
Import data
Remove duplicates (Correct)
Protect sheet
Find and replace (Correct)
Correct: Data cleaning can also be conducted by the analyst using TRIM function, the REMOVE duplicates, and the FIND AND REPLACE functions.
4. To correct a typo in a database column, where should you insert a CASE statement in a query?
As an ORDER BY clause
As a GROUP BY clause
As a SELECT clause (Correct)
As a FROM clause
Correct: The SELECT clause should comprise a CASE statement. The CASE statement checks for one or more conditions and returns a value once it finds a satisfied condition. The typo would work as the condition, and the returned value will be when the condition becomes true.
TEST YOUR KNOWLEDGE ON DOCUMENTING THE CLEANING PROCESS
1. Why is it important for a data analyst to document the evolution of a dataset? Select all that apply.
To determine the quality of the data (Correct)
To identify best practices in the collection of data
To inform other users of changes (Correct)
To recover data-cleaning errors (Correct)
Correct: Tracking the history of a dataset is very important in recovering data-cleaning errors, informing its users about the changes made, and evaluating the quality of the dataset.
2. Fill in the blank: While cleaning data, documentation is used to track _____. Select all that apply.
deletions (Correct)
errors (Correct)
bias
changes (Correct)
Correct: During the data-cleaning procedure, documentation is essential for tracking changes made to the data, document deletions, as well as errors.
3. Documenting data-cleaning makes it possible to achieve what goals? Select all that apply.
Demonstrate to project stakeholders that you are accountable (Correct)
Visualize the results of your data analysis
Be transparent about your process (Correct)
Keep team members on the same page (Correct)
Correct: So, through recording your data cleaning process, you achieve transparency in this activity as well, to keep team members aligned, as well as letting key decision makers in the project to see into the ‘how’ of what you are doing.
PROCESS DATA FROM DIRTY TO CLEAN WEEKLY CHALLENGE 4
1. The data collected for an analysis project has just been cleaned. What are the next steps for a data analyst? Select all that apply.
Certification
Reporting (Correct)
Verification (Correct)
Validation
Correct: Once the data has been cleaned, the next step for a data analyst is verification and then reporting.
2. What is the first step in the verification process?
Compare cleaned data with the original, uncleaned dataset and compare it to what is there now (Correct)
Create a chronological list of modifications made to the data
Determine the quality of the data
Inform others of your data-cleaning effort
Correct: To begin the verification process, a comparison is initially made between the cleaned data and the original unclean data, followed by an evaluation of the changes that occurred.
3. Fill in the blank: TRIM is a function that removes _____ spaces in data. Select all that apply.
Trailing (Correct)
Leading (Correct)
repeated (Correct)
inner
Correct: TRIM – this is a method for trimming leading, trailing, and unnecessary spaces within any two words in a data.
4. While verifying cleaned data, a data analyst encounters a misspelled name. Which function can they use to determine if the error is repeated throughout the dataset?
CHECK
COUNTA (Correct)
COUNT
CASE
Correct: COUNTA can be used to determine whether the error propagates the entire data set.
5. A WHEN statement considers one or more conditions and returns a value as soon as that condition is met.
True
False (Correct)
Correct: A CASE statement checks for one or more conditions and returns the respective result as soon as a condition gets satisfied.
6. Fill in the blank: Documentation is the process of tracking _____ during data cleaning. Select all that apply.
inactivity
deletions (Correct)
changes (Correct)
additions (Correct)
Correct: It includes tracking the changes, additions, deletions, errors while cleaning the data.
7. Fill in the blank: While cleaning data, a data analyst can use a changelog to keep a chronological list of changes they make. They can refer to it during the _____ period if there are errors or questions.
verification (Correct)
visualization
presenting
documentation
Correct: A data analyst can maintain a chronological list of the changes they have made using a changelog while cleaning the data. This can serve as a reference during the verification in case any faults or questions arise later.
8. Reviewing version history is an effective way to view a changelog in SQL.
True
False (Correct)
Correct: The old-time easy way such review history versions is that it is possible to access a change-log within spreadsheets.
9. Fill in the blank: Once data is clean, a data analyst moves on to _____ and verification.
processing
publishing
reporting (Correct)
confirming
Correct: A data analyst confirms and reports after data cleaning.
10. A data analyst is in the verification step. They consider the business problem, the goal, and the data involved in their analytics project. What scenario does this describe?
Visualizing the data
Seeing the big picture (Correct)
Reporting on the data
Considering the stakeholders
Correct: As a comprehensive perspective, business problem, goal and data need to be considered while verifying data cleaning.
11. Which of the following functions automatically remove extra spaces when cleaning data?
SNIP
REMOVE
CLEAR
TRIM (Correct)
Correct: TRIM clears extra spaces while cleaning data – leading, trailing, and even redundant spaces.
12. While verifying cleaned data, a data analyst encounters a misspelled name. Which function can they use to determine if the error is repeated throughout the dataset?
COUNTA (Correct)
COUNT
CHECK
CASE
Correct: To find out whether an error is recurring across the dataset, they might apply the COUNTIF function to count the incidence of any particular value or condition in the database. COUNTA sums all the non-empty cells, but it is not completely devoted to identification of any error.
13. A data analyst uses a changelog while cleaning data. What process does a changelog support?
Documentation (Correct)
Illumination
Disclosure
Examination
Correct: A changelog supports documentation.
14. Verification and reporting come directly before the data-cleaning process.
True
False (Correct)
Correct: Verification and reporting follow data cleaning activities.
15. Which function removes leading, trailing, and repeated spaces in data?
TRIM (Correct)
CROP
TIDY
CUT
Correct: The TRIM is the function that eliminates leading spaces, trailing spaces or extra spaces between words from the data.
16. Which SQL tool considers one or more conditions, then returns a value as soon as a condition is met?
CASE (Correct)
WHEN
THEN
ELSE
Correct: In short, a case statement checks one or multiple conditions and returns a value once any specified condition is satisfied.
17. Fill in the blank: A changelog contains a _____ list of modifications made to a project.
approximate
random
synchronized
chronological (Correct)
Correct: A data analyst accesses all required information via a changelog. Essentially, a changelog is taking form inside a record as excels her chronological compendium of a project: changes made.
18. A data analyst makes changes to SQL queries and uses these comments to create a changelog. This involves specifying the changes they made and why they made them.
True (Correct)
False
Correct: Documenting changes in SQL queries, along with comments to create a changelog, in turn entails keeping a record of the changes made along with the reason behind each one.
19. What is involved in seeing the big picture when verifying data cleaning? Select all that apply
Consider the business problem (Correct)
Consider the data (Correct)
Consider the goal (Correct)
Consider the reporting
Correct: To acquire a full-formed understanding for the verification of data cleaning, look at the problem – the business problem, goal, and data. When these are aligned, one can be assured that the cleaned data match project objectives and deliver insights that matter.
20. Fill in the blank: TRIM is a function that removes _____ spaces in data. Select all that apply.
Leading (Correct)
Repeated (Correct)
inner
trailing (Correct)
Correct: TRIM is a function that removes leading, trailing, and repeated spaces in data.
21. What is the process of tracking changes, additions, deletions, and errors during data cleaning?
Documentation (Correct)
Cataloging
Recording
Observation
Correct: Documenting the process involves changes, additions, deletions, and errors made within the process of data cleaning itself.
22. At what point during the analysis process does a data analyst use a changelog?
While cleaning the data (Correct)
While visualizing the data
While gathering the data
While reporting the data
Correct: A data analyst uses a changelog while cleaning data.
23. A data analyst is starting a large scale project. The project will be crucial to business success and the data analyst needs to keep the big picture at the forefront when verifying their data cleaning. What is the first step in the verification process?
Determine the quality of the data
Compare cleaned data with the original, uncleaned dataset and compare it to what is there now (CORRECT)
Create a chronological list of modifications made to the data
Inform others of the data-cleaning effort
24. During the verification process, you find that you missed a few leading spaces during data cleaning. What function can you use to eliminate these spaces?
TIDY
TRIM (CORRECT)
CROP
CUT
25. What tool can a data analyst use to figure out how many identical errors occur in a dataset?
CONFIRM
CASE
COUNT
COUNTA (CORRECT)
26. You find a few misspellings in your datatable and need to correct them when running a query. What function can you use when your set condition is met?
CASE (CORRECT)
THEN
WHEN
ELSE
27. A data analyst uses a changelog while cleaning their data. What data modifications should they track in the changelog?
Changes, resolutions, and deletions
Errors, deletions, and notes (CORRECT)
Errors, additions, and deletions
Additions, changes, and queries
28. Fill in the blank: A process to confirm that a data-cleaning effort was well-executed and the resulting data is accurate and reliable is known as _____.
manipulation
publishing
verification (CORRECT)
processing
29. What is the first step in the verification process?
Inform others of your data-cleaning effort
Compare cleaned data with the original, uncleaned dataset and compare it to what is there now (CORRECT)
Create a chronological list of modifications made to the data
Determine the quality of the data
30. During data cleaning, you find an error in a username where the ID number was accidentally joined to the user’s last name. You need to figure out if this username has been entered incorrectly more than once in your dataset. If you use a pivot table, what function can you use to determine the number of times this error occurs in your dataset?
COUNT
CASE
CHECK
COUNTA (CORRECT)
31. Fill in the blank: A data analyst uses the CASE statement to consider one or more _____, then return a value.
changes
fields
identifications
conditions (CORRECT)
32. Fill in the blank: While cleaning data, a data analyst can use a changelog to keep a chronological list of changes they make. They can refer to it during the _____ period if there are errors or questions.
documentation
presenting
verification (CORRECT)
visualization
33. A data analyst is reviewing modifications made to a SQL table and a spreadsheet. The data analyst will get similar results when using the changelogs for both data sources.
True (CORRECT)
False
34. Fill in the blank: A data analyst finishes cleaning their data. The next step in the process is reporting and ____.
verification (CORRECT)
manipulation
replacing
processing
35. A data analyst is starting a large scale project that is crucial to business success. The data analyst needs to remember the big picture when verifying their data cleaning. What is involved when focusing on the big picture-view of the project? Select all that apply.
Consider the stakeholders
Consider the reporting
Consider the business problem (CORRECT)
Consider the goal (CORRECT)
36. Your manager points out an error in a product ID number in your dataset. The Product IDs can be numbers like 42 or text like “CAD-425”. Using a pivot table, what function can you use to find how many times this error occurs in the dataset?
CASE
CHECK
COUNT
COUNTA (CORRECT)
37. A data analyst is in the verification process and needs to verify the modifications that they have made to the data. What could the analyst reference to find the changes they made throughout data cleaning?
Changelog (CORRECT)
Metadata
Spreadsheet
Notepad
38. A data analyst uses the COUNTA function to count which of the following?
The total number of values within a specified range (CORRECT)
The total number of headers in a specific range
The specific numbers in a dataset
The total number of entries in a changelog
39. You’re working with a dataset that contains categorical variables. You notice that some of the strings are misspelled or are not capitalized. What function can you use to fix these errors when a condition is met?
CASE (CORRECT)
THEN
WHEN
ELSE
40. Fill in the blank: Documentation is the process of tracking _____ during data cleaning. Select all that apply.
inactivity
changes (CORRECT)
additions (CORRECT)
deletions (CORRECT)
41. Fill in the blank: As a data analyst, you should always create a _____ to track your additions, deletions, errors, and changes to a query.
notepad
spreadsheet
changelog (CORRECT)
database
42. In what step of the data-cleaning process do you find mistakes before you begin analyzing the data?
Publishing
Processing
Confirming
Verifying (CORRECT)
43. As a data analyst, you will need to keep the big picture in mind throughout any project when verifying data cleaning. What must the analyst do to take a big picture view of the project? Select all that apply.
Consider the reporting
Consider the goal (CORRECT)
Consider the business problem (CORRECT)
Consider the data (CORRECT)
VERIFY AND REPORT ON YOUR CLEANING RESULTS CONCLUSION
To put it briefly, data cleaning is a critical step in the data analysis process. It is equally important for the cleaning processes to be reported and validated, so that the data is ready for the next step. Taking this course in Coursera will make you learn about the processes of verification and reporting in data cleaning as well as the advantages they offer. Join the learning today!