Course 4 – Process Data from Dirty to Clean Quiz Answers

Spread the love

Week 2: Sparkling-Clean Data

Sparkling-Clean Data introduction

The data are the lifeblood of the investigations, and without them, any successful investigation would not succeed. In Google Data Analytics Certificate through Coursera, you will learn how to identify, clean, dabbled data and prepare the same for worthwhile insights and use as information for decision-making. You are also guided on various techniques to clean data in spreadsheets and other tools, ways to detect possible problems, such as misadmissions and values that are not available.

You will find valuable skills in processing bad data into useful information by being able to recognize it as clean, for enhanced productivity as an analyst. So, if you are ready to take your analytics career forward, register now to become a recipient of the Google Data Analytics Certificate program on Coursera!

Learning objectives:

  • Indicate differences between clean and dirty data
  • Be able to tell specific items that make data dirty
  • Connect what data-cleaning techniques are-Identify errors, redundancy, compatibility issues, and why continuous monitoring is important.
  • Indicate common mistakes made during cleaning data.
  • Show how simple spreadsheets could help clean data.

Test your knowledge on Clean versus dirty data

1. Describe the difference between a null and a zero in a dataset.

  • A null indicates that a value does not exist. A zero is a numerical response. (Correct)       
  • A null signifies invalid data. A zero is missing data.
  • A null represents a value of zero. A zero represents an empty cell.
  • A null represents a number with no significance. A zero represents the number zero.

Correct: Data integrity refers to the accuracy, completeness, and consistency and trustworthiness of data during its whole cycle: from collection to storage and analysis. Data integrity should be maintained in any process that will later give rise to reliable insights or better decisions. Hence, it refers to the quality preservation and consistency of data, to error prevention, and to the protection of data from unauthorized access and flat corruption.

2. What are the most common processes and procedures handled by data engineers? Select all that apply.

  • Developing, maintaining, and testing databases and related systems. (Correct)
  • Transforming data into a useful format for analysis. (Correct)
  • Verifying results of data analysis
  • Giving data a reliable infrastructure. (Correct)

Correct: Data engineers make usable for analysis data and make it well-supported by a stable infrastructure. They mainly develop, maintain, and test databases and related systems.

3. What are the most common processes and procedures handled by data warehousing specialists? Select all that apply.

  • Ensuring data is backed up to prevent loss. (Correct)
  • Ensuring data is available. (Correct)
  • Ensuring data is secure. (Correct)
  • Ensuring data is properly cleaned

Correct: These data warehousing specialists make sure that the data is accessible but secure and always backed up from loss.

4. A data analyst is cleaning a dataset. They want to confirm that users entered five-digit zip codes correctly by checking the data in a certain spreadsheet column. What would be most helpful as the next step?

  • Using the MAX function to determine the maximum value in the cells in the column
  • Using the field length tool to specify the number of characters in each cell in the column. (Correct)
  • Formatting the cells in the column as number
  • Changing the column width to fit only five digits

Correct: The ideal method would be to use the field length tool in order to determine the number of characters one allowed in each cell of a column.

5. Review the final product of the spreadsheet you cleaned during this activity. Which of the following is the rightmost column?

  • Column AA
  • Column Z
  • Column AZ
  • Column AS (Correct)

Correct: Of course, the rightmost column of this task will end at Column AS, and that can be identified only after the data is properly transposed. The skills of cleaning data and transposing data included in this learning exercise will help in future endeavors as far as handling data goes.

Test your knowledge on data-cleaning techniques

1. Fill in the blank: Every database has its own formatting, which can cause the data to seem inconsistent. Data analysts use the _____ tool to create a clean and consistent visual appearance for their spreadsheets.

  • clear formats (Correct)
  • autocorrect
  • conditional formatting
  • spellcheck

Correct: Data analysts use clear formats by employing the tool for removing visible inconsistencies and making their spreadsheets clean and uniformly presentable.

2. What is the process of combining two or more datasets into a single dataset?

  • Data transferring
  • Data merging (Correct)
  • Data composition
  • Data validation

Correct: The data uniting is an act of joining together two or more datasets into a single unified dataset.

3. Fill in the blank: In data analytics, _____ describes how well two or more datasets are able to work together.

  • suitability
  • alignment
  • compatibility (Correct)
  • agreement

Correct: The integration and expected operability of two or more datasets define their compatibility.

4. Which of the following functions divides text around a specified character or string and puts each fragment of text into a separate cell in the row?

  • The TRIM function
  • The COUNTIF function
  • The CONCATENATE function
  • The SPLIT function (Correct)

Correct: The function SPLIT divides text into several parts according to a particular character or string and inserts each of them into a different cell in the same row. Spreadsheet functions are one from the entire myriad of tools required for data cleansing; these tools should be learned well enough for working as a data analyst.

Test your knowledge on Cleaning data in spreadsheets

1. Describe the relationship between a text string and a substring.

  • A text string is a group of characters within a cell. A substring is a smaller subset of that text string. (Correct)
  • A text string is the list of attributes at the top of columns within a table. A substring is a single attribute within that list.
  • A text string is a column of data within a table. A substring is one cell within that column.
  • A text string is a row of data within a table. A substring is one cell within that row.

Correct: A text string denotes a series of characters contained in a particular cell; indeed, a substring can be quite easily considered as a much smaller portion or subset of that text string.

2. A data analyst uses the COUNTIF function to count the number of times a value less than 5 occurs between spreadsheet cells A2 through A100. What is the correct syntax?

  • =COUNTIF(A2:A100,”<5″) (Correct)
  • =COUNTIF(A2:A100,”>5″)
  • =COUNTIF(A2:A100,>5)
  • =COUNTIF(A2:A100,<5)

Correct: The accuracy of the syntax is =COUNTIF(A2:A100, “<5”). The COUNTIF function will generally return the total number of cells in the specified range of A2:A100 that satisfy the condition “<5”.

3. Fill in the blank: To remove leading, trailing, and repeated spaces in data, analysts use the ____ function.

  • RIGHT
  • TRIM (Correct)
  • LEFT
  • MID

Correct: TRIM is a function that removes leading, trailing, and repeated spaces in data.

Process Data from Dirty to Clean Weekly Challenge 2

1. Conditional formatting is a spreadsheet tool that changes how cells appear when values meet a specific condition. Data analysts can use conditional formatting to do which of the following tasks? Select all that apply.

  • To make cells stand out for more efficient analysis. (Correct)
  • To sort data in series of cells into a meaningful order
  • To identify blank cells or missing information (Correct)
  • To calculate mathematical equations

Correct: The data analysts highlight the blank cells with the help of conditional formatting, which helps to identify the missing information and distinguish some specific cells, thus facilitating a proper data analysis.

2. A data analyst uses the SPLIT function to divide a text string around a specified character and put each fragment into a new, separate cell. What is the specified character separating each item called?

  • Delimiter (Correct)
  • Partition
  • Unit
  • Substring

Correct: While making use of the SPLIT function, the term used to describe that character that separates each item is called a delimiter.

3. For a function to work properly, data analysts must follow each function’s predetermined structure. What is this structure called?

  • Summary
  • Algorithm
  • Validation
  • Syntax (Correct)

Correct: That is syntax. Syntax is the guaranteed structure of all the necessary elements and their respective arrangements.

4. You are working with the following selection of a spreadsheet:

Course_4_Weekly_Challenge_2.1

In order to extract the five-digit postal code from Brandon, FL, what is the correct function?

  • =RIGHT(5,B4)
  • =LEFT(5,B4)
  • =LEFT(B4,5)
  • =RIGHT(B4,5) (Correct)

Correct: =RIGHT(B4, 5) is the syntax that is to be used. The RIGHT function extracts a number of characters from the right side of a text string. In this case, B4 is the cell which contains the text, and 5 is the number of characters to return.

5. A data analyst in a human resources department is working with the following selection of a spreadsheet:

Course_4_Weekly_Challenge_2.2

They want to create employee identification numbers (IDs) in column D. The IDs should include the year hired plus the last four digits of the employee’s Social Security Number (SS#). What function will create the ID 20201939 for the employee in row 4?

  • =CONCATENATE(A4*B4)
  • =CONCATENATE(A4!B4)
  • =CONCATENATE(A4,B4) (Correct)
  • =CONCATENATE (A4+B4)

Correct: For producing the employee ID 20201939 for the employee in row 4, the function used is =CONCATENATE(A4, B4). The CONCATENATE function is used to concatenate two or more text strings. A4 and B4 are the locations of the text strings to join in this case.

6. A data analyst at an e-commerce company is working with a spreadsheet containing last month’s sales. The most expensive product their company sells costs $49.99, so they want to quickly confirm that all of the data in the Sales column is $49.99 or less. What function can they use?

  • SUMIF
  • COUNT
  • SUM
  • COUNTIF (Correct)

Correct: There is the COUNTIF function, which gives back the number of cells that satisfy a certain condition or criterion.

7. The V in VLOOKUP stands for what?

  • Visual
  • Vertical (Correct)
  • Variable
  • Virtual

Correct: The “V” in VLOOKUP stands for vertical. Thus, VLOOKUP can be defined as a spreadsheet function that searches for a particular value in the first column of a specified range, then returns the corresponding value from another column in the same row.

8. Data and business objectives might not align for a number of reasons. Which of the following issues can prevent alignment? Select all that apply.

  • True (Correct)
  • False

Correct: The “V” in VLOOKUP stands for “vertical”. VLOOKUP is defined as a spreadsheet function that searches a specified value from the first column of a range and returns a value from another column on the same row.

9. An analyst is cleaning a new dataset containing 500 rows. They want to make sure the data contained from cell B2 through cell B300 does not contain a number greater than 50. Which of the following COUNTIF function syntaxes could be used to answer this question? Select all that apply.

  • =COUNTIF(B2:B300,>50)
  • =COUNTIF(B2:B300,<=50)
  • =COUNTIF(B2:B300,”>50″) (CORRECT)
  • =COUNTIF(B2:B300,”<=50”) (CORRECT)

Correct: One option is to use the formula =COUNTIF(B2:B300,”>50″) which would count how many cells have values more than 50. As an alternative, for the same purpose, one can use the formula =COUNTIF(B2:B300,”<=50″) which would count the number of cells containing values less than or equal to 50. Either of these formulas can be used to verify if there indeed is no value greater than 50 in the data.

Correct: Use the formula =COUNTIF(B2:B300,”>50″) to count how many cells have values exceeding 50, or use =COUNTIF(B2:B300,”<=50″) to count how many cells have values below or equal to 50. Both formulas will help determine whether any values in the data are above 50, as either function could be used for that purpose.

10. A delimiter is a character that indicates the beginning or end of a data item. The split text to columns tool uses a delimiter to accomplish what task?

  • To change the format of a column of text 
  • To split one column into two
  • To specify where to split a text string (CORRECT)
  • To split duplicate substrings

Correct: The “Split Text to Columns” option uses a delimiter to determine where the text string is divided. The delimiter can be a character such as a comma, space, semicolon, or other symbols that separate different pieces of the text. The specified delimiter allows breaking a single text entry apart into multiple columns.

11. Fill in the blank: When describing a SUM function, the _____ is =SUM(value 1 through value 2).

  • Standard 
  • Structure 
  • script
  • syntax (CORRECT)

Correct: When describing a SUM function, the syntax is =SUM(value 1 through value 2).

12. VLOOKUP searches for a value in a row in order to return a corresponding piece of information.

  • True
  • False (CORRECT)

Correct: When using VLOOKUP, you would like to look up a certain value in the first column of a range or a table and retrieve from another column in the same row the appropriate value. This function allows you to look up data like a lightning flash when entering a value and drawing related information from some other column with the aid of the column index number you provide.

13.  A data analyst needs to combine two datasets. Each dataset comes from a different system, and the systems store data in different ways. What can the data analyst do to ensure the data is compatible?

  • Merge the data
  • Use a data visualization
  • Map the data (CORRECT)
  • Apply a data structure

Correct: Using data mapping, data analysts identify and document how different data sources relate. It covers all discrepancies in format, structure, or definitions in the data sets to facilitate compatibility and union for analysis. Furthermore, it permits analytics to accurately transform and align datasets for consistent and, thus, reliable inference.

14. A data analyst wants to search for a certain value in a column, then return a corresponding piece of information. Which function should they use?

  • MATCH
  • FIND
  • VALUE
  • VLOOKUP (CORRECT)

Correct: The VLOOKUP function is used to find a specific value in the first column of a table or range and return a related piece of information from another column in the same row. It helps find more quickly and in a simple way related data corresponding to a lookup value. For example, it may be possible to find and retrieve a product price by defining its product ID.

15.  Fill in the blank: Data mapping is the process of _____ fields from one data source to another.

  • extracting
  • linking
  • matching (CORRECT)
  • merging

Correct: Data mapping is the activity performed to match the fields in two different data sources so that data can be transferred or integrated in a correct manner between the systems. It further explains how data elements from one source relate to those in another so that easy data movement, transformation, and interoperability across different platforms or databases are made possible.

16.  An analyst is working on a project involving customers from Bogota, Colombia. They receive a spreadsheet with 5,000 rows of customer information. What function can they use to confirm that the column for City contains the word Bogota exactly 5,000 times? 

  • COUNT
  • SUMIF
  • SUM
  • COUNTIF (CORRECT)

Correct: The COUNTIF function can be employed, which gives back the number of cells having a particular condition or a given match. This is really great to count the occurrences of one item, number, or text in a specified range of cells.

17. Fill in the blank: Conditional formatting is a spreadsheet tool that changes how _____ appear when values meet a specific condition. 

  • queries
  • charts
  • filters
  • cells (CORRECT)

Correct: Following are some word substitutions or synonyms for the word “Conditional Formatting”: Attributes formatting, Stylistic formatting and the Text Style. Conditional formatting in spreadsheets means changing the appearance of cells depending on specific conditions or criteria. For example, you might apply colors to certain ranges, data bars, or icons on cells that fulfill a certain threshold, making it easier to interpret patterns or trends in the data.

18. A data analyst suspects that there are many blank cells in their spreadsheet corresponding to missing information. What spreadsheet tool can they use to identify only those cells containing the null values?

  • Conditional ranking
  • Conditional formatting (CORRECT)
  • Cell filtering
  • Cell querying

19. A data analyst is working on a spreadsheet in which one of the columns is name data. This data is formatted as lastname, firstname. The analyst chooses to divide this data into two new columns, one containing the firstname data and the other containing the lastname data. What spreadsheet tool would they use to do this?

  • The MID function
  • Substring formatting
  • The SPLIT function (CORRECT)
  • Conditional formatting

20. A data analyst is using a function in a spreadsheet. When they input the function, they follow a predetermined structure that includes all required information for the function and its proper placement. What aspect of a function does this describe?

  • The specified value of the function
  • The length of the function
  • The number of characters in the function
  • The syntax of the function (CORRECT)

21. As part of the data-cleaning process, a data analyst creates a rule to highlight any empty cells in a bright blue color. This is an example of data visualization.

  • True
  • False (CORRECT)

22. A data analyst is working on a spreadsheet in which one of the columns contains name data. This data is formatted as lastname_firstname. The analyst splits this data at the underscore so that each piece—firstname and lastname—are contained in their own columns.

In this context, what is the underscore acting as?

  • MID function
  • Partition
  • Delimiter (CORRECT)
  • Substring

23. In a spreadsheet, what is the correct function for extracting the first two characters of the string located in cell A7?

  • =LEFT(A7,2) (CORRECT)
  • =RIGHT(2,A7)
  • =RIGHT(A7,2)
  • =LEFT(2,A7)

24. A data analyst in a human resources department is working with the following selection of a spreadsheet:

N/A A B C D
1 Year Hired Last 4 of SS# Department Employee ID
2 2019 1192 Marketing  
3 2014 2683 Operations  
4 2020 1939 Strategy  
5 2009 3208 Graphics  

They want to create employee identification numbers (IDs) in column D. The IDs should include the last four digits of the employee’s Social Security Number(SS#) plus the year hired. What function will create the ID 26832014 for the employee in row 3?

  • =CONCATENATE(B3,A3) (CORRECT)
  • =CONCATENATE(B3+A3)
  • =CONCATENATE(A3!B3)
  • =CONCATENATE(A3+B3)

25. An analyst is cleaning a new dataset. They want to make sure the data contained from cell C4 through cell C350 contains only numbers below 40. Choose the statements that include the correct syntax for this COUNTIF function. Select all that apply.

  • =COUNTIF(C4:C350, <=40)
  • =COUNTIF(C4:C350, >40)
  • =COUNTIF(C4:C350,”<40″) (CORRECT)
  • =COUNTIF(C4:C350,”>=40″) (CORRECT)

26. Before analyzing a dataset, an analyst maps the data. What is the reason for doing this?

  • The dataset has no visualizations.
  • The analyst thinks the dataset might have some null values.
  • The dataset contains data from different sources. (CORRECT)
  • The analyst wants to know what attributes the data has.

27. Fill in the blank: In order to make your spreadsheet easier to analyze, you choose to alter the way cells appear if their values meet certain conditions. The spreadsheet tool that you use to do this is called _____.

  • conditional ranking
  • cell querying
  • cell filtering
  • conditional formatting (CORRECT)

28. Fill in the blank: A _____ is a specified text that the SPLIT function uses to determine where a text string is to be divided.

  • partition
  • unit
  • substring
  • delimiter (CORRECT)

29. An analyst is working on a project involving customers from Bogota, Colombia. They receive a spreadsheet with 5,000 rows of customer information. What function can they use to confirm that the column for City contains the word Bogota exactly 5,000 times?

  • SUMIF
  • COUNTIF (CORRECT)
  • COUNT
  • SUM

30. Fill in the blank: The function _____ is used to return information in a column that contains a specified value.

  • MATCH
  • FIND
  • VLOOKUP (CORRECT)
  • VALUE

31. An analyst is cleaning a new dataset. They want to make sure the data contained from cell B2 through cell B100 does not contain a number smaller than 10. Which COUNTIF function syntax can be used to answer this question?

  • =COUNTIF(B2:B100,”<9″)
  • =COUNTIF(B2:B100,”>=10”)
  • =COUNTIF(B2:B100,>50)
  • =COUNTIF(B2:B200, ”<=50”) (CORRECT)

32. A data analyst is using a function in a spreadsheet. For the function to work correctly, they follow the function’s syntax. What does this entail?

  • It is how the function can be used in a program.
  • It is the purpose of the function and its use.
  • It is the function’s name and placement.
  • It is the function’s required information and its proper placement. (CORRECT)

33.  A data analyst needs to combine two datasets. Each dataset comes from a different system, and the systems store data in different ways. What can the data analyst do to ensure the data is compatible prior to analyzing the data?

  • Map the data (CORRECT)
  • Use a data visualization
  • Apply a data structure
  • Spot check for null values

Sparkling-Clean Data CONCLUSION

Applications that give activities a certain appearance have filled the lives of data analysts with spreadsheets. Because his programs for data analysis provide him with robust features that aid in investigating, cleaning, and organizing data within hundreds of functions. Cleaning and dirty data comprise what this course component has defined. Dirty as in Automated Data Cleaning, but cleaning it manually, with spreadsheets, and more.

Mastering data cleaning means that one is capable of piloting accurate and trusted analyses on the works done. If data analytics interests you more, why not take up this learning journey on Coursera? Thank you for taking this course!

Leave a Comment