Week 3: Databases: Where Data Lives
DATABASES: WHERE DATA LIVES – INTRODUCTION
Through databases, data analysis is made possible. The certification on Google Analytics on Coursera would help anyone on learning to gain access to databases for his or her purposes of working on data already stored there. You will learn to extract filter and sort data, as well as learn more about metadata. Those four components are important in data analysis for getting any trends or any such important points hidden in the data source.
So indeed, databases are very valuable to the treasure trove of any analyst’s toolkit. With the Google Data Analytics certification through Coursera, you will learn the techniques to actually work with databases, helping you to make sense rather in the endless data available in today’s world.
Test your knowledge on working with databases
1. Fill in the blank: A _____ is an identifier that references a database column in which each value is unique.
- foreign key
- relation
- field
- primary key (Correct)
Correct: A primary key is used to uniquely identify a record in a table, thereby preventing two rows from having the same value in that column. A foreign key is a column in one table which refers to the primary key column in another table, thus defining a relation among two tables.
2. Fill in the blank: A relational database contains a series of _____ that can be connected to form relationships.
- cells
- spreadsheets
- fields
- tables (Correct)
Correct: A relational database basically has a number of tables which that serves interconnection via relationships and allows having links of data within tables.
3. A key benefit of working with normalized databases is that they help lower data redundancy. Which of the following is an example of redundancy?
- The same piece of data being stored in two different places (Correct)
- A database that forms two or more relationships
- Team members in different office locations working with the same data
- A database containing two foreign keys
Correct: Redundancy occurs when the same item of information exists in two disparate locations.
Test your knowledge on metaData
2. The date and time a photo was taken is an example of which kind of metadata?
- Structural (Correct)
- Administrative
- Descriptive
- Representative
Correct: Describes data organization with regard to relationships and data collections, thus indicating that some data is either represented in one or more collections. By doing so, it gives insight into the structure of the data along with its arrangement.
2. Which document outlines the procedures to follow after an organization experiences a ransomware attack?
- Structural
- Representative
- Descriptive
- Administrative (Correct)
Correct: With this form of administrative metadata-for example, the date and time of a photograph-taken admin meta data about document processing. Administrative metadata such as creation date, file format, and other such properties are very important-e.g., sources, technical detail, digital asset, identification by weight, transfer, and metadata-initialization in document-processing systems.
3. A large metropolitan high school gives each of its students an ID number to differentiate them in its database. What kind of metadata are the ID numbers?
- Administrative
- Representative
- Structural
- Descriptive (Correct)
Correct: The ID numbers are such samples for the class of descriptive metadata. This descriptive metadata provides the information necessary to identify, describe, and retrieve a piece of data, thus allowing for easier reference or location when needed.
4. A company needs to merge third-party data with its own data. Which of the following actions will help make this process successful? Select all that apply.
- Replace the incoming data’s metadata with its own company metadata.
- Use the metadata to standardize the data. (Correct)
- Alter the company’s metadata to more closely reflect the incoming metadata.
- Use the metadata to evaluate the third-party data’s quality and credibility. (Correct)
Correct: With the aid of metadata, the organization should be able to harmonize data and analyze the quality and authenticity of external commercial data.
Test your knowledge on accessing data sources
1. A CSV file saves data in a table format. What does CSV stand for?
- Compatible scientific variables
- Calculated spreadsheet values
- Comma-separated values (Correct)
- Cell-structured variables
Correct: CSV stands for comma-separated values.
2. A data analyst wants to bring data from a CSV file into a spreadsheet. This is an example of what process?
- Importing data (Correct)
- Filing data
- Editing data
- Normalizing data
Correct: Such example can be data importation in the case of data analyst importing data from CSV file into a spreadsheet.
3. A CSV file makes it easier for data analysts to complete which tasks? Select all that apply.
- Distinguish values from one another (Correct)
- Import data to a new spreadsheet (Correct)
- Manage multiple tabs within a worksheet
- Examine a small subset of a large dataset (Correct)
Correct: Such type of file provides data analysts a very simple means to view a portion from a large dataset, import it into a new spreadsheet, and separate values quite easily.
4. What is the process of showing only the data that meets a specified criteria while hiding the rest?
- Filtering (Correct)
- Inspecting
- Sorting
- Converting
Correct: Filtering is the way to show parts of the data that fulfills the condition while hiding other parts. It is very necessary in data cleaning and one of the fundamental tools of any data analyst.
Test your knowledge on sorting and filtering
1. What is the process for arranging data into a meaningful order to make it easier to understand, analyze, and visualize?
- Reframing
- Sorting (Correct)
- Filtering
- Prioritizing
Correct: Sorting is a process of organizing data in a meaningful order to make it easier to understand, analyze and visualize.
2. A data analyst is reviewing a national database of real estate sales. They are only interested in sales of condominiums. How can the analyst narrow their scope?
- Sort by condominium sales
- Filter out condominium sales
- Sort by non-condominium sales
- Filter out non-condominium sales (Correct)
Correct: The analyst can then filter out all the sales that are not condominiums sales so that all their data is based on condominium sales.
3. A data analyst works for a rental car company. They have a spreadsheet that lists car ID numbers and the dates cars were returned. How can they sort the spreadsheet to find the most recently returned cars?
- By return date, in descending order (Correct)
- By car numerical ID, in descending order
- By return date, in ascending order
- By car numerical ID, in ascending order
Correct: To quickly identify the most recent returned cars, it should sort the spreadsheet in descending order of return date.
4. Fill in the blank: To keep a header row at the top of a spreadsheet, highlight the row and select _____ from the View menu.
- Pin
- Lock
- Set
- Freeze (Correct)
Correct: Select the row by highlighting it, and select Freeze from the View menu at the top of the spreadsheet to keep a header row at the top of your spreadsheet.
Working with large datasets in Sql
1. Run another query on your table:
SELECT
end_station_name
FROM
`bigquery-public-data.london_bicycles.cycle_hire`
WHERE
rental_id = 57635395;
At what station did the bike trip with rental_id 57635395 end?
- Southwark Street, Bankside
- Tower Gardens, Tower
- Notting Hill Gate Station, Notting Hill
- East Village, Queen Elizabeth Olympic Park (Correct)
Correct: For Row 1 of your output table in the end_station_name column, the address is East Village, Queen Elizabeth Olympic Park. You executed a query to fetch that information. Now, you will continue using SELECT, FROM and WHERE with your SQL queries to get more familiar with how you might go about building more complex queries as you analyze data in the future.
Create a custom table in BigQuery
1. After running the query on your new table, what was the third most popular baby name for boys in 2014?
- Jacob
- William
- Mason (Correct)
- Noah
Correct: You are now checking with a custom table and going through the results in order to find out that Mason is the third most popular baby name for boys in 2014. Henceforth, you will be able to put up your own data sets in BigQuery for future analyses. This becomes a practice in writing SQL queries with various data sources-a quality that every data analyst should have.
Test your knowledge on using SQL with large datasets
1. In MySQL, what is acceptable syntax for the SELECT keyword? Select all that apply.
- select (Correct)
- “SELECT”
- SELECT (Correct)
- ‘SELECT’
Correct: MySQL accepts both SELECT and select because SQL is not case sensitive. However, it is customary to write SQL keywords in upper letters.
2. A database table is named blueFlowers. What type of case is this?
- Lowercase
- Snake case
- Camel case (Correct)
- Sentence case
Correct: blueFlowers is in camel case.
3. In BigQuery, what optional syntax can be removed from the following FROM clause without stopping the query from running?
FROM `bigquery-public-data.sunroof_solar.solar_potential_by_postal_code`
- Dashes
- Underscores
- Backticks (Correct)
- Dots
Correct: Query improves by encasing name of the dataset in backticks. That said, the query still runs even without backquotes, as long as the said dataset name does not have spaces or special characters that need to be escaped.
4. In the following FROM clause, what is the table name in the SQL query?
FROM
bigquery-public-data.sunroof_solar.solar_potential_by_postal_code
- public-data.sunroof
- solar.solar
- sunroof_solar
- solar_potential_by_postal_code (Correct)
Correct: The name of the table in the SQL query is solar_potential_by_postal_code; it falls under the sunroof_solar dataset which is publicly available in BigQuery.
Prepare Data for Exploration Weekly Challenge 3
1. Primary and foreign keys are two connected identifiers within separate tables. These tables exist in what kind of database?
- Normalized
- Metadata
- Relational (Correct)
- Primary
Correct: Primary and foreign keys are connected identifiers that relate records within one or more tables in a relational database. A primary key allows the unambiguous identification of each record in a table, while a foreign key refers to the primary key in another table establishing a relationship between both tables.
2. When working with data from an external source, what can metadata help data analysts do? Select all that apply.
- Ensure data is clean and reliable (Correct)
- Choose which analyses to run
- Understand the contents of a database (Correct)
- Combine data from more than one source (Correct)
Correct: Metadata guides data analysts in understanding the architecture and the data elements existing in a database, ensures that the data is clean and consistent, and also provides a way to streamline the processes of merging data from different sources easily.
3. Think about data as a student at a high school. In this metaphor, which of the following are examples of metadata? Select all that apply.
- Classes the student is enrolled in (Correct)
- Student’s ID number (Correct)
- Student’s enrollment date (Correct)
- Grades the student earns
Correct: Student ID number, enrollment date, and courses in which the student is enrolled are descriptive metadata because they describe the student and her performance. Structural metadata contains information about how data is organized and structured, for example data layout in tables or files.
4. Fill in the blank: Data governance is the process of ensuring that a company’s _____ are managed in a formal manner.
- business tasks
- data assets (Correct)
- data engineers
- business strategies
Correct: Data governance is the process of ensuring that a company’s data assets are managed in a formal manner.
5. What are some key benefits of using external data? Select all that apply.
- External data has broad reach. (Correct)
- External data can provide industry-level perspectives. (Correct)
- External data is always reliable.
- External data is free to use.
Correct: Wide coverage is one of the most important advantages to analyzing external data, as it allows information from a far greater range to be accessed and it can be explained as an industry-level perspective, giving insights that internal data sources generally do not provide.
6. A data analyst reviews a database of Wisconsin car sales to find the last car models sold in Milwaukee in 2019. How can they sort and filter the data to return the last five cars sold at the top of their list? Select all that apply.
- Sort by sale date in ascending order
- Filter out sales not in 2019 (Correct)
- Sort by sale date in descending order (Correct)
- Filter out sales outside of Milwaukee (Correct)
Correct: In 2019, he or she will be able to filter out sales that were not completed within Milwaukee, then sort pertinent data by date in the opposite order to view the most recent first sales.
7. When writing a query, the name of the dataset can either be inside two backticks, or not, and the query will still run properly.
- True (Correct)
- False
Correct: While writing queries the dataset name can be enclosed within backticks or left out without compromising readability. The query will still run correctly as long as the dataset name doesn’t have spaces or any special characters that require escaping.
8. You are working with a database table that contains customer data. The first_name column lists the first name of each customer. You are only interested in customers with the first name Mark.
You write the SQL query below. Add a WHERE clause that will return only customers named Mark.

9. How many customers are named Mark?
- 5
- 1
- 2 (Correct)
- 3
Correct: In simple words, this clause THAT IS WHERE first_name = ‘Mark’ filters the results to return only those customers that have the first name as “Mark”. The complete query or SELECT * FROM customer WHERE first_name = ‘Mark’, will return all columns for those customers whose first name is “Mark”. The WHERE clause specifies the appropriate condition with which you can filter the results of your statement. The WHERE clause of this particular statement filters for all customers that fall under the constraint of having the same first name. Text values will be put into quotes when working with them. Two customers have the first name ‘Mark’. As a result, the query returns those records.
10. Relational databases contain a series of tables connected to form relationships. Which two types of fields exist in two connected tables?
- Primary and foreign keys (CORRECT)
- Internal and external data
- Descriptive and structural metadata
- Star and snowflake schemas
Correct: There are two tables in a relational database that use primary and foreign keys. Each record in a table is uniquely identified by a primary key. A foreign key is a field in one table related to the primary key in another table. This allows linking data from tables and ensures that data-integrity is maintained in the database.
11. Data analysts use metadata for what tasks? Select all that apply.
- To perform data analyses
- To evaluate the quality of data (CORRECT)
- To interpret the contents of a database (CORRECT)
- To combine data from more than one source (CORRECT)
Correct: Data analysts use metadata to combine data, evaluate data, and interpret a database.
Correct: Data analysts use metadata to combine data, evaluate data, and interpret a database.
Correct: Data analysts use metadata to combine data, evaluate data, and interpret a database.
12. Structural metadata indicates how a piece of data is organized and whether it’s part of one or more than one data collection.
- True (CORRECT)
- False
Correct: Structural metadata relates to the organization of data, for example, relationships among data elements, tables or datasets. It also describes whether a piece of data belongs to one or more data sets and information about its structure, format and organization. Thus, access, combination and interpretation of various data within a system could be better understood through structural metadata.
12. What is the process that data analysts use to ensure the formal management of their company’s data assets?
- Data mapping
- Data governance (CORRECT)
- Data aggregation
- Data integrity
Correct: Data governance is basically the formal management, oversight, and accountability of a data asset of a company. The creation and implementation of the rules and regulations that concern the creation, maintenance, and use of all data: data quality, security, privacy, compliance, and distribution across the organization become part of datagovernance. Good governance of information also implies that the information will be accurate, available, and responsibly used, among others.
13. A data analyst chooses not to use external data because it represents diverse perspectives. This is an appropriate decision when working with external data.
- True
- False (CORRECT)
Correct: External data capturing diverse viewpoints is really another good reason for employing a dataset because they give insight toward what could not be seen through internal sources. However, a data analyst may find that external data does not serve any purpose if there is no verification concerning its dependable usage. Careful, quality, accurate, and credible verification before the integration of the data into analyses needs to be conducted.
14. A data analyst reviews a database of Wisconsin car sales to find the last car models sold in Milwaukee in 2019. How can they sort and filter the data to return the last five cars sold at the top of their list? Select all that apply.
- Sort by sale date in ascending order
- Sort by sale date in descending order (CORRECT)
- Filter out sales outside of Milwaukee (CORRECT)
- Filter out sales not in 2019 (CORRECT)
Correct: The analyst can filter out sales outside of Milwaukee in 2019 and sort by date in descending order.
Correct: The analyst can filter out sales outside of Milwaukee in 2019 and sort by date in descending order.
Correct: The analyst can filter out sales outside of Milwaukee in 2019 and sort by date in descending order.
15. Think about data as driving a taxi cab. In this metaphor, which of the following are examples of metadata? Select all that apply.
- Passengers the taxi picks up
- Make and model of the taxi cab (CORRECT)
- License plate number (CORRECT)
- Company that owns the taxi (CORRECT)
Taxi structural metadata would include license plate number, make and model of the cab and also the company that owns the taxi.
Data structural metadata refers to information on the license plate number, make and model of the cab, and, not least, the owner company of the taxi.
License plate number, make and model of cab, and the company owning the taxi will represent structural metadata.
16. What are some key benefits of using external data? Select all that apply.
- External data is free to use.
- External data is always reliable.
- External data can provide industry-level perspectives. (CORRECT)
- External data has broad reach. (CORRECT)
Wide reach and the potential for industry-level insights are among the advantages of external data.
Just like industry-level intelligence, the potential benefits derived from external data include very wide reach.
In general, the use of external data consists of wide reach and industry-level perspectives.
17. You are working with a database table that contains customer data. The city column lists the city where each customer is located. You want to find out which customers are located in Berlin.
You write the SQL query below. Add a WHERE clause that will return only customers located in Berlin.
SELECT
*
FROM
customer
How many customers are located in Berlin?
- 9
- 12
- 2 (CORRECT)
- 7
18. A data analyst reviews a national database of movie theater showings. They want to find the first movies shown in San Francisco in 2001. How can they organize the data to return the first 10 movies shown at the top of their list? Select all that apply.
- Sort by date in descending order
- Sort by date in ascending order (CORRECT)
- Filter out showings outside of San Francisco (CORRECT)
- Filter out showings not in 2001 (CORRECT)
Correct: And then the analyst sorts the date results in ascending order after filtering out showings that fell outside San Francisco in the year 2001.
19. A nonprofit maintains a list of how many laptops they provide to each school in the county. In the table, there is a column called number_of_laptops. A data analyst wants to determine which schools were given the fewest laptops. How should they sort the data to return these schools first?
- Sort numerically in descending order
- Sort alphabetically in ascending order
- Sort numerically in ascending order (CORRECT)
- Sort alphabetically in descending order
Correct: It is essential for the data analyst to sort the number_of_laptops column in increasing numerical order.
20. When writing a query, you must remove the two backticks around the name of the dataset in order for the query to run properly.
- True
- False (CORRECT)
Correct: A query can be written with the dataset name enclosed within dual backticks or without them, and the query will still execute correctly.
21. Fill in the blank: Data _ is the process of ensuring the formal management of a company’s data assets.
- aggregation
- governance (CORRECT)
- mapping
- integrity
Correct: Data governance refers to the process of managing and controlling all data assets of a company.
22. In what circumstance might a data analyst choose not to use external data in their analysis?
- The data cannot be confirmed to be reliable (CORRECT)
- The data is free for anyone to access
- The data represents diverse perspectives
- The data is too thorough
Correct: For example, if a data analyst cannot verify the reliability of some external data, he or she may decide not to use it in his or her analysis. The reliability of external data cannot be checked.
databases: where data lives – conclusion
Today many people feel that database might petrify them, but by giving sufficient tools and training, they can avail themselves of a lot with databases. The Google Analytics Certification on Coursera is meant to equip one with database working skills, preparing one to extract raw data that fit their work. It also contains sections where it teaches how to filter and sort data so that trends can be identified and meaningful insights drawn. Therefore, this skills development will enable one to access and even exploit the plethora of information within databases for work purposes. Join Coursera today and take your first step towards becoming a certified data analyst.