Week 1: Data Types and Structures
Optional: familiar with data analytics? – Take our Diagnostic Quiz
The most important part of analytics is collecting data which is gathering data in structured formats as well as unstructured form and being prepared for analysis. Data types refer to the categories of data that will be collected and analysed by an analyst while data structures are what defines how the data is organized. For example, platforms like Google Data come and organizes the data on Coursera such that it becomes easier to extract meaningful insights.
Data types include categorical variables such as gender, numerical variables like age, time and date data, strings and complex formats such as images and audio files. Data formats like CSV files make it possible for raw data to be stored in developed forms for easy accessibility for continued analysis.
Learning Outcomes:
- Understand how data is generated through everyday activities and what forms that data takes.
- Identify important things to consider when making a data collection decision.
- Distinguish between structured and unstructured data.
- Clarify the difference between data and data types.
- Explore the connection between data types and fields and values.
- Discuss wide and long data formats; their structure of distribution and purpose.
Optional: familiar with data analytics? – take our diagnostic quiz
1. A data analyst at a construction company is working on a report for a quickly approaching deadline. Why might they choose to analyze only historical data?
- They enjoy historical references.
- The project has a very short time frame. (Correct)
- The data is constantly changing.
- The data is difficult to predict.
Correct: The reason most probably would be to analyze only historical data when a project is extremely short in time and does not allow gathering and analyzing real-time or futuredata. Historical data is much easier to process, faster availably be accessed and more feasible within a very limited time frame. Analyzing past data enables the analyst to glean immediate insights in order for the project to continue without waiting for new data collection or processing.
2. What are the benefits of data modeling? Select all that apply.
- Secure data for future use
- Keep data consistent (Correct)
- Provide a map of how data is organized (Correct)
- Make data easier to understand (Correct)
Correct: Data modelling is crucial in offering an organized structure of data and ensuring a consistent format. It increases the comprehension of any data model. It depicts how the components of data are structured and how they interact with each other.
3. A group of high school students take a survey that asks,” Are you on an athletic team? Please reply yes or no.” What kind of data is being collected?
- Boolean (Correct)
- String
- Visual
- Number
Correct: Boolean data is merely captured with two values like true or false, yes or no.
4. A data analyst is evaluating data to determine whether it is good or bad. Which qualities characterize good data? Select all that apply.
- Cited (Correct)
- Consequential
- Comprehensive (Correct)
- Current (Correct)
Correct: It actually contains comprehensive information that is current and well-referenced.
5. Imagine that a company uses your personal data as part of a financial transaction. Before it occurs, you are not made aware of the nature and scale of this transaction. What concept of data ethics does this violate?
- Transaction transparency
- Openness
- Consent
- Currency (Correct)
Correct: The principle of currency in data ethics is being violated here. The idea of currency highlights that individuals need to be informed about the transactions conducted yet the use of their personal data and the extent of those transactions.
6. Which of the following are protections afforded by data privacy? Select all that apply.
- Providing users the right to inspect, update, or correct their own data (Correct)
- Providing users the right to free access, usage, and sharing of data
- Preserving a data subject’s information and activity for all data transactions (Correct)
- Applying standards of right and wrong to the management and usage of data
Correct: Protection provides assurance to the data subject that their information and activity are safe in all data transactions. Moreover, they give the users certain rights like accessing, modifying, and correcting their related personal data.
7. Which of the following are uses of relational databases? Select all that apply.
- Organize numerical data based on relative scale
- Keep data consistent regardless of where it’s accessed (Correct)
- Contain and describe a series of tables that can be connected to form relationships (Correct)
- Present the same information to each collaborator (Correct)
Correct: Relational databases are used to store and describe a collection of linked tables. This ensures that each table maintains not only its own data but also the relations between it and other tables to ensure consistency of the information that is being presented to all collaborators irrespective of the point at which they access it.
8. Which statements define primary keys and foreign keys and describe their relationship? Select all that apply.
- A primary key is an identifier that references a column in which each value is unique. (Correct)
- A foreign key is a field within a table that’s a primary key in another table. (Correct)
- Primary and foreign keys are two connected identifiers within separate tables in a relational database. (Correct)
- A primary key is a table containing observational data, and a foreign key is a table that contains the results of the primary key’s analysis.
Correct: A primary key acts as a unique identifier that refers to some column with values unique for that column. A foreign key is a field in one table that corresponds to the primary key in another table. Primary and foreign keys are two related identifiers found in different tables within a relational database.
9. What tasks can data analysts accomplish using metadata? Select all that apply.
- Combine data from more than one source (Correct)
- Perform data analyses
- Evaluate the quality of data (Correct)
- Interpret the contents of a database (Correct)
Correct: As such, a primary key does identify one column where all its values should be unique, a field link or foreign key in a table which links to the primary key of another table. Thus, the primary key and the foreign key are interconnected identifiers stored in different tables of a relational database.
10. A data analyst reviews a spreadsheet of boat auction sales to find the last five sailboats sold in Kentucky. What steps would they take in order to narrow the scope? Select all that apply.
- Sort by date in ascending order
- Sort by date in descending order (Correct)
- Filter out sales in Kentucky
- Filter out sales outside of Kentucky (Correct)
Correct: The analyst will filter the sales data applying null transactions in Kentucky and sort the available ones by dates in descending order.
11. You are writing a SQL query to filter data from a database that describes trees in Omaha, Nebraska. You want to only display entries for trees that have a diameter of 30 inches. The name of the table you’re using is Nebraska_trees and the name of the column that shows the diameters of the trees is trunk_diameter. What is the correct query syntax that will retrieve and filter data from this table?
- SELECT Nebraska_trees WHERE trunk_diameter = 30
- SELECT * FROM trunk_diameter WHERE Nebraska_trees = 30
- SELECT trunk_diameter = 30 FROM Nebraska_trees
- SELECT * FROM Nebraska_trees WHERE trunk_diameter = 30 (Correct)
Correct: The correct query is SELECT * FROM Nebraska_trees WHERE trunk_diameter = 30.
12. Consistent naming conventions describe which properties of a file? Select all that apply.
- Version (Correct)
- Content (Correct)
- Creation date (Correct)
- File location
Correct: This accepted-naming convention talks of the content of the file, the time of creation, and the version.
Test Your Knowledge on Collecting Data
1. Which method of data-collection is most commonly used by scientists?
- Interviews
- Observations (Correct)
- Questionnaires
- Surveys
Correct: Observation is the method of data-collection most often used by scientists.
2. Organizations such as the U.S. Centers for Disease Control (CDC) often use data collected from hospitals. What kind of data is the CDC using if it is collected by hospitals, then sold to the CDC for its own analysis?
- Multiple-party data
- Second-party data (Correct)
- First-party data
- Third-party data
Correct: Other secondary data such as that collected or compiled by hospitals and further passed to the CDC is what the term refers to-the sharing of data between two organizations for any additional usages or analyses.
3. Fill in the blank: In data analytics, a _____ refers to all possible data values in a certain dataset.
- representation
- population (Correct)
- sample
- source
Correct: Thus, the population is the complete set of data values or observations used in a particular study or analysis within data analytics.
Test Your Knowledge on Data Formats and Structures
1. Fill in the blank: The running time of a movie is an example of _____ data.
- nominal
- qualitative
- discrete
- continuous (Correct)
Correct: Indeed, running duration of a movie always serves as continuous data since it is a measure and can take any numerical value within the defined range.
2. What are the characteristics of unstructured data? Select all that apply.
- Has a clearly identifiable structure
- Is not organized (Correct)
- May have an internal structure (Correct)
- Fits neatly into rows and columns
Correct: Instead of having any once and for all defined organization, unstructured data still tends to have some very internal organization or structures.
3. Structured data enables data to be grouped together to form relations. This makes it easier for analysts to do what with the data? Select all that apply.
- Rewrite
- Store (Correct)
- Search (Correct)
- Analyze (Correct)
Correct: The analysts can utilized structure data more efficiently to carry out , storage, search and/or analysis by joining to form relationships with one another.
4. Which of the following is an example of unstructured data?
- Rating of a local favorite restaurant
- GPS location
- Contact saved on a phone
- Email message (Correct)
Correct: The analysts can utilized structure data more efficiently to carry out , storage, search and/or analysis by joining to form relationships with one another.
5. How would you write a function to calculate February’s entertainment expenses for Cable TV, Video Streaming, and Movies in the example spreadsheet?
- =SUM(B2:C4)
- SUM(C2:C6)
- SUM(B2:C6)
- =SUM(C2:C4) (Correct)
Correct: To be precise, the correct way of expressing a SUM function that makes a tabulation of entertainment outlays for Cable, Video Streaming, and Movies in February is this: =SUM(C2:C4). In this function, you chose the relevant range of cells and put it in proper syntax for the SUM function. Thus, you have learned how to utilize this knowledge of functions to work on spreadsheet data. You will, in fact, create dynamic sheets that could help you in some future tasks..
6. Which statements are true about the two penguin datasets in the Dive into dplyr (tutorial #1) notebook? Select all that apply.
- In penguins_lter.csv, the column Individual ID cannot be sorted.
- penguins_size.csv has 7 columns. (Correct)
- In penguins_lter.csv, the highest value in the column Sample Number is 152. (Correct)
- In both datasets, the number of columns is the same.
Correct: The penguins_size.csv file has to 7 columns. The highest “Sample Number” in the penguins_lter.csv file is 152. To view the penguins datasets, you used an interactive notebook’s data viewing function. Having familiarized yourself with it, you have prepared yourself to use an interactive notebook for the exploration and description of data, highly beneficial in accomplishing future data projects.
Test Your Knowledge on Data Types, Fields, and Values
1. Fill in the blank: Internet search engines are an everyday example of how Boolean operators are used. The Boolean operator _____ expands the number of results when used in a keyword search.
- OR (Correct)
- AND
- WITH
- NOT
Correct: Boolean operator OR increases the number of outputs of a keyword search by including any results that match with at least one of the defined terms.
2. Which of the following statements accurately describes a key difference between wide and long data?
- Wide data subjects can have data in multiple columns. Long data subjects can have multiple rows that hold the values of subject attributes. (Correct)
- Wide data subjects can have multiple rows that hold the values of subject attributes. Long data subjects can have data in multiple columns.
- Every wide data subject has multiple columns. Every long data subject has data in a single column.
- Every wide data subject has a single column that holds the values of subject attributes. Every long data subject has multiple columns.
Correct: Broad data structures are arranged in such a way that the data spans across multiple columns, each column representing a different subject’s attribute. Long data structures, on the other hand, are organized in that many rows will capture one or more value under a particular column representing the subject’s values.
3. What does data transformation enable data analysts to accomplish?
- Inspect the data for accuracy
- Change the structure of the data (Correct)
- Restore the data after it has been lost
- Retrieve the data faster
Correct: Data transformation permits data analyst changing the structure, reconfiguring the format, or changing values of data so that it could better fit for analysis or report purposes.
Prepare Data for Exploration Weekly Challenge 1
1. A data analyst is working on an urgent traffic study. As a result of the short time frame, which type of data are they most likely to use?
- Unclean
- Theoretical
- Personal
- Historical (Correct)
Correct: A historical record would be their only thing that could allow them to work quickly with decisions within such a short time-framed context.
2. Which of the following is an example of continuous data?
- Movie budget.
- Movie run time. (Correct)
- Leading actors in movie.
- Box office returns.
Correct: Movie run times are an example of continuous data open to several values in a certain scale, and they measure time with such accuracy.
3. Nominal qualitative data has a set order or scale.
- True
- False (Correct)
Correct: In essence, nominal qualitative data is not arranged and scaled in the same way. It classifies data solely into separate categories or labels without any hierarchy whatsoever among them.
4. Which of the following is a benefit of internal data?
- Internal data is less likely to need cleaning.
- Internal data is less vulnerable to biased collection.
- Internal data is the only data relevant to the problem.
- Internal data is more reliable and easier to collect. (Correct)
Correct: Because this data has been derived from within the organization and is typically reflecting the operations and processes followed within, internal data is generally reliable and easier to collect than external data.
5. Structured data is likely to be found in which of the following formats? Select all that apply.
- Audio file
- Digital photo
- Spreadsheet (Correct)
- Table (Correct)
Correct: Ordinarily, a structured data organizes information in a table or spreadsheet that has a defined way of rows and columns, making it easy to save, search, and analyse.
6. Which of the following values are examples of a Boolean data type? Select all that apply.
- Yes, no, or unsure
- Yes or no (Correct)
- One, two, or three
- True or false (Correct)
Correct: The Boolean type can only take on two values – true or false, as well as yes or no.
7. The following is a selection from a spreadsheet:

- Narrow
- Wide (Correct)
- Long
- Short
Correct: The selection of wide data in the spreadsheet is organized through different columns in such a way that there exists a column for every attribute of the subject.
8. Data transformation can change the structure of the data. An example of this is taking data stored in one format and converting it to another.
- True (Correct)
- False
Correct: Structure of data can change with processing work. For example, instead of continuing to describe data, you may evolve it from one format to another, such as CSV to JSON or rearranging it from wide format to long format.
9. Which of the following questions collect nominal qualitative data? Select all that apply.
- True
- False (Correct)
Correct: A social media post is an example of unstructured data.
10. A social media post is an example of structured data.
- Stop intrusive activity
- Monitor system and network activity (CORRECT)
- Collect and analyze system information for abnormal activity (CORRECT)
- Alert on possible intrusions (CORRECT)
IDS is an application that keeps track of system and network activities and generates alerts for suspected intrusions. It also retrieves and analyzes system data to detect any abnormal or unusual behavior.
11. A Boolean data type must have a numeric value.
- True
- False (Correct)
Correct: A boolean data-type has only two possible values, which are typically written as true/false or yes/no, and these two values are used to represent binary conditions or states.
12. In long data, separate columns contain the values and the context for the values, respectively. What does each column contain in wide data?
- A specific data type
- A unique data variable (Correct)
- A specific constraint
- A unique format
Correct: There is a wide representation of data such that columns stand for unique data variables while rows represent specific observations or subjects. Meanwhile, in long data, value columns are separated from their contexts-with each individual row consisting of a feature-which represent a single observation of the entity.
13. A data analyst is working in a spreadsheet application. They use Save As to change the file type from .XLS to .CSV. This is an example of a data transformation.
- True (Correct)
- False
Correct: The transformation of the data is evident when a data analyst applies “Save As” to change the file type from .XLS to .CSV. This process modifies the data format to allow usage with different tools or systems for the purpose of analysis.
14. If you have a short time frame for data collection and need an answer immediately, you likely will have to use historical data.
- True (CORRECT)
- False
Correct: If you have a short space of time in which to collect your data and need an immediate answer, you are most likely going to turn to historical data, because it is already available and can be quickly analyzed without the need for further data collection.
15. Continuous data is measured and has a limited number of values.
- True
- False (CORRECT)
16. Internal data is more reliable because it’s clean.
- True
- False (CORRECT)
Correct: Internal data is more reliable because it lives within a company’s own systems.
17. A social media post is an example of structured data.
- True
- False (CORRECT)
Correct: Social media posts are examples of unstructured data – they usually have text, images, or videos without any structured format or arrangement. Therefore, it is difficult to analyze them further.
18. A data analyst at a book publisher is working on an urgent report for executives. They are using only historical data. What is the most likely reason for choosing to analyze only historical data?
- The data is constantly changing
- There is plenty of time to research historical data
- The project has a very short time frame (CORRECT)
- The data is unknown
Correct: The most likely reason for choosing to analyze only historical data is because the time period for the project is quite short, since historical data already exists and can be rapidly analyzed, while new data are going to take too much time to collect.
19. Which of the following is an example of continuous data?
- Box office returns
- Movie run time (CORRECT)
- Movie budget
- Leading actors in movie
Correct: Movie run time is an example of continuous data.
20. Why is internal data considered more reliable and easier to collect than external data?
- Internal data circumvents privacy restrictions.
- Internal data has much larger sample sizes.
- Internal data lives within a company’s own systems. (CORRECT)
- Internal data comes from people you know.
Correct: Internal data is generally thought to be more reliable and easier to collect as opposed to external data, since it comes from a company’s own infrastructures, allowing more immediate access, and is usually more accurate in terms of its representation of the operations of the company.
21. Which of the following is an example of structured data?
- Digital photo
- Relational database (CORRECT)
- Audio file
- Video file
Correct: A relational database is an example of structured data.
22. In long data, separate columns contain the values and the context for the values, respectively. What does each column contain in wide data?
- A specific data type
- A unique format
- A unique data variable (CORRECT)
- A specific constraint
Correct: All rows in a wide dataset represent observations, and each column corresponds to a distinct data variable. On the other hand, in long data, there are number columns and context value: variable/category name, where each row represents a single value and its associated context.
23. Which of the following questions collects nominal qualitative data?
- On a scale of 1-10, how would you rate your service today?
- Is this your first time dining at this restaurant? (CORRECT)
- How many times have you dined at this restaurant?
- How many people do you usually dine with?
Correct: Indeed, the question, “Is this your first time dining at this restaurant?” qualifies as one that elicits nominal qualitative data because the answers will throw the data into various categorical bins, e.g., yes or no, which is a form of qualitative data that does not fall into any order.
24. Nominal qualitative data has a set order or scale.
- True
- False (CORRECT)
Correct: Nominal qualitative data does not have a set order or scale.
25. Structured data is likely to be found in which of the following formats? Select all that apply.
- Audio file
- Digital photo
- Table (CORRECT)
- Spreadsheet (CORRECT)
Correct: Structured data is likely to be found in a table or spreadsheet.
Correct: Structured data is likely to be found in a table or spreadsheet.
26. Which of the following are examples of discrete data? Select all that apply.
- Movie running time
- Number of actors in movie (CORRECT)
- Box office returns (CORRECT)
- Movie budget (CORRECT)
Examples of discrete data include the number of actors in a movie, the box office revenue, and the budget for the movie.
The number of actors in a movie, box office revenue, and the movie budget are examples of discrete data.
The number of actors, the box office revenue, and the movie budget are examples of discrete data.
27. Fill in the blank: Data transformation enables data analysts to change the _____ of the data.
- value
- structure (CORRECT)
- accuracy
- meaning
Correct: Data transformation enables data analysts to change the structure of the data.
22. Why is internal data considered more reliable and easier to collect than external data?
- Internal data has much larger sample sizes.
- Internal data lives within a company’s own systems. (CORRECT)
- Internal data comes from people you know.
- Internal data circumvents privacy restrictions.
Correct: Internal data is often considered more reliable and easier to gather than external data as that is within the confines of the company’s own systems.
23. Fill in the blank: A Boolean data type can have _____ possible values.
- 10
- three
- two (CORRECT)
- infinite
Correct: A Boolean data type can have two possible values.
24. The following is a selection from a spreadsheet:
Name | Age | Occupation |
Agnes Shipton | 44 | Entrepreneur |
Ronaldo Vincent | 23 | Accountant |
Henry Sing | 36 | Editor |
Krishna Bowling | 62 | Graphic designer |
What kind of data format does it contain?
- Long
- Wide (CORRECT)
- Short
- Narrow
Correct!
25. Data transformation can change the structure of the data. An example of this is taking data stored in one format and converting it to another.
- True (CORRECT)
- False
Correct: It can be the conversion of data type such as converting structured to semi-structured. Example: Transforming one format into another is that type of data transformation.
Data Types and Structures Conclusion
In conclusion, analytics purely dependent on data collection for inputs, which can either be structured or unstructured, and needs to be properly prepared for analysis. Moreover, data types refer to the kind of data analysts may collect and examine while data structure explains how they would organize the data.
In fact, Google Data Analytics leverages Coursera to obtain its data and therefore facilitates giving insights. For more information regarding data collection and analysis at Google, you could also check out one of these Coursera courses.