Module 1: Introduction to data science concepts
Introduction to data science concepts
An introduction to beginning the Google advanced data analytics certification: main objectives and components. History of data science: the subject will be used effectively to discuss modern-day problems.
Objectives for Learning
- Having clear ideas about the programme structure and expectations.
- Explore the nature and prospects of a career in data analytics.
- Concepts and learning outcomes that will be emphasized throughout the programme.
This is a short introduction to the Google Advanced Data Analytics Certificate program and its absorption into data science evolution and modern applications as problem solvers.
Learning Objectives
- Familiarize with the program structure, goals, and expectations.
- Learn about the strong career and employment prospects for data professionals in the field.
- Familiarize with the core themes and outcomes, which the program will address.
PRACTICE QUIZ: ASSESS YOUR READINESS FOR THE ADVANCED ANALYTICS DATA CERTIFICATE
1. What is the key difference between qualitative and quantitative data?
- Qualitative data is subjective; quantitative data is specific.
- Qualitative data measures qualities and characteristics; quantitative data measures numerical facts.
- Qualitative data is about the quality of a product or service; quantitative data is about how much of that product or service is available in the marketplace. (CORRECT)
- Qualitative data describes the kind of data being analyzed; quantitative data describes how much data is being analyzed.
Correct: Qualitative data captures qualities and characteristics and contains descriptions rather than numbers. Quantitative data concerns numerical measurable facts and values.
2. Which of the following statements accurately describes wide and long data? Select all that apply.
- Long data subjects can have data in multiple columns.
- Wide data subjects can have multiple rows that hold the values of subject attributes.
- Wide data subjects can have data in multiple columns. (CORRECT)
- Long data subjects can have multiple rows that hold the values of subject attributes. (CORRECT)
Correct: Wide data holds subject information in different columns as each attribute is stored in separate columns, while long data hold information on subjects in multiple rows with each row representing an only attribute value related to that subject.
3. Structured data is likely to be found in which of the following formats? Select all that apply.
- Audio file
- Digital photo
- Spreadsheet (CORRECT)
- Database table (CORRECT)
Correct: Structured data is organized in a particular schema such as rows and columns that make it possible to store it and analyze it mostly in tables or spreadsheets. Enroll in the Google Data Analytics Certificate program for a deeper understanding of structured data.
Correct: Structured data are mainly defined in the format rows and columns. It is mostly represented in the form of tables or even spreadsheets. To expand your knowledge about structured data: check course three of the Google Data Analytics Certificate.
4. Fill in the blank: A Boolean data type can have_____ possible value(s).
- three
- infinite
- one
- two (CORRECT)
Correct: A Boolean data type can have two possible values.
5. What is the term for the individuals who have invested time and resources in a project and are interested in its outcome?
- Executives
- Subject-matter experts
- Stakeholders (CORRECT)
- Project sponsors
Correct: Stakeholders are the individuals and/or groups who invest time, resources or interest towards a specific project towards which communicate and concern results stemming from the project work.
6. When collecting data for study, what are some reasons to consider sample size? Select all that apply.
- To eliminate certain segments of a population
- To include as many participants as possible in the study
- To make sure a few unusual responses don’t skew results (CORRECT)
- To collect data that represents a diverse set of perspectives (CORRECT)
Correct: It ensures the appropriate sample size to ensure that data would cover a variety of perspectives for findings to be less skewed and conclusions more accurate.
7. The SMART methodology can be used to ask a question that promotes change. What type of Smart question leads to change?
- Action-oriented (CORRECT)
- Motivational
- Results-focused
- Transformational
Correct: A SMART question that promotes change is action-oriented.
8. Which of the following inquiries are leading questions? Select all that apply.
- How did you learn about our company?
- What do you enjoy most about our service? (CORRECT)
- How satisfied were you with our customer representative? (CORRECT)
- In what ways did our product meet your needs? (CORRECT)
Correct: Lead-in questions include “How satisfied were you with our representative?” “In what ways did our product meet your needs?” “What do you enjoy most about our service?” By these questions, respondents lead to a tailor-made answer. The wording often suggests an implied or expected response.
9. What are the key characteristics of a metric? Select all that apply.
- Metrics are unorganized collections of facts.
- Metrics are quantifiable. (CORRECT)
- Metrics can be used to evaluate performance. (CORRECT)
- Metrics are used for measurement. (CORRECT)
Correct: Any kind of measuring data point which is used to ascertain performance, to chart some progress, and/or to assess results against some defined objective is called metric.
10. Which type of bias is the tendency to construe ambiguous situations in a positive or negative way?
- Confirmation bias
- Cultural bias
- Interpretation bias (CORRECT)
- Observer bias
Correct: Any other name for interpretation bias might be a predisposition for interpreting situations or information under ambiguous circumstances according to whether an individual holds some feelings, beliefs, or expectations. It results in a distorted or biased perception of the situation, either positive or negative.
11. Before completing a survey, an individual acknowledges reading information about how and why the data they provide will be used. What concept does this describes?
- Privacy
- Transaction transparency
- Openness
- Consent (CORRECT)
Correct: This article has discussed the notion of consent in data ethics where people should know how and for what purpose their personal data will be used prior to giving it.
12. Which spreadsheet tool changes how cells appear when values meet a specific condition?
- Alternating colors
- Protected ranges
- Conditional formatting (CORRECT)
- Data validation
Correct: In-spreadsheet tools alter the appearance of cells depending upon the condition met by the value in the cell. Conditional formatting becomes useful in bringing the focus to the important data and making interpretation and analysis easier.
13. Fill in the blank: In a spreadsheet, the SPLIT function divides a text string around a ___, then puts a each fragment into a new, separate cell.
- Delimiter (CORRECT)
- substring
- indicator
- mark
Correct: The SPLIT Function in Worksheet will split the text string according to the specified delimiter and put the resulting fragment into individual cells. It is used when you want the data to be combined into one cell and still be able to manage it.
14. Fill in blank: A Programming language is a system of words and symbols used to___ for computers.
- detect malware
- repair infrastructure
- install hardware
- write instructions (CORRECT)
Correct: Computer programming language is an abstract mechanism which defines a set of rules, words and symbols which is applied to write instructions that can further be understood and executed by a computer. It creates a medium of scientific, technological, mundane, and other humanistic tasks with machines.
15. What are the main benefits of using a programming language to work with data? Select all that apply.
- Automate decision-making
- Easily reproduce and share work (CORRECT)
- Clarify the steps of analysis (CORRECT)
- Save time (CORRECT)
Correct: A programming language offers a user many advantages in data analysis, namely making easy reproduction and sharing, saving time, and clarifying the steps of the analysis.
16. In order for code to work properly, its necessary to follow the predetermined structure of the coding language. This includes all required words and symbols, as well as their proper placement. What is this structure called?
- Syntax (CORRECT)
- Standard
- Script
- Symbol
Correct: Following the syntax of a programming language is a necessity if the code wants to function correctly. That is, it must use the correct words and symbols in the right order.
17. What is the term for programming code that is freely available and may be modified and shared by the people who use it?
- Open-source (CORRECT)
- Common-design
- Non-dependent
- One-access
Correct: Open-source coding is something opened to the general public, which people can use to modify and share with one another.
18. Data professionals use programming languages to enable which of the following? Select all that apply.
- Data governance
- Data transformation (CORRECT)
- Data cleaning (CORRECT)
- Data visualization (CORRECT)
Correct: Programmers use programming languages for the provision of data transformation, cleansing, and visualization by data professionals.
19. What type of data visualization should be used to demonstrate how often data values fall onto certain ranges?
- Bar chart
- Correlation chart
- Histogram (CORRECT)
- Tree map
Correct: A histogram is a representation of how often data values fall into certain ranges.
20. Why is it more effective to label a data visualization instead of using a legend? Select all that apply.
- Labels help keep people’s attention on relevant data by redirecting their focus away from outliers.
- Labels can be placed near the data, whereas legends are typically positioned away from the data. (CORRECT)
- Labels make the data visualization more accessible because they don’t rely on the ability to interpret color. (CORRECT)
- Labels allow for text explanations to be placed directly on the visualization. (CORRECT)
Correct: There are many reasons which support the claim that the use of labels for data visualization is always better than legends: the labels could be placed in close proximity to the data; they offer accessibility; and there can be textual explanations within the visual field themselves.
21. Which of the following are appropriate uses for filters in data visualization tools? Select all that apply.
- Hiding outliers that do not support the hypothesis
- Limiting the number of rows or columns in view (CORRECT)
- Highlighting individual data points (CORRECT)
- Providing data to different users based on their particular needs (CORRECT)
Correct: Filters can be used to highlight particular data points, shorten row and column visibility, and personalize data viewed by different users according to their specific needs.
22. What is data science?
- The collection, transformation, and organization of data in order to draw conclusions, make predictions, and drive informed decision-making
- A process used to solve complex problems in a user-centric way
- A field of study that uses raw data to create new ways of modeling and understanding the unknown (CORRECT)
- A tool for organizing data elements and how they relate to one another.
Correct: Data science is a sensory science that enables the best utilization of unpolished data in creating new methods for modeling the unknown and comprehending the unknown.
23. A dashboard is designed to share insights about the housing market in a city. What type of data visualization would be most effective at demonstrating how the city’s annual home sales have risen over time?
- Line chart (CORRECT)
- Scatter plot
- Pie chart
- Area chart
Correct: A most effective means of displaying how sales for homes in the city have accumulated over the years is to create a line chart.
24. What type of visualizations enable the data in a presentation to automatically update and change over time?
- Customized
- Static
- Discrete
- Dynamic (CORRECT)
Correct: Dynamic visualization makes it automatically updated and changes the data embedded in a presentation over time.
25. A data visualization reveals two variables in the data that rises and fall at the same time. When variables are related in this way, what is likely happening?
- Correlation (CORRECT)
- Causation
- Divergence
- Polarity
Correct: Two variables are said to have a correlation when they move in tandem and that means both are increased or decreased at the same time. Thus, correlation is the degree with which changes in one variable relate to changes in a second variable.
QUIZ: MODULE 1 CHALLENGE
1. To gain insights about projects and processes, organizations acquire, organize, and interpret data. What type of business professionals help complete these tasks?
- Data professionals (CORRECT)
- Clients
- Information technology professionals
- Stakeholders
Correct!
2. Fill in the blank: Machine learning differs from automation in that it enables users to express how to perform a task by using _______ instead of explicit instructions.
- mapping
- schemas
- data (CORRECT)
- sampling
Correct!
3. A company evaluates its data using metrics in order to achieve what goals? Select all that apply.
- To generate more data
- To create predictive models (CORRECT)
- To identify trends (CORRECT)
- To inform best practices (CORRECT)
Correct!
4. What are some key advantages of the python programming language? Select all that apply.
- It was created within the data community.
- It is one of the easiest programming languages to learn. (CORRECT)
- Its formatting is visually uncluttered. (CORRECT)
- It can be used to deploy data-driven applications. (CORRECT)
Correct!
5. Fill in the blank: Jupyter Notebook is a web-based computing platform that enables data professionals to _____ in real-time.
- iterate on a business process
- run code (CORRECT)
- query databases
- visualize data
Correct!
6. A data professional prepares to give a presentation to their colleagues. They want to communicate the story told by the data using charts and graphs made with Tableau. This helps them simplify highly technical information for non-technical stakeholders. Which of the following communication practices does this scenario describe? Select all that apply
- Creating a statistical model with code
- Enriching data insights with visual elements (CORRECT)
- Sharing complex data (CORRECT)
- Explaining data using a graphical interface (CORRECT)
Correct!
7. Fill in the blank: _______ is a way of distributing computational tasks over a bunch of nearby processors that is good for speed and resilience and does not depend on a single source of computational power.
- Edge computing (CORRECT)
- Virtual reality
- Quantum computing
- Artificial intelligence
Correct!
8. Which of the following statements accurately describes machine learning? Select all that apply.
- Professionals use machine learning to express how to perform a task by using explicit instructions.
- Professionals use machine learning to express how to perform a task by using data. (CORRECT)
- Machine learning requires iteration to achieve desired outputs. (CORRECT)
- Machine learning involves training a model. (CORRECT)
Correct!
9. What is the Jupyter Notebook?
- A web-based computing platform for running code in real time (CORRECT)
- A file containing a chronologically ordered list of modifications made to a project
- A computer programming language used to communicate with a database
- A range of values that conveys how likely it is that a statistical estimate reflects the population
Correct!
10. A data professional uses Tableau to create data visualizations that will help people understand their analysis results. They communicate the data insights using the visualizations, which helps non-technical stakeholders gain important insights. Which of the following communication practices does this scenario describe? Select all that apply
- Creating a statistical model with code
- Enriching data stories with visual elements (CORRECT)
- Simplifying data using a graphical interface (CORRECT)
- Sharing complex data (CORRECT)
11. Fill in the blank: Edge computing is a way of distributing ____ over a bunch of nearby processors that is good for speed and resilience and does not depend on a single source of computational power.
- computational tasks (CORRECT)
- coding libraries
- models
- data sources
Correct!
12. To gain insights, businesses rely on _____ to acquire, organize, and interpret the data that informs internal projects and processes.
- stakeholders
- data professionals (CORRECT)
- clients
- information technology professionals
Correct!
13. What process enables users to express how to perform a task by using data instead of explicit instructions?
- Statistics
- Machine learning (CORRECT)
- Data Science
- Visualization
Correct!
14. Fill in the blank: Before creating predictive models to identify trends and inform best practices, a company must _____ using metrics.
- iterate on its processes (CORRECT)
- encode its data
- evaluate its data
- present findings to stakeholders
Correct!
15. A data professional wants to strengthen their communication skills. They study methods for simplifying highly technical information and telling compelling data stories. They also practice using Tableau to design compelling charts and graphs. Which of the following communication practices does this scenario describe? Select all that apply.
- Creating a statistical model with code
- Sharing complex data (CORRECT)
- Enriching data insights with visual elements (CORRECT)
- Explaining data using a graphical interface (CORRECT)
Correct!
16. What web-based computing platform can be used by data professionals when interacting with Python?
- SQL
- HTML
- R Markdown
- Jupyter Notebook (CORRECT)
Correct!
17. Fill in the blank: Data professionals use _____ to work efficiently with large datasets.
- programming languages (CORRECT)
- schemas
- data visualizations
- spreadsheets
Correct: Data analytics professionals use programming languages to work efficiently within large datasets.
18. Before creating predictive models to identify trends and inform best practices, a company must evaluate its data using what type of measurement?
- SMART methodology (INCORRECT)
- Metrics
- Attributes
- Best practices (INCORRECT)
Seraching for correct answer…
19. What are some key advantages of the Python programming language? Select all that apply.
- It has an enormous online community and other helpful resources. (CORRECT)
- It is very flexible. (CORRECT)
- It is one of the easiest programming languages to write. (CORRECT)
- It was created within the data community.
Correct!
20. Fill in the blank: Edge computing is a way of distributing computational tasks over a bunch of nearby processors that is good for _______ and resilience and does not depend on a single source of computational power.
- Speed (CORRECT)
- augmented reality
- algorithms
- artificial intelligence
Correct!
21. What is the term for someone who explores, cleans, analyzes, and visualizes data?
- Information technology professional
- Client
- Data professional (CORRECT)
- Stakeholder
Correct!