Week 2: Bias, Credibility, Privacy, and Access
Bias, Credibility, Privacy, and Access INTRODUCTION
Welcome to the Google Data Analytics Professional Certificate in collaboration with Coursera! This is the section of the course that will teach you how to identify bias in data, ensure its credibility, and find open data resources. You will also learn about the interconnections between data ethics, privacy, and access.
The capacity to assess data for biases is a primal analytic skill. Knowing different types of bias, such as selection bias and confirmation bias, will help you judge the reliability of the sources of data well before making a decision. Finding good sources for the collection of data meant for analysis is equally important. However, those who understand that there are various forms of bias, leading to ineffective determination of truthfulness, will have a good judgment of both sources and data.
Learning Objectives:
- Know the process of reviewing data for bias.
Differentiate biased data from unbiased data. - Understand various types of biases, including confirmation, interpretation, and observer bias.
- Be able to identify characteristics of credible data sources – even when it comes to handling untidy data.
- Open data and its contributions in the current debate of data analytics.
- Key concepts in data ethics and data privacy.
Explain how both issues affect each other. - Understand the importance and advantages of anonymizing data.
- Develop awareness of challenges in accessibility with open data.
Test your knowledge on unbiased and objective data
1. Which of the following are examples of sampling bias? Select all that apply.
A clinical study includes three times more men than women.
- A clinical study includes three times more men than women. (Correct)
- A national election poll only interviews people with college degrees. (Correct)
- A survey of high-school-age students does not include homeschooled students. (Correct)
- An online marketing analytics firm stores data in a spreadsheet.
Correct: Such surveys would include those of high school students excluding homeschooled individuals, a national election poll conducted only among the college degree holders, and the clinical study, which offers thrice the participants per study in terms of male to female ratio.
2. Fill in the blank: The tendency to search for or interpret information in a way that validates pre-existing beliefs is _____ bias.
- observer
- confirmation (Correct)
- sampling
- interpretation
Correct: Confirmation bias means looking up or interpreting information in a manner that favors already counted beliefs.
3. Which of the following terms are also ways of describing observer bias? Select all that apply.
- Research bias (Correct)
- Experimenter bias (Correct)
- Perception bias
- Spectator bias
Correct: When the impressions or desires of a researcher within a given study changes how the researcher interprets the data from a study, this is known as observer bias, experimenter bias, or research bias.
Test your knowledge on Data Credibility
1. Which of the following are usually good data sources? Select all that apply.
- Governmental agency data (Correct)
- Social media sites
- Vetted public datasets (Correct)
- Academic papers (Correct)
Correct: Such trustworthy sources could include pubic datasets after proper vetting, peer-reviewed journal publications, as well as information from government agencies.
2. To determine if a data source is cited, you should ask which of the following questions? Select all that apply.
- Is the data relevant to the problem I’m trying to solve?
- Has this dataset been properly cleaned?
- Who created this dataset? (Correct)
- Is this dataset from a credible organization? (Correct)
Correct: Questions like “Is this dataset from a credible organization?” and “Who created this dataset?” can help judge the legitimacy and references of a data source.
3. A data analyst is analyzing sales data for the newest version of a product. They use third-party data about an older version of the product. For what reasons is this inappropriate for their analysis? Select all that apply.
- The data is not original (Correct)
- The data is biased
- The data is not current (Correct)
- The data is not accurate
Correct: This kind of third-party data is no longer suitable when it comes to product versions because it becomes both unoriginal and out of date.
Test your knowledge on data ethics and Privacy
1. Fill in the blank: _____ states that all data-processing activities and algorithms should be completely explainable and understood by the individual who provides their data.
- Transaction transparency (Correct)
- Currency
- Privacy
- Openness
Correct: Transactions will be made more open, all their data processing activities, algorithms to persons whose information has been provided, must be completely understood and properly explained.
2. A data analyst removes personally identifying information from a dataset. What task are they performing?
- Data collection
- Data anonymization (Correct)
- Data sorting
- Data visualization
Correct: It’s a process for doing data anonymization, removing identifiable fields from the data so that the data is not made specific to any individual’s identity.
3. Before completing a survey, an individual acknowledges reading information about how and why the data they provide will be used. What is this concept called?
- Currency
- Privacy
- Discretion
- Consent (Correct)
Correct: This is what is known as consent. It forms part of the basic principle of data ethics that states that a person must know how and for what reason his or her data will be used before giving such consent.
Test your knowledge on open Data
1. What aspect of data ethics promotes the free access, usage, and sharing of data?
- Transaction transparency
- Consent
- Openness (Correct)
- Privacy
Correct: Another such fundamental value in data ethics is openness, which further advocates free access to data, its free use and share.
2. What are the main benefits of open data? Select all that apply.
- Open data combines data from different fields of knowledge. (Correct)
- Open data makes good data more widely available. (Correct)
- Open data increases the amount of data available for purchase.
- Open data restricts data access to certain groups of people.
Correct: It increases the access to and availability of useful data in open formats, in addition to promoting synergy between datasets from different disciplines.
3. Universal participation is a standard of open data. What are the key aspects of universal participation? Select all that apply.
- All corporations are allowed to sell open data.
- Certain groups of people must share their private data.
- No one can place restrictions on data to discriminate against a person or group.
- Everyone must be able to use, re-use, and redistribute open data.
Correct: Universal participation means, in essence, that every individual will have the opportunity of using, reusing, and redistributing open data. Furthermore, those data must not include any barriers that are discriminatory toward individuals or groups.
Prepare Data for Exploration Weekly Challenge 2
1. Which of the following situations are examples of bias? Select all that apply.
- A researcher who surveys a sample group that is representative of the population
- A scholar who only reads sources that support their argument (Correct)
- A dancing competition judge who is a close friend of the dancer who wins the competition (Correct)
- A daycare that won’t hire men for childcare positions (Correct)
Correct: One can indicate that bias exists in the following ways: a reader who reads only the sources that support his argument, a daycare that refuses to hire men for childcare and a judge in a dancing competition who is a close friend of the dancer who wins.
2. Which type of bias is the tendency to always construe ambiguous situations in a positive or negative way?
- Sampling
- Interpretation (Correct)
- Observer
- Confirmation
Correct: Interpretation bias refers to an instance where ambiguous situations are interpreted consistently in terms of either a positive or negative perspective.
3. Which of the following are qualities of unreliable data? Select all that apply.
- Inaccurate (Correct)
- Biased (Correct)
- Vetted
- Incomplete (Correct)
Correct: Such data are misleading, incorrect, or partial as well as not impartial.
4. If a company uses your personal data as part of a financial transaction, you should be made aware of the nature and scale of the transaction. What concept of data ethics does this refer to?
- Privacy
- Consent
- Ownership
- Currency (Correct)
Correct: Such currency, however, provides the impetus to an individual to thereby inform him or herself about the monetary transactions created by respective property use of personal data and their magnitude.
5. Ownership is a key issue in data ethics. Who owns data?
- The individual who originally generates the data (Correct)
- The law enforcement agencies that enforce data protection laws
- The organization that invests time and money collecting, processing, and analyzing the data
- The government that passes data-protection legislation
Correct: The one who initially creates the information typically commands its usage, processing, and sharing.
6. The right to inspect, update, or correct your own data is part of which aspect of data ethics?
- Data consent
- Data openness
- Data privacy (Correct)
- Data ownership
Correct: The privilege to examine, amend, or rectify personal data has been one of the rights enshrined in securing the individual in data privacy.
7. Data anonymization applies to both text and images.
- True (Correct)
- False
Correct: Data anonymization relates to all personally identifiable information in text and image form.
8. A key aspect of open data is free access to people’s personal information.
- True
- False (Correct)
Correct: Basically, open data does not offer free access to the personal data of people; it stands for disclosure of data to the public user and the sharing and redistribution of information without compromising private or sensitive details.
9. A university surveys its student-athletes about their experience in college sports. The survey only includes student-athletes with scholarships. What type of bias is this an example of?
- Sampling bias (Correct)
- Observer bias
- Confirmation bias
- Interpretation bias
Correct: This is what we call sampling bias, i.e., a situation in which a sample will not be representative of the total population.
10. Fill in the blank: Data _____ refers to well-founded standards of right and wrong that dictate how data is collected, shared, and used.
- privacy
- anonymization
- ethics (Correct)
- credibility
Correct: In fact, data ethics can be defined as a set of precepts that govern the collection, dissemination, and utilization of data, and these principles benchmark their morality.
11. An individual who provides their data has the right to know and understand all of the data-processing activities and algorithms used on that data. This is called ownership.
- True
- False (Correct)
Correct: A person contributing their information has the right to know what treatment has been given to their data and what algorithms will be applied to this data. That is called transactionary transparency.
12. An employer accesses an employee’s credit report without their consent. This is not a violation of the employee’s privacy because they work at the company.
- True
- False (Correct)
Correct: The employer cannot get the employee’s credit report without his or her consent, since it would infringe on the employee’s data privacy rights.
13. Which of the following are commonly used methods for anonymizing data? Select all that apply.
- Hashing (Correct)
- Blanking (Correct)
- Deleting
- Masking (Correct)
Correct: Common techniques in the process of anonymizing data entail blanking, hashing, and masking.
14. The government of a large city collects data on the quality of the city’s infrastructure. Any business, nonprofit organization, or person can access the government’s databases and re-use or redistribute the data. Is this an example of open data?
- Yes (Correct)
- No
Correct: An example of open data is when anyone has the right to use, reuse, and redistribute the data.
15. Fill in the blank: A preference in favor of or against a person, group of people, or thing is called _____. It is an error in data analytics that can systematically skew results in a certain direction.
- data bias (Correct)
- data collection
- data anonymization
- data interoperability
Correct: Data bias refers to this kind of fallacy as an error that consistently produces an imbalanced result inclination that ultimately leads to wrong or fallacious conclusions.
16. Which of the following are types of data bias often encountered in data analytics? Select all that apply.
- Interpretation bias (Correct)
- Observer bias (Correct)
- Educational bias
- Confirmation bias (Correct)
Correct: In data analytics, there are different biases that can be encountered: observer bias, interpretation bias, and confirmation bias. All these kinds of biases can affect how data is input or interpreted.
17. Which of the following “C’s” describe qualities of good data? Select all that apply.
- Comprehensive (Correct)
- Cited (Correct)
- Current (Correct)
- Consequential
Correct: Valid, relevant, current, and good data is attributable.
18. In data ethics, consent gives an individual the right to know the answers to which of the following questions? Select all that apply.
- How will my data be used? (Correct)
- Why is my data being collected? (Correct)
- How long will my data be stored? (Correct)
- Why am I being forced to share my data?
Correct: In data ethics, consent is affording the individuals the right to know why their information has been collected, how it is going to be used, and how long it will be kept.
19. A clinic surveys a group of male and female patients about their experience with physical therapy. The survey does not include people with disabilities. Is the survey data biased?
- Yes (Correct)
- No
Correct: As a result of the non-comprehensiveness of the sample, survey data is biased and thus the representation is skewed.
20. What is data privacy?
- Providing free access, usage, and sharing of data
- Applying well-founded standards of right and wrong that dictate how data is collected, shared, and used
- Searching for or interpreting supporting information
- Preserving a data subject’s information and activity for all data transactions (Correct)
21. An individual who provides their data has the right to know and understand all of the data-processing activities and algorithms used on that data. This concept refers to which aspect of data ethics?
- Currency
- Transaction transparency (CORRECT)
- Consent
- Ownership
Correct: That indeed relates to transaction visibility in that any person who gives his data has the right to know and get to know about all data processing activities and algorithms that go on with his data.”
22. Interoperability is key to open data’s success. Which of the following is an example of interoperability?
- An analyst removes all personally identifiable information from a database
- A company restricts the use of a database to its own employees
- Different databases use common formats and terminology (CORRECT)
- A website charges a fee to access a database
Correct: Interoperability is seen as the different databases, which use common formats and terminology so that systems interact and data are exchanged easily among them.
23. In general, the usefulness of data decreases as time passes.
- True (CORRECT)
- False
Correct: But generally, as time passes, the value of data decreases. The best sources for data are up-to-date and relevant.
24. Which of the following are types of data bias often encountered in data analytics? Select all that apply.
- Educational bias
- Confirmation bias (CORRECT)
- Observer bias (CORRECT)
- Interpretation bias (CORRECT)
Correct: Observer bias, interpretation bias, and confirmation bias are types of bias often encountered in data analytics.
Correct: Observer bias, interpretation bias, and confirmation bias are types of bias often encountered in data analytics.
Correct: Observer bias, interpretation bias, and confirmation bias are types of bias often encountered in data analytics.
25. A university surveys its student-athletes about their experience in college sports. The survey only includes student-athletes with scholarships. What type of bias is this an example of?
- Sampling bias (CORRECT)
- Observer bias
- Confirmation bias
- Interpretation bias
Correct: It is an example of sampling bias when a sample does not accurately reflect the relevant population.
26. What is the process of protecting people’s private or sensitive data by eliminating identifying information?
- Data design
- Data anonymization (CORRECT)
- Data ethics
- Data governance
Correct: Essentially, data anonymization is the process of eliminating sensitive personal information, which include home address, phone number, credit card details, and medical records, from data sets.
27. Which of the following situations are examples of bias? Select all that apply.
- A researcher who surveys a sample group that is representative of the population
- A scholar who only reads sources that support their argument (CORRECT)
- A daycare that won’t hire men for childcare positions (CORRECT)
- A dancing competition judge who is a close friend of the dancer who wins the competition (CORRECT)
Correct: Bias can be found in many instances like a scholar who seeks to refer only to those sources that support the argument one intends forwarding, equal to a daycare that refuses to accommodate men into positions undertaken to provide childcare, not forgetting the judge of a dance competition who will appear to favor a particular dancer because they are friends ultimately affecting the outcome of the competition.
Correct: A scholar who only reads sources that support their argument, a daycare that won’t hire men for childcare positions, and a dancing competition judge who is a close friend of the dancer who wins the competition are examples of bias.
Correct: Biases throw up numerous examples; like a scholar reading selectively at sources that affirm his argument, a day care refusing to hire men for childcare work, and a dance competition judge who knows the winning dancer as a close friend.
28. The government of a large city collects data on the quality of the city’s infrastructure. Any business, nonprofit organization, or person can access the government’s databases and re-use or redistribute the data. Is this an example of open data?
- Yes (CORRECT)
- No
Correct: This is another type of open data that can be used, reused, and redistributed by anyone without any difficulty.
Bias, Credibility, Privacy, and Access CONCLUSION
Analyst skills include checking for bias and credibility by the data. In this section, you have come to know the different types of data bias and some ways on how to maintain the credibility of data.
In addition, you have been introduced to open data, i.e.: data ethics and privacy-a relationship behind it. These are some competences which very much matter in your successful growth as a data analyst. You can also roll into such studies by taking the course on Coursera.