Module 2: Transforming Data in Power BI

Spread the love

INTRODUCTION – Transforming data in Power BI

In this module, you will focus on extensive practice in cleaning and transforming data—an essential foundation for preparing datasets for advanced data modeling techniques in future courses.

Through hands-on exercises and practical examples, you will address common data quality challenges, such as missing values, inconsistencies, and outliers. You will also apply effective transformation techniques to ensure data is properly formatted for analysis. By mastering these skills, you will be prepared to handle the complexities of data modeling with confidence and precision.

Learning Objectives:

  • Identify and perform data cleaning tasks using Power Query.
  • Simplify and transform data effectively using Power Query.
  • Utilize the Applied Steps list to undo or re-order steps.
  • Merge multiple data sources in Power BI.

SELF-REVIEW: PREPARING A DATASET

1. Several columns in your worksheet contain missing or null values. Which of the following options must you type in the Value to Find field to locate these values?

  • 0
  • Null (CORRECT)
  • Missing

That’s correct. You can locate missing, or NULL values by typing null in the Value to Find field in the Replace Values feature.

2. What steps did you take to address inconsistencies in date columns in your worksheet? Select all that apply.

  • You changed the data type of the column to Text by clicking on the DataType icon next to the column name.
  • You dropped duplicate rows by selecting Remove Duplicates inside the Remove Rows menu.
  • You changed the data type of the column to Date by clicking on the Data Type icon next to the column name. (CORRECT)
  • You replaced any empty values with a default date using the Replace Values tool. (CORRECT)

That’s correct! After you set the right data type for the date column, the appearance of the values within the column changed significantly. 

That’s correct! You were able to set the default date value for the empty values in a date field by using the Replace Values tool. By performing this operation, you replaced another non-date value of Empty with a default date value. You could then change the data type of the column.

3. True or false: Once you completed all the data cleaning steps, you clicked Apply or Close & Apply to apply all the transformations you made. 

  • False
  • True (CORRECT)

Correct! You clicked Apply or Close & Apply to apply all the transformations. If you want to continue working in Power Query, click Apply. Only click Close & Apply when you are finished working in Power Query.

KNOWLEDGE CHECK: INTRODUCTION TO TRANSFORMING DATA IN POWER BI

1. Which of the following operations are steps in the data transformation process? Select all that apply.

  • Creating insights from data
  • Cleaning data (CORRECT)
  • Shaping data (CORRECT)
  • Removing data (CORRECT)

That’s correct! Cleaning data is a fundamental step in the data transformation process. It involves identifying and addressing data inaccuracies, such as anomalies, empty fields, duplicate entries, improperly formatted data, and misaligned or irrelevant columns within your dataset.

That’s correct! Shaping data is another critical step in the data transformation process in Power BI. This step includes tasks like pivoting, unpivoting, merging data from multiple sources, and applying various transformations to structure the data appropriately for analysis and visualization.

That’s correct! Removing data is a basic yet essential operation in data transformation. For instance, a table might contain hundreds of fields, making report design cumbersome. By selecting only the necessary columns and removing irrelevant ones, you can streamline the dataset for report creation.

2. Which of the following data types are part of the number type group? Select all that apply.

  • Text
  • Whole number (CORRECT)
  • Binary
  • Fixed decimal number  (CORRECT)

That’s correct! A Whole Number is an integer type that does not include any digits to the right of the decimal point.

Correct! The Decimal Number type supports four digits to the right of the decimal point and allows for up to 19 digits of significance. It is also referred to as the Currency type or Decimal (19,4).

3. Which one of the following features are used to track, re-order or delete the steps completed in Power Query?

  • Applied Steps (CORRECT)
  • Properties
  • New Source
  • Queries

That’s correct! Applied Steps is used to track, re-order or delete the steps completed.

4. Which of the following options can be used for Power Query Optimization? Select all that apply.

  • Filter rows in the queries. (CORRECT)
  • Choose only the columns that you will use in the data model. (CORRECT)
  • Choose the right data types for columns. (CORRECT)

That’s correct! Filtering and limiting the number of rows has a significant effect on performance. So, it is considered as one of the basic options in optimization.

That’s correct! One of the fastest ways of performing optimization is limiting the number of columns to only the ones you will use in the data model.

That’s correct! Power BI detects the data type of a column by evaluating the rows of the data source in the initial load. However, extra work might be needed to set the right data types for some of the columns. If the right data type is not set for a specific column then type conversion may be needed to handle operations, and this can cause performance losses. So, setting the right data types is important for optimization.

5. You have added a new source that displays NaN (Not a Number) values in Power Query. Which of the following issues could occur if these are not resolved? Select all that apply.

  • Skewed statistical results (CORRECT)
  • Incorrect calculations (CORRECT)
  • Misleading insights (CORRECT)
  • Normal distribution in data

That’s correct! Skewness is a measure used to understand the distribution of a dataset and can be used to identify anomalies or outliers in a dataset. NaN values cause this type of imbalance in the dataset.

That’s correct! NaN values lead to errors in calculations due to type conversion problems. 

That’s correct! Misleading statistics is a term used to describe the misuse of numerical data, either intentionally or by error. Such misuse results in information that can be deceptive, creating false narratives around a particular topic. Therefore, NaN values cause misleading insights.

6. Which of the following statements describes the pivot operation in Power BI? Select all that apply.

  • The pivot operation converts data from a narrow format to a wide format by reorganizing the data structure. (CORRECT)
  • The pivot operation is used for data aggregation and summarization by converting rows into columns. (CORRECT)
  • The pivot operation supports data normalization by converting column headers into row values.
  • The pivot operation involves transforming data from a wide format to a narrow format.

That’s correct! The pivot operation in Power BI transforms data from a narrow format (with fewer columns) to a wide format (with multiple columns) by reorganizing the data structure.

That’s correct! The pivot operation is often used to summarize and aggregate data and cross-tabulations by converting rows into columns based on specific criteria or values.

SELF-REVIEW: APPENDING TWO TABLES

1. Which tool collects and analyzes log data to monitor critical activities in an organization?

  • AdventureWorksSales (CORRECT)
  • Consolidated Sales
  • OtherSales 

That’s correct! You select AdventureWorksSales as the first table during the appending queries.

2. How many columns are there in the new query ConsolidatedSales, after you combined the AdventureWorksSales and OtherSales queries?

  • 16
  • 24
  • 8 (CORRECT)

That’s correct! There are 8 columns in the new query ConsolidatedSales, which can be verified at the bottom left in the status bars.

3. True or False:  To append all respective columns between two queries successfully, you first need to make sure that there is an equal number of columns, the same column names and the same or convertible data types respectively.   

  • False
  • True (CORRECT)

That’s correct! You first need to select an equal number of columns from the data sets and update the data sets with the same column names and same or convertible data types respectively. Then you will be ready for a new consolidated query.

4. How many rows are there in the new query Consolidated Sales, after you combined AdventureWorksSales and OtherSales queries?

  • 124 (CORRECT)
  • 100
  • 24

That’s correct! There are 124 rows in the new query Consolidated Sales, which can be verified at the bottom left in the status bars.

5. Data transformation is the process of preparing data for analysis.

  • True (CORRECT)
  • False

That’s correct. Data from different sources can be untidy, incomplete, and inconsistent, making it difficult to draw meaningful insights. That’s why data transformation is a crucial step. It helps you prepare data for analysis.

6. What is the primary purpose of Power Query in Power BI?

  • To create insightful visualizations and reports.
  • To predict future trends and patterns in the data.
  • To facilitate seamless data preparation for analysis and visualization. (CORRECT)
  • To automate the process of sharing reports and dashboards.

That’s correct! Power Query’s user-friendly interface and tools make it easier to connect to different data sources, perform various data transformations, and create data models.

7. What is the purpose of the Applied Steps section in the Power Query Editor?

  • To preview the data after the applied transformations.
  • To display a list of all the queries in your Power BI project.
  • To show the sequence of transformations applied to the selected query. (CORRECT)
  • To provide a graphical user interface for designing and managing queries.

That’s correct! The Applied Steps section in the Power Query Editor shows the sequence of data transformations applied to the selected query. It allows users to track and understand the order of data manipulation. Additionally, it can be used to modify, delete, or reorder steps as needed to refine the transformation process.

8. How does removing unnecessary columns from a dataset benefit the data analysis process?

  • It reduces the dataset size, making it easier to manipulate and process. (CORRECT)
  • It makes the dataset look more visually appealing.
  • It changes the structure of the dataset entirely.
  • It creates new columns with more relevant data.

That’s correct! By removing unnecessary or repetitive columns, you can focus on the most relevant data for your analysis. This reduces the dataset size, simplifies the data structure, and enhances efficiency, allowing for easier manipulation and faster processing.

9. Which of these issues in Power Query within Power BI is related to the presence of empty cells in your dataset?

  • Inconsistent data types
  • Duplicate rows
  • Missing or null values (CORRECT)
  • Data entry errors

Correct! Missing or null values refer to cells in a dataset that lack data. These gaps can arise due to various reasons, such as omissions during data entry, loss of data during extraction or transformation processes, or intentional exclusion of data for confidentiality purposes.

10. Which one of the following describes the reason for Adio’s request to combine two different sales datasets together?

  • Enriching data
  • Creating relationships
  • Consolidating information (CORRECT)
  • Enhancing analysis

That’s correct! Consolidating information means getting information from various sources or tables together into a single table and provide a unified view of the data.

11. Which of the following is the operation of putting two or more tables or queries in one master table together?

  • Appending (CORRECT)
  • Merging
  • Combining

That’s correct! Appending is adding rows of data to another table or query.

SELF-REVIEW: MERGING TWO DATA SOURCES

1. How many columns are there in the final Consolidated Sales dataset, after you merged the (Sales) and (Product) queries, add (Product.Product) and remove (Reseller), (Employee) and (Sales Territory Key) columns?

  • (CORRECT)
  • 10

That’s correct! There are 7 columns in the Sales query, which can be verified at the bottom left in the status bar.

2. True or false: After you import two data sources to combine, you observed that there is no NULL value ProductKey column for the Sales table. You applied Left outer join to merge the tables. If you had used Inner join, there would not have been a change in the number of rows because there are no null values in the product column.

  • True (CORRECT)
  • False

That’s correct! The inner join operation includes only the matching records from both the Sales table and the Product table. If the Product column in the Sales table contains no null values, all rows will match, resulting in the same number of rows as the left outer join query.

3. Exactly how many rows are there in the Sales query, after you merged the Sales and Product queries?  

  • 999
  • 57851 (CORRECT)
  • 47655 

That’s correct! There are 57851 rows in Sales query after merge.

4. Which of the following options can be considered as the purposes of join operation? Select all that apply.

  • Integrating data (CORRECT)
  • Ensuring consistency (CORRECT)
  • Exploring relationships (CORRECT)
  • Creating insights from data

That’s correct! Joining tables during a merge operation enables the integration of data from different sources or systems.

That’s correct! Ensuring consistency validates the data and ensures that the appended tables align properly.

That’s correct! Joining tables in a merge operation allows you to explore relationships and connections between various entities.

5. Which type of JOIN operation includes only the matching records from both joined tables?

Select the correct option.

  • INNER JOIN (CORRECT)
  • LEFT OUTER JOIN
  • FULL OUTER JOIN

Correct! An INNER JOIN only includes the matching records from both joined tables.

6. Which of the following pairs can be considered as a master table – pre defined table foreign key relationship? 

Select all that apply.

  • Employee – Department (CORRECT)
  • Customer – City (CORRECT)
  • Order – Status (CORRECT)
  • Customer – Surname

That’s correct! Department information will be stored in a separate table and used by a specific employee when needed.

That’s correct! City information will be stored in a separate table and used by a specific customer when needed.

That’s correct! Status information will be stored in a separate table and used by a specific order when needed.

KNOWLEDGE CHECK: TRANSFORMATIONS BY COMBINING DATA

1. Which feature allows you to combine related data between differently structured data sources in Power Query?

  • Grouping
  • Merging (CORRECT)
  • Appending

That’s correct! Merging allows you to match related data between data sources.

2. Which of the following can be considered as a purpose of merging data with joins? Select all that apply:

  • Integrating Data (CORRECT)
  • Expanding Data
  • Exploring Relationships (CORRECT)
  • Matching Related Data (CORRECT)

That’s correct! Joining tables during a merge operation allows you to integrate data from different sources.

That’s correct! Joining tables in a merge operation helps you to explore relationships and connections between different entities.

That’s correct! The join condition determines how the rows from the two tables are matched and combined.

3. True or False. The full outer join is useful when you want to retrieve all the records from both tables, regardless of whether they have matching values in the join condition.

  • True (CORRECT)
  • False

That’s correct! Full outer join retrieves all the records from both tables without matching the column values for the data sources.

4. You import 4 Microsoft Excel tables named Sales, Product, Reseller and Employee into Power Query. 

Sales contains the following columns:

  • SalesOrderNumber
  • OrderDate
  • ProductKey
  • ResellerKey
  • EmployeeKey
  • SalesTerritoryKey
  • Quantity
  • Unit Price
  • Sales
  • Cost

Your manager asked you to list Sales data with the descriptive information from the Product, Reseller and Employee tables for the columns which have the suffix “Key”. What should you do to accomplish this task? Select all that apply:

  • Join Sales and Reseller  tables based on the EmployeeKey column.
  • Check the column types of (ProductKey), (ResellerKey) and (EmployeeKey) in the Sales, Product, Reseller and Employee tables. (CORRECT)
  • Merge the Sales table with the Product, Reseller and Employee tables respectively. (CORRECT)
  • Join Sales and Product tables based on the ProductKey column. (CORRECT)

It seems like you’re working with SQL joins and table merging concepts. You’re absolutely right in checking the column types before performing a join to ensure compatibility and avoid errors. Merging the tables based on the common columns, like ProductKey, ensures you can accurately retrieve and combine data across the different tables.

If you have any further questions or need help with SQL queries or any other related topic, feel free to ask!

5. You import two Microsoft Excel tables named Product and Categories into Power Query. There are 319 rows in the Product table. Nine of the total rows in the Product table do not have Categories data, so the CategoryKey of these rows has NULL values. 

  • Your manager asked you to list Product data by showing their category names including the rows which have NULL values in CategoryKey column. What should you do to accomplish this task?
  • Merge Product and Categories tables based on CategoryKey column by choosing Inner Join in the join kind dropdown.
  • Merge Product and Categories tables based on ResellerKey column.
  • Merge Product and Categories tables based on CategoryKey column by choosing Left Outer Join in the join kind dropdown. (CORRECT)

That’s correct! CategoryKey is the common column between Product and Category tables, and your common column selection is right and to show all the products with or without categories, you have to select Left Outer Join in the join kind dropdown.

MODULE QUIZ: TRANSFORMING DATA IN POWER BI

1. True or False: A join is a method for combining columns from two or more tables based on a related column.

  • True 
  • False  (CORRECT)

2. True or False: You can use Append Queries to combine customer data from Adventure Works database and potential customers data from a separate Excel file.

  • True (CORRECT)
  • False

Correct! Append Queries adds rows of one table or query to another table or query.

3. True or False: A join key is a column that exists in only one table being joined. 

  • True 
  • False (CORRECT)

That is correct! A join key is a column that exists in both tables being joined, allowing for the connection between the tables.

4. Which join type returns all the records from the left table and the matching records from the right table?

  • Left outer join (CORRECT)
  • Right outer join 
  • Full outer join

That is correct! A left outer join returns all the records from the left table and the matching records from the right table.

5. True or False: The merge operation in Power BI allows you to combine tables based on related columns.

  • True (CORRECT)
  • False

That is correct! The merge operation is used to combine tables based on related columns.

6. True or False: Joining two tables requires that the primary key from the first table matches a corresponding key in the second table. 

  • True (CORRECT)
  • False

Exactly! For a join to work properly, the key from one table (usually the primary key) must match the related key in another table (often a foreign key). This ensures that the data from both tables aligns correctly, allowing for accurate and meaningful results when they are combined.

7. What is the purpose of transforming data in Power BI?

  • To create visualizations and reports.
  • To share dashboards and reports.
  • To clean, filter, and manipulate data for analysis. (CORRECT)

That is correct! Transforming data in Power BI involves cleaning, filtering, and manipulating the data to make it suitable for data analysis.

8. Data transformation involves ________ and ________ data to fulfill analysis requirements.

  • cleaning and visualizing
  • modifying and enhancing (CORRECT)
  • collecting and storing

That is correct! In Power BI, data transformation involves modifying and enhancing data to fulfill analysis requirements.

9. Your company, Adventure Works, has salesperson data in a database with a SalesID column. The target sales amount values of each salesperson are being stored in a separate Excel file, also containing the SalesID column. Your manager asks you to create a list which displays the names of their salespeople and their annual target sales amount values in the same report. How would you complete this task?

  • Pivot the Target sales SalesID column and combine the two queries.
  • Merge the tables with left outer join using the common SalesID column from the two data sources. (CORRECT)
  • Append rows of the Salesperson and Target sales amount tables.

That is correct! You have to match the 2 tables on a common column and then merge them.

10. A join is a method to combine ________ from two or more tables based on ________.

  • “columns” and “a common value” 
  • “datasets” and “a specific attribute” 
  • “rows” and “a related column”  (CORRECT)

That is correct! A join is a method to combine rows from two or more tables based on a related column.

11. True or False: In Power BI, the left join type includes all rows from both tables, including unmatched rows.

  • True 
  • False  (CORRECT)

That is correct! The left join type in Power BI includes all rows from the left table and matching rows from the right table, but it does not include unmatched rows from both tables.

12. Which of the following options can be considered as the purpose of a join operation?

  • Ensuring consistency (CORRECT)
  • Adding new relationships 
  • Creating insights from data

13. True or False: In Power BI, data transformation is only used for creating reports and visualizing data.

  • True
  • False (CORRECT)

That is correct! Data transformation in Power BI includes tasks such as cleaning, shaping and combining data from different sources before analysis and visualization.

14. What is the purpose of cleaning and formatting data in Power BI?

  • Creating reports and dashboards.
  • Removing inconsistencies and errors in the data. (CORRECT)
  • Creating visualizations and reports.

That is correct! Cleaning and formatting data is done to remove inconsistencies, errors and improve data quality.

15. A dataset contains two tables with related SalesID columns. The second table includes date information for the sales. What should you do to visualize the column containing date data in the second table?

  • To merge columns
  • To format data
  • To join two tables (CORRECT)

That is correct! A join is a method to combine rows from two or more tables based on a related column.

16. What is a join in the context of Power BI and data analysis?

  • A method to combine rows from two or more tables based on a related column. (CORRECT)
  • A method to filter data based on specific criteria.
  • A method to organize data in columns and rows. 

That is correct! A join is a method to combine rows from two or more tables based on a related column. 

17. In Power BI, the ________ join type returns only the matching rows from both tables, excluding unmatched rows.

  • Inner  (CORRECT)
  • Left 
  • Full 

That is correct! In Power BI, the Inner join type returns only the matching rows from both tables, excluding unmatched rows.

18. Before appending, a join can be performed to check for any ________ in the common columns or keys.

  • Correlations
  • Inconsistencies (CORRECT)
  • Dependencies

That is correct. Before appending, a join can be performed to check for any discrepancies or inconsistencies in the common columns or keys

19. When merging two tables which of the following conditions are required to match specific columns? Select all that apply:

  • Matching columns should have a numeric data type.
  • Matching columns should have convertible data types. (CORRECT)
  • Matching columns should have the same data type. (CORRECT)

That is correct! Matching columns should have the same or convertible data types.

That is correct! Matching columns should have the same or convertible data types.

20. True or False: In Power BI, data transformation involves modifying the structure and format of data to meet analysis requirements.

  • *True (CORRECT)
  • False

That is correct! In Power BI, data transformation allows modifying the data structure and format to prepare data for analysis and visualization.

21. You import two Microsoft Excel tables named (ContactInfo) and (Address) into Power Query. 

Address contains the following columns:

  • CustomerID
  • CustomerName
  • Phone
  • Address

ContactInfo contains the following columns:

  • ContactID
  • ContactName
  • ZipCode
  • Phone
  • Address

What happens if you append these two tables without changing the column names?

  • Shared columns with or without values are added to the result set.
  • Shared columns with values are added to the result set. 
  • All the columns are added to the result set with values from its owning table and null values from the other. (CORRECT)

Correct! Append Queries adds rows of one table or query to another table or query. If there are different numbers of columns or the names of the columns vary, the extra columns are added to the right, and the row values of the columns will be NULL if that column does not exist in the original table.

22. What is a join key in the context of combining tables with merge?

  • filter, specific values
  • combine, related columns (CORRECT)
  • sort, ascending order

That is correct! The merge operation is used to combine tables based on related columns.

23. The merge operation is used to ________ tables based on ________.

  • Security information and event management (SIEM)
  • Intrusion prevention system (IPS)
  • Intrusion detection system (IDS)
  • Security orchestration, automation, and response (SOAR) (CORRECT)

24. True or False: Ensuring consistency, one of the purposes of appending tables, helps validate the data and ensure that the appended tables align correctly.

  • True (CORRECT)
  • False

That is correct! Ensuring consistency helps validate the data and ensure that the appended tables align correctly.

25. Which of the following join operations expands the existing dataset by adding new rows of data?

  • Appending with join (CORRECT)
  • Merging with left outer join.
  • Merging with inner join

That is correct! An append operation allows you to expand the existing dataset by adding new rows of data.

26. In Power BI, data transformation involves ____________ to make it convenient for data analysis.

  • cleaning, filtering, and manipulating data (CORRECT)
  • importing data from external sources
  • creating visualizations and reports

That is correct! Data transformation in Power BI involves cleaning, filtering, and manipulating data to prepare it for analysis.

27. How can you combine the outcomes of two queries in Power BI, ensuring that the results of one query are stacked directly beneath the other?

  • Append the query results. (CORRECT)
  • Merge the query results.
  • Join the query results.

That is correct! You can combine the results of two queries into one single table by appending the results of one query to the other.

28. A join key is typically a ____ ___ that exists in both tables being joined. 

  • “calculated value” 
  • “common column”  (CORRECT)
  • “primary key” 

That is correct! A join key is typically a common column that exists in both tables being joined

CONCLUSION – Transforming data in Power BI

That’s a great summary! Data cleaning and transformation are crucial skills in the data preparation process, as they lay the groundwork for any subsequent analysis or modeling tasks. By addressing data quality issues and applying the right transformations, you’re not just ensuring your datasets are reliable and accurate, but also positioning yourself to work with more advanced data modeling techniques. This foundational knowledge will be invaluable as you progress through more complex data tasks in future courses.

Feel free to reach out if you need further guidance or have any questions about data preparation or modeling!

Leave a Comment