INTRODUCTION – Preparing data
In this module, a detailed interpretation of the core preparatory and loading concepts in Power BI shall be discussed; this is on the way to preparing you for the PL-300 examination. This will include including primary principles and advanced techniques needed for efficient data management processes within Power BI so that you are completely prepared in respecting different approaches of preparing and loading data. With the focus on those important areas, this module becomes an aspect for solidifying your comprehension and boosting your ability to adopt these concepts in real-life scenarios and, eventually, become successful in taking the PL-300 exam.
Learning objective:
- Identify and learn about the main knowledge and skills that are assessed on the PL-300 exam and learn how to best allocate your study round.
- Prepare yourself for the PL-300 exam through eliciting the processes of preparing data in Power BI.
SELF-REVIEW: WHAT DID YOU LEARN? 1
1. When a connector in Microsoft Power BI does not support DirectQuery mode, which of the following statements is true?
- All connectors support DirectQuery mode.
- The connector will default to Import mode. (CORRECT)
- The connector will default to Dual mode.
That’s correct! If DirectQuery is not supported, then Power BI moves into Import mode: it copies the entire data into the memory engine of Power BI. Thus, it can work on the data locally in Power BI, but it might produce larger data models and consume more memory to the extent to which the data is imported or how much data it consists of.
2. At the beginning of the data analysis process, which of the following activities are important to ensuring a successful analysis outcome? Select all that apply.
- Identifying the required data for the analysis (CORRECT)
- Inspecting the data to ensure it meets requirements (CORRECT)
- Connecting to the data sources (CORRECT)
- Identifying which data sources can provide the required data (CORRECT)
Yes indeed. It is upon the outcome of analysis process that determines which data is needed.
Indeed! In fact, once again before the data has arrived from the data sources, checking the data to see if it passes would-be standards will help enormously in finding the right insights.
Connect to your Microsoft Power BI data sources so that you can model this data correctly later during the analysis.
This includes identifying and connecting all source data so that your analysis can continue.
3. Which of the following storage modes will copy data to Microsoft Power BI? Select all that apply.
- Dual mode (CORRECT)
- DirectQuery mode
- Import mode (CORRECT)
Power BI stores query results automatically locally in the dual mode and refreshing them on-demand so that the best experience can be given to report and dashboard consumers when it comes to performance and flexibility.
That’s right! In Import Mode, Power BI pulls all data from the data source into its in-memory engine for fast query performance and offline access.
4. Which of the following are the benefits of using a shared dataset? Select all that apply.
- The dataset is accessible at high speeds.
- The dataset acts as a single source of truth. (CORRECT)
- The dataset can scale as the organization grows. (CORRECT)
- The dataset is accessible from both office and remote locations. (CORRECT)
That’s correct! A dataset would provide a centralized place to store data. Sharing it across the organization serves as that source of truth, aiding collaboration and improving efficiency.
That’s correct! The Power BI Service uses the elasticity of the cloud to grow datasets toward the requirement of scale as more data are needed.
That’s exactly right! So, with the appropriately shared datasets working with Power BI Service, users will be able to access all the shared data remotely and enjoy convenient collaboration for data-based decisions.
5. True or False: A shared dataset is more secure than a local dataset.
- True
- False (CORRECT)
6. Which of the following categories are assessed by the Exam PL-300? Select all that apply.
- Prepare and model the data (CORRECT)
- Import and visualize the data using dataflows
- Visualize and analyze the data (CORRECT)
- Deploy and maintain assets (CORRECT)
It is indeed true! The Prepare data and Model data sections will each comprise 25-30% of the questions that you will be facing in the PL-300 exam.
That’s fine! The Visualize and analyze the data will also have 25 and 30% share of the exam questions in the PL-300 test.
That is indeed true! The Deploy and maintain assets component will take up about 15 to 20% of the questions you may see in the PL-300 exam.
7. Which of the following is considered a key responsibility of a data analyst? Select all that apply.
- Identify which data is required (CORRECT)
- Create the required data
- Connect to the data sources (CORRECT)
- Identify which data sources can provide the data (CORRECT)
Yes, that’s right, identifying what data will be necessary appears to be one of the most crucial points in addressing business problems.
Yes, indeed! Connecting to various data sources to fetch an assortment of data for analysis is an essential activity before starting off with all the grime factor of analysis.
That’s right! Choose a data source appropriate for your analysis to get relevant information needed during the analysis process.
8. Which of the following storage modes will store data in Microsoft Power BI? Select all that apply.
- DirectQuery mode
- Import mode (CORRECT)
- Dual mode (CORRECT)
That’s correct! Import mode fetches the entire data set from the data source and stores it in memory in Power BI for very quick query performance.
That is correct! Dual mode keeps results from the queries locally to improve responsiveness of visualizations and reportsbpomises a balance between the Import and DirectQuery modes.
SELF-REVIEW: WHAT DID YOU LEARN? 2
1. Which of the following data profiling tools allows you to inspect the percentage of valid, error, and empty data in a column?
- Column profile
- Column quality (CORRECT)
- Column distribution
That’s correct! In the Power Query Editor, enable Column Quality to view percentages for each column’s valid, error, and empty values. This will help you determine how clean your data is and what problems the data might have.
2. You’re preparing a dataset for import into Microsoft Power BI and have accidentally changed a column to the wrong data type. What is the next action you can take to correct the mistake? Select all that apply.
- Edit the Changed Type step in the Applied Steps list. (CORRECT)
- Append a new step to the Applied Steps list to change the data type.
- Remove the step from the Applied Steps list. (CORRECT)
Right! You can go on editing the incorrect step from within the Applied Steps list so that the column can be changed to the intended data type.
You are right! Deleting the wrong step from the Applied Steps lists will be canceled and put back to its previous state.
3. True or False: Column profile displays the minimum and maximum values within the first 1000 rows for a specific column.
- True (CORRECT)
- False
That’s correct! In the Power Query Editor, it shows some detailed statistics of a selected column ranging from its minimum value to its maximum value, distribution, and also distinct and unique values.
4. You are importing a dataset of 10 rows that contains a Date type column. One of the values in the Date column is Bicycle. Which of the following statements is correct? Select all that apply.
- Column quality will show 10% of values as Error. (CORRECT)
- You can replace the value using the Replace Values feature.
- You can replace the value using the Replace Errors feature. (CORRECT)
- Column profile will show 10% of values as Error.
But that is accurate! If the value is not a date valid, the Column Quality indicates 10% of values will be counted as Error.
Correctly so: You are allowed to use the Replace Errors option to replace the error with a valid date.
5. True or False: Distinct values refer to the number of values that only occur once in the dataset.
- True
- False (CORRECT)
That’s correct! Unique values solely mean the total number of different values occurred in the collected dataset, not considering how many times it has been observed.
6. In which of the following scenarios can you use parameter values? Select all that apply.
- To filter data dynamically from a data source (CORRECT)
- To dynamically add a new visualization to a report
- To dynamically switch between a test and production data source (CORRECT)
- To dynamically append a value to the dataset
True indeed! Using parameter values, one can dynamically adjust transformation arguments for more agile data manipulation.
Correct! Parameter values dynamic change also the data source input to Microsoft Power BI so it becomes easy to switch between various datasets.
7. Which of the following are limitations that must be considered when using dataflows?
- There is limited visibility of dependencies between dataflows. (CORRECT)
- To refresh more than 10 dataflows, a Premium subscription is required. (CORRECT)
- The maximum number of linked dataflows is 32. (CORRECT)
- Dataflows are limited to small volumes of data.
It is! Each dataflow is controlled separately, which limits the visibility of dependencies and makes it much more difficult to see how they relate to each other.
Yes indeed! A Power BI Premium subscription is required to refresh more than 10 dataflows within a workspace, thus enabling more robust dataflow management.
True! The maximum depth for dataflows is 32, which defines how much transformation can be nested inside a single dataflow.
8. You are importing a dataset containing 500 rows. Which of the following should be validated before importing the data? Select all that apply.
- Column Distribution shows all values are unique.
- Column Profile shows that the total rows is equal to 500. (CORRECT)
- Column Quality shows data is 100% valid. (CORRECT)
- Column Data Types are correct. (CORRECT)
That is indeed true! One must guarantee that all rows of the data set are imported in order not to miss any critical data appropriate for analysis.
Yes! Datasets should not contain any errors or empty values as this ensures data quality and proper analysis.
That is correct! By having the correct data types assigned to every column, it alleviates problems with queries used in the analysis and also safeguards the correct treatment of data.
9. Why is it important to ensure that data is good quality? Select all that apply.
- To optimize visualizations
- To ensure effective decision-making (CORRECT)
- To ensure correct column data types
- To help ensure accurate reporting (CORRECT)
Correct! Stakeholders trust quality data to make pertinent decisions based on accurate and reliable insights.
That’s right! The quality of information puts reliability at the heart of reports, as poor-quality data generates incorrect conclusions and mistakes in decision-making.
SELF-REVIEW: WHAT DID YOU LEARN? 3
1. The sales team at Adventure Works has two tables of data for tracking bicycle sales: Sales and Product. Each row in the Sales table is associated with one of the bicycles in the Product table. Which of the following cardinalities is most suitable between the Sales and the Product table?
- Many-to-many
- One-to-one
- Many-to-one (CORRECT)
That’s correct! That is correct as in many sales corresponding to one product, that defines many to one relationship. That is where many sales connect to one product, so this relationship is well structured and highly effective in terms of analysis.
2. The inventory team at Adventure Works has two tables of data for tracking inventory levels: Stock and Product. If a product is not in stock, there is no row in the Stock table for the product. The team wants to merge the tables into a single table containing products that are in stock. If Stock is the left table and Product is the right table, which of the following join types will achieve the desired outcome? Select all that apply.
- Inner Join (CORRECT)
- Left Outer Join (CORRECT)
- Right OuterJoin
- Full Outer Join
That’s correct! The Inner Join only retains those rows that match in both tables and eliminates rows whose values do not match in the two tables.
That’s right! The Left Outer Join produces the merging of associated rows from the Product table with all the rows from the Stock table. In contrast, rows from the Product table that do not have a counterpart in the Stock will have missing nulls in the merged data set.
3. True or False: When performing an append query, the columns of both tables must be the same.
- True
- False (CORRECT)
That’s correct! That is right! When tables that have different columns are appended, all columns of both tables should be included in the resulting table. If some columns are missing from one table, null entries will be added in the corresponding columns for those rows in that table.
4. You are importing a data source containing two columns: SalePrice and Cost. You need to create a calculated column to represent the Profit. Which of the following DAX queries will achieve the desired result?
- Profit = SalePrice / Cost
- SalePrice / Cost
- SalePrice – Cost
- Profit = SalePrice – Cost (CORRECT)
That’s correct! Word-for-word: Correct; it will create a calculated column titled Profit based on the $(sale price – cost), giving revenues from each entry in the dataset as profit.
5. True or False: Reducing columns and rows will reduce the required storage space for a table.
- True (CORRECT)
- False
That’s correct! Certainly! Importing data in this manner reduces the amount of data stored because any unnecessary rows and columns are not imported. It storage optimizing power and performance. Relevant data will only be imported; hence, any resources related to this is put to better use.
6. You need to create a calculated column for the tax amount per sale. The tax amount is 25% of the SalesAmount column. Which of the following DAX formulas will create the column?
- 0.25 * SalesAmount
- TaxAmount = 0.25 * SalesAmount
- TaxAmount = 25 * SalesAmount (CORRECT)
- 25 * SalesAmount
That’s correct! Consequently, Power BI computes the tax on every single row according to the entry in the SalesAmount column.
7. You have created a duplicate query in Microsoft Power BI from a query that contained transformations. Which of the following statements is true? Select all that apply.
- Transformations copied from the original query will execute in both queries. (CORRECT)
- Changes to transformations in the original query will not impact the duplicate query. (CORRECT)
- Changes to transformations in the original query will automatically update the duplicate query.
- Transformations copied from the original query will execute only in the original query.
Yes, that is true! Any changes made to an original query will also be transferred to a duplicate query and executed along with the duplicate query.
That is right! For a duplicate query, any transformations applied to the original query get carried forward and executed in the duplicate.
8. You have a Product table as a left table and a Category table as a right table. You want to merge these tables so that you have a table containing only products and each product row containing its category name or null. Which of the following joins is most suitable?
- Left Outer Join (CORRECT)
- Inner Join
- Right Outer Join
- Full Outer Join
In fact, it is! A Left Outer Join will keep every single record of the Product table and join any rows that match from the Category table. Where there might be no match from the Category table, the value for that column in the resulting table will be null.
9. Adventure Works’ head office must keep track of employee keycards. For security reasons, each employee can only have one keycard. There are two tables: Employees and Keycards. What is the most suitable relationship for these two tables?
- Many-to-many
- Many-to-one (Employees to Keycards)
- One-to-one (CORRECT)
- One-to-many (Employees to Keycards)
That’s correct! This is true! A one-to-one relationship is suitable in this case, since each employee can have exactly one keycard, and a keycard may only be assigned to one employee. There will then be a direct one-to-one match between the entities.
CONCLUSION – Preparing data
The above-reviewed concluding module encapsulates all the key concepts one will need to prepare and load data within the scope of Power BI, as well as giving an opportunity to prepare for the PL-300 exam. It will strengthen basic principles and techniques to solidify understanding and sharpen practical skills to manage data in Power BI, enhancing proficiency and confidence to face the PL-300 exam.