INTRODUCTION – The right tools for the job
This provides an extensive introduction to data management, leading students through the unique aspects of collecting, transforming, and cleaning data. Initiating with presenting the importance of collecting data- accurate, and relevant data out of the numerous sources, businesses and organization pretend.
Learners will touch upon the Extract, Transform, and Load (ETL) process which is a very important integration step for data. It is this process that extracts data from various sources to transform them into an acceptable format that is easy to analyze and load to the destination.
In addition, there will be a lot of focus on checking the quality and reliability of data before analysis so that it can be used in generating insights of business load. This will facilitate learners with such core basic things necessary to prepare and handle data well for one’s project into analysis successfully.
Learning Objectives
- Explain the generation and collection of data from businesses and organizations
- Describe the ETL process
Show the importance of data cleaning and transformation for analysis
SELF-REVIEW: DATASET
1. What is the primary task assigned to you by Renee in the Assessing a data set case study?
- To sort and list the prices of all products, displaying them in descending order
- To list the products in ascending order by the Supplier and Date Entered columns, respectively.
- To list the products purchased from suppliers in ascending order by the Date Entered column.
- To track the price increase on a supplier basis from the past to the present for each product the company purchases from suppliers and detect any unusual price situations, if any. (CORRECT)
That’s correct! This implies the surveillance of changes in the prices of suppliers over time with each product purchased and contrasts the amount of previous price with the current price. Its purpose is to identify any unusual or abnormal price change, warning that such activity should be found out. This process ensures price consistency and detection of significant price deviations so that proper action can be taken.
2. In the Assessing a data set case study, which of the following columns need to be considered to track the price increase on a supplier basis from the past to the present for each product and detect anomalies? Select all that apply.
- Product name (CORRECT)
- Supplier (CORRECT)
- Category
- Date Entered (CORRECT)
Absolutely! The leading edge involved in analyzing ProductName of any product that runs through the system.
Exactly! At this stage, products are purchased from the same supplier. But in the future, this may become different as different suppliers bring in varying price ranges. So, it would be a good sorting list via supplier column.
That’s right! First, the DateEntered column should help you keep the chronological list in order, i.e., from the oldest to the newest of the list.
3. Based on your analysis of the data in the Assessing a data set case study, is this dataset suitable for the business need as determined by the task assigned to you by Renee?
- No
- Yes (CORRECT)
Correct! The dataset contains the required fields for the business need.
4. What type of data is primarily collected, stored, and interpreted by Adventure Works’ Enterprise Resource Planning (ERP) system?
- Photo and video data
- Unstructured data
- Structured data (CORRECT)
- Semi-structured data
That’s correct! ERP systems are designed to collect, store, manage, and interpret structured data coming from various business activities. Structured data is data organized into a pre-determined format such as a database. This format is searchable, analyzable, and retrievable, making it easy to glean insights from data. The neat form guarantees that businesses have an easier time interpreting their data to streamline operations and improve the decision-making process.
5. Which of the following is the SaaS (Software as a Service) based web application of Microsoft Power BI?
- Power BI Service (CORRECT)
- Power BI Apps
- Power BI Desktop
That’s correct! The Power BI app on the web is the SaaS implementation of Power BI that is intended for user consumption and for Administration. This will enable users to develop, share, and work together on reports and dashboards online. In this service, admins can set up access, security, and configuration. This provides an organization with really a very convenient distribution channel for insights and data visualization; users can have real-time access to information from anywhere in the world.
KNOWLEDGE CHECK: DATA COLLECTION
1. Which product has strong reporting features and is typically used to begin a workflow in Power BI?
- Microsoft Power BI Service
- Microsoft Power BI Apps
- Microsoft Power BI Desktop (CORRECT)
That’s correct! Microsoft Power BI Desktop is a software used in designing and making reports on a Windows platform that has to be connected with all data sources to be used further to change and shape data, and to create various types of interactive visualizations and reports. It represents one of the best tools for use of data analysts and report developers that will set in order the data to be published for sharing and collaboration on the Power BI service. It presents a comprehensive feature set for very thorough data-analytic activities, including the possibility of deeper modeling, DAX (Data Analysis Expressions), and custom visualizations.
2. If you are given Microsoft Excel data and informed of a business need, what method would you use to determine if the data provided is compatible with the business need?
- Check if the Microsoft Excel data fulfills the business requirements by examining the format of the data.
- Check if the Microsoft Excel data fulfills the business requirements by examining its content and data types. (CORRECT)
- Check if the Microsoft Excel data fulfills the business requirements by examining the source of the data.
That’s correct! In order to best fulfill the requirements of business using Microsoft Excel, the various columns of data in the worksheet must be in sync with detailed specifications given with the business requirements. This means the format, conventions for naming, and content of the columns of data have to resemble the specified fields and metrics in the requirements. Bringing this together properly is the most important part of data interpretation, reporting out, and decision-making for accurate data analysis as they themselves translate data into something useful for objectives in business.
3. You want to publish your report and share your data with others by creating dashboards. Which of the following products would you use to accomplish this?
- Microsoft Power BI Service (CORRECT)
- Microsoft Power BI Desktop
- Microsoft Power BI Apps
That’s correct! To create dashboards and allow your report to be directly shared with others, one can use Microsoft Power BI Service.
4. True or False: The typical workflow in Microsoft Power BI starts with the creation of a report in Power BI Desktop.
- True (CORRECT)
- False
That’s correct! The usual Power workflow initiates from Power Desktop that comes next to Power Service, Mobile applications of Power BI, and covers the entire cycle between the report’s creation and publication, the consumption phase of the study.
5. What term is used to classify data such as word-processing files, images, video, and audio files?
- Structured data
- Semi-Structured data
- Unstructured data (CORRECT)
That’s correct! Unstructured data examples are media files, word-processing files, images, video, and audio files.
SELF-REVIEW: DATA STORAGE AND MANAGEMENT
1. What is the main advantage of hybrid storage for Adventure Works?
- It reduces the IT management overhead.
- It is the most affordable storage solution.
- It is suitable for storing only structured data.
- It combines the benefits of on-premises and cloud-based storage solutions. (CORRECT)
That’s correct! Hybrid storage offers the flexibility and scalability of cloud storage, while maintaining control over sensitive information.
2. Which statement best describes structured data?
- Data that is not organized in a predefined format, consisting of data types that do not fit neatly into rows and columns.
- Data that is easily searchable and analyzable, consisting of data types that can be neatly arranged in rows and columns. (CORRECT)
- Data that is stored on physical hardware located within the company’s premises.
- Data that is stored on remote servers managed by a third-party provider.
That’s correct! Structured data implies data that is well organized into a predetermined format, making it well prepared for inquiry, analysis, and reflection.
3. What data source mentioned in the case study is an example of unstructured data?
- Financial data
- Manufacturing data
- Sales data
- Social media and online reviews data (CORRECT)
That’s correct! So, the above examples don’t exactly fit a structured format having rows and columns, such as customer reviews, social media interactions, and multimedia material-these examples come under the heading of unstructured data.
4. In the ETL process, which step involves retrieving raw data from different sources, such as databases and files?
- Extract (CORRECT)
- Load
- Visualize
- Transform
That’s correct! The Extract step involves retrieving raw data from different sources.
5. Which method of data ingestion is most suitable for gathering data from many Excel spreadsheets?
- Database connections
- Manual data entry
- Web scraping
- File-based ingestion (CORRECT)
That’s correct! It is particularly useful for reading and breaking apart file contents in different formats such as Excel sheets.
6. You need to consolidate data from multiple sources into a unified view. Which aspect of data management involves this task?
- Data integration. (CORRECT)
- Data governance.
- Data quality.
- Data retention and archiving.
That’s correct! Integration of data tells the process of combining data gathered from different sources into one structure, typically different departments or systems.
KNOWLEDGE CHECK: INTRODUCTION TO THE ETL PROCESS
1. What is a benefit of on-premises storage?
- ull control over data and infrastructure. (CORRECT)
- No need for physical hardware.
- Easy scalability.
- Reduced IT management overhead.
That’s correct! The idea points out for one fact: whenever the company needs tight control over its own hardware and data, or is bound by strict security and compliance regulations, or handle sensitive or mission-critical data.
2. What is the primary purpose of the Transform step in the ETL process?
- To load the transformed data into the final storage system.
- To clean, structure, and enrich the data to make it more suitable for analysis. (CORRECT)
- To extract data from multiple sources.
- To analyze and visualize the data.
That’s correct! The transformation step within the ETL process involves cleaning and structuring the data for enhancing it before analyzing it.
3. Which method of data ingestion allows real-time access to data but may require knowledge of database languages and complex configurations?
- Data streaming
- File-based ingestion
- Web scraping (CORRECT)
- Database connections
That’s correct! It permits the retrieval of records from the database management system in real time. Connection and understanding of database languages/schema/configurations deeply enough to enable effective access and manipulation of data through inquiries and retrieval.
4. What does source data refer to? Select all that apply.
- Pre-processed data used for analysis and decision-making.
- Data that has been analyzed and refined for specific purposes.
- Raw, unprocessed information collected, stored, and managed by an organization. (CORRECT)
- The initial input used as the basis for further processing, transformation, and analysis. (CORRECT)
Definitely! Raw source data is the data in raw form which an organization has collected, stored, and managed. It is the first entry into the process of further processing, transformation, and analysis.
5. Which aspect of data management is primarily responsible for establishing clear policies and procedures for data handling throughout an organization?
- Data archiving
- Data quality
- Data security
- Data governance (CORRECT)
That’s correct! Regulation becomes one of the base levels of disciplined information structures by defining clear guidelines, protocols, and generally accepted standards through which data would be handled across an organization.
SELF-REVIEW: EVALUATING DATA FOR TRANSFORMATION
1. Based on the Adventure Works Inventory dataset, what is the RestockingFrequency for the product Kidz-K400?
- 30 days
- 45 days (CORRECT)
- 60 days
- 90 days
Correct! With respect to the Inventory dataset, the Kidz-K400 product is supposed to have a RestockingFrequency of 45 days, which is indicated by the values for ProductID 47 in our dataset as all these relate to Kidz-K400.
2. In the Customer Feedback dataset, which ProductID received a feedback score of 3,5 on May 2023 23rd?
- 51
- 49 (CORRECT)
- 52
- 50
Correct! The product with ProductID 49 did receive a feedback score of 3,5 on May 23rd 2023. A score of 3,5 suggests a level of customer satisfaction that is higher than neutral but less than high satisfaction.
3. According to the Adventure Works Sales dataset, what is the total quantity of products sold on 2023-05-05?
- 2
- 4
- 3 (CORRECT)
- 1
Correct! The total quantity of products sold on 2023-05-05 is 3. This can be confirmed by cross-referencing the ProductID35, TransactionID35, and SalesAmount of 750 in the dataset.
4. What is the primary goal of data cleaning in the context of data analysis?
- By recording data transformation steps in the Applied Steps pane. By automatically generating data visualizations. (CORRECT)
- By creating real-time dashboards.
- By integrating with third-party applications.
Exactly, Power Query is there for this purpose: a thing. It records all the transformations of data by creating steps and continuously adding them up in the Processed Steps section.
5. How does Power Query promote a structured and repeatable approach to data preparation?
- Documentation (CORRECT)
- Detection
- Illustration
- Investigation
Documenting is recording something in a certain way for a definite purpose or providing something in detail.
KNOWLEDGE CHECK: INTRODUCTION TO TRANSFORMING DATA
1. Which process involves altering the data’s structure, format, or values to make it more suitable for analysis?
- Data validation
- Data cleaning
- Data aggregating
- Data transformation (CORRECT)
That’s correct! Data is structured in formats and values and then converted or normalised for usability. It can range from aggregating data to changing data types and standardising values.
2. What is a primary advantage of cleaning data at the source?
- Eliminating the need for data transformation.
- Reducing the need for data documentation.
- Ensuring future analyses have a clean and consistent foundation. (CORRECT)
- Making it easier to import data into Microsoft Power BI.
That’s correct. Cleaning data at the source ensures that any future analyses using this data will have a clean and consistent foundation, saving time and effort in future analyses.
3. Which Excel function can be used to convert text strings into date formats?
- MATCH()
- SUMIF()
- PROPER (CORRECT)
- UPPER()
That’s correct! The PROPER function will only capitalize the first character of each word in a piece of text. This function requires only one argument, which is the location of the piece of text you would like the function to work on.
4. What is the primary function of Power Query in Microsoft’s Microsoft Power BI suite?
- Data visualization.
- Sharing and collaboration.
- Creating advanced calculations.
- Data connectivity and preparation. (CORRECT)
Yes it is! The chief task of Power Query is getting and shaping data, which means extracting it all, transforming, and then loading it into Microsoft’s Power BI data model for further analysis and visualization. The tool enables users to connect to a broad spectrum of data sources, standardize and transform the data, and finally load it into Microsoft Power BI data model so as to enable for further analysis and visualization. This provides a greatly smoothed and automated data preparation environment that allows users to extract value from their data efficiently.
5. Which data transformation functions are commonly performed in Power Query? Select all that apply.
- Changing data types. (CORRECT)
- Filling in missing values. (CORRECT)
- Encrypting data.
- Removing duplicates. (CORRECT)
That’s right! One of the most common data transformation challenges tackled using Power Query is changing data types. It is crucial data fitting will take place since it can be imperative to execute a smooth study and visualization in raw data. Power Query as per now ensures the integration of an entire dataset at this stage.
That’s right! Getting rid of missing values is a common data transformation task for Power Query. They can distract accurate analysis and interpretation, leading to inconsistencies and errors in calculations or aggregations. Power Query easily fills those spaces and thus helps enhance the quality and usability of the data.
That’s right! Eliminating duplicates is one of the many data transformations positively contributed by Power Query. Duplicate entries can just daunt analysis results, necessitating their identification and eventual removal in order to guarantee accuracy and reliability in data-driven insights.
MODULE QUIZ: THE RIGHT TOOLS FOR THE JOB
1. What is the purpose of the transform stage in the ETL process?
- Removing duplicates and refining the data. (CORRECT)
- Combining different data sources.
- Loading transformed data into a data warehouse.
- Retrieving raw data from different sources.
Correct! The transform stage is primarily focused on refining the data, which includes operations like removing duplicates, converting data types, and handling missing values.
2. Which of the following tasks are related to data ingestion in the ETL process? Select all that apply:
- Obtaining data from various sources. (CORRECT)
- Cleaning and formatting data for analysis.
- Importing data for immediate use or storage in a database. (CORRECT)
- Loading data into a target database or data warehouse.
Data ingestion is the acquiring of computer data from several resources to utilize it immediately or store in a file database.
Data ingestion refers to getting data from various sources for immediate use or into a database.
3. What is a crucial factor to consider when estimating storage capacity for data storage?
- The number of employees who analyze data.
- The size of your organization. (CORRECT)
- The operating system used
- The time of day data is most frequently accessed
Correct! The size of your organization is an important factor when determining how much storage capacity you need.
4. In the context of data analysis, which of the following statements are true about data cleaning and data transformation? Select all that apply.
- Data cleaning involves identifying and correcting errors and inconsistencies in datasets. (CORRECT)
- Data transformation is an ongoing process, while data cleaning is a one-time process.
- Data transformation involves altering the data’s structure, format, or values to make it more suitable for analysis. (CORRECT)
- Data cleaning is only done at the source.
Data cleaning can be simply considered as the procedure used for identifying and correcting errors or inconsistencies coming with inexactly entered data and removing duplicates, filling out missing values in cells, and correction of the types of those cells.
Data transformation is said to mean modifying the structure, format, or values of data in order to move the data toward its most useful form for analysis.
5. Which Excel feature can help you quickly spot errors, outliers, or patterns in your data by applying different formats to cells based on specific conditions?
- Data validation
- VLOOKUP()
- Conditional formatting (CORRECT)
- TRIM()
Correct! Conditional formatting enables you to apply different formats (such as colors, fonts, or icons) to cells based on specific conditions, helping you quickly spot errors, outliers, or patterns in your data.
6. True or False: Microsoft Power Query’s main purpose in the Microsoft Power BI suite is to generate data visualizations automatically.
- True
- False (CORRECT)
Correct! In the whole suite of Microsoft’s Power BI software, the main role that Microsoft Power Query plays is as a connection for converging various data sources that will later be cleaned, modeled, and filtered into the Power BI data model.
7. True or False: Having too many visualizations is a common issue in raw data that can hinder accurate analysis.
- True
- False (CORRECT)
Correct! Having too many visualizations is an issue related to report presentation, not the quality or structure of the raw data itself, which can hinder accurate analysis.
8. What is the main goal of identifying and evaluating required data for a business decision?
- To gather as much data as possible.
- To avoid using external data sources.
- To understand the factors that influence the decision and collect relevant data. (CORRECT)
- To focus only on structured data sources.
Correct! Identifying and evaluating required data helps you understand the factors that influence the decision and collect the most relevant data for the analysis.
9. What type of data source is an Enterprise Resource Planning (ERP) system classified as?
- Unstructured data
- Structured data (CORRECT)
- Semi-Structured data
- Streaming data
Correct! ERP systems are structured sources of data and are rule-oriented, entailing that they provide the best constituents for reporting and analysis.
10. In a typical Microsoft Power BI workflow, what is the primary purpose of Microsoft Power BI Desktop?
- To view and interact with reports.
- To assign user permissions.
- To design and create reports. (CORRECT)
- To share dashboards with other users.
Correct! Microsoft Power BI Desktop is a software predominantly in the computing world used by people in data analysis and report designing to clean, transform and load data, create data models, design reports, and publish those reports.
11. In the context of the ETL process, what does the term data ingestion primarily refer to?
- Obtaining and importing data from various sources. (CORRECT)
- Converting raw data into insights.
- Loading transformed data into a data warehouse.
- Cleaning and formatting data.
Correct! When someone talks of data ingestion in ETL, that person means obtaining data from various sources, moving and storing them, and getting them into use as soon as needed or when just enough has been done later for keeping them in the database.
12 When estimating storage capacity for data storage, which factor is the least relevant?
- The number of departments within the organization.
- How long you need to store the data.
- The time of day data is most frequently accessed (CORRECT)
- The size of your organization.
Correct! Access time is only one kind of system performance issue. Even if it causes system-intensive tasks to execute very quickly, it does not influence directly the amount of storage capacity the system is able to require.
13. What task is primarily involved in the process of data cleaning in the context of data analysis?
- Transforming the data to be more suitable for Microsoft Power BI.
- Removing duplicate entries from datasets.
- Identifying and correcting errors and inconsistencies in datasets. (CORRECT)
- Altering the structure, format, or values of the data.
Correct! Data cleaning includes identification and correction of errors and inconsistencies in data, while data transformation focuses on modification of the structure, format, or values of data so that the data is more suitable for analysis.
14. Which of the following are main purposes of Microsoft Power Query in the Microsoft Power BI suite? Select all that apply.
- To clean and transform data. (CORRECT)
- To load data into Microsoft Power BI data models. (CORRECT)
- To connect to multiple data sources. (CORRECT)
- To create real-time dashboards.
Power Query eliminate various advanced data cleansing and transformation features to prepare data best suited for analysis and visualization.
Power Query contribute more excellent enriched, well-compacted, and managed data in Power BI data models; trending towards advanced analytics and visualization.
Power Query is an add-on for Power BI devised to tap different data sources for populating data models in Power BI.
15. Which of the following are main goals of identifying and evaluating required data for a business decision? Select all that apply:
- To understand the factors that influence the decision and collect relevant data. (CORRECT)
- To gather as much data as possible.
- To consider both internal and external data sources. (CORRECT)
- To consider all types of data sources, including structured, semi-structured, and unstructured data. (CORRECT)
Formulating and assessing required data indicates the factors affecting the decision and ensures suitable data collection for analysis.
Assessing and formulating required data can be achieved considering both internal and external data sources to reveal as much as possible about environmental influences of a decision.
Pinpointing and assessing the required data for the analysis involves exploring all kinds of data sources to ensure a comprehensive understanding of the factors affecting the decision.
16. Which of the following data sources are classified as structured data sources? Select all that apply.
- Log files
- Relational databases (CORRECT)
- Messages
- Enterprise Resource Planning (ERP) system (CORRECT)
Also referred to as organized and rule-based; Relational Database Management system fits well.
This includes the structured data sources like ERP systems, which are organized and rule-based.
17. Which of the following tasks are performed during the transform stage of the ETL process? Select all that apply:
- Loading transformed data into a data warehouse.
- Converting data types. (CORRECT)
- Handling missing values. (CORRECT)
- Retrieving raw data from different sources.
Correct! Converting data types is one of the tasks performed during the transform stage of the ETL process.
Correct! Handling missing values is one of the tasks performed during the transform stage of the ETL process.
18. Which of the following factors should be considered when estimating storage capacity for data storage? Select all that apply.
- The size of your organization. (CORRECT)
- The color of the storage devices.
- How long you need to store the data. (CORRECT)
- The type of data you collect. (CORRECT)
You should not worry about the staff-you should be concerned only about the server storage.
How long the company would want to keep the data for is a significant question that one should ask when planning for capacity.
Whether customers are on the website, making orders, or even discussing things, every kind of data is unique in estimating the volume of storage needed.
19. Which of the following Microsoft Excel features can help you quickly spot errors, outliers, or patterns in your data by applying different formats to cells based on specific conditions?
- Data validation
- TRIM()
- VLOOKUP()
- Conditional formatting (CORRECT)
20. Which of the following are common issues in raw data that can hinder accurate analysis? Select all that apply:
- Too few data points. (CORRECT)
- Too many visualizations.
- Incompatible data sources. (CORRECT)
- Missing values. (CORRECT)
Sometimes all data are useful in analysing data properly. But, most may become limited and, thus, diminish accuracy.
Leading to extraction problems, most of the large data stores differ in some respects because the difficult data mot only contains a lot of data but also come from various sources.
The benefits could be outweighed by the extent to which raw data are precise, as is not at all useful in inference.
21. True or False: The main goal of identifying and evaluating required data for a business decision is to gather as much data as possible.
- True
- False (CORRECT)
22. True or False: An Enterprise Resource Planning (ERP) system is classified as an Unstructured data source.
- True
- False (CORRECT)
23. Which of the following are primary purposes of Microsoft Power BI Desktop? Select all that apply.
- To clean, transform, and load data. (CORRECT)
- To view and interact with reports.
- To design and create reports. (CORRECT)
- To assign user permissions.
Microsoft Power BI Desktop is used to clean, transform, and load data, as well as to create data models and design reports.
Microsoft Power BI Desktop is primarily used by data analysts and report designers to clean, transform, and load data, create data models, design reports, and publish those reports.
24. What is a common issue in raw data that can hinder accurate analysis?
- Too few data points.
- Too many visualizations.
- Incompatible data sources.
- Missing values. (CORRECT)
25. Which stage of the ETL process is responsible for loading the transformed data into a data warehouse or another storage system for analysis?
- Load (CORRECT)
- Extract
- Visualize
- Transform
Correct! The loadstage in the ETL process is about loading the transformed data into a data warehouse or another storage system where it can be accessed and analyzed by various tools like Power BI.
26. What is the primary difference between data cleaning and data transformation in the context of data analysis?
- Data cleaning focuses on removing duplicate entries, while data transformation focuses on altering data structure.
- Data cleaning is a one-time process, while data transformation is an ongoing process.
- Data cleaning is only done at the source, while data transformation is only done in Microsoft Power BI.
- Data cleaning involves identifying and correcting errors and inconsistencies in datasets, while data transformation involves altering the data’s structure, format, or values. (CORRECT)
Correct! Cleaning data involves identification of errors such inconsistencies as carried out in data with cleaning, but the molding transforming data refers to changes in the structure, format, or values it takes so that it can be used for more intricate types of analysis.
27. True or False: The primary purpose of Microsoft Power BI Desktop is to share dashboards with other users.
- True
- False (CORRECT)
Correct! Microsoft Power Bi Desktop generally caters to the primary needs of data cleansing, amongst the following activities close transformations, modeling, designing reports, and even publishing them. It relies on the Power Bi Service Dashboard Sharing.
28. What is the primary purpose of data ingestion in the ETL process?
- To clean and format data for analysis.
- To load data into a target database or data warehouse.
- To obtain and import data from various sources for immediate use or storage in a database. (CORRECT)
- To analyze data and extract insights.
Correct! It does this by extracting the necessary data from a large number of heterogeneous source systems, transforming and integrating those pieces of data so that from the business intelligence (BI) point of view, the data is in a much more usable format, and then loading the data in the data warehouse.
29. Which Microsoft Excel feature enables setting criteria for allowable data in a cell or range of cells?
- VLOOKUP()
- Data validation (CORRECT)
- TRIM()
- Conditional formatting
Correct! To ensure that data retained by a given cell remains as free from error as possible and conforms to stipulated criteria.about the data in those cells.
CONCLUSION – The right tools for the job
In wrapping up the module, students are provided with a strong basis for embedding fundamental data management practices. They are taken through all steps, beginning from the identification and collection of valuable data sources, through becoming a hypothesis on the ETL process, to making accurate judgments on the process of data assessment relative to its analysis. As they graduate, students not only learn how to keep the data in order and manage it effectively but also develop a system of keeping its quality and reliability in mind so they can come up with something meaningful. This kind of encompassing information ought to permit confident navigation in the sprawl of data and successful analysis that will lead to informed decision-making.