INTRODUCTION – Dynamic Database Design
The main objective of the advanced training is to expand the knowledge about database systems in-depth, covering core subjects like data marts, data lakes, data warehouses, and ETL procedures. While finding out some complications related to the study of these systems, learners will also be able to see how each actually contributes to the common infrastructure in which data is stored and retrieved in an organization’s business intelligence. The course is designed to help widen the boundaries of knowledge beyond the simplest basic understandings, advancing their perspectives on the strategic use of data structures for better optimization of performance and more efficient decision making.
Among the five critical parameters affecting the performance of databases, workload, throughput, resources, optimization, and contention are at the heart of this course. It will equip participants with all the abilities they need to evaluate and troubleshoot database systems and perform optimization by taking these parameters into consideration. BI professionals who want to make their data infrastructure more efficient and responsive need to understand how these components work together.
Learning Outcomes:
- Explore the various ways in which one can build an ETL process that will be efficient and fit the organizational stakeholders’ needs.
- Understand different data storage and extraction processes and tools (for example Extract: Stitch/Segment/Fivetran, Transform: DBT/Airflow/Looker).
- Explain the process optimization while creating new tables.
- Identify places where new tables can be introduced in the data pipeline.
- Understand the varied aspects of different databases, for example OLAP vs. OLTP, columnar vs. relational, and distributed vs. single-homed databases.
- Understand database performance and optimization.
- Understand the five determinants of database performance: workload, throughput, resources, optimization, and contention.
Pipe by doing:
PRACTICE QUIZ: DATABASE PERFORMANCE
1. Fill in the blank: A data mart is a _____ database that can be a subset of a larger data warehouse. This means it is a convenient way to access the data pertaining to specific areas or departments of a business.
- Specialized
- Categorical
- Subject-oriented (CORRECT)
- Departmental
Correct: It is a subject-oriented database that is derived from a larger data warehouse. A data mart provides concentrated access to data for specific departments or business areas, letting teams retrieve and analyze that information easily without needing to navigate through the whole data warehouse.
2. A business intelligence team manager wants to support their team’s ability to perform at a high level. They investigate the overall capability of their company’s database hardware and software tools to enable the team to process stakeholder requests. In this situation, which of the factors of database performance do they consider?
- Workload
- Resources
- Throughput (CORRECT)
- Optimization
Correct: Throughput relates to the total through-put of hardware and software in the database for supporting request processing. Throughput reflects the extent to which the system can handle data over a period of time, thereby indicating how effectively operations or queries are processed by the database in parallel.
3. What term is used to describe data that is broken up into many pieces that are not stored together?
- Split data
- Modified data
- Archived data
- Fragmented data (CORRECT)
Correct: Fragmentation generally occurs when data is heavily used, when new data files are created or when modification or deletions are performed to previously existing data files. Such occurrences lead to the scattering of data across the storage system in non-contiguous blocks, thus, causing accesses to be less efficient and performance thereof affected.
QUIZ: MODULE 2 CHALLENGE
1. Which of the following statements accurately describe data marts and data lakes? Select all that apply.
- Data lakes are subject-oriented, which means they are associated with specific areas or departments of a business.
- Data marts are designed to enable information accessibility because their data doesn’t require a lot of processing.
- Data lakes are designed to enable information accessibility because their data doesn’t require a lot of processing. (CORRECT)
- Data marts are subject-oriented, which means they are associated with specific areas or departments of a business. (CORRECT)
2. Fill in the blank: A business intelligence professional gathers data, loads it into a unified destination system, and then transforms it into a useful format. They do this using an _____ data pipeline.
- oriented
- ELT (CORRECT)
- Interpreted
- ETL
3. What is a measure of the workload that can be processed by a database, as well as the associated costs?
- Scalability
- Maturity
- Database performance (CORRECT)
- Distribution
4. A business intelligence professional is considering the transactions, queries, analyses, and system commands being processed by a database system. Which of the five factors of database performance are they evaluating?
- Workload (CORRECT)
- Throughput
- Optimization
- Contention
5. Which of the following statements accurately describe database resources? Select all that apply.
- Resources may not be shared with other users.
- Resources include disk space and memory. (CORRECT)
- Resources include hardware and software tools. (CORRECT)
- Resources can be both internal and external. (CORRECT)
6. Optimization involves decreasing _____, which is how long it takes for a database to respond to a user request.
- Scope
- Data view
- Contention
- Response time (CORRECT)
7. Fill in the blank: In a relational database system that uses SQL, a _____ describes how the database system will execute a query.
- query plan (CORRECT)
- run method
- HOW statement
- data limitation
8. Fill in the blank: A business intelligence team uses _____ to divide their cloud database system into logical parts. This helps improve query processing and manageability.
- the SPLIT function
- data partitioning (CORRECT)
- database migration
- metadata
9. Fragmented data is broken up into many pieces that are not stored together. What are some common reasons for this fragmentation? Select all that apply.
- Using the data infrequently
- Modifying data files (CORRECT)
- Deleting data files (CORRECT)
- Creating new data files (CORRECT)
10. When two or more data analysts attempt to use a single data resource in a conflicting way, what is the result?
- Redundancy
- Duplicates
- Contention (CORRECT)
- Argument
11. What business intelligence tool enables data to be gathered from different sources, then loaded into a unified destination system and transformed into a useful format?
- Data lake
- ELT (CORRECT)
- ETL
- Data mart
12. Fill in the blank: Database performance is a measure of the workload that can be _____ by a database, as well as the associated costs.
- Measured
- Processed (CORRECT)
- stored
- visualized
13. Which of the following statements accurately describes workload with regards to database performance?
- Workload involves maximizing the speed and efficiency with which data is retrieved in order to ensure high levels of database performance.
- Workload is the combination of transactions, queries, analysis, and system commands being processed by the database system at any given time. (CORRECT)
- Workload is the overall capability of the database’s hardware and software to process requests.
- Workload involves two or more components attempting to use a single resource in a conflicting way.
14. Which of the following statements accurately describe database resources? Select all that apply.
- Resources do not fluctuate.
- Only internal factors affect resource performance
- External factors can affect resource performance. (CORRECT)
- Resources can fluctuate. (CORRECT)
15. A business intelligence team is optimizing the performance of their database. What does this involve? Select all that apply.
- Evaluating the effectiveness of the team’s spreadsheets
- Examining resource use (CORRECT)
- Identifying better data sources and structures (CORRECT)
- Comparing workload to cost (CORRECT)
16. Fill in the blank: A query plan describes the _____ involved with executing a query by a relational database.
- spreadsheets
- reasoning
- steps (CORRECT)
- business strategy
17. A business intelligence team can cause _____ when two or more data analysts attempt to use a single data resource in a conflicting way.
- annotation
- verification
- contention (CORRECT)
- repetition
18. What does database performance measure? Select all that apply.
- Improvements made to data tools and processes
- The ability of the database to be reconfigured
- Any costs associated with the workload being processed by the database (CORRECT)
- The workload that can be processed by the database (CORRECT)
19. Which of the following statements accurately describe indexes versus data partitions? Select all that apply.
- Indexes can only locate one section of a table at a time.
- Data partitioning is typically used in cloud-based systems handling big data. (CORRECT)
- Data partitioning is the process of dividing a database into distinct, logical parts. (CORRECT)
- Indexes are organizational tags used to locate data. (CORRECT)
20. Fill in the blank: Fragmented data occurs when data is broken up into many pieces that are not_____, often as a result of using the data frequently.
- Structured
- sorted and filtered
- stored together (CORRECT)
- labeled
21. There are four main reasons why data becomes fragmented. The first is using the data files frequently. What are the other three?
- Clearing data files from the cache
- Deleting data files (CORRECT)
- Modifying the data files (CORRECT)
- Creating new data files (CORRECT)
22. Which of the following statements accurately describe data marts and data lakes? Select all that apply.
- A data mart is a database system that stores large amounts of raw data in its original format until it’s needed.
- A data lake is a subject-oriented database that can be a subset of a larger data warehouse.
- A data mart is a subject-oriented database that can be a subset of a larger data warehouse. (CORRECT)
- A data lake is a database system that stores large amounts of raw data in its original format until it’s needed. (CORRECT)
23. A business intelligence professional is investigating the steps their database system takes in order to execute a query. They discover that creating a new table will enhance performance. What does this scenario describe?
- Limiting data
- Evaluating contentions
- Checking a query plan (CORRECT)
- Considering run methodology
24. Fill in the blank: Contention occurs when two or more data analysts attempt to use a _____in a conflicting way.
- section of a spreadsheet
- series of reports
- single data resource (CORRECT)
- data strategy
25. When evaluating a database system’s resources, what does a business intelligence professional consider? Select all that apply.
- Users
- Disk space and memory (CORRECT)
- Software (CORRECT)
- Hardware (CORRECT)
26. When evaluating the workload of a database system, what does a business intelligence professional consider? Select all that apply.
- Context
- Queries and analyses (CORRECT)
- System commands (CORRECT)
- Transactions (CORRECT)
27. What are some key benefits of ELT data pipelines in business intelligence?
- ELT enables business intelligence professionals to transform data while it is being transported.
- ELT reduces storage costs and enables businesses to scale storage and computation resources independently. (CORRECT)
- ELT enables business intelligence professionals to transform only the data they need. (CORRECT)
- ELT can ingest many different kinds of data into a storage system as soon as that data is available. (CORRECT)
28. Fill in the blank: The goal of _____ is to enable a database system to process the largest possible workload at the most reasonable cost.
- visibility
- optimization (CORRECT)
- business intelligence strategy
- application development
29. Fill in the blank: A data lake is a database system that stores large amounts of _____ in its original format until it’s needed.
- live data
- structured data
- clean data
- raw data (CORRECT)
Correct: A data lake is referred as a storage system where lots and lots of raw data is maintained in its unprocessed form until a requirement for it arises for analysis purposes. Unlike other systems, that store data in structured manner, data present in a data lake is free form, but can be tagged for identification of the raw data. Hence, a data lake will support all the type of data, from structured to semi-structured and unstructured data, thus making them suitable for advanced use cases for example.
30. What is the term for a pipeline that extracts, loads, then transforms the data?
- Warehouse
- ETL
- Lineage
- ELT (CORRECT)
Correct: Extraction – Load – Transformation, which is popularly called ELT, is a data pipeline where the source data extracted from external resources is loaded into a target data warehouse before performing the transformations. It collects data from data lakes, loads it into a centralized destination system, and converts it into an accessible format.
31. A database is performing slowly because multiple components are attempting to use the same piece of data at the same time. Which of the factors of database performance should be addressed?
- Contention (CORRECT)
- Throughput
- Workload
- Resources
Correct: This is the matter under dispute. The contention refers to the situation wherein two or more entities try to access a given resource, but in conflicting directions.
32. What is the process of dividing a database into distinct, logical parts in order to improve query processing and increase manageability?
- Data fragmentation
- Data processing
- Data partitioning (CORRECT)
- Data indexing
Correct: In fact, data partitioning is the procedure that divides a single database into some distinct logical parts so that these parts may improve query performance and make them become more understandable. To enhance database performance, data have to be properly partitioned.
CONCLUSION – Dynamic Database Design
Going through this advanced course will be a very enlightening experience about database systems. It brings the students into the world of concepts on data marts, data lakes, data warehouses, and all the key ETL necessities in the frameworks for today’s business intelligence. In the end, a student will appreciate how all these structures have woven together a rich, dynamic, and sound data ecosystem.
The course will then delve into the five essential variables affecting the performance of a database—workload, throughput, resources, optimization, and contention. Such focus will help the students develop the conceptual skills necessary for analysis and enhancement of their data infrastructures. Armed with these capabilities, the BI practitioner can ensure that the systems deliver above score expected performance.
Learning to write efficient queries that will meet organizational objectives throughout the organization while using system resources optimally is yet another advantage for learners. It prepares learners as front-liners in advanced database management, equipping them to develop, produce, and present innovative, data-driven solutions and meet goals in the fast-changing field of business intelligence.