
Who we are and why we created the course, and a first look at what you'll get out of the course.
A detailed overview of what the course will cover, and how to follow along with each module to get the most out of the course.
Data-driven decision making requires integrated data from around your enterprise, and even from external data providers. Data warehousing helps integrate the data to better focus the majority of your work on analysis and decision making rather than repeatedly gathering, consolidating, and cleansing data. Data warehousing is here to stay, and an understanding of this discipline is essential to data-driven decision making.
Brief introduction to and overview of what you will learn in this section.
Data warehouses have been around for a while, but the entire discipline continues to evolve. This first content video sets the stage for all that follow by clearly defining and describing what a data warehouse is.
Data warehouses provide a place to integrate and consolidate data from many applications and systems, both within the enterprise and from the outside. Once this integration is accomplished in an architecturally sound manner, business intelligence and analytics become "easier" through the consolidated and organized data.
In some ways, a data lake is a modern successor to a data warehouse; yet the two disciplines co-exist in most organizations. The similarities and differences are clearly explained.
Data virtualization is a long-standing "offshoot" of data warehousing that plays an important role in many organizations for certain use cases. The two related disciplines are compared side-by-side, including guidance on situations for which each is best suited.
Seeing a simple, end-to-end data warehouse high-level architecture sets the stage for the sections that follow. This video brings together key points from this section as the gateway to what follows.
Pull together key concepts from this foundational section before moving on.
Brief introduction to and overview of what you will learn in this section.
The most straightforward DW architecture is with a single, centralized, monolithic database (e.g., a single SQL Server database) serving as the DW. While straightforward, a centralized DW faces some challenges.
The concept of a data warehouse versus a data mart can be confusing because no official definitions exist, and the demarcation can be fluid. Further data marts are often built as part of an overall data warehousing environment rather than a centralized data warehouse. The terms are clarified and guidance presented.
Various data mart-based component architectures (versus a centralized DW) can be built, with data marts placed in front of and/or behind a data warehouse...or possibly used in place of a data warehouse. The various options are presented along with tradeoffs and guidance.
In addition to a relational database foundation for data warehousing, we often use "cubes" or specialized multi-dimensional databases for part of our overall environment. The basics are presented and discussed in concert with relational database components.
An operational data store (ODS) is a specialized data warehouse dedicated to operational business intelligence. Some of the rules and guidelines for an ODS are different than a DW, and these are described and built into business use cases.
"Peeling the lid back" on a data warehouse (or data mart) we find different sections or layers, including a "staging layer" where data first lands upon entering the DW environment. We describe the staging layer and various options we have for our architecture.
Details of persistent vs. non-persistent staging layers - specific examples, tradeoffs.
Pull together key concepts of data warehousing architecture before moving on.
Brief introduction to and overview of what you will learn in this section.
Extraction, transformation, and loading (ETL) is the primary mechanism to bring data into the DW...but for big data environments (e.g., data lakes), ELT is the dominant paradigm. The two are described to help ensure that the viewer is clear on the mechanics of each for data warehouses and data lakes, respectively.
The initial "stocking" of a data warehouse is important because this is where we first load data for BI and analytical usage. We need to decide which data to bring in; why; and how.
After the initial ETL, we regularly update and refresh the DW through incremental ETL. We have various models or patterns that we can use, and each is described using examples.
Within ETL, the "T" - transformation - is where data from multiple source applications and systems becomes unified to be able to be used together for BI and analytics.
The most common transformation models are described with examples.
Incremental ETL is typically implemented in a mix-and-match manner, with different intervals (e.g., hourly/daily/weekly) for different sources and even different data within a source, along with different ETL patterns for the various sources.
Pull together key concepts of ETL before moving on.
Brief introduction to and overview of what you will learn in this section.
Traditionally, data warehouses are used for dimensional ly analyzing data, and dimensional analysis remains the backbone of business intelligence. Increasingly, data warehouses also support various data mining capabilities such as predictive analytics. Determining how a DW will be used is critical to deciding how to structure its data.
Dimensional analysis presents users with key measurements that are "sliced and diced" by various dimensions. These key concepts are critical to being able to effectively and correctly design our data warehouse's contents.
A fact is not the same as a fact table, nor is a dimension the same as a dimension table. All four of these key concepts and terms are clearly explained.
Some facts are "additive" meaning that they can be added, while others are either non-additive or semi-additive. The three classes of additivity are discussed in the context of why each is important for various aspects of our data warehousing data design.
The backbone of dimensional analysis is the star schema, but sometimes we extend the star structure into a "snowflake" by expanding our dimensions into multiple tables along a given hierarchy. All of these options are clearly explained with detailed examples.
Data warehouse designers and data modelers have several options with how to link data among fact tables and dimension tables: using either surrogate keys or natural keys. Even if one selects surrogate keys (a best practice), natural keys play an important role in overall data design. Both concepts are introduced and explained in detail.
Pull together key concepts about data warehousing design fundamentals before moving on.
Brief introduction to and overview of what you will learn in this section.
Dimension tables provide the context for key data warehouse subject areas: customer, product, vendor, employee, location, etc. In a star schema, we design dimension tables according to a specific set of rules; while in a snowflake schema, we follow a different set of rules. Examples for both will guide the presentation and discussion.
Each of the four main types of fact tables is introduced, along with a clear description of each type's role. This video leads into the four that follow, in which each of the four are described.
The transaction fact table is the "backbone" of the star schema. Understanding the structural aspects of transaction fact tables is essential to building dimensional models.
While relatively simple to understand, several design rules and best practices govern how we relate our facts (measurements) to our fact tables.
The basic concept of a database key is easy to follow, but data warehousing best practices impose rules and constraints over how we use different forms of keys, and when.
The periodic snapshot fact table can be an easier-to-access summarization of contents from a transaction fact table, or it can represent snapshots of data that otherwise cannot be captured in transactions. Both uses of periodic snapshot fact tables are demonstrated.
Periodic snapshot fact tables are built around a unique form of fact, called a semi-additive fact. Understanding both the structure of these types of tables, as well as the rules that govern what we can and can't do arithmetically with those special facts, is critical to building out our dimensional models.
A lesser-used but still important type of fact table tracks and measures progress along a given business process.
Clear examples demonstrate when an accumulating snapshot fact table fits into a data warehouse's design.
Though a "factless fact table" sounds like an oxymoron, because a fact is not the same as a fact table, we can actually build fact tables for several specific uses in which we have no facts (i.e., measurements). Clear examples demonstrate the usage of factless fact tables.
Whereas dimension tables are structured very differently in a star schema versus a snowflake schema, we structure our fact tables the same regardless of star versus snowflake. Examples of fact tables in both schemas are shown side-by-side to highlight this point.
Examples of SQL are presented for various types of dimension and fact tables.
Pull together key concepts of fact and dimension tables before moving on to additional design and structural topics about each.
Brief introduction to and overview of what you will learn in this section.
The somewhat mystically named "slowly changing dimension" is the mechanism through which a data warehouse manages its historical data. The basic concept is introduced as a lead-in for the other videos in this section.
The "Type 1 SCD" overwrites existing data with new incoming data, and essentially does not preserve history. We have certain use cases where the Type 1 SCD is particularly applicable and we'll look at examples.
The "Type 2 SCD" is the "backbone" of managing and maintaining history within a data warehouse. Though the basics are easy to understand, significant complications occur which requires careful usage of surrogate and natural keys. Clear examples linking fact tables, dimension tables, surrogate and natural keys, and BI/reporting usage of Type 2 data present the critical concepts.
An extension to the basic Type 2 SCD model helps us maintain and be able to report and analyze the correct historical sequence of our data. Two different approaches can be used, and both are discussed along with tradeoffs...and a hybrid model that makes use of both techniques together.
The "Type 3 SCD" is a lesser-used but still valuable way to represent limited history through a switching-back-and-forth paradigm among various columns.
Pull together key concepts of SCDs before moving on to ETL design, which is designed in part by the type of SCDs applicable to your data.
Brief introduction to and overview of what you will learn in this section.
Overall ETL architecture needs to be decomposed into specific designs for our fact and dimension tables.
The basics of dimension table ETL are introduced, leading in to the detailed discussion in the next two videos
Type 1 and Type 2 SCDs in particular require a specific workflow ordering for incremental ETL. The workflow is illustrated and walked through.
Type 1 and Type 2 SCDs in particular require a specific workflow ordering for incremental ETL. The workflow is illustrated and walked through.
SCDs also impact our fact tables with regards to ensuring that we properly link surrogate keys back to dimension tables.
Pull together key concepts from this section before moving on.
Brief introduction to and overview of what you will learn in this section.
Cloud hosting for data warehouses is an increasingly popular option, yet at the same time, an on-premises data center is still viable for hosting. Tradeoffs between the two are explored.
Depending on whether cloud or on-premises hosting is selected, architecture and design implications need to be considered. For example, certain cloud-based database platforms may represent dimensionality differently than "classic" relational databases with fact and dimension tables. Checklists for what to focus on are presented.
Some final points to conclude this course.
If you are a current or aspiring IT professional in search of sound, practical techniques to plan, design, and build a data warehouse or data mart, this is the course for you.
During the course, you’ll put what you learn to work and define sample data warehousing architectures and dimensional data structures to help emphasize the best practices and techniques covered in this course. Each section has either scenario based quiz questions or hands on assignments that emphasizes key learning objectives for that section’s material. This way, you can be confident as you move through the course that you’re picking up the key points about data warehousing.
To build this course, I drew from more than 30 years of my own data warehousing work on more than 40 client projects and engagements. I’ve been a thought leader in the discipline of data warehousing since the early 1990s when modern data warehousing came onto the scene. I’ve literally seen it all...and written about the discipline of data warehousing in books such as the original Data Warehousing For Dummies ® , along with articles, white papers, and as a monthly data warehousing columnist. I’ve led global consulting practices delivering data warehousing (and its related discipline, business intelligence) to some of the most recognizable brand name customers, along with smaller-sized organizations and governmental agencies. My own consulting firm, Thinking Helmet, Inc., specializes in data warehousing, business intelligence, and related disciplines. I’ve rolled up my sleeves and personally tackled every aspect of what you’ll learn in this course. I’ve even learned a few painful lessons, and have built a healthy share of “lessons learned” into the course material.
In this course, I take you from the fundamentals and concepts of data warehousing all the way through best practices for the architecture, dimensional design, and data interchange that you’ll need to implement data warehousing in your organization. You’ll find many examples that clearly demonstrate the key concepts and techniques covered throughout the course. By the end of the course, you’ll be all set to not only put these principles to work, but also to make the key architecture and design decisions required by the “art” of data warehousing that transcend the nuts-and-bolts techniques and design patterns.
Specifically, this course will cover:
Foundational data warehousing concepts and fundamentals
The symbiotic relationship between data warehousing and business intelligence
How data warehousing co-exists with data lakes and data virtualization
Your many architectural alternatives, from highly centralized approaches to numerous multi-component alternatives
The fundamentals of dimensional analysis and modeling
The key relational database capabilities that you will put to work to build your dimensional data models
Different alternatives for handling changing data history within your environment, and how to decide which approaches to apply in various situations
How to organize and design your Extraction, Transformation, and Loading (ETL) capabilities to keep your data warehouse up to date
Data warehousing is both an art and a science. While we have developed a large body of best practices over the years, we still have to make this-or-that types of decisions from the earliest stages of a data warehousing project all the way through architecture, design, and implementation. That’s what I’ve instilled into this course: the fusion of data warehousing art and science that you can bring to your organization and your own work. So come join me on this journey through the world of data warehousing!