Data Warehouse Fundamentals for Beginners

Name: Data Warehouse Fundamentals for Beginners
Rating: 4.5 (34704 reviews)

Best Practices and Concepts for Architecture and Dimensional Design

Created byAlan Simon

Last updated 3/2020

English

What you'll learn

Master the techniques needed to build a data warehouse for your organization.
Determine your options for the architecture of your data warehousing environment.
Apply the key design principles of dimensional data modeling.
Combine various models and approaches to unify and load data within your data warehouse.

Course content

10 sections • 68 lectures • 5h 9m total length

Welcome3:52
Who we are and why we created the course, and a first look at what you'll get out of the course.
About This Course3:59
A detailed overview of what the course will cover, and how to follow along with each module to get the most out of the course.
Reflection: The Value of Data Warehousing1:48
Data-driven decision making requires integrated data from around your enterprise, and even from external data providers. Data warehousing helps integrate the data to better focus the majority of your work on analysis and decision making rather than repeatedly gathering, consolidating, and cleansing data. Data warehousing is here to stay, and an understanding of this discipline is essential to data-driven decision making.

Introduction to Data Warehousing Concepts1:08
Brief introduction to and overview of what you will learn in this section.
What is a Data Warehouse?3:41
Data warehouses have been around for a while, but the entire discipline continues to evolve. This first content video sets the stage for all that follow by clearly defining and describing what a data warehouse is.
Reasons for You to Build a Data Warehouse2:39
Data warehouses provide a place to integrate and consolidate data from many applications and systems, both within the enterprise and from the outside. Once this integration is accomplished in an architecturally sound manner, business intelligence and analytics become "easier" through the consolidated and organized data.
Compare a Data Warehouse to a Data lake3:08
In some ways, a data lake is a modern successor to a data warehouse; yet the two disciplines co-exist in most organizations. The similarities and differences are clearly explained.
Compare a Data Warehouse to Data Virtualization4:09
Data virtualization is a long-standing "offshoot" of data warehousing that plays an important role in many organizations for certain use cases. The two related disciplines are compared side-by-side, including guidance on situations for which each is best suited.
Look at a Simple End-to-End Data Warehousing Environment1:59
Seeing a simple, end-to-end data warehouse high-level architecture sets the stage for the sections that follow. This video brings together key points from this section as the gateway to what follows.
Summarize Data Warehousing Concepts0:41
Pull together key concepts from this foundational section before moving on.
Data Warehousing Concepts

Introduction to Data Warehousing Architecture1:55
Brief introduction to and overview of what you will learn in this section.
Build a Centralized Data Warehouse3:39
The most straightforward DW architecture is with a single, centralized, monolithic database (e.g., a single SQL Server database) serving as the DW. While straightforward, a centralized DW faces some challenges.
Compare a Data Warehouse to a Data Mart5:30
The concept of a data warehouse versus a data mart can be confusing because no official definitions exist, and the demarcation can be fluid. Further data marts are often built as part of an overall data warehousing environment rather than a centralized data warehouse. The terms are clarified and guidance presented.
Decide Which Component-Based Architecture is Your Best Fit11:25
Various data mart-based component architectures (versus a centralized DW) can be built, with data marts placed in front of and/or behind a data warehouse...or possibly used in place of a data warehouse. The various options are presented along with tradeoffs and guidance.
Include Cubes in Your Data Warehousing Environment3:50
In addition to a relational database foundation for data warehousing, we often use "cubes" or specialized multi-dimensional databases for part of our overall environment. The basics are presented and discussed in concert with relational database components.
Include Operational Data Stores in Your Data Warehousing Environment4:43
An operational data store (ODS) is a specialized data warehouse dedicated to operational business intelligence. Some of the rules and guidelines for an ODS are different than a DW, and these are described and built into business use cases.
Explore the Role of the Staging Layer Inside a Data Warehouse7:18
"Peeling the lid back" on a data warehouse (or data mart) we find different sections or layers, including a "staging layer" where data first lands upon entering the DW environment. We describe the staging layer and various options we have for our architecture.
Compare the Two Types of Staging Layers6:58
Details of persistent vs. non-persistent staging layers - specific examples, tradeoffs.
Summarize Data Warehousing Architecture0:59
Pull together key concepts of data warehousing architecture before moving on.
Data Warehousing Architecture

Introduction to ETL and Data Movement for Data Warehousing1:14
Brief introduction to and overview of what you will learn in this section.
Compare ETL to ELT6:07
Extraction, transformation, and loading (ETL) is the primary mechanism to bring data into the DW...but for big data environments (e.g., data lakes), ELT is the dominant paradigm. The two are described to help ensure that the viewer is clear on the mechanics of each for data warehouses and data lakes, respectively.
Design the Initial Load ETL3:46
The initial "stocking" of a data warehouse is important because this is where we first load data for BI and analytical usage. We need to decide which data to bring in; why; and how.
Compare Different Models for Incremental ETL6:09
After the initial ETL, we regularly update and refresh the DW through incremental ETL. We have various models or patterns that we can use, and each is described using examples.
Explore the Role of Data Transformation7:28
Within ETL, the "T" - transformation - is where data from multiple source applications and systems becomes unified to be able to be used together for BI and analytics.
More Common Transformations Within ETL5:37
The most common transformation models are described with examples.
Implement Mix-and-Match Incremental ETL2:55
Incremental ETL is typically implemented in a mix-and-match manner, with different intervals (e.g., hourly/daily/weekly) for different sources and even different data within a source, along with different ETL patterns for the various sources.
Summarize ETL Concepts and Models0:33
Pull together key concepts of ETL before moving on.
ETL Fundamentals

Data Warehousing Structure Fundamentals1:36
Brief introduction to and overview of what you will learn in this section.
Deciding What Your Data Warehouse Will Be Used For2:13
Traditionally, data warehouses are used for dimensional ly analyzing data, and dimensional analysis remains the backbone of business intelligence. Increasingly, data warehouses also support various data mining capabilities such as predictive analytics. Determining how a DW will be used is critical to deciding how to structure its data.
The Basic Principles of Dimensionality9:04
Dimensional analysis presents users with key measurements that are "sliced and diced" by various dimensions. These key concepts are critical to being able to effectively and correctly design our data warehouse's contents.
Compare Facts, Fact Tables, Dimensions, and Dimension Tables5:01
A fact is not the same as a fact table, nor is a dimension the same as a dimension table. All four of these key concepts and terms are clearly explained.
Compare Different Forms of Additivity in Facts6:03
Some facts are "additive" meaning that they can be added, while others are either non-additive or semi-additive. The three classes of additivity are discussed in the context of why each is important for various aspects of our data warehousing data design.
Compare a Star Schema to a Snowflake Schema7:28
The backbone of dimensional analysis is the star schema, but sometimes we extend the star structure into a "snowflake" by expanding our dimensions into multiple tables along a given hierarchy. All of these options are clearly explained with detailed examples.
Database Keys for Data Warehousing8:49
Data warehouse designers and data modelers have several options with how to link data among fact tables and dimension tables: using either surrogate keys or natural keys. Even if one selects surrogate keys (a best practice), natural keys play an important role in overall data design. Both concepts are introduced and explained in detail.
Summarize Data Warehousing Structure1:12
Pull together key concepts about data warehousing design fundamentals before moving on.
Data Warehouse Structure

Introduction to Dimensional Modeling0:50
Brief introduction to and overview of what you will learn in this section.
Design Dimension Tables for Star Schemas and Snowflake Schemas12:02
Dimension tables provide the context for key data warehouse subject areas: customer, product, vendor, employee, location, etc. In a star schema, we design dimension tables according to a specific set of rules; while in a snowflake schema, we follow a different set of rules. Examples for both will guide the presentation and discussion.
The Four Main Types of Data Warehousing Fact Tables2:13
Each of the four main types of fact tables is introduced, along with a clear description of each type's role. This video leads into the four that follow, in which each of the four are described.
The Role of Transaction Fact Tables4:32
The transaction fact table is the "backbone" of the star schema. Understanding the structural aspects of transaction fact tables is essential to building dimensional models.
The Rules Governing Facts and Transaction Fact Tables6:19
While relatively simple to understand, several design rules and best practices govern how we relate our facts (measurements) to our fact tables.
Primary and Foreign Keys for Fact Tables5:10
The basic concept of a database key is easy to follow, but data warehousing best practices impose rules and constraints over how we use different forms of keys, and when.
The Role of Periodic Snapshot Fact Tables7:57
The periodic snapshot fact table can be an easier-to-access summarization of contents from a transaction fact table, or it can represent snapshots of data that otherwise cannot be captured in transactions. Both uses of periodic snapshot fact tables are demonstrated.
Periodic Snapshots and Semi-Additive Facts6:02
Periodic snapshot fact tables are built around a unique form of fact, called a semi-additive fact. Understanding both the structure of these types of tables, as well as the rules that govern what we can and can't do arithmetically with those special facts, is critical to building out our dimensional models.
Transaction and Periodic Snapshot Fact Tables
The Role of Accumulating Snapshot Fact Tables9:26
A lesser-used but still important type of fact table tracks and measures progress along a given business process.
Accumulating Snapshot Fact Table Example6:41
Clear examples demonstrate when an accumulating snapshot fact table fits into a data warehouse's design.
Why a Factless Fact Table isn't a Contradiction in Terms9:49
Though a "factless fact table" sounds like an oxymoron, because a fact is not the same as a fact table, we can actually build fact tables for several specific uses in which we have no facts (i.e., measurements). Clear examples demonstrate the usage of factless fact tables.
Compare the Structure of Fact Tables in Star Schemas vs. Snowflake Schemas4:09
Whereas dimension tables are structured very differently in a star schema versus a snowflake schema, we structure our fact tables the same regardless of star versus snowflake. Examples of fact tables in both schemas are shown side-by-side to highlight this point.
SQL for Dimension and Fact Tables9:03
Examples of SQL are presented for various types of dimension and fact tables.
Summarize Fact and Dimension Tables0:59
Pull together key concepts of fact and dimension tables before moving on to additional design and structural topics about each.
Factless fact tables and accumulating snapshot fact tables

Introduction to Slowly Changing Dimensions0:45
Brief introduction to and overview of what you will learn in this section.
Slowly Changing Dimensions (SCDs) and Data Warehouse History5:01
The somewhat mystically named "slowly changing dimension" is the mechanism through which a data warehouse manages its historical data. The basic concept is introduced as a lead-in for the other videos in this section.
Design a Type 1 SCD3:47
The "Type 1 SCD" overwrites existing data with new incoming data, and essentially does not preserve history. We have certain use cases where the Type 1 SCD is particularly applicable and we'll look at examples.
Design a Type 2 SCD8:29
The "Type 2 SCD" is the "backbone" of managing and maintaining history within a data warehouse. Though the basics are easy to understand, significant complications occur which requires careful usage of surrogate and natural keys. Clear examples linking fact tables, dimension tables, surrogate and natural keys, and BI/reporting usage of Type 2 data present the critical concepts.
Maintain Correct Data Order with Type 2 SCDs5:37
An extension to the basic Type 2 SCD model helps us maintain and be able to report and analyze the correct historical sequence of our data. Two different approaches can be used, and both are discussed along with tradeoffs...and a hybrid model that makes use of both techniques together.
Design a Type 3 SCD7:01
The "Type 3 SCD" is a lesser-used but still valuable way to represent limited history through a switching-back-and-forth paradigm among various columns.
Summarize SCD concepts and implementations1:07
Pull together key concepts of SCDs before moving on to ETL design, which is designed in part by the type of SCDs applicable to your data.
Slowly Changing Dimensions (SCDs)

Introduction to ETL Design1:11
Brief introduction to and overview of what you will learn in this section.
Build your ETL Design from your ETL Architecture5:53
Overall ETL architecture needs to be decomposed into specific designs for our fact and dimension tables.
Dimension Table ETL6:45
The basics of dimension table ETL are introduced, leading in to the detailed discussion in the next two videos
Process SCD Type 1 Changes to a Dimension Table4:27
Type 1 and Type 2 SCDs in particular require a specific workflow ordering for incremental ETL. The workflow is illustrated and walked through.
Process SCD Type 2 Changes to a Dimension Table2:31
Type 1 and Type 2 SCDs in particular require a specific workflow ordering for incremental ETL. The workflow is illustrated and walked through.
Design ETL for Fact Tables9:07
SCDs also impact our fact tables with regards to ensuring that we properly link surrogate keys back to dimension tables.
Summarize ETL Design0:52
Pull together key concepts from this section before moving on.
ETL Design

Introduction to Data Warehousing Environments0:49
Brief introduction to and overview of what you will learn in this section.
Decide Between Cloud and On-Premises Settings for Your Data Warehouse5:04
Cloud hosting for data warehouses is an increasingly popular option, yet at the same time, an on-premises data center is still viable for hosting. Tradeoffs between the two are explored.
Architecture and Design Implications for Your Selected Platform5:30
Depending on whether cloud or on-premises hosting is selected, architecture and design implications need to be considered. For example, certain cloud-based database platforms may represent dimensionality differently than "classic" relational databases with fact and dimension tables. Checklists for what to focus on are presented.
Data Warehousing Environments

Requirements

A basic understanding of (but not necessarily programming experience with) relational databases and SQL fundamentals, specifically how you use the SQL CREATE TABLE statement to create data structures in a relational database.

Description

If you are a current or aspiring IT professional in search of sound, practical techniques to plan, design, and build a data warehouse or data mart, this is the course for you.

During the course, you’ll put what you learn to work and define sample data warehousing architectures and dimensional data structures to help emphasize the best practices and techniques covered in this course. Each section has either scenario based quiz questions or hands on assignments that emphasizes key learning objectives for that section’s material. This way, you can be confident as you move through the course that you’re picking up the key points about data warehousing.

To build this course, I drew from more than 30 years of my own data warehousing work on more than 40 client projects and engagements. I’ve been a thought leader in the discipline of data warehousing since the early 1990s when modern data warehousing came onto the scene. I’ve literally seen it all...and written about the discipline of data warehousing in books such as the original Data Warehousing For Dummies ® , along with articles, white papers, and as a monthly data warehousing columnist. I’ve led global consulting practices delivering data warehousing (and its related discipline, business intelligence) to some of the most recognizable brand name customers, along with smaller-sized organizations and governmental agencies. My own consulting firm, Thinking Helmet, Inc., specializes in data warehousing, business intelligence, and related disciplines. I’ve rolled up my sleeves and personally tackled every aspect of what you’ll learn in this course. I’ve even learned a few painful lessons, and have built a healthy share of “lessons learned” into the course material.

In this course, I take you from the fundamentals and concepts of data warehousing all the way through best practices for the architecture, dimensional design, and data interchange that you’ll need to implement data warehousing in your organization. You’ll find many examples that clearly demonstrate the key concepts and techniques covered throughout the course. By the end of the course, you’ll be all set to not only put these principles to work, but also to make the key architecture and design decisions required by the “art” of data warehousing that transcend the nuts-and-bolts techniques and design patterns.

Specifically, this course will cover:

Foundational data warehousing concepts and fundamentals
The symbiotic relationship between data warehousing and business intelligence
How data warehousing co-exists with data lakes and data virtualization
Your many architectural alternatives, from highly centralized approaches to numerous multi-component alternatives
The fundamentals of dimensional analysis and modeling
The key relational database capabilities that you will put to work to build your dimensional data models
Different alternatives for handling changing data history within your environment, and how to decide which approaches to apply in various situations
How to organize and design your Extraction, Transformation, and Loading (ETL) capabilities to keep your data warehouse up to date

Data warehousing is both an art and a science. While we have developed a large body of best practices over the years, we still have to make this-or-that types of decisions from the earliest stages of a data warehousing project all the way through architecture, design, and implementation. That’s what I’ve instilled into this course: the fusion of data warehousing art and science that you can bring to your organization and your own work. So come join me on this journey through the world of data warehousing!

Who this course is for:

A business analyst, data engineer, or database designer, currently with little or no exposure to or experience with data warehousing, who desires to build a personal toolbox of data warehousing best practices and techniques.
After completing this course, you will be ready to begin working on real-world data warehousing projects, either with expanded responsibilities as part of an existing role or to find a new position involving data warehousing. Example positions include data warehousing architect, dimensional data modeler, ETL architect and designer, and data warehousing business analyst.

Data Warehouse Fundamentals for Beginners

What you'll learn

Explore related topics

Course content

Welcome3 lectures • 10min

Data Warehousing Concepts7 lectures • 17min

Data Warehousing Architecture9 lectures • 46min

Bring Data Into Your Data Warehouse8 lectures • 34min

Data Warehousing Design: Building Blocks8 lectures • 41min

Design Facts, Fact Tables, Dimensions, and Dimension Tables14 lectures • 1hr 25min

Managing Data Warehouse History Through Slowly Changing Dimensions7 lectures • 32min

Designing Your ETL7 lectures • 31min

Selecting Your Data Warehouse Environment3 lectures • 11min

Conclusion2 lectures • 1min

Requirements

Description

Who this course is for: