Data Warehouse Fundamentals for Beginners
4.5 (982 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
4,746 students enrolled

Data Warehouse Fundamentals for Beginners

Best Practices and Concepts for Architecture and Dimensional Design
4.5 (982 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
4,746 students enrolled
Created by Alan Simon
Last updated 3/2020
English [Auto]
Current price: $30.99 Original price: $44.99 Discount: 31% off
5 hours left at this price!
30-Day Money-Back Guarantee
This course includes
  • 5 hours on-demand video
  • 1 article
  • Full lifetime access
  • Access on mobile and TV
  • Assignments
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • Master the techniques needed to build a data warehouse for your organization.
  • Determine your options for the architecture of your data warehousing environment.
  • Apply the key design principles of dimensional data modeling.
  • Combine various models and approaches to unify and load data within your data warehouse.
  • A basic understanding of (but not necessarily programming experience with) relational databases and SQL fundamentals, specifically how you use the SQL CREATE TABLE statement to create data structures in a relational database.

If you are a current or aspiring IT professional in search of sound, practical techniques to plan, design, and build a data warehouse or data mart, this is the course for you.

During the course, you’ll put what you learn to work and define sample data warehousing architectures and dimensional data structures to help emphasize the best practices and techniques covered in this course. Each section has either scenario based quiz questions or hands on assignments that emphasizes key learning objectives for that section’s material. This way, you can be confident as you move through the course that you’re picking up the key points about data warehousing.

To build this course, I drew from more than 30 years of my own data warehousing work on more than 40 client projects and engagements. I’ve been a thought leader in the discipline of data warehousing since the early 1990s when modern data warehousing came onto the scene. I’ve literally seen it all...and written about the discipline of data warehousing in books such as the original Data Warehousing For Dummies ® , along with articles, white papers, and as a monthly data warehousing columnist. I’ve led global consulting practices delivering data warehousing (and its related discipline, business intelligence) to some of the most recognizable brand name customers, along with smaller-sized organizations and governmental agencies. My own consulting firm, Thinking Helmet, Inc., specializes in data warehousing, business intelligence, and related disciplines. I’ve rolled up my sleeves and personally tackled every aspect of what you’ll learn in this course. I’ve even learned a few painful lessons, and have built a healthy share of “lessons learned” into the course material.

In this course, I take you from the fundamentals and concepts of data warehousing all the way through best practices for the architecture, dimensional design, and data interchange that you’ll need to implement data warehousing in your organization. You’ll find many examples that clearly demonstrate the key concepts and techniques covered throughout the course. By the end of the course, you’ll be all set to not only put these principles to work, but also to make the key architecture and design decisions required by the “art” of data warehousing that transcend the nuts-and-bolts techniques and design patterns.

Specifically, this course will cover:

  • Foundational data warehousing concepts and fundamentals

  • The symbiotic relationship between data warehousing and business intelligence

  • How data warehousing co-exists with data lakes and data virtualization

  • Your many architectural alternatives, from highly centralized approaches to numerous multi-component alternatives

  • The fundamentals of dimensional analysis and modeling

  • The key relational database capabilities that you will put to work to build your dimensional data models

  • Different alternatives for handling changing data history within your environment, and how to decide which approaches to apply in various situations

  • How to organize and design your Extraction, Transformation, and Loading (ETL) capabilities to keep your data warehouse up to date

Data warehousing is both an art and a science. While we have developed a large body of best practices over the years, we still have to make this-or-that types of decisions from the earliest stages of a data warehousing project all the way through architecture, design, and implementation. That’s what I’ve instilled into this course: the fusion of data warehousing art and science that you can bring to your organization and your own work. So come join me on this journey through the world of data warehousing!

Who this course is for:
  • A business analyst, data engineer, or database designer, currently with little or no exposure to or experience with data warehousing, who desires to build a personal toolbox of data warehousing best practices and techniques.
  • After completing this course, you will be ready to begin working on real-world data warehousing projects, either with expanded responsibilities as part of an existing role or to find a new position involving data warehousing. Example positions include data warehousing architect, dimensional data modeler, ETL architect and designer, and data warehousing business analyst.
Course content
Expand all 68 lectures 05:09:10
+ Welcome
3 lectures 09:39

Who we are and why we created the course, and a first look at what you'll get out of the course.

Preview 03:52

A detailed overview of what the course will cover, and how to follow along with each module to get the most out of the course.

Preview 03:59

Data-driven decision making requires integrated data from around your enterprise, and even from external data providers. Data warehousing helps integrate the data to better focus the majority of your work on analysis and decision making rather than repeatedly gathering, consolidating, and cleansing data. Data warehousing is here to stay, and an understanding of this discipline is essential to data-driven decision making.

Preview 01:48
+ Data Warehousing Concepts
7 lectures 17:25

Brief introduction to and overview of what you will learn in this section.

Preview 01:08

Data warehouses have been around for a while, but the entire discipline continues to evolve. This first content video sets the stage for all that follow by clearly defining and describing what a data warehouse is.

Preview 03:41

Data warehouses provide a place to integrate and consolidate data from many applications and systems, both within the enterprise and from the outside. Once this integration is accomplished in an architecturally sound manner, business intelligence and analytics become "easier" through the consolidated and organized data.

Reasons for You to Build a Data Warehouse

In some ways, a data lake is a modern successor to a data warehouse; yet the two disciplines co-exist in most organizations. The similarities and differences are clearly explained.

Compare a Data Warehouse to a Data lake

Data virtualization is a long-standing "offshoot" of data warehousing that plays an important role in many organizations for certain use cases. The two related disciplines are compared side-by-side, including guidance on situations for which each is best suited.

Compare a Data Warehouse to Data Virtualization

Seeing a simple, end-to-end data warehouse high-level architecture sets the stage for the sections that follow. This video brings together key points from this section as the gateway to what follows.

Look at a Simple End-to-End Data Warehousing Environment

Pull together key concepts from this foundational section before moving on.

Summarize Data Warehousing Concepts

Test your knowledge about data warehousing concepts before moving ahead.

Data Warehousing Concepts
4 questions
+ Data Warehousing Architecture
9 lectures 46:17

Brief introduction to and overview of what you will learn in this section.

Introduction to Data Warehousing Architecture

The most straightforward DW architecture is with a single, centralized, monolithic database (e.g., a single SQL Server database) serving as the DW. While straightforward, a centralized DW faces some challenges.

Build a Centralized Data Warehouse

The concept of a data warehouse versus a data mart can be confusing because no official definitions exist, and the demarcation can be fluid. Further data marts are often built as part of an overall data warehousing environment rather than a centralized data warehouse. The terms are clarified and guidance presented.

Compare a Data Warehouse to a Data Mart

Various data mart-based component architectures (versus a centralized DW) can be built, with data marts placed in front of and/or behind a data warehouse...or possibly used in place of a data warehouse. The various options are presented along with tradeoffs and guidance.

Decide Which Component-Based Architecture is Your Best Fit

In addition to a relational database foundation for data warehousing, we often use "cubes" or specialized multi-dimensional databases for part of our overall environment. The basics are presented and discussed in concert with relational database components.

Include Cubes in Your Data Warehousing Environment

An operational data store (ODS) is a specialized data warehouse dedicated to operational business intelligence. Some of the rules and guidelines for an ODS are different than a DW, and these are described and built into business use cases.

Include Operational Data Stores in Your Data Warehousing Environment

"Peeling the lid back" on a data warehouse (or data mart) we find different sections or layers, including a "staging layer" where data first lands upon entering the DW environment. We describe the staging layer and various options we have for our architecture.

Explore the Role of the Staging Layer Inside a Data Warehouse

Details of persistent vs. non-persistent staging layers - specific examples, tradeoffs.

Compare the Two Types of Staging Layers

Pull together key concepts of data warehousing architecture before moving on.

Summarize Data Warehousing Architecture
To help you understand the relationship between source systems, data warehouses, and data marts, you will define two different architectural alternatives for the scenario you'll be given.
Data Warehousing Architecture
1 question
+ Bring Data Into Your Data Warehouse
8 lectures 33:49

Brief introduction to and overview of what you will learn in this section.

Introduction to ETL and Data Movement for Data Warehousing

Extraction, transformation, and loading (ETL) is the primary mechanism to bring data into the DW...but for big data environments (e.g., data lakes), ELT is the dominant paradigm. The two are described to help ensure that the viewer is clear on the mechanics of each for data warehouses and data lakes, respectively.

Compare ETL to ELT

The initial "stocking" of a data warehouse is important because this is where we first load data for BI and analytical usage. We need to decide which data to bring in; why; and how.

Design the Initial Load ETL

After the initial ETL, we regularly update and refresh the DW through incremental ETL. We have various models or patterns that we can use, and each is described using examples.

Compare Different Models for Incremental ETL

Within ETL, the "T" - transformation - is where data from multiple source applications and systems becomes unified to be able to be used together for BI and analytics.

Explore the Role of Data Transformation

The most common transformation models are described with examples.

More Common Transformations Within ETL

Incremental ETL is typically implemented in a mix-and-match manner, with different intervals (e.g., hourly/daily/weekly) for different sources and even different data within a source, along with different ETL patterns for the various sources.

Implement Mix-and-Match Incremental ETL

Pull together key concepts of ETL before moving on.

Summarize ETL Concepts and Models
Define and show simple data transformations to help emphasize important ETL principles
ETL Fundamentals
1 question
+ Data Warehousing Design: Building Blocks
8 lectures 41:26

Brief introduction to and overview of what you will learn in this section.

Data Warehousing Structure Fundamentals

Traditionally, data warehouses are used for dimensional ly analyzing data, and dimensional analysis remains the backbone of business intelligence. Increasingly, data warehouses also support various data mining capabilities such as predictive analytics. Determining how a DW will be used is critical to deciding how to structure its data.

Deciding What Your Data Warehouse Will Be Used For

Dimensional analysis presents users with key measurements that are "sliced and diced" by various dimensions. These key concepts are critical to being able to effectively and correctly design our data warehouse's contents.

Preview 09:04

A fact is not the same as a fact table, nor is a dimension the same as a dimension table. All four of these key concepts and terms are clearly explained.

Preview 05:01

Some facts are "additive" meaning that they can be added, while others are either non-additive or semi-additive. The three classes of additivity are discussed in the context of why each is important for various aspects of our data warehousing data design.

Compare Different Forms of Additivity in Facts

The backbone of dimensional analysis is the star schema, but sometimes we extend the star structure into a "snowflake" by expanding our dimensions into multiple tables along a given hierarchy. All of these options are clearly explained with detailed examples.

Compare a Star Schema to a Snowflake Schema

Data warehouse designers and data modelers have several options with how to link data among fact tables and dimension tables: using either surrogate keys or natural keys. Even if one selects surrogate keys (a best practice), natural keys play an important role in overall data design. Both concepts are introduced and explained in detail.

Database Keys for Data Warehousing

Pull together key concepts about data warehousing design fundamentals before moving on.

Summarize Data Warehousing Structure

Test your knowledge about data warehousing structure before moving ahead.

Data Warehouse Structure
5 questions
+ Design Facts, Fact Tables, Dimensions, and Dimension Tables
14 lectures 01:25:12

Brief introduction to and overview of what you will learn in this section.

Introduction to Dimensional Modeling

Dimension tables provide the context for key data warehouse subject areas: customer, product, vendor, employee, location, etc. In a star schema, we design dimension tables according to a specific set of rules; while in a snowflake schema, we follow a different set of rules. Examples for both will guide the presentation and discussion.

Design Dimension Tables for Star Schemas and Snowflake Schemas

Each of the four main types of fact tables is introduced, along with a clear description of each type's role. This video leads into the four that follow, in which each of the four are described.

Preview 02:13

The transaction fact table is the "backbone" of the star schema. Understanding the structural aspects of transaction fact tables is essential to building dimensional models.

The Role of Transaction Fact Tables

While relatively simple to understand, several design rules and best practices govern how we relate our facts (measurements) to our fact tables.

The Rules Governing Facts and Transaction Fact Tables

The basic concept of a database key is easy to follow, but data warehousing best practices impose rules and constraints over how we use different forms of keys, and when.

Primary and Foreign Keys for Fact Tables

The periodic snapshot fact table can be an easier-to-access summarization of contents from a transaction fact table, or it can represent snapshots of data that otherwise cannot be captured in transactions. Both uses of periodic snapshot fact tables are demonstrated.

The Role of Periodic Snapshot Fact Tables

Periodic snapshot fact tables are built around a unique form of fact, called a semi-additive fact. Understanding both the structure of these types of tables, as well as the rules that govern what we can and can't do arithmetically with those special facts, is critical to building out our dimensional models.

Periodic Snapshots and Semi-Additive Facts
In this assessment, you will design examples of two of the fact table types that we’ve seen: Transaction-grained fact table (easy) Periodic snapshot fact table (intermediate)
Transaction and Periodic Snapshot Fact Tables
2 questions

A lesser-used but still important type of fact table tracks and measures progress along a given business process.

The Role of Accumulating Snapshot Fact Tables

Clear examples demonstrate when an accumulating snapshot fact table fits into a data warehouse's design.

Accumulating Snapshot Fact Table Example

Though a "factless fact table" sounds like an oxymoron, because a fact is not the same as a fact table, we can actually build fact tables for several specific uses in which we have no facts (i.e., measurements). Clear examples demonstrate the usage of factless fact tables.

Why a Factless Fact Table isn't a Contradiction in Terms

Whereas dimension tables are structured very differently in a star schema versus a snowflake schema, we structure our fact tables the same regardless of star versus snowflake. Examples of fact tables in both schemas are shown side-by-side to highlight this point.

Compare the Structure of Fact Tables in Star Schemas vs. Snowflake Schemas

Examples of SQL are presented for various types of dimension and fact tables.

SQL for Dimension and Fact Tables

Pull together key concepts of fact and dimension tables before moving on to additional design and structural topics about each.

Summarize Fact and Dimension Tables
In this assessment, you will design examples of 2 of the fact table types that we’ve seen: Factless fact table (intermediate) Accumulating snapshot fact table (difficult)
Factless fact tables and accumulating snapshot fact tables
2 questions
+ Managing Data Warehouse History Through Slowly Changing Dimensions
7 lectures 31:47

Brief introduction to and overview of what you will learn in this section.

Introduction to Slowly Changing Dimensions

The somewhat mystically named "slowly changing dimension" is the mechanism through which a data warehouse manages its historical data. The basic concept is introduced as a lead-in for the other videos in this section.

Slowly Changing Dimensions (SCDs) and Data Warehouse History

The "Type 1 SCD" overwrites existing data with new incoming data, and essentially does not preserve history. We have certain use cases where the Type 1 SCD is particularly applicable and we'll look at examples.

Design a Type 1 SCD

The "Type 2 SCD" is the "backbone" of managing and maintaining history within a data warehouse. Though the basics are easy to understand, significant complications occur which requires careful usage of surrogate and natural keys. Clear examples linking fact tables, dimension tables, surrogate and natural keys, and BI/reporting usage of Type 2 data present the critical concepts.

Design a Type 2 SCD

An extension to the basic Type 2 SCD model helps us maintain and be able to report and analyze the correct historical sequence of our data. Two different approaches can be used, and both are discussed along with tradeoffs...and a hybrid model that makes use of both techniques together.

Maintain Correct Data Order with Type 2 SCDs

The "Type 3 SCD" is a lesser-used but still valuable way to represent limited history through a switching-back-and-forth paradigm among various columns.

Design a Type 3 SCD

Pull together key concepts of SCDs before moving on to ETL design, which is designed in part by the type of SCDs applicable to your data.

Summarize SCD concepts and implementations
This short assessment will help you test your knowledge of Type 1 and Type 2 Slowly Changing Dimensions (SCDs)...especially the "tricky" part of when Type 1 and Type 2 changes are applied to the same data.
Slowly Changing Dimensions (SCDs)
2 questions
+ Designing Your ETL
7 lectures 30:46

Brief introduction to and overview of what you will learn in this section.

Introduction to ETL Design

Overall ETL architecture needs to be decomposed into specific designs for our fact and dimension tables.

Build your ETL Design from your ETL Architecture

The basics of dimension table ETL are introduced, leading in to the detailed discussion in the next two videos

Dimension Table ETL

Type 1 and Type 2 SCDs in particular require a specific workflow ordering for incremental ETL. The workflow is illustrated and walked through.

Process SCD Type 1 Changes to a Dimension Table

Type 1 and Type 2 SCDs in particular require a specific workflow ordering for incremental ETL. The workflow is illustrated and walked through.

Process SCD Type 2 Changes to a Dimension Table

SCDs also impact our fact tables with regards to ensuring that we properly link surrogate keys back to dimension tables.

Design ETL for Fact Tables

Pull together key concepts from this section before moving on.

Summarize ETL Design

Test your understanding of ETL design concepts and techniques.

ETL Design
4 questions
+ Selecting Your Data Warehouse Environment
3 lectures 11:23

Brief introduction to and overview of what you will learn in this section.

Introduction to Data Warehousing Environments

Cloud hosting for data warehouses is an increasingly popular option, yet at the same time, an on-premises data center is still viable for hosting. Tradeoffs between the two are explored.

Decide Between Cloud and On-Premises Settings for Your Data Warehouse

Depending on whether cloud or on-premises hosting is selected, architecture and design implications need to be considered. For example, certain cloud-based database platforms may represent dimensionality differently than "classic" relational databases with fact and dimension tables. Checklists for what to focus on are presented.

Architecture and Design Implications for Your Selected Platform

Test your knowledge about your data warehousing environment options.

Data Warehousing Environments
3 questions
+ Conclusion
2 lectures 01:26

Some final points to conclude this course.

Thank you for taking the course!
Additional resources for further study