Data Warehouse Fundamentals for Beginners
- 5 hours on-demand video
- 1 article
- Full lifetime access
- Access on mobile and TV
- Certificate of Completion
Get your team access to 4,000+ top Udemy courses anytime, anywhere.Try Udemy for Business
- Master the techniques needed to build a data warehouse for your organization.
- Determine your options for the architecture of your data warehousing environment.
- Apply the key design principles of dimensional data modeling.
- Combine various models and approaches to unify and load data within your data warehouse.
- A basic understanding of (but not necessarily programming experience with) relational databases and SQL fundamentals, specifically how you use the SQL CREATE TABLE statement to create data structures in a relational database.
If you are a current or aspiring IT professional in search of sound, practical techniques to plan, design, and build a data warehouse or data mart, this is the course for you.
During the course, you’ll put what you learn to work and define sample data warehousing architectures and dimensional data structures to help emphasize the best practices and techniques covered in this course. Each section has either scenario based quiz questions or hands on assignments that emphasizes key learning objectives for that section’s material. This way, you can be confident as you move through the course that you’re picking up the key points about data warehousing.
To build this course, I drew from more than 30 years of my own data warehousing work on more than 40 client projects and engagements. I’ve been a thought leader in the discipline of data warehousing since the early 1990s when modern data warehousing came onto the scene. I’ve literally seen it all...and written about the discipline of data warehousing in books such as the original Data Warehousing For Dummies ® , along with articles, white papers, and as a monthly data warehousing columnist. I’ve led global consulting practices delivering data warehousing (and its related discipline, business intelligence) to some of the most recognizable brand name customers, along with smaller-sized organizations and governmental agencies. My own consulting firm, Thinking Helmet, Inc., specializes in data warehousing, business intelligence, and related disciplines. I’ve rolled up my sleeves and personally tackled every aspect of what you’ll learn in this course. I’ve even learned a few painful lessons, and have built a healthy share of “lessons learned” into the course material.
In this course, I take you from the fundamentals and concepts of data warehousing all the way through best practices for the architecture, dimensional design, and data interchange that you’ll need to implement data warehousing in your organization. You’ll find many examples that clearly demonstrate the key concepts and techniques covered throughout the course. By the end of the course, you’ll be all set to not only put these principles to work, but also to make the key architecture and design decisions required by the “art” of data warehousing that transcend the nuts-and-bolts techniques and design patterns.
Specifically, this course will cover:
Foundational data warehousing concepts and fundamentals
The symbiotic relationship between data warehousing and business intelligence
How data warehousing co-exists with data lakes and data virtualization
Your many architectural alternatives, from highly centralized approaches to numerous multi-component alternatives
The fundamentals of dimensional analysis and modeling
The key relational database capabilities that you will put to work to build your dimensional data models
Different alternatives for handling changing data history within your environment, and how to decide which approaches to apply in various situations
How to organize and design your Extraction, Transformation, and Loading (ETL) capabilities to keep your data warehouse up to date
Data warehousing is both an art and a science. While we have developed a large body of best practices over the years, we still have to make this-or-that types of decisions from the earliest stages of a data warehousing project all the way through architecture, design, and implementation. That’s what I’ve instilled into this course: the fusion of data warehousing art and science that you can bring to your organization and your own work. So come join me on this journey through the world of data warehousing!
- A business analyst, data engineer, or database designer, currently with little or no exposure to or experience with data warehousing, who desires to build a personal toolbox of data warehousing best practices and techniques.
- After completing this course, you will be ready to begin working on real-world data warehousing projects, either with expanded responsibilities as part of an existing role or to find a new position involving data warehousing. Example positions include data warehousing architect, dimensional data modeler, ETL architect and designer, and data warehousing business analyst.
Who we are and why we created the course, and a first look at what you'll get out of the course.
A detailed overview of what the course will cover, and how to follow along with each module to get the most out of the course.
Data-driven decision making requires integrated data from around your enterprise, and even from external data providers. Data warehousing helps integrate the data to better focus the majority of your work on analysis and decision making rather than repeatedly gathering, consolidating, and cleansing data. Data warehousing is here to stay, and an understanding of this discipline is essential to data-driven decision making.
Brief introduction to and overview of what you will learn in this section.
Data warehouses have been around for a while, but the entire discipline continues to evolve. This first content video sets the stage for all that follow by clearly defining and describing what a data warehouse is.
Data warehouses provide a place to integrate and consolidate data from many applications and systems, both within the enterprise and from the outside. Once this integration is accomplished in an architecturally sound manner, business intelligence and analytics become "easier" through the consolidated and organized data.
Data virtualization is a long-standing "offshoot" of data warehousing that plays an important role in many organizations for certain use cases. The two related disciplines are compared side-by-side, including guidance on situations for which each is best suited.
The concept of a data warehouse versus a data mart can be confusing because no official definitions exist, and the demarcation can be fluid. Further data marts are often built as part of an overall data warehousing environment rather than a centralized data warehouse. The terms are clarified and guidance presented.
Various data mart-based component architectures (versus a centralized DW) can be built, with data marts placed in front of and/or behind a data warehouse...or possibly used in place of a data warehouse. The various options are presented along with tradeoffs and guidance.
In addition to a relational database foundation for data warehousing, we often use "cubes" or specialized multi-dimensional databases for part of our overall environment. The basics are presented and discussed in concert with relational database components.
An operational data store (ODS) is a specialized data warehouse dedicated to operational business intelligence. Some of the rules and guidelines for an ODS are different than a DW, and these are described and built into business use cases.
"Peeling the lid back" on a data warehouse (or data mart) we find different sections or layers, including a "staging layer" where data first lands upon entering the DW environment. We describe the staging layer and various options we have for our architecture.
Extraction, transformation, and loading (ETL) is the primary mechanism to bring data into the DW...but for big data environments (e.g., data lakes), ELT is the dominant paradigm. The two are described to help ensure that the viewer is clear on the mechanics of each for data warehouses and data lakes, respectively.
Traditionally, data warehouses are used for dimensional ly analyzing data, and dimensional analysis remains the backbone of business intelligence. Increasingly, data warehouses also support various data mining capabilities such as predictive analytics. Determining how a DW will be used is critical to deciding how to structure its data.
Dimensional analysis presents users with key measurements that are "sliced and diced" by various dimensions. These key concepts are critical to being able to effectively and correctly design our data warehouse's contents.
A fact is not the same as a fact table, nor is a dimension the same as a dimension table. All four of these key concepts and terms are clearly explained.
Some facts are "additive" meaning that they can be added, while others are either non-additive or semi-additive. The three classes of additivity are discussed in the context of why each is important for various aspects of our data warehousing data design.
The backbone of dimensional analysis is the star schema, but sometimes we extend the star structure into a "snowflake" by expanding our dimensions into multiple tables along a given hierarchy. All of these options are clearly explained with detailed examples.
Data warehouse designers and data modelers have several options with how to link data among fact tables and dimension tables: using either surrogate keys or natural keys. Even if one selects surrogate keys (a best practice), natural keys play an important role in overall data design. Both concepts are introduced and explained in detail.
Dimension tables provide the context for key data warehouse subject areas: customer, product, vendor, employee, location, etc. In a star schema, we design dimension tables according to a specific set of rules; while in a snowflake schema, we follow a different set of rules. Examples for both will guide the presentation and discussion.
Each of the four main types of fact tables is introduced, along with a clear description of each type's role. This video leads into the four that follow, in which each of the four are described.
The periodic snapshot fact table can be an easier-to-access summarization of contents from a transaction fact table, or it can represent snapshots of data that otherwise cannot be captured in transactions. Both uses of periodic snapshot fact tables are demonstrated.
Periodic snapshot fact tables are built around a unique form of fact, called a semi-additive fact. Understanding both the structure of these types of tables, as well as the rules that govern what we can and can't do arithmetically with those special facts, is critical to building out our dimensional models.
Though a "factless fact table" sounds like an oxymoron, because a fact is not the same as a fact table, we can actually build fact tables for several specific uses in which we have no facts (i.e., measurements). Clear examples demonstrate the usage of factless fact tables.
Whereas dimension tables are structured very differently in a star schema versus a snowflake schema, we structure our fact tables the same regardless of star versus snowflake. Examples of fact tables in both schemas are shown side-by-side to highlight this point.
The "Type 2 SCD" is the "backbone" of managing and maintaining history within a data warehouse. Though the basics are easy to understand, significant complications occur which requires careful usage of surrogate and natural keys. Clear examples linking fact tables, dimension tables, surrogate and natural keys, and BI/reporting usage of Type 2 data present the critical concepts.
An extension to the basic Type 2 SCD model helps us maintain and be able to report and analyze the correct historical sequence of our data. Two different approaches can be used, and both are discussed along with tradeoffs...and a hybrid model that makes use of both techniques together.
Depending on whether cloud or on-premises hosting is selected, architecture and design implications need to be considered. For example, certain cloud-based database platforms may represent dimensionality differently than "classic" relational databases with fact and dimension tables. Checklists for what to focus on are presented.