
Master the foundations of etl and data warehouses, including star schema, explore etl testing across extract, transform, and load, and compare etl with elt for cloud data warehouses.
Learn how a data warehouse consolidates heterogeneous data sources—from MySQL, MongoDB, flat files, and CSV—through ETL to enable reporting and business intelligence decisions.
Learn how a data warehouse centralizes data for analysis and how ETL extracts, transforms, and loads diverse sources, cleans data, and consolidates records.
Explore how etl tools like Pentaho extract data from csv and databases, transform with rules, and load into a data warehouse, emphasizing etl testing and data models.
Discover cloud data warehouses as scalable, on-demand, secure centralized stores that serve as one source of truth, with ELT loading data directly into the warehouse and transforming there.
Understand how fact tables store measures and foreign keys to dimension tables, how dimension tables provide context, and how star schema enables fast queries by joining only necessary tables.
Plan an end-to-end etl project by extracting from mysql and csv, apply dedup and city mapping rules, trim emails, compute total amount, and load into a star-schema data warehouse.
Create a source mysql database and csv data, load into a staging database, transform with pentaho in an etl pipeline, and build a star schema data warehouse for end-to-end testing.
Build an end-to-end etl pipeline by creating source, staging, and data warehouse schemas, including star schema tables, loading data into source, and implementing etl transformations with ready-to-run sql queries.
Explore how transformation and job components form an end-to-end ETL pipeline in Pentaho, executing sequentially to extract data from source tables, stage it, transform, and load into a data warehouse.
Learn how to extract data with the ETL extract feature, filtering by the latest staging timestamp to avoid duplicates, using shared connections and get and set variables.
Test delta extraction in an ETL flow by validating null and empty max load timestamps, using a JavaScript step to default to 1970, and verify data load behavior.
Learn how to perform delta load testing for ETL by validating staging against source, using max timestamp logic to prevent duplicates, and wiring jobs and transformations for end-to-end testing.
Create a separate transformation to clean staging data by removing duplicates, using order id and descending order to preserve the latest. Preview results and note changes aren’t written back yet.
Demonstrates updating records by removing beauty category, filtering rows, and applying string operations like email trimming in a data warehouse ETL workflow, with a QA testing focus.
Learn to test etl transformation logic with sql queries, covering data quality and transformation tests, including null city handling, duplicate checks by order id, and zero quantity or null emails.
Design a data warehouse with a star schema, creating a fact table and product and customer dimensions, then load staging data via an etl pipeline.
Learn how to load data into the prod dimension with an ETL tool, using insert and update logic on the dim product table to avoid duplicates and update categories.
Learn how to load data into the customer dimension table with etl tool operations, using email-based insert/update logic and fields like customer name, city, email, and source city.
Loads a fact table by joining prod and customer dimension data with staging through ETL lookups, extracting customer and product keys, and preparing for testing.
design and run an end-to-end etl pipeline that extracts from mysql and csv, transforms data, and loads into a star-schema data warehouse with product and customer dimensions and fact table.
Explore etl testing scenarios from delta testing in extraction to end-to-end data warehouse validation, including transformation logic and dimension checks for city and product data.
Learn functional ETL testing by verifying end-to-end data flow from source to data warehouse, using left joins to compare source and fact tables and identify missing records by order ID.
Master ETL testing and data warehouse fundamentals by validating data consistency across source, staging, and fact tables with associated queries and data quality checks on order records.
Verify fact and dimension relationships; detect orphan keys and duplicates via left joins to product and customer tables for etl tests. Preview how slowly changing dimensions will be covered next.
Learn how slowly changing dimensions track changes in dimension attributes, using type 1 for direct replacements and type 2 to preserve history with new rows and dates.
Learn ETL testing for slowly changing dimension type 2, validating two records, current flags, and correct row creation in the dimension table using SQL queries.
Learn how to use group by and aggregate functions in SQL to compute monthly and branch totals, apply where and having filters, and extract max, min, avg, and counts.
Master practical sql joins and group by for etl testing and data warehouse fundamentals, with inner, left, and right join examples merging two tables and counting employees by location.
A hands-on tutorial that takes you from the ground up and gives you a solid understanding of Data Warehouse and ETL Testing concepts.
What will you learn from this course?
Learn why and where ETL is required with a real-time business problem.
Understand the fundamentals of Data Warehousing and common data models such as Star Schema.
Gain a complete architectural overview of how ETL works with a Data Warehouse.
Get an overview of popular ETL tools used in the industry.
Build a real-time ETL project from scratch using Pentaho Data Integration (PDI) tool.
Understand the scope of ETL testing at each layer of the pipeline with practical examples.
Learn how to build ETL test scenarios and validate them using SQL queries.
Write test cases for advanced concepts such as Slowly Changing Dimensions (SCDs).
Explore Cloud Data Warehouses and how ETL/ELT fits in modern data stacks.
Understand the differences between ETL vs ELT and where each is applicable.
Discover the critical role of ETL data quality testing in training Large Language Models (LLMs) — ensuring reliable and accurate data pipelines is a key foundation for any AI/ML system.
Learn how bad data quality can lead to hallucinations, bias, and inaccurate results in LLM outputs, and why robust ETL testing is crucial before model ingestion.
Prerequisites:
Basic knowledge of SQL (Insert, Update, Delete).
Core SQL concepts such as Joins, Group By, and Subqueries are used frequently in ETL test scenarios.
A refresher on these SQL topics is available in the last section of the course — recommended for those who need it.