Building a Data Mart with Pentaho Data Integration

A step-by-step tutorial that takes you through the creation of an ETL process to populate a Kimball-style star schema
4.0 (3 ratings) Instead of using a simple lifetime average, Udemy calculates a
course's star rating by considering a number of different factors
such as the number of ratings, the age of ratings, and the
likelihood of fraudulent ratings.
90 students enrolled
$85
Take This Course
  • Lectures 25
  • Contents Video: 2 hours
  • Skill Level All Levels
  • Languages English
  • Includes Lifetime access
    30 day money back guarantee!
    Available on iOS and Android
    Certificate of Completion
Wishlisted Wishlist

How taking a course works

Discover

Find online courses made by experts from around the world.

Learn

Take your courses with you and learn anywhere, anytime.

Master

Learn and practice real-world skills and achieve your goals.

About This Course

Published 12/2015 English

Course Description

Companies store a lot of data, but in most cases, it is not available in a format that makes it easily accessible for analysis and reporting tools. Ralph Kimball realized this a long time ago, so he paved the way for the star schema.

Learning Pentaho Data Integration walks you through the creation of an ETL process to create a data mart based on a fictional company. This course will show you how to source the raw data and prepare it for the star schema step-by-step. The practical approach of this course will get you up and running quickly, and will explain the key concepts in an easy to understand manner.

Learning Pentaho Data Integration teaches you how to source raw data with Pentaho Kettle and transform it so that the output can be a Kimball-style star schema. After sourcing the raw data with our ETL process, you will quality check the data using an agile approach. Next, you will learn how to load slowly changing dimensions and the fact table. The star schema will reside in the column-oriented database, so you will learn about bulk-loading the data whenever possible. You will also learn how to create an OLAP schema and analyze the output of your ETL process easily.
By covering all the essential topics in a hands-down approach, you will be in the position of creating your own ETL processes within a short span of time.

What are the requirements?

  • You need to have a basic understanding of star schemas and Pentaho Data Integration to take the next step: setting everything into practice.

What am I going to get from this course?

  • Create a star schema
  • Populate and maintain slowly changing dimensions type 1 and type 2
  • Load fact and dimension tables in an efficient manner
  • Use a columnar database to store the data for the star schema
  • Analyze the quality of the data in an agile manner
  • Implement logging and scheduling for the ETL process
  • Get an overview of the whole process: from source data to the end user analyzing the data
  • Learn how to auto-generate data for a date dimension

What is the target audience?

  • If you are are eager to learn how to create an ETL process to populate a star schema, and at the end of the course you want to be in a position to apply your new knowledge to your specific business requirements, then Learning Pentaho Data Integration is for you.

What you get with this course?

Not for you? No problem.
30 day money back guarantee.

Forever yours.
Lifetime access.

Learn on the go.
Desktop, iOS and Android.

Get rewarded.
Certificate of completion.

Curriculum

Section 1: Getting Started
06:49

Get an insight into the raw data, which we will be working with in this video tutorial.

04:29

Create a Star Schema derived from the raw data.

07:07

We will create the required databases for our project, add JDBC drivers to PDI and create JDNI connections.

Section 2: Agile BI – Creating ETLs to Prepare Joined Data Set
03:22

Create an ETL transformation that imports your raw data so that you can apply further manipulation further down the stream and output the data to the Datamart.

04:33

We will learn how to easily make sure that the data types of the ETL output step are in sync with the database table column types.

04:32

Loading huge amounts of data in the traditional way takes too long ,speed it up by using the bulk loader.

Section 3: Agile BI – Building OLAP Schema, Analyzing Data, and Implementing Required ETL I
03:25

In this first step to Agile ETL development, you will learn how to create a Pentaho Analysis Model so that you can analyze the data later on in Pentaho Analyzer.

03:49

A very important point is to understand the quality of the data: Are there any duplicates, misspellings and so on. We will find such problems and use this new knowledge to feed it back to the ETL design.

04:15

Learn how to implement ETL improvements to iron out the data problems found.

Section 4: Slowly Changing Dimensions
06:47

Learn how to populate a simple dimension.

04:58

Learn how to populate a simple dimension and make it future proof.

05:18

Learn how to keep historic versions in your dimension table

Section 5: Populating Data Dimension
05:17

In order to make our date dimension transformation more dynamic, we will allow users to define a start and end date in order to specify the period.

04:26

Based on the provided parameters, the amount of days between the start and end date will be calculated. This figure will be used to generate a data set with the same number of rows.

06:27

In this part, you will learn how to derive various date attributes such as year, week, day and so on. from the input date.

Section 6: Creating the Fact Transformation
03:52

Learn how to efficiently create an input query for your fact transformation.

04:28

Learn how to configure the step to look up the SCD type 1 keys.

06:08

Learn how to configure the step to look up the SCD type 2 keys.

Section 7: Orchestration
06:20

In our setup, dimensions can be loaded in parallel; therefore, we can create an ETL job

04:09

We will create the main job, which runs all the required child jobs and transformations

Section 8: ID-based Change Data Capture
04:58

In this section, you will learn how the new data can be automatically loaded into Datamart using the Change Data Capture (CDC) approach.

04:48

We will define the order of execution for all the transformations involved.

Section 9: Final Touches: Logging and Scheduling
01:22

We will create a dedicated environment for logging.

04:22

Pentaho Kettle features built-in logging. You will learn how to configure them.

05:30

Learn how to schedule a daily run of your ETL process.

Students Who Viewed This Course Also Viewed

  • Loading
  • Loading
  • Loading

Instructor Biography

Packt Publishing, Tech Knowledge in Motion

Over the past ten years Packt Publishing has developed an extensive catalogue of over 2000 books, e-books and video courses aimed at keeping IT professionals ahead of the technology curve. From new takes on established technologies through to the latest guides on emerging platforms, topics and trends – Packt's focus has always been on giving our customers the working knowledge they need to get the job done. Our Udemy courses continue this tradition, bringing you comprehensive yet concise video courses straight from the experts.

Ready to start learning?
Take This Course