Data Quality & Profiling with ETL Pentaho DI & DataCleaner
3.4 (5 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
59 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Data Quality & Profiling with ETL Pentaho DI & DataCleaner to your Wishlist.

Add to Wishlist

Data Quality & Profiling with ETL Pentaho DI & DataCleaner

Explore how to improve the data quality by profiling, cleaning and automating the DQM process with ETL & Cleansing Tools
3.4 (5 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
59 students enrolled
Created by Rajkumar V
Last updated 7/2016
English
Current price: $10 Original price: $30 Discount: 67% off
5 hours left at this price!
30-Day Money-Back Guarantee
Includes:
  • 1.5 hours on-demand video
  • 4 Supplemental Resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Understand the key terms in Data Quality
  • Understand the basics of Pentaho Data Integration and Datacleaner tool
  • Understand how to install and use the tools for creating a Data profiling and cleansing jobs
  • Understand how to Integrate Datacleaner tool with PDI
  • Understand how to automate the key phases of Data Quality Management framework
  • Several hands on sessions to understand how to implement data quality jobs, data profiling jobs and how to schedule these jobs
View Curriculum
Requirements
  • You should have an exposure to Linux and basic shell commands
Description

Learn the key terminologies, basic concepts, implementation techniques that you will need to build fully functional data quality implementations with the popular ETL tool - Pentaho Data Integration and Data Quality tool - DataCleaner. Learn by comprehensive hands-on sessions to improve the data quality by profiling, cleansing and automating the DQM process with tools.

The concepts learned in this course can be applied in other ETL or Data Quality assurance tool as well. 

Take care of the data to take care of the business
Learning the fundamentals and understanding the implementation of Data Quality tasks is something very imperative for any business to keep up in this competitive world. Ensuring the data quality will keep your business or projects from losses.

There are plenty of opportunities in data domain, and being able to learn and appreciate the importance of data quality will give you a confidence to tackle the challenges that you encounter while handling data of any volume and format.

Content and Overview

Through this 8 session course, 43 lectures and 103 minutes of content, you will learn the key terms in Data Quality, basics of Pentaho Data Integration and Datacleaner tool, how to install the tools, how to automate the key data quality and data profiling tasks with series of demo sessions.

You can test the knowledge gained through the sessions by attending quizzes and every use case mentioned in the course are explained with demo sessions thereby enabling you to practice the newly learned skills. 

Downloadable Resources

You can also download the files (available in the last session of this course) used for demo sessions in the course to practice at your end.

Learners who completes this course will have the knowledge and confidence to implement fully functional automated data quality solutions in the projects.

Who is the target audience?
  • Developers on ETL, Database, Data Warehouse and BI domain
  • Testers on ETL, Database, Data Warehouse and BI domain
  • Automation builders
  • Data Governance professionals
  • Quality Analysts
  • Beginners or Newbies who are not familiar with Data Quality or PDI or DataCleaner tool but willing to learn on these areas
Students Who Viewed This Course Also Viewed
Curriculum For This Course
43 Lectures
01:42:59
+
What does this Course cover?
3 Lectures 07:53

A brief introduction about the Target Audience and what one could Benefit from this course.

Preview 02:14

This lecture focuses on explaining the Importance of Data Quality in any business domain and the Consequences when not ensuring data quality.

Why to bother about Data Quality?
01:57

This lecture focuses on to list what other lectures in the entire course will cover in a brief manner.

Preview 03:42
+
Building the Foundation
1 Lecture 10:29

At the end of this lecture, students will be able to List and Define some of the Key Terms in Data Quality domain.

Key Terminologies and Concepts in Data Quality
10:29
+
Tools Introduction And Installation
6 Lectures 13:01

This lecture focuses on to 

  1. Explain the Advantages of using a tool for Data Quality tasks and 
  2. Why Pentaho Data Integration and DataCleaner tools are applied in this area
Why to use a tool for DQA Tasks?
02:24

As part of this lecture, students will get to know about the Key building blocks of PDI

Preview 03:53

As part of this lecture, students will get to know about the Key building blocks of DataCleaner

Introduction to DataCleaner
02:13

This lecture focuses on how to Install PDI community edition and Verify the installation

Installation of PDI
00:44

This lecture focuses on how to Install DataCleaner community edition and Verify the installation

Installation of DataCleaner in Stand-alone mode
01:04

This lecture focuses on how to Install and Integrate DataCleaner with PDI

Installation of DataCleaner as Plug-in with PDI
02:43
+
Working with DataCleaner
12 Lectures 18:49

This lecture focuses on to List the topics covered as part of this section, that includes various features in DataCleaner tool and how to utilize or implement those features with a series of hands-on demos.

Section Outline
00:55

This demo session on DataCleaner focuses on to 

  • List available options to connect various data sources
  • How to connect the data sources
  • How to List the configured data sources
  • Edit and Remove the saved datastores
Preview 01:39

This demo session focuses on How to design a job with DataCleaner tool

Designing a Job with DataCleaner
02:08

This demo session focuses on How to Preview or Execute a DataCleaner job.

Previewing or Executing a DataCleaner Job
00:55

This demo session focuses on How to store the cleansed data with DataCleaner tool for immediate or later use.

Preserving Cleansed data with DataCleaner
00:59

This demo session focuses on various Command line options available with DataCleaner tool.

Command line options
01:34

This demo session focuses on How to execute the job in Command line.

Executing a DataCleaner Job in Command line
00:51

This demo session focuses on 

  1. How to schedule a DataCleaner job and 
  2. The idea presented shall be used for Scheduling the job in various operating systems

Scheduling a DataCleaner Job
01:07

This demo session focuses on 

  1. How to design a parameters enabled  job and
  2. How to execute the parameter enabled job in command line
Parameterizable DataCleaner Jobs
03:21

This demo session focuses on to

  1. List the various configuration files
  2. Configure using conf.xml file
Configuration in DataCleaner
01:28

After this lecture, students will get to know 

  1. The Default Log location
  2. The Default Log Config file location 
  3. How to change the logging path to a custom location
Logging in DataCleaner
01:29

This lecture focuses on 

  1. Various logging options in PDI
  2. How to Configure and Use various logging features in PDI
Logging in PDI
02:23
+
Integrating DataCleaner with PDI
5 Lectures 08:19
Section Outline
00:37

As part of this demo session, students will learn How to design a job in PDI using Spoon GUI and save it.

Additional resources related to various PDI job entries are added with this lecture.

Creating a Job with PDI
03:09

As part of this demo session, students will learn How to design a Transformation in PDI using Spoon GUI and save it.

Additional resources related to various PDI Transformation steps and instructions to configure database connectivity settings are added with this lecture.

Creating a Transformation with PDI
01:36

As part of this demo session, students will learn How to execute a job in PDI using Spoon GUI.

Executing a DataCleaner Job in PDI
01:53

As part of this demo session, students will learn 

  1. How to schedule a PDI job using Kitchen Utility and Crontab
  2. The idea explained can be used to Extend the scheduling of PDI job in other Operating systems with various schedulers. 
Scheduling a PDI Job
01:04
+
Walk Through on Demo Use Cases
5 Lectures 07:30
Section Outline
01:10

Data Quality Dimensions - Use Cases - Part 1
02:22

Data Quality Dimensions - Use Cases - Part 2
01:33

Data Quality Dimensions - Use Cases - Part 3
00:46

Data Profiling and Other Use Cases
01:39
+
Demo
10 Lectures 34:36

Demo on Conformity
04:12

Demo on Referential Integrity
03:13

Demo on Validity
01:58

Demo on Accuracy
02:13

Demo on Duplicate check - Single field
01:34

Demo on various Data Profiling Tasks
04:00

Demo on Duplicate check - Multiple fields and Adhoc Profiling
06:44

This demo session focuses on explaining

  1. How to integrate DataCleaner job with PDI
  2. The use case involved for integrating a DataCleaner job with PDI
  3. How to generalize the job design to store the various data quality issues observed in a database table and later use this stored information as part of data quality solution automation tasks.
Demo on Integrating DataCleaner Job with PDI
07:20

As part of this demo session, students will Takeaway tips to automate data quality solution by

  1. Executing a data quality control job using the tools and 
  2. Log the observations in a table
  3. How to use the persisted information for further notification and 
  4. Reports generation process
Tips on Automating Data Quality Solution
01:37
+
What Next?
1 Lecture 02:22

Thank You Note and Resources in form of 

  1. Reference links
  2. Helpful links to seek support, 
  3. Source code files and other relevant files used in various demos of this course are attached
Ways to move forward
02:22

Pentaho Data Integration Command Line Utilities

PDI Command Line
2 questions

DataCleaner Command Line Options

DataCleaner Command Line
2 questions

Parameterizable DataCleaner Jobs
2 questions
About the Instructor
Rajkumar V
4.0 Average rating
21 Reviews
161 Students
3 Courses
Data Architect

My name is Rajkumar and I am so excited to contribute the learning from my industrial experience.

With more than a decade of experience in IT, I have spent the majority of my time dealing with "DATA", that includes data modeling, data profiling, cleansing, data transformation, storage, retrieval, optimization, governance, mining and reporting. 

I have played various roles in my career that includes Developer, Data Modeler, Tester, Project Lead, Product Consultant, Data Architect, ETL Specialist, Solution Architect, Release Manager etc.

To sum up, I am absolutely passionate about anything to do with "DATA" and I am looking forward to share my passion and knowledge with you!