
Explore data quality measurement dimensions, including accuracy, completeness, consistency, timeliness, validity, and uniqueness, to assess data usability and maintain reliable, fit-for-use information.
Maintain data uniqueness by preventing duplicates across customer records from multiple touchpoints through strict validation and deduplication, ensuring reliable data for accurate analysis and informed decision making.
Define data quality and its business impact, and introduce data quality measurement dimensions: accuracy, completeness, consistency, timeliness, validity, and uniqueness—preparing for data quality testing.
Navigate a data pipeline from operational data through etl to data lake, warehouse, and marts, highlighting data quality testing at each stage for schema, completeness, and rules.
Define data quality testing as verifying that data meets accuracy, completeness, and consistency standards, and integrate quality checks in the data pipeline with real-time monitoring, automation, and Great Expectations.
Explore how great expectations core enables automated data quality testing within data pipelines, validating data against defined standards and safeguarding data integrity with the open source Python library.
Explore the core building blocks of Great Expectations, from data context to docs, and learn how to write data quality tests, run validations, and track quality with a checkpoint.
Control the attributes in the validation result using a checkpoint in the data context hierarchy, and explore the four formats: boolean, basic, summary, and complete.
Learn how to create parameterized expectations that adapt to different data sets by supplying runtime temperature ranges via a parameter object, enabling reusable checks across checkpoints.
Define parameterized expectations for temperature range and city names using variables, then run a checkpoint with a runtime object to supply values dynamically, reusing the same suite across data sets.
Define conditional expectations that evaluate the temperature range only for Mumbai records using pandas syntax. Verify the subset contains 31 Mumbai records out of 62, matching the Delhi count.
Apply set based expectations to validate categorical data against predefined sets and exclude unwanted entries, such as ensuring column values are in set and values not in set.
Learn to build a customized SQL expectation in Great Expectations to validate data directly in the database using an unexpected row query for a temperature range.
Reload the file data context, load and reuse configurations with get_context(mode='file'), reload checkpoints, and run validations to preserve results while exploring data sources, data assets, expectations, and validation definitions.
Data Quality Testing Unleashed: From Theory to Implementation is your comprehensive roadmap to mastering Data Quality Testing using Python and the powerful Great Expectations framework. It is designed for those who want to elevate their data projects by ensuring high-quality and reliable data. This course takes you from foundational principles to hands-on implementation.
In this course, we'll explore:
Fundamentals of Data Quality & Testing: Discover the core principles that underpin data quality and testing, with a focus on critical dimensions like accuracy, completeness, and consistency. You’ll understand how these elements contribute to trustworthy, dependable data.
Introduction to the Great Expectations Framework: Gain proficiency with Great Expectations, the leading open-source tool for data validation, documentation, and profiling. This framework is crafted to set and enforce data standards, ensuring that data meets the highest quality benchmarks.
The Building Blocks of Great Expectations: Uncover the core components of Great Expectations, learning how to structure workflows that bring them to life. You’ll dive into the extensive expectations library, equipping yourself with versatile tools to meet diverse data validation needs.
Hands-On Data Quality Testing: With a focus on practical application, this course will guide you through creating multiple testing workflows. You’ll learn how to publish results, automate actions based on test outcomes, and build experience in efficiently managing data quality testing in real-world scenarios.
By the end of this course, you’ll have a thorough understanding of data quality testing principles and hands-on skills in applying the Great Expectations framework. You’ll be ready to deliver data that meets rigorous quality standards and confidently contribute to any data project with best-in-class testing practices.