Data Engineering 101 with Kestra

Name: Data Engineering 101 with Kestra
Rating: 4.6 (73 reviews)

A hands-on course demonstrating workflow orchestration in Kestra

Created byShruti Mantri

Last updated 8/2024

English

What you'll learn

Get to know the Data Engineering 101 in a compact but informative way
Learn a cutting-edge data orchestration tool for Big Data: Kestra
Hands-on Practicals helping you understand Kestra clearly
Understand the best practices for using Kestra efficiently

Course content

6 sections • 26 lectures • 1h 57m total length

What is Data Engineering?4:59
Data Engineering Processes - Part 13:56
Data Engineering Processes - Part 27:54
Design and manage data pipelines with orchestration, scheduling, triggering, and monitoring using Kestra and other tools. Delve into data governance and veracity, covering cataloging, lineage, validation, cleansing, and data quality.
Batch Processing v/s Stream Processing4:04
ETL v/s ELT3:05
Explore data processing methodologies with ETL and ELT, learning when to transform before loading versus after loading, and how tools like Spark, DBT, and Redshift enable scalable data pipelines.
Data Lake v/s Data Warehouse3:22
Compare data lakes and data warehouses to decide when to store data in its raw format with schema on read versus schema on write, considering cost, flexibility, and performance.
Change Data Capture4:42
Master change data capture (cdc) to track inserts, updates, and deletes using logs, triggers, or polling; enable real-time data synchronization, efficient data integration, and support for event driven architectures.

Kestra: Orchestration Platform for Engineers7:27
Kestra is an orchestration platform for engineers that enables declarative YAML workflows and API-first automation. Schedule events, run anywhere, and integrate with clouds and tools via a rich plugin ecosystem.
Kestra UI Overview10:36
Explore the Kestra UI overview, from the welcome page to creating and editing flows. Navigate the flows, source and topology views, documentation, executions, dashboards, triggers, plugins, and admin settings.
Flows and Tasks4:55
Discover how flows and tasks are defined in YAML, identified by an id and namespace, and how flowable and runnable tasks orchestrate work like logs, http calls, and retries.
Namespaces2:02
Explore namespaces as logical groupings for flows, like folders that organize environments, projects, teams, and departments, using dot-separated, indefinitely nestable names such as company dot engineering dot product one.

Kestra Architecture5:32
Explore Kestra architecture, comparing JDBC-based backend and Kafka with Elasticsearch for scalable, fault-tolerant orchestration, including metadata servers, schedulers, executors, workers, and enterprise deployment options.
Installing Kestra using Docker Compose3:29
Install kestra with docker compose by following the installation guide, verifying docker and docker compose versions, and launching the kestrel and postgres services, then access the kestrel ui at localhost:8080.
Other Kestra Installations1:54
Explore diverse Kestra installations, from Docker and Docker Compose to Kubernetes clusters on AWS EKS, GCP GKE, and Azure AKS, including Podman rootless setups.

Flow Architecture1:14
Design a flow architecture that downloads orders and products csv files from http, loads them into Postgres, performs a join, and uploads enriched orders to MongoDB, all orchestrated with Kestra.
Installing Postgres and MongoDB4:38
Set up Postgres and MongoDB with Docker, running Postgres on port 15432 and MongoDB on port 27017, then create an order_details collection.
Installing Kestra with Secrets2:40
Install Kestra server by creating base64 encoded secrets for PostgreSQL and MongoDB, save them in kestra.env, update docker-compose to use the env file, and start with docker compose up -d.
Flow Creation - HTTP Download4:59
Create a parallel flow with two sequential tasks to download orders.csv and products.csv via http, saving the files to internal storage and validating outputs.
Flow Creation - Using Postgres Plugin10:08
Create and populate orders and products tables in a Postgres flow using http download and copy in, with csv data and a clear records step.
Flow Creation - Query Postgres and Upload Data to MongoDB5:35
Create a Kestra data flow that extracts data from orders and products in Postgres, performs a join on orders.product_id = products.product_id, and uploads joined dataset to MongoDB's order details collection.
Outputs and Metrics3:16
Clears MongoDB records before loading new data, deletes many records, then loads and updates data, and reviews outputs and metrics including deleted and inserted counts, and joined results.
Plugin Defaults3:40
Define plugin defaults for PostgreSQL query and copy tasks to share common attributes like URL, username, and password, reducing repetition with validated syntax and execution.

Kestra Plugins3:40
Explore Kestra plugins spanning core tasks, data stores like MySQL and PostgreSQL, queues such as Kafka and Kinesis, and ELT and notification integrations, plus CDC and infrastructure plugins.
Kestra Adoption3:21
Explore how Kestra adoption across companies replaces Apache Airflow with simpler, faster, more reliable orchestration, as Leroy Merlin, Clever Cloud, and Chorus share data mesh and automated reporting success.

Requirements

You do not need any prior knowledge for this course

Description

Unlock the full potential of data engineering with our comprehensive course on Kestra, a powerful open-source data orchestration platform that's streamlines complex workflows across a wide range of industries and domains.

This course starts by building a solid foundation in the basics of data engineering, ensuring you have the essential knowledge needed to delve into more advanced topics. We then introduce Kestra, an advanced open-source tool designed to simplify and enhance the management of complex data workflows.

Throughout the course, you'll explore Kestra’s user-friendly interface, which allows for intuitive navigation and seamless workflow creation. You'll learn how to design and implement data workflows using Kestra’s visual flow editor, making complex data processes straightforward and manageable. We guide you through the process of writing detailed workflows, incorporating various components, and adding triggers to automate and optimize your data pipelines.

Kestra has quickly become a favored choice among industries due to its flexibility and scalability. Leading organizations across a wide range sectors have adopted Kestra to streamline their data operations, from ETL processes to real-time data integration, enhancing overall efficiency and responsiveness. By mastering Kestra, you’ll gain practical skills that are highly valued in the industry, preparing you to tackle real-world data engineering challenges.

This course not only teaches you how to effectively use Kestra but also offers insights into industry best practices and real-world applications. It’s an invaluable resource for anyone looking to advance their career in data engineering and workflow automation. Join us to deepen your expertise, stay ahead in the dynamic field of data engineering, and leverage Kestra to its fullest potential.

Who this course is for:

Those interested to learn about Data Engineering
Those who want to learn Kestra from SCRATCH to its Live Project Implementation
Those who want to learn about the modern tool in Data Orchestration and their capabilities

Data Engineering 101 with Kestra

What you'll learn

Explore related topics

Course content

Introduction to Data Engineering7 lectures • 32min

Introduction to Kestra4 lectures • 25min

Kestra Architecture and Installation3 lectures • 11min

ETL using Kestra8 lectures • 36min

Kestra Triggers2 lectures • 6min

Kestra Industry Adoption2 lectures • 7min

Requirements

Description

Who this course is for: