
Design and manage data pipelines with orchestration, scheduling, triggering, and monitoring using Kestra and other tools. Delve into data governance and veracity, covering cataloging, lineage, validation, cleansing, and data quality.
Explore data processing methodologies with ETL and ELT, learning when to transform before loading versus after loading, and how tools like Spark, DBT, and Redshift enable scalable data pipelines.
Compare data lakes and data warehouses to decide when to store data in its raw format with schema on read versus schema on write, considering cost, flexibility, and performance.
Master change data capture (cdc) to track inserts, updates, and deletes using logs, triggers, or polling; enable real-time data synchronization, efficient data integration, and support for event driven architectures.
Kestra is an orchestration platform for engineers that enables declarative YAML workflows and API-first automation. Schedule events, run anywhere, and integrate with clouds and tools via a rich plugin ecosystem.
Explore the Kestra UI overview, from the welcome page to creating and editing flows. Navigate the flows, source and topology views, documentation, executions, dashboards, triggers, plugins, and admin settings.
Discover how flows and tasks are defined in YAML, identified by an id and namespace, and how flowable and runnable tasks orchestrate work like logs, http calls, and retries.
Explore namespaces as logical groupings for flows, like folders that organize environments, projects, teams, and departments, using dot-separated, indefinitely nestable names such as company dot engineering dot product one.
Explore Kestra architecture, comparing JDBC-based backend and Kafka with Elasticsearch for scalable, fault-tolerant orchestration, including metadata servers, schedulers, executors, workers, and enterprise deployment options.
Install kestra with docker compose by following the installation guide, verifying docker and docker compose versions, and launching the kestrel and postgres services, then access the kestrel ui at localhost:8080.
Explore diverse Kestra installations, from Docker and Docker Compose to Kubernetes clusters on AWS EKS, GCP GKE, and Azure AKS, including Podman rootless setups.
Design a flow architecture that downloads orders and products csv files from http, loads them into Postgres, performs a join, and uploads enriched orders to MongoDB, all orchestrated with Kestra.
Set up Postgres and MongoDB with Docker, running Postgres on port 15432 and MongoDB on port 27017, then create an order_details collection.
Install Kestra server by creating base64 encoded secrets for PostgreSQL and MongoDB, save them in kestra.env, update docker-compose to use the env file, and start with docker compose up -d.
Create a parallel flow with two sequential tasks to download orders.csv and products.csv via http, saving the files to internal storage and validating outputs.
Create and populate orders and products tables in a Postgres flow using http download and copy in, with csv data and a clear records step.
Create a Kestra data flow that extracts data from orders and products in Postgres, performs a join on orders.product_id = products.product_id, and uploads joined dataset to MongoDB's order details collection.
Clears MongoDB records before loading new data, deletes many records, then loads and updates data, and reviews outputs and metrics including deleted and inserted counts, and joined results.
Define plugin defaults for PostgreSQL query and copy tasks to share common attributes like URL, username, and password, reducing repetition with validated syntax and execution.
Discover triggers in Kestra, including scheduled, flow, webhook, real-time, and polling triggers. Learn their key properties, like id, type, description, disabled, and worker group key.
Explore triggers in action by building a Postgres table and a pull-based trigger that runs every 30 seconds to detect new records and trigger a flow.
Explore Kestra plugins spanning core tasks, data stores like MySQL and PostgreSQL, queues such as Kafka and Kinesis, and ELT and notification integrations, plus CDC and infrastructure plugins.
Explore how Kestra adoption across companies replaces Apache Airflow with simpler, faster, more reliable orchestration, as Leroy Merlin, Clever Cloud, and Chorus share data mesh and automated reporting success.
Unlock the full potential of data engineering with our comprehensive course on Kestra, a powerful open-source data orchestration platform that's streamlines complex workflows across a wide range of industries and domains.
This course starts by building a solid foundation in the basics of data engineering, ensuring you have the essential knowledge needed to delve into more advanced topics. We then introduce Kestra, an advanced open-source tool designed to simplify and enhance the management of complex data workflows.
Throughout the course, you'll explore Kestra’s user-friendly interface, which allows for intuitive navigation and seamless workflow creation. You'll learn how to design and implement data workflows using Kestra’s visual flow editor, making complex data processes straightforward and manageable. We guide you through the process of writing detailed workflows, incorporating various components, and adding triggers to automate and optimize your data pipelines.
Kestra has quickly become a favored choice among industries due to its flexibility and scalability. Leading organizations across a wide range sectors have adopted Kestra to streamline their data operations, from ETL processes to real-time data integration, enhancing overall efficiency and responsiveness. By mastering Kestra, you’ll gain practical skills that are highly valued in the industry, preparing you to tackle real-world data engineering challenges.
This course not only teaches you how to effectively use Kestra but also offers insights into industry best practices and real-world applications. It’s an invaluable resource for anyone looking to advance their career in data engineering and workflow automation. Join us to deepen your expertise, stay ahead in the dynamic field of data engineering, and leverage Kestra to its fullest potential.