Udemy Business

Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Apache Airflow: The Hands-On Guide

Name: Apache Airflow: The Hands-On Guide
Rating: 4.4 (5699 reviews)

Master Apache Airflow from A to Z. Hands-on videos on Airflow with AWS, Kubernetes, Docker and more

Created byMarc Lamberti

Last updated 11/2025

English

What you'll learn

Coding Production Grade Data pipelines by Mastering Airflow through Hands-on Examples
How to Follow Best Practices with Apache Airflow
How to Scale Airflow with the Local, Celery and Kubernetes Wxecutors
How to Set Up Monitoring with Elasticsearch and Grafana
How to Secure Airflow with authentication, crypto and the RBAC UI
Core and Advanced Concepts with Pros and Limitations
Mastering DAGs with timezones, unit testing, backfill and catchup
Organising the DAG folder and keep things clean

Course content

10 sections • 103 lectures • 10h 54m total length

Important Prerequisites1:09
Develop Python-based data pipelines with Airflow by meeting essential prerequisites: solid Python programming experience and Docker, enabling a shared environment for guided hands-on learning.
The Roadmap1:42
Navigate the airflow roadmap from basics to advanced features, build your first data pipeline, and learn to scale, run in production, monitor, and secure your environment.
Who I am?1:03
Meet the instructor who builds daily data pipelines with Apache Airflow and explains running Airflow in production with Astronema, inviting you to LinkedIn, YouTube, and Udemy for tips.
Development Environment2:27
Set up your development environment for airflow by installing Docker, using the Astro CLI to run locally, verify with astro version, and follow OS-specific steps for Mac, Windows, and Linux.
Learning Advice [Must Read]0:26
Share your data engineering projects0:15

Why data orchestration?2:17
Define data orchestration to coordinate extraction, cleaning, transformation, and loading with dependencies and automatic retries using Airflow. Manage thousands of tasks and monitor data workflows at scale.
Why Airflow?8:15
Discover why Airflow is a scalable data orchestrator that manages dependencies, enables Python-based pipelines, and integrates with Airbyte, dbt, and Snowflake for end-to-end data workflows.
The Core Components2:41
Map airflow's core components: the web server and user interface, the metadata database, the scheduler, and the executor with its worker, including Kubernetes, Celery, and local options.
The Core Concepts2:12
Discover airflow's core concepts, including tasks and operators, and learn how to build a dag by linking tasks with dependencies into a data pipeline.
How does Airflow work?3:05
See how Airflow orchestrates a dag from the DAGs directory to a Dag run using the scheduler, metadata database, and executor. Monitor task states and progress via the web UI.
Airflow limitations1:20
Explore airflow limitations, including its batch oriented design and lack of real-time streaming. Pipelines may run with delays; integrate with Kafka, and offload processing to Spark to avoid memory overflow.
[Practice] Installing Airflow11:47
Explore multiple Airflow installation methods, from pip to Docker and the Astro CLI, and learn best practices for a reliable, containerized Airflow environment.
IMPORTANT0:11
[Practice] Quick tour of Airflow UI7:42
Explore the Airflow user interface, focusing on the DAGs view and grid, graph, gantt, code, and task duration views to monitor and troubleshoot data pipelines.
[Practice] Quick tour of Airflow CLI4:23
Discover how the Airflow CLI enhances workflow control beyond the UI, supports CI/CD pipelines, and access via Astro CLI or Docker for essential DAG, database, and task commands.
[Practice] The Rest API2:01
Explore the Airflow rest api and its endpoints to trigger data pipelines from external tools. Review authentication options, view dag routes, and consult the official docs for practical use.

Introduction0:40
Build data pipelines with Airflow by defining tasks such as waiting for an event, executing Python functions, and interacting with a database, then monitor and debug the workflow.
The Project! What you will build ?2:10
Build a stock market data pipeline in Airflow that fetches Apple's prices, stores in Minio, formats with Spark, loads to Postgres, and powers dashboards with Metabase.
Project materials5:44
Learn to set up the project with Docker and docker compose, download and unzip the materials from Udemy, and explore the docker compose services including Airflow, MinIO, Spark, and Metabase.
Running the new environment4:16
Build docker images for spark master and worker, start the project with astro dev, and access the airflow user interface and postgres while verifying containers and ports are running.
Important0:11
Import warnings are OK0:16
Create the DAG with the dag decorator2:55
Create a dag using the dag decorator in Apache Airflow to verify Yfinance API availability for a stock market data pipeline with daily scheduling and tags.
The new way of authoring DAGs with Taskflow6:10
Explore the task flow API in Airflow, using dag and task decorators to reduce boilerplate and share data via xcoms between tasks, with automatic dependencies.
Playing with the Taskflow API
Checking API availability with the Sensor decorator8:37
Poll yfinance api using the task.sensor decorator every 30 seconds for up to 5 minutes, returning a poc return value when available. Set up stock_api connection and verify via requests.
Fetching stock prices with the PythonOperator9:25
Learn to fetch stock prices with Airflow by building a PythonOperator task that calls a finance API, uses templating and XComs, and returns Nvidia's latest stock data.
Storing stock prices in MinIO (AWS S3 like)11:46
Store stock prices in Minio, connecting to a Minio server like an aws s3 bucket, creating a bucket, building a Minio client, and saving json data under company symbols.
Formatting stock prices with Spark and the DockerOperator11:29
Format stock prices by running a spark job inside a docker container via the docker operator, converting prices.json in MinIO to a CSV file with a header.
Fetching formatted prices from MinIO (AWS S3 like)6:49
Build an Airflow task to fetch the formatted stock prices CSV from MinIO using a Python operator, a MinIO client, and file-not-found handling in the data pipeline.
The best way to load files into data warehouses with Postgres and Astro SDK7:37
Configure an Airflow task to load a CSV into Postgres using the Astro SDK loadfile operator, from Minio to stock_market in the public schema, and validate via logs.
Creating the dashboard to track Apple stock with Metabase4:27
Build a Metabase dashboard to monitor Apple stock prices stored in your data warehouse, creating questions for average closing price, volume, and closing price visualizations.
The pipeline in action!1:12
See Airflow ui run the stock dag, triggering a data pipeline that fetches stock prices, formats them to csv, and loads to dw to create a dataset.
Getting alerts on Slack with the new Notifiers6:24
Airflow 2.7 introduces notifiers that encapsulate notification logic for task success or failure, enabling Slack and other notifier integrations via plug‑and‑play providers.

Set up the new Airflow environment1:40
Clean the environment by stopping Docker containers with astro dev stop, then initialize a new Airflow project locally using astro dev init for a fresh data pipeline.
The best way to create your DAGs4:54
Explore three ways to define your dags in airflow—the old dag notation, the with dag context manager, and the dag decorator—and learn why the dag decorator is recommended.
The parameters your DAGs need6:29
Define essential Airflow dag parameters, including dag id, start date, schedule, catch up, description, tags, default arguments, dagrun timeout, and max consecutive failed dag runs.
DAG scheduling: the basics3:14
Explore basics of dag scheduling, including start date, schedules, and presets. See how each dag run defines a data interval and how max active runs sets the limit per dag.
Backfill and Catchup2:03
Explore how backfilling and catch up control dag runs in Airflow, and learn to disable catch up or backfill with CLI or UI for targeted reruns.
The most important rule to follow when creating tasks3:42
Backfill your dag safely by ensuring tasks are idempotent to avoid duplicates when rerunning data pipelines, and use the data_interval_end variable in SQL instead of now.
Play by scheduling your DAGs10:34
Explore configuring Airflow dag scheduling, including start date, daily versus weekly cadence, and catch up behavior. Learn to manage dag runs, data intervals, and safe schedule changes.
Dealing with timezones in Airflow5:39
Learn how Airflow handles time zones, store data in UTC, display in local time, and use pendulum, while comparing cron and timedelta scheduling across daylight saving time.
Scheduling DAGs based on data with Datasets4:34
Airflow enables data-based scheduling of DAGs using datasets, allowing a producer and consumer DAG to link via a dataset update that triggers the downstream workflow.
Conditional Dataset scheduling3:46
Master conditional dataset scheduling with and/or operators to run a dag when A or B and C or D update, using parentheses in the schedule parameter and time tables.
Datasets in action!8:15
Create and link Airflow dags that extract JSON data from an API, write it to a dataset, and trigger a dependent dag with dataset scheduling.
Sharing data between task with XComs7:56
Learn how to share data between Airflow tasks using XComs. Push values with xcom_push and retrieve them with xcom_pool, stored in the Airflow meta database.
Organize your DAGs folder and clean your DAGs6:40
Centralize data sets in an include folder, reference them from your dag files, and use an ignore file to keep the dags folder lean and fast.
Manage task and DAG failures10:47
Configure on success and on failure callbacks for dag and task runs, inspect context, and implement retries with default args and retry exponential backoff.
Test your tasks and DAGs14:17
Discover how to test Apache Airflow pipelines with dag tests, validation tests, unit tests, and integration tests using the Astro CLI, then debug with IDE breakpoints.

The right way of grouping tasks10:47
Learn how to group tasks with task groups in Airflow to improve organization, apply defaults to sets of tasks, enable dynamic mapping, and create modular, reusable task blocks across DAGs.
Choosing tasks with branching and conditions8:12
Learn to use Airflow's branch operator and branch Python operator to route tasks based on conditions, such as data size or a cocktail being alcoholic, with task and group dependencies.
Changing execution behaviours with Trigger Rules13:24
Discover how trigger rules change when a task runs in Airflow. Explore defaults like all_success and alternatives such as all_done, all_skipped, one_failed, and non_failed_mean_one_success.
Templating your tasks11:42
Discover how airflow templating uses a template engine to replace placeholders with runtime data. Use ds, data interval start and end, templates dict, and template fields to avoid hardcoding.
The smart way of storing data with Custom XCOM backends11:17
Learn to store XCom data outside the airflow meta database using a custom XCom backend with AWS S3, enabling versioning and archiving, then configure and verify the setup.
Using variables to avoid hardcoding values9:50
Learn how to avoid hardcoding values in Airflow by using variables, created via the UI, the CLI, or environment variables, with encryption and template integration for reuse across tasks.

Executing tasks sequentially with the SequentialExecutor and SQLite6:46
This lecture explains the executor inside the airflow scheduler and how the sequential executor runs tasks one at a time using sqlite, with config overrides via env vars.
Executing tasks in parallel with the LocalExecutor and Postgres4:17
The local executor runs tasks in parallel within the scheduler using Postgres, avoids SQLite for concurrent writes, and is configured via the astro CLI.
Concurrency settings to control how tasks and dags run in parallel7:03
Configure parallelism, max active tasks per dag, max active runs per dag, and max activities per dag to control Airflow concurrency and understand how tasks run across dag runs.
Start scaling Airflow with the CeleryExecutor12:22
Learn to scale Airflow with the Celery executor by distributing tasks across multiple workers via a Redis broker, using Docker Compose, and monitoring a sample DAG.
Track your tasks using Flower with the CeleryExecutor6:09
Monitor airflow with Flower to view real-time status of celery executor workers and tasks, explore the default redis queue, and adjust max concurrency using the airflow__celery__walker_concurrency setting.
Add new workers and configure queues to distribute your tasks11:04
Add a second Airflow worker and create GPU and CPU queues to distribute tasks. Restart Docker Compose, run the DAG, and confirm tasks run on the correct workers.
Quick introduction to Kubernetes10:02
Discover how airflow runs on Kubernetes using the celery executor, and learn Kubernetes concepts—pods, nodes, master node, scheduler, controller—and how Helm deploys airflow on a Docker Desktop cluster.
Introduction to the KubernetesExecutor10:05
Explore the shift from celery to the Kubernetes executor, discovering one task per pod isolation, granular resources and environment customization, and the Kubernetes API driven scheduling in a Kubernetes cluster.
Installing Airflow on a Kubernetes cluster8:13
Install Airflow on a Kubernetes cluster with Helm, create the airflow namespace, verify nodes and pods, and access the UI via port-forward for admin access.
How to configure Airflow on Kubernetes6:24
Configure Airflow on Kubernetes with Helm by editing values.yaml, upgrading the deployment, and verifying pods and the web UI, then switch from Celery to Kubernetes executor for scalable DAG execution.
Deploying DAGs with Airflow on Kubernetes using GitSync12:43
Deploy airflow on Kubernetes and fetch dags with a git sync sidecar, then configure a git repo, SSH keys, and Kubernetes secret to run dags on the Kubernetes executor.

Introduction1:28
Set up a Kubernetes cluster with EKS and Rancher, install Airflow, and run it with the Kubernetes Executor in AWS, including creating an EC2 instance, IAM user, and ECR repo.
Quick overview of AWS EKS3:45
Explore AWS EKS as a managed Kubernetes service, compare it to ECS, and learn how to deploy Airflow with the Kubernetes Executor on AWS while considering costs.
[Practice] Set up an EC2 instance for Rancher8:17
Set up an ec2 instance for rancher on aws using amazon linux 2, t2.small memory, and open http and https ports; install docker and run rancher to access its interface.
[Practice] Create an IAM User with permissions2:34
Create an IAM user in the AWS console for programmatic access, attach AdministratorAccess for testing, download the credentials CSV, and review permissions for Rancher to set up an EKS cluster.
[Practice] Create an ECR repository6:49
Configure an ECR to store and deploy docker images for Airflow, install and configure the AWS CLI, build and tag the image, then push it as v1.0.
[Practice] Create an EKS cluster with Rancher6:21
Create an Amazon EKS cluster with Rancher to simplify Kubernetes deployment on AWS. Configure credentials, region, and kubectl access, then monitor with the Rancher dashboard.
How to access your applications from the outside4:19
Explore how to expose a Kubernetes web server from outside the cluster using NodePort, LoadBalancer, and Ingress, with ClusterIP as the internal address.
[Practice] Deploy Nginx Ingress with Catalogs (Helm)4:56
Install nginx ingress in your Kubernetes cluster using Rancher catalogs and Helm charts, enable the Helm catalog, launch nginx ingress, customize values.yaml, and access the cluster via port 80/443.
[Practice] Deploy and run Airflow with the Kubernetes Executor on EKS5:21
Deploy and run Airflow on an EKS cluster with the Kubernetes executor, using a catalog to install airflow-eks, then verify via the Airflow UI and a running DAG.
[Practice] Cleaning your AWS services2:50
Clean up your AWS setup after using Rancher by terminating the EKS cluster, deleting CloudFormation stacks, terminating EC2 instances, removing load balancers, and deleting the VPC to avoid charges.

Introduction1:28
Monitor your Airflow instance and DAGs in production by configuring logging and dashboards. Explore ELK and TIG stacks to visualize metrics and set Grafana alerts.
How the logging system works in Airflow3:43
Discover how Airflow uses Python logging to manage logs from web server, scheduler, and workers. Configure loggers, handlers, and formats in airflow.cfg, with optional remote logging via REMOTE_LOG_CONN_ID and REMOTE_LOGGING.
[Practice] Setting up custom logging17:16
Explore Airflow logging customization by editing airflow.cfg parameters like base_log_folder and loglevel, configuring logging_config_class, and creating a custom log_config.py with the default logging config.
[Practice] Storing your logs in AWS S314:40
Store Airflow logs in AWS S3 by creating a bucket and an IAM user with read/write access, then enable remote logging with the AWSS3LogStorage connection.
Elasticsearch Reminder4:13
Explore how Elasticsearch stores, searches, and analyzes json documents and logs in near real time. Learn how indices, mappings, and the Elk stack enable ingestion, visualization, and monitoring for Airflow.
[Practice] Configuring Airflow with Elasticsearch18:08
Configure Airflow to read logs from Elasticsearch via an ELK stack with Logstash and Filebeat. Set up containers, generate log_id and offset fields, and visualize DAG logs in Kibana.
[Practice] Monitoring your DAGs with Elasticsearch10:40
Learn to monitor Airflow DAGs with Elasticsearch and Kibana by creating indices, dashboards, and visualizations that track failed tasks over the last seven days.
Introduction to metrics4:33
Learn how to monitor Airflow using metrics sent via UDP to StatsD, including counters, gauges, and timers, and visualize them with the TIG stack (Telegraf, InfluxDB, Grafana).
[Practice] Monitoring Airflow with TIG stack12:12
Set up the tig stack to monitor airflow by configuring telegraf to send metrics to influxdb. Build a grafana dashboard to visualize airflow metrics, including dagbag_size.
[Practice] Triggering alerts for Airflow with Grafana11:30
Set up Grafana alerts for Airflow by triggering the logger_dag to collect metrics, configure an smtp email channel via Gmail, and alert when the logger_dag.t2 duration exceeds 5 seconds.
Airflow maintenance DAGs2:59
Explore maintenance DAGs to keep Airflow running smoothly, including log-cleanup, db-cleanup, and kill-halted-tasks. Learn how to configure them and prevent metastore clutter.

Introduction0:54
Secure your Airflow deployment by encrypting passwords and managing the Fernet key. Hide sensitive variables in the UI, filter DAGs by owner, enable password authentication, and activate role-based access control.
[Practice] Encrypting sensitive data with Fernet16:54
Discover how Airflow encryption secures sensitive data by enabling secure_mode and encrypting credentials with Fernet keys, installing crypto, and updating connections through the UI and metastore.
[Practice] Rotating the Fernet Key7:19
Learn to rotate the fernet key in Airflow without breaking credentials, verify rotation, remove old keys, and configure securely with environment variables instead of airflow.cfg.
[Practice] Hiding variables3:24
Learn how Airflow hides variable values using keywords and the UI, with values encrypted in the database, and that this security is cosmetic unless decrypted at the final destination.
[Practice] Password authentication and filter by owner9:38
Enable password authentication for the Airflow UI by setting authenticate to true and auth_backend to airflow.contrib.auth.backends.password_auth. Enable owner-based filtering so only the logged-in user's DAGs are shown.
[Practice] RBAC UI14:15
Learn to implement RBAC in Apache Airflow, create admin and viewer users, and tailor permissions and roles to control access to DAGs and UI features.
What to expect from Airflow 2.0?10:41
Airflow 2.0 delivers active‑active schedulers, dag serialization for a stateless web ui, dag versioning, a stable rest api with open api 3.0, functional dags, and a pluggable storage engine.

Backfill your DAGs in Airflow like a PRO23:36
Learn how to backfill Airflow DAGs and tasks using the CLI, including when to backfill, how to rerun failures, and cloning DAGs to run in parallel.
How to define variables through environment variables0:10
[BLOG POST] Running Apache Airflow on a multi-nodes Kubernetes cluster locally0:20
[BLOG POST] Best Practices in Apache Airflow (Part 1)0:23
[BLOG POST] The PostgresOperator: All you need to know0:20
[VIDEO] Running Airflow with the Official Helm Chart0:24
[VIDEO] The DockerOperator: The basics and more19:35
Learn to use the Docker operator to run tasks in a Docker image, manage dependencies, test scripts, and work with mounts, XComs, and resources.
[VIDEO] Airflow with DBT: The best way!0:07
Surprise ?0:13

Requirements

Notions of Docker and Python
Docker Desktop installed and running
The course "The Complete Hands-On Introduction to Apache Airflow" can be a nice plus.

Description

Apache Airflow is a platform created by the community to programmatically author, schedule and monitor workflows.

It is scalable, dynamic, extensible, and modulable.

Without any doubt, mastering Airflow is becoming a must-have and an attractive skill for anyone working with data.

What you will learn in the course:

Fundamentals of Airflow are explained such as what Airflow is, how the scheduler and the web server work
The Forex Data Pipeline project is an incredible way to discover many operators in Airflow and deal with Slack, Spark, Hadoop, and more
Mastering your DAGs is a top priority, and you can play with timezones, unit test your DAGs, structure your DAG folder, and much more.
Scaling Airflow through different executors such as the Local Executor, the Celery Executor, and the Kubernetes Executor will be explained in detail. You will discover how to specialize your workers, add new workers, and what happens when a node crashes.
A Kubernetes cluster of 3 nodes will be set up with Rancher, Airflow, and the Kubernetes Executor local to run your data pipelines.
Advanced concepts will be shown through practical examples such as templating your DAGs, how to make your DAG dependent on another, what are Subdags and deadlocks, and more.
You will set up a Kubernetes cluster in the cloud with AWS EKS and Rancher to use Airflow and the Kubernetes Executor.
Monitoring Airflow is extremely important! That's why you will know how to do it with Elasticsearch and Grafana.
Security will also be addressed to make your Airflow instance compliant with your company. Specifying roles and permissions for your users with RBAC, preventing them from accessing the Airflow UI with authentication and password, data encryption, and more.

In addition:

Many practical exercises are given along the course so that you will have occasions to apply what you learn.
Best practices are stated when needed to give you the best ways of using Airflow.
Quiz are available to assess your comprehension at the end of each section.
Answering your questions fast is my top priority, and I will do my best for you.

I put a lot of effort into giving you the best content, and I hope you will enjoy it as much as I wanted to do it.

At the end of the course, you will be more confident than ever in using Airflow.

I wish you a great success!

Marc Lamberti

Who this course is for:

Data Engineers
Inspiring Data Engineers
DevOps
Software Engineers
Data Scientists

Apache Airflow: The Hands-On Guide

What you'll learn

Explore related topics

Course content

Introduction6 lectures • 7min

The basics of Apache Airflow11 lectures • 46min

The Stock Market Pipeline17 lectures • 1hr 30min

Mastering your DAGs15 lectures • 1hr 35min

Improving your DAGs with advanced concepts6 lectures • 1hr 5min

Scaling Airflow11 lectures • 1hr 35min

Deploying Airflow on AWS EKS with Kubernetes Executors and Rancher10 lectures • 47min

Monitoring Apache Airflow11 lectures • 1hr 41min

Security in Apache Airflow7 lectures • 1hr 3min

APPENDIX9 lectures • 45min

Requirements

Description

Who this course is for: