
Explore a complete hands-on introduction to Apache Airflow, designed for beginners. Learn the prerequisites—Docker installed and basic Python knowledge—and how to install and run Airflow on your local machine.
The complete hands-on introduction to Apache Airflow for beginners, focusing on building simple data pipelines with basic features. Practice with notes, bookmarks, and seek solutions via StackOverflow, Google, or AI.
Meet Martin Marty, a French data engineer and best selling Udemy instructor who uses Airflow in production and leads customer education at Astronomer, the cloud platform for Airflow at scale.
Install docker desktop and allocate at least 8 GB memory, 4 CPUs, and free disk space to run airflow locally, then install uv and verify it in the terminal.
Explore how Docker packages an application with its dependencies into portable containers and learn to build a Dockerfile, create images, and run containers for consistent, isolated environments.
Explore how Apache Airflow orchestrates data tasks through defined order and timing, provides a clear visibility dashboard, and scales by connecting to diverse data sources and tools.
Explore how Airflow, an open source platform, programmatically author, schedule, and monitor workflows with Python-based dynamic tasks. Benefit from a fully functional user interface and extensibility through providers.
Explore the core components of Apache Airflow 3—metadata database, scheduler, Dag file processor, executor, API server, walkers, queue, and trigger—and how they orchestrate task execution.
Explore the core concepts of Airflow, including dags, operators, tasks, and dependencies, and see how a dag structures a data workflow like a recipe, with task instances.
Airflow is an orchestrator, not a data processing framework or real-time streaming system, uses a metadata database, and is not suitable for high-frequency scheduling, large-scale processing, or real-time data.
Explore the single node architecture where Airflow components run on one machine and the API server, scheduler, executor, and workers interact with the metadata database, then contrast with multi-node setups.
Discover how airflow runs tasks in a single-node setup, from dag.py in the dags folder to serialization in the metadata database, with API server and scheduler triggering a dag run.
download docker-compose.yml, create an airflow intro folder, set up a python virtual environment with uv, install airflow, and run docker compose to launch airflow components and access airflow home page.
Explore the new home view of Airflow 3.x, monitor health statuses of core components, and quickly navigate failed, running, and active dags with history filters.
Explore the DAGs view in Airflow 3 to search, filter by latest run state, toggle scheduling, and manage dag runs and task instances with triggers, restarts, and status filters.
Explore the assets view, where assets are logical groupings of data (formerly datasets), showing their relationships to DAGs and how materializing assets and asset events trigger downstream DAGs.
Explore essential airflow features, including xcoms for sharing task values, variables for cross-dag data, and providers and connections for AWS interactions, plus UI preferences.
Create a Postgres-backed data pipeline in Airflow by verifying an API, extracting and processing a user, and storing results in Postgres, following best practices.
Take your time and watch the entire video first, then compare your code with the provided examples. Beginners may find Airflow challenging; ask questions in the Udemy Q&A.
Create a dag by adding a python file to the DAGs folder and applying the dag decorator to a unique function.
Create a table in Postgres with sql execute query operator; import the operator and define a task with users query for id, first name, last name, email, createdat and con_id.
Create a Postgres connection in Airflow UI, set the ID to Postgres, and fill host, login, password, and port 5432, then save. Use this connection with SQL execute query operator.
Install the appropriate Airflow provider to unlock connection types and operators for tools like Flink or Airbyte. Astronomer IO's registry helps you find and install providers.
Learn to quickly validate a new data pipeline task using the Airflow tasks test command in a Docker-backed Airflow environment, and confirm it completes successfully.
Learn how to verify API availability in Airflow using a sensor that polls every 30 seconds with a 5-minute timeout, using a fake API to pass data downstream.
Explore useful sensors in Apache Airflow, compare the sensor decorator's flexibility with pre-built sensors like HTTP, file, and S3 key sensors, and decide when to use each.
Discover how to extract a user's id, first name, last name, and email from a fake API in Airflow using the Python operator and XComs.
Switch to the task decorator from the Python operator to implement extract user, achieving the same result with less code, clearer data flow, and visible tasks in the Airflow UI.
Implement a process_user task with the task decorator to store extracted user info into tmp/user_info.csv using a csv dict writer with id, first name, last name, and email.
Implement the store user task using a Postgres hook and copy expert to load data from userinfo csv into the users table.
Learn how to define task dependencies in Apache Airflow 3 by using bitshift operators to chain create table, API available, extract user, process user, and store user in a dag.
Trigger the dag manually in airflow 3 to validate the data pipeline and learn dag decorator, sql execute query operator, hooks, and sensors.
Shift your thinking from tasks to assets, viewing workflows as linked assets that transform into outputs. Recognize dependencies, such as an API feeding A.txt, to guide Airflow design.
Explore what an asset is in airflow 3.0, how assets replace datasets and trigger downstream workflows, and how the asset decorator improves data lineage and code.
Create your first asset named user with the airflow sdk decorator to fetch a fake user from the random user api, scheduled daily at midnight.
Discover how to materialize an asset in Airflow 3 using the CLI or UI, turning assets into DAGs with one task, and managing upstream and downstream DAG dependencies.
Define dependencies between assets in Apache Airflow 3 by creating a second asset that materializes when the user asset does, and retrieve its XCom to determine the user location.
Explore how to view and manage asset dependencies in Airflow 3 by materializing users, location, and login assets, triggering DAGs, and inspecting XComs in the airflow UI.
Refactor by using asset.multi in Airflow 3 to materialize two assets, user location and user login, via the outlets parameter, while avoiding code duplication.
Airflow executors define how and on which system to run your tasks. Choose between local, Celery, or Kubernetes executors to scale task execution across machines and pods.
Access and override airflow settings in docker using environment variables, learn to change the executor from the default local executor without editing the config file.
Explore the sequential executor, an Airflow executor that runs one task at a time on a single machine, default until 3.0 and now deprecated in favor of local executor.
Adopt the local executor as the default in Airflow 3.0 to run tasks in parallel on a single machine, and start with it before moving to advanced executors.
Explore the celery executor for Apache Airflow, a distributed task queue that distributes tasks across multiple workers via brokers and queues, with a result backend for task outputs.
Configure Airflow to run with the celery executor using the official docker compose setup, enable Redis as the broker, and deploy celery workers to distribute tasks.
Monitor celery tasks with flower, a real-time web app for Airflow that shows worker and task status. Enable flower with docker compose --profile flower to view queues and concurrency.
Run a celery executor pipeline with four tasks A–D that sleep five seconds each, monitor via the Airflow and Flower UIs, and trigger the DAG to observe worker execution.
Learn to add a new airflow worker to distribute tasks across multiple workers using docker compose, duplicating and renaming the worker service, and running the celery worker command.
Discover how a queue in airflow enforces first-in, first-out task execution and, with celery, allows you to route tasks to specialized workers via separate queues for CPU, GPU, and default.
Create a high CPU queue linked to worker two and assign tasks B, C, and D to it using the task queue parameter, while worker one uses the default queue.
Switch from celery to local executor by editing docker-compose, commenting Redis, airflow workers, and flower, then restart airflow to boot Postgres, the scheduler, the Dag processor, and the API server.
Group related tasks into task groups to simplify large DAGs, expand groups in the Airflow UI, apply group-level defaults, and export reusable modules for cross-DAG reuse.
learn to implement and organize tasks with task groups in Airflow DAGs, create groups, nest groups, share data between tasks, and apply default arguments while understanding task ID prefixes.
discover how to share data between tasks with xcoms by pushing and pulling values into the airflow metadata database, and note database size limits: postgres 1gb, sqlite 2gb, mysql 64mb.
Explore how to share data between Airflow tasks using XComs, including pushing values, pulling them in downstream tasks, and returning dictionaries for multiple XComs.
Learn how Airflow branching uses the branch operator to choose between the equal one and different-than-one tasks based on a value from task A.
Design the SQL decorator for Airflow to run a SQL query directly from a Python function, reducing boilerplate and enabling dynamic SQL generation from Python logic.
Explore how Airflow's core works with providers to connect to services like AWS and Databricks using operators, hooks, and decorators, and how to build a custom provider named my sdk.
Create a provider folder named my Dash SDK outside the docs folder, add a hatchling configuration, a readme, and get provider info with dependencies like Apache Airflow and typing extensions.
Implement the Get provider info function to turn your Python package into an airflow provider, returning a dict with package name, provider name, description, version, and sql decorator details.
Learn to create a SQL decorated operator in Apache Airflow by combining decorated operator with the SQL execute query operator, including templating and runtime rendering.
Create a python function that maps to the sql decorator, returning a task decorator via the task decorator factory, after importing the task decorator class and factory.
learn to install a custom provider in an Apache Airflow 3 setup by building a docker image, copying the My SDK folder, and installing with pip from a local path.
Learn to use the sql decorator in airflow to execute sql by returning statements from Python functions, wire a postgres connection, run a simple dag, and verify xcom results.
Apache Airflow is an open-source platform for programmatically authoring, scheduling, and monitoring workflows. If you have many ETLs to manage, Airflow is a must-have.
In this course, you are going to learn everything you need to start using Apache Airflow 3 through theory and practical videos.
You will start with the basics such as:
What is Apache Airflow?
The core concepts of Airflow
Different architectures to run Airflow
What happens when a workflow runs
Then you will create your first data pipeline covering many Airflow features such as:
Sensors, to wait for specific conditions
Hooks, for interacting with a database
Taskflow, for writing efficient, easy-to-read DAGs
XCOMs, for sharing data
and much more.
At the end of the project, you will be equipped for creating your own workflows!
After the project, you will also discover the new Asset syntax that completely change your way of thinking about your tasks in Airflow 3.
What is an Asset
How to create dependencies between Assets
How to materialize an Asset
and more.
You will discover the different executors for running Airflow at scale. More specifically, the CeleryExecutor which is extremely popular.
How to configure Airflow for using the CeleryExecutor
How to distribute your tasks on different Workers
How to choose your Workers with Queues
and more.
You will explore advanced features to elevate your DAGs to a new level, and conclude by creating your own Airflow provider and a new decorator for executing SQL requests.
If you're working in a company with Airflow, you will love that part.
Enjoy