
This Chapter cover
What is Apache Airflow
Scenario to utilize Apache Airflow
Benefits of Apache Airflow
This session covers
What is virtualization?
How to run two applications with different dependencies on the same computer.
What is docker?
How to install docker.
What are docker commands
This chapter covers
How to install airflow on docker
Various dependencies required to install airflow
How to test airflow installation
What is Airflow Web UI
1. How to create DockerFile
2. Airflow Installation with DockerFile
This chapter covers various components of Airflow Web UI
This chapter covers following
What is Dag
What are Tasks in Airflow
How to create your first Dag
How to schedule a dag
How to create Operators in airflow
How to visualize the task in airflow
This Chapter covers following
How to fetch data from web URLs using HTTPSensor
What is HttpOperator
How to store the fetched data in databases.
This chapter covers
How to sense a file in a folder
How to trigger action on availability of file in a folder
How to store and retrieve data from AWS S3 bucket
This chapter covers
How to share data between to two tasks using XCom
How to transfer or retrieve data from Microsoft Azure Blob Storage
This chapter covers
How to branch tasks on the basis of condition
How to run past dag executions
How to send an Email notification on certain task
This chapter covers
How to use postgres or MYSQL as airflow metadata
Why to use database server as meta data
How to create custom operators
Apache Airflow is an open-source platform used for workflow automation, scheduling, and orchestration of complex data pipelines. As data volumes and complexity continue to grow, the need for efficient and scalable data processing and management is critical. In this comprehensive course, you will learn how to master Apache Airflow, starting from the basics and progressing to advanced concepts.
The course is designed for data engineers, data scientists, python developers, software engineers, and anyone interested in learning how to automate and manage data workflows.
You will learn how to use Apache Airflow to build and manage data pipelines, schedule, and trigger tasks, monitor and troubleshoot workflows, and integrate with various data sources and services.
The course will cover the following topics:
Introduction to Apache Airflow and workflow management
Introduction to Docker and Docker Commands
Installation and configuration of Apache Airflow
Building and managing workflows with Apache Airflow
Scheduling and triggering tasks in Apache Airflow
Operators in Apache Airflow
Fetching data from Web APIs or HTTP
File Sensors in Apache Airflow
Connecting with Azure or AWS
Using AWS S3 Bucket and Azure Blob Storage to store and retrieve data
Creating custom operators and sensors
Handling dependencies and task retries
Monitoring and troubleshooting workflows
Integrating with data sources and services
Scaling and optimizing Apache Airflow for large-scale data processing using Celery Executors
Securing Dags Connections using Fernet Keys
Throughout the course, you will work on practical exercises and projects to apply the concepts you learn. By the end of the course, you will have a strong understanding of Apache Airflow and the skills to build and manage complex data workflows.