Data Engineering Project SQL, Python, Airflow, Docker, CI/CD

Name: Data Engineering Project SQL, Python, Airflow, Docker, CI/CD
Rating: 4.5 (603 reviews)

Become a Data Engineer by Learning APIs, SQL, Python, Docker, Airflow, CI/CD, Functional/ Data Quality Tests and more!

Created byMatthew Schembri

Last updated 2/2026

English

What you'll learn

Build Python scripts for data extraction by interacting with APIs using Postman, loading into the data warehouse and transforming (ELT)
Use PostgreSQL as a data warehouse. Interact with the data warehouse using both psql & DBeaver
Discover how to containerize data applications using Docker, making your data pipelines portable and easy to scale.
Master the basics of orchestrating and automating your data workflows with Apache Airflow, a must-have tool in data engineering.
Understand how to perform unit, integration & end-to-end (E2E) tests using a combination of pytest and Airflow's DAG tests to validate your data pipelines.
Implement data quality tests using SODA to ensure your data meets business and technical requirements.
Learn to automate deployment pipelines using GitHub Actions to ensure smooth, continuous integration and delivery.

Course content

7 sections • 66 lectures • 5h 12m total length

Welcome!0:50
Prerequisties0:39
Tools Installation for Course - [IMPORTANT]2:29
Project Overview4:24
Extract YouTube data via YouTube API and load it into a Postgres data warehouse via ELT with Python, then perform data quality checks with soda and enable CI/CD with Docker.
Building the Code0:40
Build the code from the ground up with the class, using GitHub to store and version the project, and reference the final code as you move into data extraction.
APPENDIX0:02

Data Extraction Introduction0:28
What is an API1:03
Getting the Youtube API Key3:08
Learn to create and secure a YouTube Data API v3 key via the Google Developer Console, set up a project, restrict the key to public data, and manage credentials.
Google Cloud Shell1:17
Youtube API Explorer and Postman6:14
Setting Up Git Remote2:54
Create Virtual Environment5:38
Set up a virtual environment to isolate Python projects and avoid conflicts between versions like 3.4 and 3.10. Activate, install with pip, and use gitignore to exclude venv and pycache.
Analysis of Data Extraction Variables2:30
Building the Videos Statistics script - Part 1 Playlist ID17:47
Develop a Python script to fetch YouTube channel playlist ID via the YouTube API using requests, handle errors with try-except, and build modular code ready for ELT, docker, and airflow.
Introducing the .env3:56
Building the Videos Statistics script - Part 2 Unique Video IDs15:03
Develop a Python function to fetch unique video IDs from a playlist using the playlist item resource, handling pagination with next page tokens and robustly parsing contentDetails.videoId.
Building the Videos Statistics script - Part 3 Video Data11:55
Implement a batch-based function mapping video IDs to seven variables via snippet, content details, and statistics; batch IDs, build the API URL, fetch data, and accumulate results.
Building the Videos Statistics script - Part 4 Save to JSON5:24
Put logs/ folder in .gitignore0:20
APPENDIX0:17

Why Docker0:56
Dockerfile4:20
dockerfile versions - [IMPORTANT]0:12
Build the Docker Image4:50
Airflow Architecture3:45
Airflow Directories2:30
.env file0:13
Amending the .env6:33
docker-compose.yaml file to use - [VERY IMPORTANT]5:12
init-multiple-databases.sh script - [VERY IMPORTANT]0:51
Docker Compose16:22
docker commands5:48
Stopping Docker containers before shutting down laptop - [IMPORTANT]0:21
APPENDIX0:06

Postgres Data Warehouse Introduction0:31
Loading to Data Warehouse & Transformations1:55
Setting up Connection to Data Warehouse using Airflow6:30
Creating the Schemas and Tables7:31
Loading the JSON data5:06
Build a Python data loading script to read json api data from the data directory with a load_path function, using json parsing and logging for robust error handling.
Inserts, Updates & Deletes10:14
Transformations8:53
Populating Staging and Core Tables10:22
Defining the Data Warehouse DAG & Debugging6:32
Interacting with the Data Warehouse using Dbeaver8:39
APPENDIX0:04

CI/CD Introduction0:35
Commit and Push1:46
Commit and push all changes before the ci/cd section to version your work, add any untracked changes with a meaningful commit message, and push to update the GitHub workflow.
CI-CD Part 1 - Docker Image Builds13:50
CI-CD Part 2 - Testing13:15
Github Actions Workflow Dispatch4:25
APPENDIX0:05
The End0:33
You have successfully finished this data engineering course; celebrate your achievement and apply what you learned in the workplace, then consider leaving a rating about your experience.

Requirements

At least 8 GB of RAM, though 16 GB is better for smoother performance
Python, Docker & Git installation to run/access the code course
Beginner-level SQL knowledge is required
Intermediate-level Python knowledge is required
Basic understanding of Docker is needed
Knowledge of CI/CD is a plus but not necessary

Description

Data Engineering is the backbone of modern data-driven companies. To excel, you need experience with the tools and processes that power data pipelines in real-world environments. This course gives you practical, project-based learning with the following tools PostgreSQL, Python, Docker, Airflow, Postman, SODA and Github Actions. I will guide you as to how you can use these tools.

What you will learn in the course:

Python for Data Engineering: Build Python scripts for data extraction by interacting with APIs using Postman, loading into the data warehouse and transforming (ELT). In this course we use Python version 3.10.
SQL for Data Pipelines: Use PostgreSQL as a data warehouse. Interact with the data warehouse using both psql & DBeaver
Docker for Containerized Deployments: Discover how to containerize data applications using Docker, making your data pipelines portable and easy to scale.
Airflow for Workflow Automation: Master the basics of orchestrating and automating your data workflows with Apache Airflow, a must-have tool in data engineering. In this course we use Airflow version 2.9.2.
Testing and Data Quality Assurance: Understand how to perform unit, integration & end-to-end (E2E) tests using a combination of pytest and Airflow's DAG tests to validate your data pipelines. Implement data quality tests using SODA to ensure your data meets business and technical requirements.
CI/CD for Automated Testing & Deployment: Learn to automate deployment pipelines using GitHub Actions to ensure smooth, continuous integration and delivery.

Who this course is for:

Aspiring Data Engineers: If you already have basic SQL & intermediate-level Python and want to learn Data Engineering by working with real tools and projects, this course will help you build strong foundational skills and practical experience to start your career.
Early-Career Data Professionals: If you have some experience in data-related roles (Data Analyst, Junior Data Engineer, Data Scientist) and want to deepen your understanding of essential tools like Docker, CI/CD, and automated testing, this course will help you level up your engineering skills.

Data Engineering Project SQL, Python, Airflow, Docker, CI/CD

What you'll learn

Explore related topics

Course content

Introduction6 lectures • 9min

Data Extraction using API15 lectures • 1hr 18min

Docker14 lectures • 52min

Airflow3 lectures • 17min

Postgres Data Warehouse11 lectures • 1hr 6min

Testing10 lectures • 55min

CI/CD7 lectures • 34min

Requirements

Description

Who this course is for: