Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Complete Data Engineering Bootcamp: SQL, ETL & Data Pipeline

Name: Complete Data Engineering Bootcamp: SQL, ETL & Data Pipeline
Rating: 4.6 (7 reviews)

Build Real-World Data Engineering Skills Using SQL, ETL, Data Pipelines, and Modern Data Architecture, From Beginner to

Created byGaniyu Shakirudeen Kola

Last updated 3/2026

English

What you'll learn

Build strong data engineering foundations, including data pipelines, ETL/ELT concepts, and real-world data workflows
Use SQL and Python to extract, transform, and load data efficiently for analytics and reporting
Work with big data tools such as Apache Spark and understand batch and streaming data processing
Design and implement an end-to-end data engineering project using cloud storage and data warehouses

Course content

10 sections • 59 lectures • 42h 48m total length

Introduction to Data Engineering6:57
In this foundational module, you will gain a clear understanding of what Data Engineering is and why it plays a critical role in modern data-driven organizations.
We begin by defining Data Engineering and exploring how it differs from other data roles. You will understand how data engineers design, build, and maintain the systems that move and transform raw data into reliable, usable formats for analysis and decision-making.
Next, we examine the role and responsibilities of a Data Engineer, including:
Building scalable data pipelines
Designing data architectures
Ensuring data quality and reliability
Working with databases and cloud platforms
You will also get an overview of the core tools and technologies used in Data Engineering, including:
SQL for querying and managing data
Python for data processing and automation
ETL tools and workflow orchestration systems
Data warehouses and cloud storage platforms
Finally, we clarify the differences between:
Data Engineering
Data Analytics
Data Science
So you can clearly understand how these roles collaborate within a data team and where a Data Engineer fits in the ecosystem.
By the end of this module, you will have a solid conceptual foundation that prepares you for the hands-on technical sections that follow in the course

Understanding Data and Databases6:18
In this module, we build the foundational knowledge every Data Engineer must have about data and database systems.
We begin by exploring the different types of data you will encounter in real-world systems:
Structured Data – organized data stored in relational databases
Semi-Structured Data – flexible formats such as JSON and XML
Unstructured Data – text, images, videos, and logs
Understanding these categories helps you choose the right storage and processing approach for different business problems.
Next, we examine the critical difference between OLTP and OLAP systems:
OLTP (Online Transaction Processing) systems optimized for fast, real-time transactions
OLAP (Online Analytical Processing) systems optimized for large-scale analysis and reporting
You will learn when each system is used and how they support modern data architectures.
We then introduce Relational Databases, covering:
Tables, rows, and columns
Primary and foreign keys
Relationships and normalization
Common systems such as MySQL and PostgreSQL
Finally, we explore NoSQL Databases and why they are important in big data and scalable systems. You will understand:
Key-value databases
Document databases
Column-family databases
When to choose NoSQL over relational databases
By the end of this module, you will clearly understand how data is structured, stored, and managed across different systems a critical step before building real ETL pipelines and data warehouses in later sections of this course.

SQL for Data Engineering34:14
In this module, we dive deep into SQL, the most essential skill for every Data Engineer.
Since data engineers work extensively with databases, mastering SQL is critical for extracting, transforming, and preparing data efficiently.
We begin with core SQL fundamentals, including:
SELECT statements
Filtering with WHERE
Sorting and grouping data
By the end of this module, you will be able to:
Select statements,
Filtering with where,
Sorting and grouping data
Join15:16
This lecture provides a comprehensive introduction to SQL joins and their critical role in data engineering. Learners will understand how to combine data from multiple tables using INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN. The session emphasizes real-world data integration scenarios, relationship modeling, and efficient query design. By the end of this lecture, students will confidently retrieve and merge relational data to build accurate, analysis-ready datasets for data pipelines and reporting systems.
Aggregation27:35
This lecture introduces aggregation techniques used to summarize and analyze data efficiently within relational databases. Learners will explore essential aggregate functions such as COUNT, SUM, AVG, MIN, and MAX, along with the powerful GROUP BY and HAVING clauses. The session focuses on transforming raw data into meaningful insights, enabling engineers to compute metrics, generate reports, and prepare datasets for analytics workflows. By the end, students will confidently perform data summarization operations essential for scalable data engineering solutions.
Subquery46:13
This lecture introduces subqueries and their practical application in building dynamic, layered SQL queries. Learners will explore how to use subqueries within SELECT, WHERE, and FROM clauses to perform advanced filtering, conditional logic, and data transformation. The session emphasizes real-world data engineering scenarios where nested queries enhance flexibility and analytical precision. By the end, students will confidently construct efficient subqueries to solve complex data retrieval and processing challenges.
View18:56
In this lecture, we explore SQL Views as a strategic tool for abstraction, security, and reusable data modeling. You will learn how to create and manage views to simplify complex queries, standardize business logic, and provide controlled access to datasets. From a data engineering perspective, views play a critical role in building clean, maintainable, and production-ready data environments.
Indexes16:31
This lecture introduces SQL indexing as a core performance optimization technique in relational databases. You will learn how indexes improve query execution speed, how to create and manage them effectively, and when to use them in production environments. From a data engineering standpoint, understanding indexing is essential for building scalable, high-performance data systems that handle large volumes of data efficiently.
Store Procedure31:49
This lecture introduces stored procedures as a powerful way to encapsulate business logic within the database. You will learn how to create, execute, and manage stored procedures to automate repetitive tasks, enforce consistency, and improve performance. From a data engineering perspective, stored procedures are essential for building structured, reusable, and production-grade database operations.
Common Table Expression(CTE)55:44
This session explores Common Table Expressions (CTEs) as a modern approach to writing clean, readable, and modular SQL queries. You will learn how CTEs simplify complex transformations, improve query maintainability, and enhance logical flow in ETL processes. CTEs are a key tool for professional data engineers working with layered data transformations.
Window Function45:33
In this lecture, you will master SQL window functions for advanced analytical computations. Topics include ranking, running totals, partitioning, and lead/lag analysis. Window functions allow data engineers to perform complex calculations without collapsing datasets, making them indispensable for building analytical data pipelines.
Writing Optimized SQL for ETL33:09
This lecture focuses on writing efficient, scalable SQL queries tailored for ETL processes. You will learn performance optimization strategies, indexing considerations, query tuning techniques, and best practices for handling large datasets. The goal is to equip you with the mindset and technical skills required to design production-ready data pipelines.

Python Basics for Data Task34:39
This session introduces Python fundamentals tailored specifically for data engineering. You will cover variables, data types, control structures, functions, and basic scripting techniques. The focus is on building a strong programming foundation required for automation and data processing tasks.
Python for Data Task 230:46
Building on foundational concepts, this lecture dives deeper into practical Python applications for data workflows. You will explore file handling, error handling, modular scripting, and working with structured data formats. This module strengthens your ability to write efficient, reusable data processing scripts.
Pandas for Data Manipulation55:46
This lecture introduces Pandas as a core library for data manipulation and transformation. You will learn how to clean, filter, aggregate, and reshape datasets efficiently. From a data engineering perspective, Pandas is a powerful tool for preprocessing, exploratory analysis, and preparing structured datasets for downstream systems.
Working with APIs and JSON Data57:57
This session teaches you how to extract and process data from APIs using Python. You will learn how to handle JSON responses, authenticate requests, and transform semi-structured data into structured formats. This is a critical skill for modern data engineers who integrate external data sources into data pipelines.
Working with APIs and JSON Data(Using API Key)1:04:18
This lecture focuses on securely accessing APIs using API keys and processing JSON responses in Python. You will learn authentication methods, request handling, and transforming semi-structured data into structured datasets. This skill is essential for integrating external data sources into modern data pipelines.
File Handling(Using Excel)15:23
In this session, you will learn how to read, process, and export Excel files programmatically. The lecture covers structured data extraction, sheet handling, and automation of reporting workflows. Excel integration remains a practical requirement in many enterprise data environments.
File Handling: (Using CSV)12:34
This lecture introduces efficient techniques for reading and writing CSV files in data workflows. You will explore parsing, cleaning, and transforming flat-file data. CSV handling is fundamental for batch data ingestion and ETL processes.
File Handling(Using Parquet)1:17:26
This session covers working with Parquet files, a columnar storage format optimized for performance and scalability. You will understand why Parquet is widely used in big data ecosystems and how it improves storage efficiency and query speed in analytical systems.
Data Validation & Error Handling50:46
This lecture emphasizes building reliable data pipelines through validation checks and structured error handling. You will learn strategies to detect inconsistencies, manage exceptions, and ensure data quality. Robust validation is critical for production-grade data engineering systems.

Introduction to ETL and ELT1:47:16
This session introduces the core concepts of ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform). You will understand architectural differences, use cases, and how modern data platforms leverage these approaches to manage scalable data workflows.
Designing Scalable ETL Pipelines1:18:27
In this lecture, you will explore principles for building scalable, maintainable, and fault-tolerant ETL pipelines. Topics include modular design, orchestration, performance optimization, and monitoring. The focus is on engineering pipelines that can handle growing data volumes efficiently.
Handling Bad Data and Logging1:34:56
This lecture focuses on identifying, managing, and documenting bad data within pipelines. You will learn structured logging techniques, data quality checks, and monitoring strategies to ensure reliability and traceability in production grade systems.
Batch Vs Streaming (Batch Processing)27:52
This session explains batch processing as a traditional data processing approach. You will learn its architecture, scheduling strategies, and common use cases in analytics and reporting systems. Batch remains a foundational concept in data engineering.
Batch Vs Streaming (Streaming Processing)1:27:30
This lecture introduces real-time streaming processing and event-driven architectures. You will explore how streaming systems handle continuous data flows and support near real-time analytics.
Batch Vs Streaming Processing(Streaming with Kafka)25:45
In this session, you will examine streaming architectures using Apache Kafka. The lecture covers producers, consumers, topics, and real-time data pipelines, highlighting Kafka’s role in scalable event-driven systems.

Introduction to Apache Airflow11:41
This module introduces learners to Apache Airflow, a powerful workflow automation and orchestration tool widely used in data engineering. Students will explore the fundamentals of DAGs (Directed Acyclic Graphs), task scheduling, and workflow automation. The lecture provides a hands-on understanding of how Airflow helps automate data pipelines, schedule tasks efficiently, and manage complex workflows. By the end, learners will grasp core concepts that power modern data engineering workflows and scalable automation solutions.
Installation of Apache Airflow20:12
This lecture provides a step-by-step guide to installing and configuring Apache Airflow. You will understand environment setup, dependencies, and initial configuration to prepare for workflow orchestration.
Dag and Tasks29:51
This session introduces Directed Acyclic Graphs (DAGs) and task structures in Airflow. You will learn how to define workflows, set dependencies, and design modular task pipelines for automation.
Scheduling and Monitoring(Scheduling)23:09
Scheduling and Monitoring(Monitoring)1:03:26
In this session, you will explore Airflow’s monitoring capabilities, including logs, task tracking, and failure handling. Monitoring ensures transparency and operational reliability in data pipelines.
Integrating Python, SQL, and Bash in Workflows1:11:41
This lecture demonstrates how to orchestrate Python scripts, SQL queries, and Bash commands within automated workflows. You will learn cross-technology integration techniques essential for real-world data engineering environments.

What is a Data Warehouse?36:50
This session introduces the concept of a data warehouse as a centralized repository for analytical data. You will understand its architecture, purpose, and role in business intelligence and large-scale analytics.
Dimensional Modeling (Star Schema)1:11:47
This lecture explains the Star Schema design pattern for organizing data warehouses. You will learn how fact and dimension tables interact to optimize query performance and simplify reporting.
Dimensional Modeling (Snowflake Schema)45:02
In this session, you will explore the Snowflake Schema as an extension of the Star Schema. The lecture covers normalization of dimension tables and trade-offs between performance and storage efficiency in warehouse design.
Introduction to Redshift44:05
This session introduces Amazon Redshift as a scalable cloud data warehouse solution. You will explore its architecture, columnar storage model, and role in high-performance analytics at scale.
Introduction to BigQuery35:16
In this lecture, you will explore Google Big Query as a serverless, fully managed data warehouse. The focus will be on its architecture, SQL capabilities, and advantages for large-scale analytical workloads.
Introduction to Snowflake30:05
This session provides a foundational overview of Snowflake’s cloud-native data platform. You will understand its unique architecture, separation of storage and compute, and scalability benefits for modern data engineering.
Loading data into Redshift35:38
This lecture covers best practices for ingesting structured data into Redshift. You will learn bulk loading techniques, COPY commands, and performance optimization strategies for efficient data warehousing.
Loading data into Warehouse(PostgreSQL)1:14:47
This session focuses on loading data into PostgreSQL as a traditional relational warehouse. You will explore bulk insert strategies and schema design considerations for analytics use cases.
Loading data into BigQuery28:14
In this session, you will explore methods for importing data into Big Query, including batch uploads and cloud storage integration. Emphasis is placed on performance and cost efficiency.
Loading data into Snowflake20:34
This lecture demonstrates structured approaches to loading data into Snowflake using staging areas and bulk ingestion techniques. You will learn how to manage scalable and reliable data loads.
Using Snowflake Connector with Python44:18
This lecture introduces the Snowflake Python Connector for programmatic database interaction. You will learn how to establish secure connections, execute queries, and automate data operations.

Cloud Storage Concepts (S3, GCS)49:46
This session explains cloud object storage systems such as Amazon S3 and Google Cloud Storage. You will understand storage architecture, scalability, durability, and their role in modern data pipelines.
Cloud Storage(GCS)35:39
This lecture provides a deeper dive into Google Cloud Storage, covering buckets, access control, lifecycle management, and integration with analytics services.
Serverless ETL8:52
This session introduces the concept of serverless ETL architectures. You will learn how serverless computing eliminates infrastructure management while enabling scalable data processing.
Serverless ETL with AWS Glue54:27
In this lecture, you will explore AWS Glue as a managed ETL service. The focus includes job orchestration, data catalog integration, and scalable transformation workflows.
Serverless ETL with Google Dataflow50:17
This session covers Google Dataflow for stream and batch data processing. You will understand pipeline design and how managed services simplify distributed data processing.
Cloud Databases8:54
This lecture introduces cloud-managed databases and their advantages over traditional on-premise systems, including scalability, high availability, and reduced operational overhead.
Cloud Databases using (RDS)1:13:11
In this session, you will explore Amazon RDS as a managed relational database service. The lecture covers deployment, scaling, and operational best practices.
Cloud Databases using (BigQuery)16:52
This lecture explains Big Query as a cloud-native analytical database. You will understand its serverless architecture and integration within modern data ecosystems.
Cloud Databases using (Snowflake)14:28
This session focuses on Snowflake as a multi-cloud data platform, emphasizing scalability, performance optimization, and secure data sharing capabilities.
Deploying Pipelines to the Cloud1:02:55
This lecture teaches best practices for deploying data pipelines to cloud environments. You will learn automation, configuration management, and production deployment strategies.

Introduction to Data Lakes55:19
This session introduces data lakes as centralized repositories for structured and unstructured data. You will understand architecture, storage layers, and use cases in big data environments.
HDFS and Distributed Systems52:22
This lecture explains the Hadoop Distributed File System (HDFS) and core distributed computing principles. You will explore fault tolerance, replication, and scalability concepts.
Intro to Spark and PySpark1:06:38
This session introduces Apache Spark and PySpark for distributed data processing. You will understand parallel computation, resilient distributed datasets (RDDs), and large-scale transformations.
Use Cases for Big Data Tools50:03
This lecture explores real-world applications of big data technologies across industries. You will learn when and why to use tools like Spark, Kafka, and distributed storage systems.

Design and Build a Complete ETL Pipeline,Business Insight from transformed data2:02:44
This comprehensive project guides you through designing and building an end-to-end ETL pipeline using SQL, Python, and Airflow. You will optionally deploy the pipeline to the cloud and present business insights derived from transformed data. This capstone reinforces practical skills and demonstrates your readiness for real-world data engineering roles.

Requirements

Basic computer skills are required; no prior data engineering experience is needed as everything is explained from scratch

Description

Data Engineering is one of the most in-demand skills in today’s data-driven world. Organizations rely on data engineers to collect, transform, organize, and prepare data so analysts, data scientists, and decision-makers can generate valuable insights.

This course is designed to take you from complete beginner to advanced level in Data Engineering through a structured and practical learning path.

Instead of focusing only on theory, this course emphasizes hands-on learning, real datasets, and practical business scenarios so you can build the skills that companies actually need.

Throughout this course, you will learn how data engineers design data systems, build pipelines, and transform raw data into clean and reliable datasets that support business decisions.

You will start by learning the fundamentals of data engineering, then gradually move into more advanced topics including SQL for data engineering, ETL processes, data modeling, and data pipeline concepts.

Every concept is explained clearly and supported with practical examples and step-by-step demonstrations, helping you develop real-world skills.

By the end of this course, you will understand how modern data systems work and how data engineers manage data workflows in real organizations.

What You'll Learn

• Understand the role of a Data Engineer in modern data-driven organizations
• Learn SQL from beginner to advanced level for real-world data analysis and transformation
• Use advanced SQL techniques such as joins, subqueries, aggregations, and window functions
• Understand and implement ETL (Extract, Transform, Load) processes
• Learn how to design and understand data pipelines used in real-world systems
• Apply data modeling and relational database design principles
• Create and use views, temporary tables, and stored procedures
• Write optimized SQL queries for better performance on large datasets
• Understand data warehousing concepts and modern data architecture
• Work with real business scenarios and practical datasets

Requirements

• Basic computer knowledge
• A laptop or computer to practice the examples
• Interest in learning how data systems and data pipelines work

No prior data engineering experience is required. Everything in this course is explained step-by-step from beginner level.

Who This Course Is For

• Beginners who want to start a career in Data Engineering
• Data Analysts who want to transition into data engineering
• Aspiring data professionals interested in ETL and data pipelines
• Software developers who want to understand data systems and database workflows
• Anyone interested in learning modern data engineering concepts

Instructor: Ganiyu Shakirudeen Kola

I'm a Data Engineer and educator passionate about teaching practical data skills. I specializes in SQL, ETL processes, data pipelines, and modern data engineering practices. Through his courses and educational content, he helps students develop the technical skills needed to work with real-world data systems.

Who this course is for:

Beginners who want to start a career in data engineering and learn from the fundamentals to advanced concepts
Data analysts, software developers, or IT professionals looking to transition into data engineering roles
Students and professionals who want hands-on experience building real-world data pipelines and projects

Complete Data Engineering Bootcamp: SQL, ETL & Data Pipeline

What you'll learn

Explore related topics

Course content

Introduction to data engineering1 lecture • 7min

Understanding Data and Databases1 lecture • 6min

SQL for Data Engineering10 lectures • 5hr 25min

Python for Data Engineers9 lectures • 6hr 40min

ETL (Extract, Transform, Load) Processes6 lectures • 7hr 2min

Building ETL Pipelines with Apache Airflow6 lectures • 3hr 40min

Data Warehousing11 lectures • 7hr 47min

Cloud Data Engineering Basics10 lectures • 6hr 15min

Data Lake and Big Data Tools4 lectures • 3hr 44min

Final Capstone Project1 lecture • 2hr 3min

Requirements

Description

Who this course is for: