Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Azure Databricks & Spark for Data Engineers:Hands-on Project

Name: Azure Databricks & Spark for Data Engineers:Hands-on Project
Rating: 4.6 (27428 reviews)

[Fully Refreshed 2026] Real World Project on Formula1 using Databricks, Spark, Delta Lake, Unity Catalog, Lakeflow Jobs

Bestseller

Created byRamesh Retnasamy . 200,000+ Learners

Last updated 5/2026

English

What you'll learn

You will learn how to build a real world data project using Azure Databricks and Spark Core. This course has been taught using real world data.
You will acquire professional level data engineering skills in Azure Databricks, Delta Lake, Spark Core, Azure Data Lake Gen2 and Azure Data Factory (ADF)
You will learn how to create notebooks, dashboards, clusters, cluster pools and jobs in Azure Databricks
You will learn how to ingest and transform data using PySpark in Azure Databricks
You will learn how to transform and analyse data using Spark SQL in Azure Databricks
You will learn about Data Lake architecture and Lakehouse Architecture. Also, you will learn how to implement a Lakehouse architecture using Delta Lake.
You will learn how to build and orchestrate data pipelines using Lakeflow Jobs in Databricks.
You will learn how to implement incremental data processing using Delta Lake.
You will gain a comprehensive understanding of Unity Catalog and how it is used to organise and manage data in Databricks.
You will gain practical experience working with modern Databricks features and best practices used in real-world data engineering projects.
You will build practical skills that support certifications such as Databricks Certified Data Engineer Associate and Databricks Certified Associate Developer fo
You will strengthen your understanding of key concepts commonly tested in Databricks and Azure Databricks certification exams.

Course content

44 sections • 283 lectures • 31h 45m total length

Course Update Announcement2:08
Refresh the Azure Databricks and Spark project to modernize features and practices, highlighting Delta Lake, Unity Catalog, Lakeflow Jobs, dashboards, and Genie, with sections 1–19 as the recommended path.
Course Introduction5:00
Course Structure2:37
Explore a hands-on course structure for Azure Databricks with Spark, covering fundamentals, compute, notebooks, Unity Catalog, the medallion bronze, silver, gold pipeline, Delta Lake, Lakeflow, and analytics.
Course Slides Download0:26
Course Notebooks Download0:36
Course Data Download0:26

Introduction to Azure Databricks6:54
Explore how Databricks unifies data analytics with Apache Spark to build data lakehouses, featuring Photon, Unity Catalog, Delta Lake, and cloud integration on Azure, AWS, and Google Cloud.
Creating Azure Databricks Service6:38
Create an Azure Databricks service in the Azure portal by selecting subscription, resource group, workspace name, region, and premium tier, then launch the workspace with Azure Active Directory single sign-on.
Databricks User Interface Overview6:28
Databricks Architecture Overview7:58

Section Overview0:52
Explore Databricks compute as the engine that runs notebooks, jobs, and pipelines. Learn to configure, create, and troubleshoot compute, addressing Azure quota and VM availability.
Introduction to Databricks Compute5:18
Discover Databricks compute options, including serverless and classic compute, and learn about clusters with driver and worker nodes, auto-scaling, execution-based billing, and workloads from etl to machine learning.
Compute Configuration6:27
Explore classic compute configurations, choosing between single-node and multi-node clusters, and select access modes and runtimes to optimize Spark workloads, scalability, security, and cost.
Creating Databricks Cluster14:04
Troubleshooting Databricks Cluster Quota and VM Issues8:12

Introduction to Unity Catalog3:27
Unity Catalog Object Model5:38
Accessing Databricks Account Console10:35
Create Unity Catalog Metastore7:45
Create a Unity Catalog metastore and attach your Databricks workspace to it, ensuring regional alignment and proper access settings via the Databricks account console.
Configure Access to Cloud Storage4:56
Configure access to cloud storage with Unity Catalog by creating storage credentials and external locations tied to per-catalog containers, using managed identities or service principals and the Azure access connector.
Configure Access to Cloud Storage Demo (Azure)7:29
Configure an Azure Databricks access connector, create an Azure Data Lake storage account, and assign the Storage Blob Data Contributor role to enable secure access from the data lake.
Configure Access to Cloud Storage Demo (Databricks)10:01

Data Ingestion Overview2:43
Ingest all six F1 datasets from csv and json formats, apply schema and audit metadata, and store as delta bronze data using Spark DataFrame APIs for a production-ready ingestion workflow.
Circuits File - Requirements2:46
Load the circuits csv from the landing layer into a bronze delta table, enforcing or inferring schema, and add ingestion metadata like source file name and timestamp using PySpark.
Circuits File - Dataframe Reader11:06
Circuits File - Specify Schema9:28
Circuits File - Add Ingestion Metadata5:38
Circuits File - Dataframe Writer4:17
Races File - Ingestion (Assignment)4:22
Refactor Code to Remove Hardcoded Values10:36
Refactor Code to Handle Repeated Logic4:32
Refactor repetitive notebook logic by extracting ingestion metadata into a helper function add_ingestion_metadata, then apply it across circuits and racers notebooks to add ingestion_timestamp and source_file, improving readability and maintainability.
Constructors File - Ingestion10:04
Drivers File - Ingestion7:33
Results File - Ingestion (Assignment)5:46
Ingest the results dataset from a folder into the bronze layer with Spark, reading all JSON files in the folder path and writing to Delta table with a metadata column.
Sprints File - Ingestion4:54

Requirements

All the code and step-by-step instructions are provided, but the skills below will greatly benefit your journey
Basic Python programming experience will be required
Basic SQL knowledge will be required
Knowledge of cloud fundamentals will be beneficial, but not necessary
Azure subscription will be required, If you don't have one we will create a free account in the course
No prior experience with Azure Databricks is required.

Description

Course Fully Refreshed for 2026

This course has been completely rebuilt for 2026 using the latest Azure Databricks features and best practices.

Instead of relying on legacy approaches such as Hive Metastore and external orchestration tools, this course focuses on modern Databricks capabilities like Unity Catalog, Lakeflow Jobs, Databricks SQL Dashboards, and Genie.

Welcome!

In this course, you will build a complete end-to-end data engineering project using Azure Databricks and Apache Spark based on Formula 1 Motor Racing data.

You won’t just learn individual concepts. You will design and implement a cloud data platform from scratch, following the same approach used in real-world data engineering and data platform projects.

What You Will Build

Throughout the course, you will:

Design a modern Data Lakehouse architecture using Azure Databricks
Implement the Medallion Architecture (Bronze, Silver, Gold) for scalable data pipelines
Ingest, transform, and model data using Apache Spark (PySpark and Spark SQL)
Store and manage data using Delta Lake in Databricks
Organise and govern data using Unity Catalog in Azure Databricks
Build and orchestrate pipelines using Lakeflow Jobs in Databricks
Create analytical views and dashboards using Databricks SQL and Dashboards
Enhance the pipeline with incremental data processing using Delta Lake

By the end of the course, you will have built a production-ready data engineering pipeline on Azure Databricks.

Technologies You Will Use

As part of building the project, you will learn:

Azure Databricks
Apache Spark using PySpark and Spark SQL
Delta Lake and modern Lakehouse architecture
Unity Catalog for data governance and organisation in Databricks
Databricks SQL and Dashboards for analytics and reporting

How You Will Learn

This is a hands-on, project-based Azure Databricks course.

You will build the solution step by step
Concepts are explained in the context of a real-world project
Each section builds on the previous one

This approach ensures that you not only understand the concepts, but also know how to apply them in real-world data engineering scenarios.

I value your time as much as I do mine. So, I’ve designed this course to be focused, practical, and to the point. The lessons are explained in simple English, without unnecessary jargon, and we start from the basics. By the end of the course, you will be confident building real-world data engineering solutions.

How This Course Supports Certification Preparation

This course can help you build many of the core skills required for the following certifications:

Databricks Certified Data Engineer Associate
Databricks Certified Associate Developer for Apache Spark
Microsoft Exam DP-750: Implementing Data Engineering Solutions Using Azure Databricks
Databricks Certified Data Engineer Professional

The hands-on project will strengthen your practical understanding of key Databricks and Spark concepts tested in these exams.

However, this course is not designed as a certification preparation course and does not cover all exam topics.

What’s Included (and What’s Not)

This course focuses on core Spark and Databricks concepts
It does not cover Spark Streaming, Spark ML, and Lakeflow Declarative Pipelines
Spark is taught using PySpark and Spark SQL (not Scala or Java)

Final Outcome

By the end of this course, you will have built a complete, production-ready data engineering solution using Azure Databricks and Spark, and gained the confidence to apply these skills in real-world projects.

Who this course is for:

University students looking to start a career in Data Engineering
Developers working in other areas who want to move into Data Engineering
Data Engineers or Data Warehouse developers working on on-premises systems or other cloud platforms (such as AWS or GCP) who want to learn Azure Databricks and modern data engineering
Data Architects looking to gain a practical understanding of the Azure Data Engineering stack

Azure Databricks & Spark for Data Engineers:Hands-on Project

What you'll learn

Explore related topics

Course content

Introduction6 lectures • 11min

Azure Subscription (Optional)2 lectures • 9min

Azure Databricks Overview4 lectures • 28min

Databricks Compute5 lectures • 35min

Databricks Notebooks5 lectures • 56min

Introduction to Unity Catalog7 lectures • 50min

Formula1 Project Overview3 lectures • 13min

Formula1 Solution Overview4 lectures • 23min

Formula1 Environment Set-up3 lectures • 26min

Data Ingestion - Bronze13 lectures • 1hr 24min

Requirements

Description

Who this course is for: