
Welcome to the Azure Databricks Intermediate course designed for data engineers, PySpark developers, ETL professionals, and working professionals who want to build strong real-world Azure Databricks skills beyond beginner concepts.
This course focuses on intermediate-level Azure Databricks concepts with practical hands-on examples, performance optimization techniques, Delta Lake implementations, and production-oriented ETL development scenarios used in real-world data engineering projects.
You will start by understanding advanced Spark concepts such as partitions, shuffle operations, repartition vs coalesce, caching, persistence, execution plans, Catalyst Optimizer, Tungsten Optimization, and Spark join algorithms including broadcast joins, sort merge joins, and shuffle hash joins.
The course also covers processing complex file formats such as JSON, XML, and Excel using PySpark. You will learn advanced transformation techniques including flattening nested JSON structures using explode and arrays_zip functions.
A major focus of this course is Delta Lake. You will learn Delta table creation, managed vs external tables, merge operations, insert, update, delete, time travel, restore, vacuum, partitioning strategies, Z-Ordering, liquid clustering, and performance optimization techniques.
You will also build Slowly Changing Dimension (SCD) Type 1 and Type 2 pipelines using Delta Lake merge logic, audit columns, and hash key generation techniques commonly used in enterprise ETL projects.
Additionally, the course covers Auto Loader, schema evolution, idempotent data pipelines, Databricks cluster types, ADLS integration, ETL logging frameworks, and cluster sizing concepts.
By the end of this course, you will have practical experience in building scalable, optimized, and production-ready Azure Databricks ETL pipelines using Delta Lake and PySpark.