Udemy
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
Turn what you know into an opportunity and reach millions around the world.
Learn More
Your cart is empty.
Keep shopping
Azure Databricks Intermediate | Delta Lake & Optimization
New
10 students
Created byMallaiah Somula
Last updated 5/2026
English

What you'll learn

  • Optimize Spark applications using partitions, caching, execution plans, join strategies, and cluster sizing techniques
  • Build scalable Delta Lake pipelines using MERGE, SCD Type 1 & 2, Time Travel, Partitioning, and Z-Ordering
  • Develop incremental and idempotent ETL pipelines using Auto Loader, COPY INTO, schema evolution, and logging
  • Process complex JSON, XML, and Excel files in Azure Databricks using advanced PySpark transformation techniques

Course content

8 sections25 lectures30h 53m total length
  • Spark Partitions | Architecture, Shuffle | Repartition vs Coalesce Explained1:30:48
  • Spark Cache vs Persist vs Unpersist Explained | Why Spark Recomputes Jobs?1:31:43
  • Spark Cache vs Persist Hands-On | Tungsten Optimization | Parquet vs ORC vs Avro1:36:08
  • Spark Execution Plan Explained | Lazy Evaluation | Catalyst & Tungsten Optimizer1:09:30
  • Spark Join Algorithms Explained | Sort Merge vs Broadcast vs Shuffle Hash1:30:25
  • Databricks Cluster Sizing Explained | Memory & Partition Calculation58:32

Requirements

  • Basic knowledge of Azure Databricks, PySpark DataFrames, joins, and Python programming is recommended
  • Students should be familiar with Databricks notebooks, clusters, and basic ETL development concepts
  • A free Azure account or Databricks workspace is recommended for hands-on practice and exercises
  • Completion of a beginner-level Azure Databricks or PySpark course will help learners understand concepts faster

Description

Welcome to the Azure Databricks Intermediate course designed for data engineers, PySpark developers, ETL professionals, and working professionals who want to build strong real-world Azure Databricks skills beyond beginner concepts.


This course focuses on intermediate-level Azure Databricks concepts with practical hands-on examples, performance optimization techniques, Delta Lake implementations, and production-oriented ETL development scenarios used in real-world data engineering projects.


You will start by understanding advanced Spark concepts such as partitions, shuffle operations, repartition vs coalesce, caching, persistence, execution plans, Catalyst Optimizer, Tungsten Optimization, and Spark join algorithms including broadcast joins, sort merge joins, and shuffle hash joins.


The course also covers processing complex file formats such as JSON, XML, and Excel using PySpark. You will learn advanced transformation techniques including flattening nested JSON structures using explode and arrays_zip functions.


A major focus of this course is Delta Lake. You will learn Delta table creation, managed vs external tables, merge operations, insert, update, delete, time travel, restore, vacuum, partitioning strategies, Z-Ordering, liquid clustering, and performance optimization techniques.


You will also build Slowly Changing Dimension (SCD) Type 1 and Type 2 pipelines using Delta Lake merge logic, audit columns, and hash key generation techniques commonly used in enterprise ETL projects.


Additionally, the course covers Auto Loader, schema evolution, idempotent data pipelines, Databricks cluster types, ADLS integration, ETL logging frameworks, and cluster sizing concepts.


By the end of this course, you will have practical experience in building scalable, optimized, and production-ready Azure Databricks ETL pipelines using Delta Lake and PySpark.

Who this course is for:

  • Data engineers and PySpark developers who want to improve Spark optimization and Delta Lake development skills
  • Azure Databricks professionals looking to build scalable ETL pipelines using Auto Loader and Delta Lake
  • Working professionals preparing for intermediate Databricks, Spark, and Azure Data Engineering interviews
  • Learners who already understand Databricks basics and want to move into real-world production concepts