
Identify prerequisite topics such as Spark SQL, Spark DataFrame API, Spark Structured Streaming API, Python basics, and Spark architecture and internals to benefit from the course.
Learn how to access and download course resources, including notebooks, sample data, and capstone project, then import notebooks into Azure Databricks and upload data to cloud storage.
Encourage students to share honest reviews and five-star ratings to support ongoing course updates and high-quality content, with a 30-day refund if the course doesn't meet expectations.
Learn to set up unity catalog and a metastore within Databricks on Azure, linking storage accounts, access connectors, and workspaces to enable data governance and fine-grained access.
Discover delta lake, an open source storage framework between processing engine and cloud storage, enabling acid transactions, delete/update/merge, schema enforcement, data versioning with time travel, and streaming and batch unification.
Master delta table operations in Spark, including delete, update, and merge, with Spark SQL and Delta table API in Python.
Convert a partitioned parquet data set to a delta data set in place with the convert to delta command, enabling in-place migration and delta log creation.
Explore incremental data ingestion in lakehouses, covering architecture and use cases, then learn copy command, spark streaming, and auto loader for ingestion with manual and automatic schema evolution.
Learn to use copy into to ingest landing zone data into a bronze table with a fixed schema, and apply manual schema evolution to handle new columns.
Master Databricks copy into with automatic schema evolution to ingest csv data from a landing zone into a schema-less delta table, inferring and merging schema on the fly.
Explore Databricks Auto Loader, a cloud-native, spark streaming framework that efficiently ingests new files from cloud storage with incremental listing, optimized reads, and optional data landing notifications.
Build delta live tables pipelines from landing zone to bronze, silver, gold layers with incremental processing, apply QCD type two, implement CDC with merge for UK 2022 daily sales report.
Learn to create and schedule a delta live table pipeline using the UI, connect your code from workspace or repo, and run it against Unity Catalog or Hive metastore.
Learn to build delta live tables pipelines in python, creating bronze raw tables, cleaning with data quality, and silver scd type 2 merges, plus daily materialized views for final analytics.
About the Course
I am creating Databricks - Master Azure Databricks for Data Engineers using the Azure cloud platform. This course will help you learn the following things.
Databricks in Azure Cloud
Working with DBFS and Mounting Storage
Unity Catalog - Configuring and Working
Unity Catalog User Provisioning and Security
Working with Delta Lake and Delta Tables
Manual and Automatic Schema Evolution
Incremental Ingestion into Lakehouse
Databricks Autoloader
Delta Live Tables and DLT Pipelines
Databricks Repos and Databricks Workflow
Databricks Rest API and CLI
Capstone Project
This course also includes an End-To-End Capstone project. The project will help you understand the real-life project design, coding, implementation, testing, and CI/CD approach.
Who should take this Course?
I designed this course for data engineers who are willing to develop Lakehouse projects following the Medallion architecture approach using the Databrick cloud platform. I am also creating this course for data and solution architects responsible for designing and building the organization’s Lakehouse platform infrastructure. Another group of people is the managers and architects who do not directly work with Lakehouse implementation. Still, they work with those implementing Lakehouse at the ground level.
Spark Version used in the Course.
This course uses Databricks in Azure Cloud and Apache Spark 3.5. I have tested all the source codes and examples used in this course on Azure Databricks Cloud using Databricks Runtime 13.3.