
Prepare for the Databricks Certified Data Engineer Associate exam by blending theory with hands-on practice on the Databricks lakehouse, featuring spark and Delta Live Tables projects plus a practice exam.
Learn the Databricks lakehouse platform and exam topics across five categories, with hands-on guidance on Unity Catalog, Delta Lake, Spark, and productionizing data pipelines.
Learn how to create a free azure account, explore 12 months of free services and 200 USD of free credit, and compare with the student free option and sign-up steps.
Explore how Databricks enables a modern data lakehouse that blends data lake flexibility with data warehouse governance, supporting BI and ML workloads on a unified platform.
Discover how medallion architecture structures data as bronze, silver, and gold layers in a Databricks data lakehouse, enabling quality, governance, lineage, incremental processing, and role-based access control.
Explore the Databricks user interface, navigate the left menu for data warehousing, data engineering, and machine learning, and manage notebooks, clusters, jobs, data ingestion, and Delta Live Tables.
Explore the two main Databricks planes—control plane and compute plane—and how classic and serverless compute, Unity Catalog data governance, and workspace storage locate resources and data.
Explore Databricks notebooks and their Jupyter-style environment, attach a cluster, organize notebooks in the workspace, and mix Python, SQL, and Markdown cells to document and run code.
Explore Databricks utilities to combine file operations with ETL tasks in notebooks, using dbutils.fs and the percentage fs magic, plus secrets, widget, and workflow utilities.
Databricks git folders offer a visual git client in the workspace that bridges notebooks and scripts to GitHub, Azure DevOps, and Bitbucket, enabling branches, reviews, and pull requests.
Explore how to use the Databricks notebook debugger to set breakpoints, step through Python code, inspect variables, and run debug console snippets, with hands-on examples fixing a tax calculation error.
Navigate the Unity Catalog object model from Metastore to catalogs, schemas, volumes, and tables, contrast managed versus external data, and understand three-level namespaces and Hive Metastore compatibility.
Configure Databricks clusters for Unity Catalog by selecting runtime 11.3 or higher, choosing an appropriate access mode, and disabling credential pass through, ensuring the workspace has Unity Catalog enabled.
Learn to design and implement ETL pipelines with Apache Spark, validate and transform data from bronze to silver to gold in a data lakehouse, using Spark SQL and PySpark.
Develop a simple ETL project with Apache Spark to build a Gizmo Box data lake house, covering bronze, silver, and gold layers, Unity Catalog, and external vs internal data workflows.
Set up the data lake project environment by creating Gizmo Box container, organizing landing data into operational and external folders, uploading files, and granting Databricks access via an external location.
Set up Unity Catalog project environment with Gizmo Box, landing, bronze, silver, and gold schemas, and an operational data volume; learn to create catalogs, schemas, volumes, and manage permissions.
Create a Unity Catalog view in the bronze schema to reference landing-layer data, using a three-level namespace and create or replace view, enabling lineage, security, and easy data access.
Create temporary views and global temporary views in Spark within Databricks, and understand their lifecycles for use in intermediate ETL results across notebooks.
Extract orders data from a complex json with nested arrays and data quality issues using text format for pre-processing before json parsing, then create a bronze view for raw data.
Process unstructured membership image files using the binary file format in Databricks. Query with select, view the data schema, and access metadata across subfolders of PNG identity cards.
Learn to read tab-delimited csv addresses data with the read_files function, including header handling and delimiter specification. Compare select limitations with alternatives like external tables and bronze layer.
Learn to access refunds data from an Azure SQL database in Databricks by creating an external table via JDBC in Hive metastore, then query the external table.
Learn to run SQL commands from Python using Spark SQL, read JSON and other formats with the DataFrame Reader API, and create temporary views with spark.table.
Transform refunds data by splitting refund reason and source using split and regexp_extract, extract date and time from the refund timestamp, and write results to hive metastore silver layer.
Transform memberships data by extracting the customer id from the file path using regexp_extract. Create a silver memberships table and join it to the customers table for integrated insights.
Extract information from json strings in the orders data using extraction path and array indexing. Use dot notation for fields and cast as needed, noting performance limits of string reads.
Transform orders data by converting json strings to json objects, fixing data quality issues with pre-processing, and building a sql table in the silver schema for downstream processing.
Learn how to inner join customers and addresses on customer_id, pivot shipping and billing addresses into a single row, and create a gold layer customer address table for downstream use.
Master spark aggregate functions to summarize orders by customer and month, calculating total orders, total items bought, and total amount spent using price times quantity.
Explore higher order functions that operate on arrays and maps, using lambdas to transform, filter, exist, and aggregate; see examples with named structs and total order calculations.
Explore higher order functions for maps, including transform values, transform keys, and map filter, with examples converting keys to uppercase, applying 10% tax, and filtering items above 500.
Explore PySpark, the Python API for Apache Spark, and learn how data frames enable flexible, programmable ETL from data sources through the DataFrame API, from read to transform to write.
Convert Gizmo Box extract customers data workflow from Spark SQL to PySpark, read JSON with DataFrame Reader API, and write to a Delta table using DataFrame Writer v2 API.
extract orders data from a json file by reading as text to handle corrupt records, then write json strings to pie.orders in the gizmo box catalog using py spark v2.
Extract memberships data from binary image files using PySpark, read all PNGs, and write to gizmo box bronze p y underscore memberships table in target schema via DataFrame Writer v2.
Extract payments data from csv using the data frame reader API in PySpark, define the schema (ddl format or python format), and write to a table with writer v2.
Extract refunds data from an Azure SQL table using Spark data frame reader with JDBC, then write it to a bronze Delta Lake table via the data frame write API.
Transform the bronze customer data into a silver table using PySpark by cleaning nulls and duplicates, keeping latest by created timestamp, and writing to the silver table.
Transform payments data by extracting date and time from the payment timestamp, translating numeric statuses to text, and writing the results to the silver layer.
Transform bronze-layer membership data to the silver layer by extracting the customer id from the file path and writing the results to the silver memberships table.
Denormalize addresses from bronze to silver by pivoting on address type after grouping by customer id, aggregating with max for address line, city, state, postcode to a single customer record.
Transform orders data by converting JSON strings to JSON objects, fixing data quality issues from bronze to silver using regexp_replace and from_json in Spark.
Join the silver customers and silver addresses to create the gold table customer_address in the gold schema using PySpark data frame joins on customer_id and write to a delta table.
May 2026 - Updated to include changes from the latest exam syllabus.
Databricks Data Engineer Associate Certification is a gateway to gain recognition in the industry and open doors to better job opportunities and higher salaries. And it showcases your ability to handle real-world data engineering projects, as well as a way to future-proof your career, and a chance to achieve your professional goals!
I want to help you pass the Databricks Data Engineer Associate Certification with ease!
I have designed the course to give you the right level of theory and hands-on practice so that you can not only pass the certification exam, but also develop yourself with the right skills required to work in the industry using Databricks. So, I have designed the course with the following in mind
Many of the technical concepts and practical skills covered in this course are also relevant for the Microsoft Azure Databricks Data Engineer Associate (DP-750) certification, including Spark, Delta Lake, Unity Catalog, Lakehouse architecture, data ingestion, orchestration, and governance. While this course is primarily focused on the Databricks Certified Data Engineer Associate exam, it can also serve as a strong Databricks-focused foundation for students preparing for DP-750.
It covers all the topics required to pass the certification
It includes a Full Length Practice Exam with detailed explanations
It provides detailed explanations of each of topics
It takes a hands-on approach to learning. 80% of the course requires you to be working with databricks
It has 2 small projects to give you the practical knowledge required to work in the industry
It's fast paced and to the point. I genuinely value your time.
All the 300+ slides are available to download as PDF
All the databricks notebooks created during the course are available to download
I provide guidance on how to approach the Databricks Exam and pass with ease
Beginners are welcome! I teach everything about Databricks from Scratch and provide step-by-step instructions.
Disclaimer: This course is an independent preparation resource and is not affiliated with, endorsed by, or sponsored by Databricks, Inc. All instructional content, practice questions, and materials have been originally developed by the instructor based on the publicly available exam guide and real-world data engineering experience. No actual exam content or proprietary Databricks materials have been used or reproduced.
About the Instructor
My name is Ramesh and I am going to be your instructor for this course. I am a data engineer with over 25 years of experience working on some of the large data projects, including most recently working for Microsoft UK and some of the top consulting firms.
I hold a number of certifications including the Databricks Certified Data Engineer Associate Certification that I am teaching in this course.
Over the last 4 years, I've taught over 200,000 students on Udemy, and my courses are highly rated and best sellers. I’m extremely passionate about teaching and committed to making your learning journey enjoyable and worthwhile.
I am active in the Q&A section of the course. So, you will be able to ask questions and I will be there to answer your questions!
So, if you’re ready to take the next step in your data engineering career and become a Databricks Certified Data Engineer Associate, enroll now, and let’s get started! I look forward to seeing you inside the course!