Databricks Certified Data Engineer Associate - Preparation

Name: Databricks Certified Data Engineer Associate - Preparation
Rating: 4.5 (17634 reviews)

Complete preparation for Databricks Data Engineer Associate certification + hands-on training

Bestseller

Created byDerar Alhussein | 10x Databricks Certified, Acadford ™

Last updated 6/2026

English

What you'll learn

Understand how to use Databricks Lakehouse Platform and its tools
Build ETL pipelines using Apache Spark SQL and Python
Process data incrementally in batch and streaming mode
Orchestrate production pipelines
Understand and follow best security practices in Databricks

Course content

8 sections • 59 lectures • 5h 19m total length

Course Overview1:36
This course overview outlines the Databricks certified data engineer associate preparation, covering lakehouse platform, etl with spark sql and python, incremental data processing, production pipelines, and data governance.
New Exam Version (Update in Progress)0:55
What is Databricks5:05
Learn how Databricks combines a multi-cloud lakehouse built on Apache Spark, with the cloud service, runtime, and workspace, plus DBFS and Delta Lake support for batch and streaming analytics.
Free trial on Azure3:46
Learn to sign up for a 14-day free trial of Databricks on Azure, including creating a resource group, selecting the 14-day free premium option, and launching a workspace.
Exploring Workspace4:00
Navigate the Databricks workspace interface, including the left sidebar and workspace explorer, to organize notebooks, folders, and data assets across SQL, data engineering, and machine learning.
Course Materials1:43
Import notebooks into the Databricks workspace via git folders with a GitHub repository; clone, access the course materials, and follow along to recreate solutions.
Creating Cluster6:51
Create a Databricks cluster by navigating to compute, naming it, and choosing a single or multi-node setup with a driver and workers, runtime version 13.3 LTS, and auto termination.
Notebooks Fundamentals12:04
Create and manage Databricks notebooks, switch languages with magic commands, use markdown for notes, run cells, and export, import, or revert revisions for modular workflows.
New Notebook Features0:36
Git folders8:03
Explore git folders (Databricks Repos) for source control by integrating with git providers like GitHub, then clone, commit, push, and manage branches and pull requests.

Delta Lake5:25
Delta Lake is an open source storage framework that brings reliability to data lakes and enables lakehouse architecture through a transaction log for ACID and consistent reads of parquet data.
Understanding Delta Tables (Hands On)10:20
Explore delta lake tables in the hive metastore catalog, create a delta table, perform multiple inserts, and review metadata and history via describe detail and describe history.
Advanced Delta Lake Features4:17
Explore advanced Delta Lake features, including time travel with history and restore, and optimize performance with file compaction and Z-order indexing, plus vacuum garbage collection.
Applying Advanced Delta Features (Hands On)7:03
Explore advanced Delta Lake features, including time travel, optimize and vacuum, and restore data from prior table versions in a hands-on session.
Optimizing Data File Layout6:18
Explore data file layout optimization in Databricks by using partitioning, z-order indexing, and liquid clustering to accelerate queries through data skipping and efficient file organization.
Databases and Tables on Databricks (Hands On)6:58
Learn to create and manage databases and tables on Databricks, including managed and external tables, their locations, and describe extended and drop behavior.
Set Up Delta Tables6:38
Learn to set up delta tables with ctas, infer schemas, rename columns, and apply not null constraints. Explore partitioning, external locations, and deep or shallow cloning to copy delta tables.
Views3:40
Explore Databricks views, including stored, temporary, and global temporary views. These virtual tables are defined by saved SQL queries and scoped by session or cluster.
Working with Views (Hands On)7:14
Create and query stored, temporary, and global temporary views from a smartphones table, demonstrating persistence across sessions, and using show tables to manage tables and views in Databricks.

Querying Files6:13
Query files in Databricks with Spark SQL across json, parquet, and csv; create Delta Lake tables via CTAS or external tables using, then load external data via temporary views.
Reminder: Technical Considerations0:28
Querying Files (Hands On)12:39
Explore querying and ingesting files with Spark SQL on Databricks, reading JSON and CSV data, creating external and Delta tables, and handling schema, caching, and CTAS workflows.
Simplified File Querying0:30
The _metadata Column0:23
Writing to Tables (Hands On)9:00
Explore SQL for writing to Delta tables, using create or replace table, insert overwrite, and merge into to upsert, overwrite, or append while preserving ACID guarantees and time travel.
Advanced Transformations (Hands On)8:50
Explore advanced Spark SQL transformations on a bookstore dataset, parsing json strings, converting to struct types, flattening fields, exploding arrays, and applying joins, unions, and pivots.
Legacy JSON Querying Syntax0:26
Null-Safe Join0:33
Higher Order Functions and SQL UDFs (Hands On)7:15
Explore higher order functions and user defined functions (UDFs) in Spark SQL. Apply filter and transform on the books array, and define UDFs to reuse SQL logic across Spark sessions.

Structured Streaming7:30
Spark structured streaming, treating infinite data as an unbounded table, using readStream and writeStream with micro-batches, triggers, checkpoints, Delta Lake integration, and exactly-once semantics.
Structured Streaming (Hands On)8:35
Learn spark structured streaming with a bookstore dataset, query a Delta table as a stream source using spark.readStream, and create streaming views for incremental processing.
Incremental Data Ingestion4:41
discover incremental data ingestion in Databricks using copy into and auto loader to load only new files into a delta table, with inferred schema, checkpointing, and exactly-once guarantees.
Auto Loader (Hands On)5:36
Learn incremental data ingestion with auto loader, reading parquet via Spark Structured Streaming and loading new files into the Delta Lake table orders_updates.
Auto Loader options2:18
Multi-hop Architecture2:16
Explore the multi hop (medallion) architecture for lakehouse data, using bronze, silver, and gold layers to incrementally refine raw data into business-ready insights and support hybrid batch and streaming etl.
Multi-hop Architecture (Hands On)10:07
Build a Delta Lake multi-hop pipeline with Auto Loader streaming, metadata enrichment, and bronze, silver, gold tables, using a static customers lookup to enrich streaming data.

Lakeflow Declarative Pipelines (Hands On)13:29
Explore delta live tables (dlt) to build maintainable multi-hop data pipelines with bronze, silver, and gold layers, using auto loader to ingest parquet data and enforce data quality via constraints.
Lakeflow Jobs (Hands On)9:03
Orchestrate multi-task Databricks jobs by chaining a land data notebook, a Delta Live Tables pipeline, and a results notebook, with scheduling, dependencies, and error repair.
Databricks Asset Bundles3:03
NOTE: DABs Renaming0:31
Deploying Jobs with DABs (Hands On)13:23
Deploy and manage Databricks workflows across development, staging, and production with Databricks asset bundles, enabling testing, packaging, and ci/cd through a streamlined cli and vscode workflow.
DABs Advanced Configurations1:33
DABs Important Commands1:43

Databricks SQL12:40
Explore Databricks SQL (DBSQL) to run SQL and BI workloads at scale, create SQL warehouses, and build dashboards, queries, and alerts with unified governance.
Data Objects Privileges3:42
Learn data object privileges in Databricks and how the governance model grants, denies, and revokes access to catalogs, schemas, tables, and views with privileges like select and modify.
Managing Permissions (Hands On)7:50
Explore managing permissions in Databricks SQL by creating HR DB with an employees table and Paris view, assigning HR Team privileges, and reviewing permissions via show grants and data explorer.
Unity Catalog8:42
Unity Catalog provides centralized governance across workspaces and clouds with a three-level namespace (metastore, catalog, schema) and identity-based access, discovery, and lineage features.
Unity Catalog (Hands On)7:56
Explore Unity Catalog in Databricks, verify metastore linking, manage catalogs and permissions, enable Delta Sharing, and trace data lineage across workspaces and regions.
Unity Catalog Advanced Topics1:49
Cluster Best Practices9:45
Explore Databricks cluster best practices, comparing classic and serverless compute, choosing between all-purpose and jobs clusters, leveraging instance pools and SQL warehouses for cost-efficient workloads.

Requirements

Basic SQL knowledge will be required
Basic Python programming experience will be required
Knowledge of cloud fundamentals will be beneficial, but not necessary

Description

If you are interested in becoming a Certified Data Engineer Associate from Databricks, you have come to the right place! This study guide will help you with preparing for this certification exam.

By the end of this course, you should be able to:

Understand how to use and the benefits of using the Databricks Lakehouse Platform and its tools, including:
- Data Lakehouse (architecture, descriptions, benefits)
- Data Science and Engineering workspace (clusters, notebooks, data storage)
- Delta Lake (general concepts, table management and manipulation, optimizations)
Build ETL pipelines using Apache Spark SQL and Python, including:
- Relational entities (databases, tables, views)
- ELT (creating tables, writing data to tables, cleaning data, combining and reshaping tables, SQL UDFs)
- Python (facilitating Spark SQL with string manipulation and control flow, passing data between PySpark and Spark SQL)
Incrementally process data, including:
- Structured Streaming (general concepts, triggers, watermarks)
- Auto Loader (streaming reads)
- Multi-hop Architecture (bronze-silver-gold, streaming applications)
- Delta Live Tables (benefits and features)
Build production pipelines for data engineering applications and Databricks SQL queries and dashboards, including:
- Jobs (scheduling, task orchestration, UI)
- Dashboards (endpoints, scheduling, alerting, refreshing)
Understand and follow best security practices, including:
- Unity Catalog (benefits and features)
- Entity Permissions (data objects Privileges)

With the knowledge you gain during this course, you will be ready to take the certification exam.

I am looking forward to meeting you!

Who this course is for:

Anyone aiming to pass the Databricks Data Engineer Associate certification exam
University students looking for a career in Data Engineering
Data Engineers moving from other technologies and aiming to apply their skills to Databricks
Data Engineers/ Data Warehouse Developers currently working on on-premises technologies
Anyone new to Databricks and want to save time by learning Databricks fundamentals

Databricks Certified Data Engineer Associate - Preparation

What you'll learn

Explore related topics

Course content

Introduction10 lectures • 45min

Databricks Intelligence Platform9 lectures • 58min

Data Ingestion and Loading6 lectures • 26min

Data Transformation10 lectures • 46min

Incremental Data Processing7 lectures • 41min

Productionizing Data Pipelines7 lectures • 43min

Data Governance & Quality7 lectures • 52min

Certification Overview3 lectures • 9min

Requirements

Description

Who this course is for: