Mastering Databricks & Apache spark -Build ETL data pipeline

Name: Mastering Databricks & Apache spark -Build ETL data pipeline
Rating: 4.2 (436 reviews)

Learn fundamental concept about databricks and process big data by building your first data pipeline on Azure

Created byPriyank Singh

Last updated 8/2021

English

What you'll learn

Databricks
Build your first data pipeline to process CSV, JSON, XML
Orchestrate data pipeline on Azure data factory
Spin up spark cluster
Delta tables
Concept of time travel and vacuum on delta tables
Apache Spark SQL
Filtering Dataframe
Renaming, drop, Select, Cast
Aggregation operations SUM, AVERAGE, MAX, MIN
Rank, Row Number, Dense Rank
Building dashboards
Build Complete project
Build End to End data pipeline

Course content

5 sections • 47 lectures • 4h 23m total length

Introduction0:20
What is Databricks1:04
Project1:28
Create Azure Account1:41
Setting up databricks environment2:16
Importing Notebooks0:59
Understanding Distributed Processing0:35
How to create cluster2:04
Learn how to create a Databricks cluster, name it, choose standard or single mode, set min and max workers, select runtime and machine, then edit, clone, restart, stop, or delete.
Notebook2:52
Why Databricks0:46
Create table or dataframe by uploading data1:16

Window Functions1:25
Scala - Filtering Dataframe11:08
Scala - Common Operations12:41
Scala - Aggregation commands6:43
Scala - Rank, Row Number, Dense Rank13:17
Apply window functions in Scala to compute max and min salary per country, and explore rank, dense rank, and row number scenarios using partition by and order by.
Python - Filtering Dataframe10:27
Python - Common Operations8:06
Python - Aggregation commands6:47
Python - Rank, Row Number, Dense Rank12:25
Spark SQL - Common Operations7:47
Spark SQL - Aggregation Commands6:04
Spark SQL - Rank, Row Number, Dense Rank9:24
Spark SQL - Global View4:01
Spark SQL - Temp View3:01
Joins1:50
Explore different kinds of joins (inner, left, right, full) to combine tables A and B on department I.D., and understand how each join affects the resulting rows.
Scala - Joins8:53
Learn how to perform left, right, and full outer joins in Scala on Spark dataframes, create temporary views, and run Spark SQL queries to combine department data and handle nulls.
Python - Joins7:03
Spark SQL - Joins3:55

Project Description1:28
Spinning up Azure SQL3:51
Key Vault3:00
Secret Scopes2:13
Project building and mounting of containers3:24
Create two storage containers named landing and archive, then mount them in the Databricks project. Set up a batch process folder and create a one-time data warehouse database named Grandison.
Reading XML,JSON,CSV and loading to Delta tables & Azure SQL12:01
Process multi-format data in notebooks by reading csv, json, and xml files, casting types, and loading into delta tables and Azure SQL via JDBC, with separate notebooks for each format.
Move files from one container to another5:48
Dashboard4:54
Azure Data Factory to orchestrate12:35
Congratulations0:11

Requirements

There are no pre requisites with this course

Description

Welcome to the course on Mastering Databricks & Apache spark -Build ETL data pipeline

Databricks combines the best of data warehouses and data lakes into a lakehouse architecture. In this course we will be learning how to perform various operations in Scala, Python and Spark SQL. This will help every student in building solutions which will create value and mindset to build batch process in any of the language. This course will help in writing same commands in different language and based on your client needs we can adopt and deliver world class solution. We will be building end to end solution in azure databricks.

Key Learning Points

We will be building our own cluster which will process our data and with one click operation we will load different sources data to Azure SQL and Delta tables
After that we will be leveraging databricks notebook to prepare dashboard to answer business questions
Based on the needs we will be deploying infrastructure on Azure cloud
These scenarios will give student 360 degree exposure on cloud platform and how to step up various resources
All activities are performed in Azure Databricks

Fundamentals

Databricks
Delta tables
Concept of versions and vacuum on delta tables
Apache Spark SQL
Filtering Dataframe
Renaming, drop, Select, Cast
Aggregation operations SUM, AVERAGE, MAX, MIN
Rank, Row Number, Dense Rank
Building dashboards
Analytics

This course is suitable for Data engineers, BI architect, Data Analyst, ETL developer, BI Manager

Who this course is for:

Data engineer
People who are interested in build End to End ETL data pipeline
Learn fundamentals commands in Python, Apache Spark SQL, Scala

Mastering Databricks & Apache spark -Build ETL data pipeline

What you'll learn

Explore related topics

Course content

Getting Started with Databricks11 lectures • 15min

Extraction of Data5 lectures • 30min

Transformation of Data18 lectures • 2hr 15min

Processing XML, JSON, Delta tables3 lectures • 33min

Loading data and building ETL data pipeline with dashboard10 lectures • 49min

Requirements

Description

Who this course is for: