Udemy
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
Turn what you know into an opportunity and reach millions around the world.
Learn More
Your cart is empty.
Keep shopping
Mastering Databricks & Apache spark -Build ETL data pipeline
Rating: 4.2 out of 5(436 ratings)
2,518 students

Mastering Databricks & Apache spark -Build ETL data pipeline

Learn fundamental concept about databricks and process big data by building your first data pipeline on Azure
Created byPriyank Singh
Last updated 8/2021
English

What you'll learn

  • Databricks
  • Build your first data pipeline to process CSV, JSON, XML
  • Orchestrate data pipeline on Azure data factory
  • Spin up spark cluster
  • Delta tables
  • Concept of time travel and vacuum on delta tables
  • Apache Spark SQL
  • Filtering Dataframe
  • Renaming, drop, Select, Cast
  • Aggregation operations SUM, AVERAGE, MAX, MIN
  • Rank, Row Number, Dense Rank
  • Building dashboards
  • Build Complete project
  • Build End to End data pipeline

Course content

5 sections47 lectures4h 23m total length
  • Introduction0:20
  • What is Databricks1:04
  • Project1:28
  • Create Azure Account1:41
  • Setting up databricks environment2:16
  • Importing Notebooks0:59
  • Understanding Distributed Processing0:35
  • How to create cluster2:04

    Learn how to create a Databricks cluster, name it, choose standard or single mode, set min and max workers, select runtime and machine, then edit, clone, restart, stop, or delete.

  • Notebook2:52
  • Why Databricks0:46
  • Create table or dataframe by uploading data1:16

Requirements

  • There are no pre requisites with this course

Description

Welcome to the course on Mastering Databricks & Apache spark -Build ETL data pipeline

Databricks combines the best of data warehouses and data lakes into a lakehouse architecture. In this course we will be learning how to perform various operations in Scala, Python and Spark SQL. This will help every student in building solutions which will create value and mindset to build batch process in any of the language. This course will help in writing same commands in different language and based on your client needs we can adopt and deliver world class solution. We will be building end to end solution in azure databricks.


Key Learning Points

  • We will be building our own cluster which will process our data and with one click operation we will load different sources data to Azure SQL and Delta tables

  • After that we will be leveraging databricks notebook to prepare dashboard to answer business questions

  • Based on the needs we will be deploying infrastructure on Azure cloud

  • These scenarios will give student 360 degree exposure on cloud platform and how to step up various resources

  • All activities are performed in Azure Databricks


Fundamentals

  • Databricks

  • Delta tables

  • Concept of versions and vacuum on delta tables

  • Apache Spark SQL

  • Filtering Dataframe

  • Renaming, drop, Select, Cast

  • Aggregation operations SUM, AVERAGE, MAX, MIN

  • Rank, Row Number, Dense Rank

  • Building dashboards

  • Analytics

This course is suitable for Data engineers, BI architect, Data Analyst, ETL developer, BI Manager

Who this course is for:

  • Data engineer
  • People who are interested in build End to End ETL data pipeline
  • Learn fundamentals commands in Python, Apache Spark SQL, Scala