Databricks Fundamentals & Apache Spark Core
What you'll learn
- Databricks
- Apache Spark Architecture
- Apache Spark DataFrame API
- Apache Spark SQL
- Selecting, and manipulating columns of a DataFrame
- Filtering, dropping, sorting rows of a DataFrame
- Joining, reading, writing and partitioning DataFrames
- Aggregating DataFrames rows
- Working with User Defined Functions
- Use the DataFrameWriter API
Requirements
- Basic Scala knowledge
- Basic SQL knowledge
Description
Welcome to this course on Databricks and Apache Spark 2.4 and 3.0.0
Apache Spark is a Big Data Processing Framework that runs at scale.
In this course, we will learn how to write Spark Applications using Scala and SQL.
Databricks is a company founded by the creator of Apache Spark.
Databricks offers a managed and optimized version of Apache Spark that runs in the cloud.
The main focus of this course is to teach you how to use the DataFrame API & SQL to accomplish tasks such as:
Write and run Apache Spark code using Databricks
Read and Write Data from the Databricks File System - DBFS
Explain how Apache Spark runs on a cluster with multiple Nodes
Use the DataFrame API and SQL to perform data manipulation tasks such as
Selecting, renaming and manipulating columns
Filtering, dropping and aggregating rows
Joining DataFrames
Create UDFs and use them with DataFrame API or Spark SQL
Writing DataFrames to external storage systems
List and explain the element of Apache Spark execution hierarchy such as
Jobs
Stages
Tasks
Who this course is for:
- Software developers curious about big-data, data engeneering and data science
- Beginner data engineer who want to learn how to do work with databricks
- Beginner data scientist who want to learn how to do work with databricks
Instructor
I'm a software developer specialized in building data-intensive applications.
I've been developing software for over 10 years.
I've worked for Industries that are very data-intensive such as the financials and industrial image processing.
Over the years, the volume of data produced by systems and humans outgrew the storage and compute capacity of the legacy RDBMS systems, and therefore I had to learn how to use the new tools and frameworks to process Big-Data
As a data engineer, I'm very motivated and passionate about building applications that can leverage the power and flexibility of cloud computing and big-data processing frameworks.