Apache Spark 3 - Spark Programming in Python for Beginners
4.8 (37 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
183 students enrolled

Apache Spark 3 - Spark Programming in Python for Beginners

Data Engineering using Spark Structured API
Highest Rated
4.8 (37 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
183 students enrolled
Last updated 7/2020
English
English
Current price: $13.99 Original price: $19.99 Discount: 30% off
5 hours left at this price!
30-Day Money-Back Guarantee
This course includes
  • 5.5 hours on-demand video
  • 10 downloadable resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • Apache Spark Foundation and Spark Architecture
  • Data Engineering and Data Processing in Spark
  • Working with Data Sources and Sinks
  • Working with Data Frames and Spark SQL
  • Using PyCharm IDE for Spark Development and Debugging
  • Unit Testing, Managing Application Logs and Cluster Deployment
Course content
Expand all 51 lectures 05:26:46
+ Installing and Using Apache Spark
5 lectures 23:41
Apache Spark in Local Mode Command Line REPL
05:49
Did you notice?
3 questions
Apache Spark in the IDE - PyCharm
05:55
Did you notice?
3 questions
Apache Spark in Cloud - Databricks Community and Notebooks
04:33
Check your knowledge
3 questions
Apache Spark in Anaconda - Jupyter Notebook
04:32
+ Spark Execution Model and Architecture
9 lectures 37:13
Execution Methods - How to Run Spark Programs?
05:01
Check your knowledge
4 questions
Spark Distributed Processing Model - How your program runs?
03:11
Spark Execution Modes and Cluster Managers
04:55

Execution Modes and Cluster manager is one of the most confusing topics. Check your understanding using this quiz.

Check your knowledge
10 questions
Summarizing Spark Execution Models - When to use What?
02:24
Working with PySpark Shell - Demo
04:31
Installing Multi-Node Spark Cluster - Demo
05:36
Working with Notebooks in Cluster - Demo
06:58
Working with Spark Submit - Demo
02:55
Section Summary
01:42
+ Spark Programming Model and Developer Experience
11 lectures 01:27:22
Creating Spark Project Build Configuration
06:10
Configuring Spark Project Application Logs
10:50
Creating Spark Session
08:26
Configuring Spark Session
09:12
Data Frame Introduction
07:43
Data Frame Partitions and Executors
05:24
Spark Transformations and Actions
11:02
Spark Jobs Stages and Task
08:34
Understanding your Execution Plan
09:33
Unit Testing Spark Application
05:01
Rounding off Summary
05:27
+ Spark Structured API Foundation
5 lectures 25:12
Introduction to Spark APIs
05:11
Introduction to Spark RDD API
13:13
Working with Spark SQL
02:37
Spark SQL Engine and Catalyst Optimizer
02:53
Section Summary
01:18
+ Spark Data Sources and Sinks
8 lectures 59:03
Spark Data Sources and Sinks
06:44
Spark DataFrameReader API
05:00
Reading CSV, JSON and Parquet files
07:59
Creating Spark DataFrame Schema
06:06
Spark DataFrameWriter API
06:09
Writing Your Data and Managing Layout
12:51
Spark Databases and Tables
05:33
Working with Spark SQL Tables
08:41
+ Spark Dataframe and Dataset Transformations
7 lectures 54:04
Introduction to Data Transformation
02:44
Working with Dataframe Rows
05:02
DataFrame Rows and Unit Testing
04:02
Dataframe Rows and Unstructured data
06:08
Working with Dataframe Columns
10:33
Creating and Using UDF
10:01
Misc Transformations
15:34
+ Aggregations in Apache Spark
3 lectures 18:50
Aggregating Dataframes
08:58
Grouping Aggregations
04:25
Windowing Aggregations
05:27
Requirements
  • Programming Knowledge Using Python Programming Language
  • A Recent 64-bit Windows/Mac/Linux Machine with 8 GB RAM
Description

This course does not require any prior knowledge of Apache Spark or Hadoop. We have taken enough care to explain Spark Architecture and fundamental concepts to help you come up to speed and grasp the content of this course.


About the Course

I am creating Apache Spark 3 - Spark Programming in Python for Beginners course to help you understand the Spark programming and apply that knowledge to build data engineering solutions. This course is example-driven and follows a working session like approach. We will be taking a live coding approach and explain all the needed concepts along the way.

Who should take this Course?

I designed this course for software engineers willing to develop a Data Engineering pipeline and application using the Apache Spark. I am also creating this course for data architects and data engineers who are responsible for designing and building the organization’s data-centric infrastructure. Another group of people is the managers and architects who do not directly work with Spark implementation. Still, they work with the people who implement Apache Spark at the ground level.

Spark Version used in the Course

This Course is using the Apache Spark 3.x. I have tested all the source code and examples used in this Course on Apache Spark 3.0.0 open-source distribution.

Who this course is for:
  • Software Engineers and Architects who are willing to design and develop a Bigdata Engineering Projects using Apache Spark
  • Programmers and developers who are aspiring to grow and learn Data Engineering using Apache Spark