Apache Spark, Spark sql & Streaming basic to advance 2024

Name: Apache Spark, Spark sql & Streaming basic to advance 2024
Rating: 4.0 (21 reviews)

Spark 2.4 complete guide with practice session, Spark job Performance boost and Structured Streaming with scala basics

Created byAnshul Jain

Last updated 8/2025

English

What you'll learn

Apache Spark in Big data Ecosystem
Spark Internal Architecture
Integration of Spark with Hive Warehouse
One course to learn Spark, Spark SQL, D-Stream and Structured Streaming
Basic Scala Programming for Spark
Spark Streaming Basics with Java
Spark Performance boost technique
How to Design Project on Big Data using Spark 2.x with Hive 2.8.x
Structured Streaming with Spark 3
Code Spark or Streaming Application in eclipse and run in yarn cluster
Google cloud Big data environment Free setup for practical

Course content

4 sections • 36 lectures • 9h 19m total length

Introduction to Apache Spark4:11
Agenda of Course
What is Apache Spark7:51
What is Apache Spark ?
Why Apache Spark ?
Who uses Apache Spark?
Real Use Case of Apache Spark?
Spark on Google cloud for Free with hive ,hadoop setup in one click16:20
Get your google cloud platform free trial for first 90 days and use google cloud machines for practise.
It will not charge you , we will create account , cluster , and run spark-shell directly
Thats it setup is done.
https://console.cloud.google.com/
other option https://cloud.ibm.com/catalog
Create Big data cluster and check spark application , hadoop application , shut down cluster
Spark Working Architecture 17:07
Components in Apache Spark
Actual working of Spark engine
Spark Architecture 2: How Job executed and Different Modes of Execution10:01
Spark in local mode / standalone or client mode & cluster Mode
How Exactly a Spark job is executed in Spark Engine
Setup Spark 2.4 with Hadoop 2.x Local Machine13:57
Configure local machine having Unix /Linux or Ubuntu OS
Install Java 1.8.x
Install Hadoop 2.8.x
Install Hive 3.x or Hive 2.x
Install Spark 2.4.x
Install Scala 2.11.3
Set properties required to do a standalone setup
Spark Shell & Scala Basics for Spark15:14
Spark Shell in local mode
Spark shell in google cloud environment
Scala Basics
Spark Session & SparkContext8:24
What exactly is Spark Context and Spark Session
Tabs in Spark Job UI
Scala Basics 2 for Spark Programming23:43
For-loop , switch case , var and val in scala programming
How to read file line by line in Spark Shell using scala code
BroadCast & Accumulators in Spark using Java14:30
Special variables in spark
Broadcast and accumulator
Spark RDD , Transformation & Action18:34
What is RDD, RDD Features ?
Transformation
Action
Spark Word Count Program15:55
Logic & Demo
Map, flatMap , reduceByKey
DataSet in Spark21:18
What is Spark dataset ?
RDD vs DataFrame vs Dataset
How to analyze spark shell command on Spark UI
Read different formats of data by Spark Engine22:53
Read csv / json /xml or text data in spark
Using https://github.com/databricks/spark-xml to know more about xml file read and write operation
Spark Write data in different formats & configure Spark job parameters15:35
How to write data into json/parquet/avro/text format on local or hadoop path

Integration of Spark with Hive & Spark SQL Basics20:20
Spark config changes to add hive details
Spark SQL working architecture
Spark SQL to apply Hive Concepts20:34
CREATE HIVE TABLE with CTAS Query
PARTITION, BUCKETING, SORTING , MSCK and REFRESH Utility in Spark-SQL statement
Trick of the Day
Spark SQL on Hive Tables or data files17:57
Spark SQL and similar Dataset/Dataframe API functions
Hive opeartion on spark sql like select , filter , where , group by , order by , agg, sum, count
Spark SQL 4 Advance Window Functions in Hive21:54
How to write queries using window function like
Rank()
Dense Rank()
Row Number
Sampling of Data :
to analyze big files using Random Sampling with RAND() Function & Block Sampling using SQL
Persist & Caching of Dataset14:25
Spark Optimization technique 1 : Cache or Persist your dataframe or RDD into memory/ disk and
speedup your action
Spark SQL Join Types19:06
Different Types of Joins in Spark
Join with SQL Query and Dataset API
LEFT /RIGHT/ INNER /FULL / LEFT SEMI / LEFT ANTI Join
How to read Join DAG in Spark UI
Spark Property Details
https://spark.apache.org/docs/latest/configuration.html#viewing-spark-properties
Spark Join Techniques15:04
Sort Merge Join , BroadCast Join , Bucket Map Join on Hive Table
DAG analysis of sql queries
which Join we should prefer and what are the configuration for this
How to calculate Spark Job cluster Configuration13:05
How to calculate Spark Executor / Driver memory and cores , No. of executors ?
How to set parallelism in spark job?
Read this page after seeing video :
https://spark.apache.org/docs/latest/configuration.html#spark-properties
Setting Spark Job configuration with Spark Shell , Spark App & Spark Submit12:35
How to configure spark configuration parameters with spark-shell
Spark code application
Spark submit command
Spark Java App to Write data in Optimized Way to solve real small file problem17:18
We will create an Scala Project with Maven in Eclipse which will read any csv /json /parquet or avro data and
write it into any HDFS external table path in overwrite mode.
Writer operation will take care that every file before writing must be less or equal to 128 Mb , so that Hadoop block will not get waste with small files.

Spark Streaming Basics10:42
What is spark streaming ?
Use cases where we need to use it
How it works ?
Spark UI for Spark Streaming Job
Spark Streaming Demo Application Coding26:30
Scala code to run spark Streaming (RDD API) DStream application
Live Demo on cloud to consume HDFS text files at real time and do word Count, we will display word count on console and also redirect all output to a unix stream.log file
How to use Checkpoint in Spark
How to use broadcast and Accumulator in spark Streaming to skip few words from word count which are not relevant like is, a , ; etc...
Spark Streaming Demo running on cloud11:17
Live Demo on cloud to consume HDFS text files at real time and do word Count, we will display word count on console and also redirect all output to a unix stream.log file
How to use broadcast and Accumulator in spark Streaming to skip few words from word count which are not relevant like is, a , ; etc...
Structured Streaming Basic Concept & watermarking16:47
What is structured Streaming ?
Why and Where to use it ?
Watermarking in Spark Structured Streaming
Spark Structured Streaming Application Code & Demo19:49
Scala code application to read data of Employee from HDFS , count employee for each department and display
count department wise on console.
Design Big Data project with Spark Streaming & Hive & kafka9:50
Solve a real world problem with designing a Big data application.
Explaining a Sample Spark Application Working model of WallMart / D-Mart
Spark Streaming Quiz

Driver & Executor Tuning in Spark job12:32
What parameters to set to tune Executor
What parameters to set to tune Driver
Apache Spark Config with Different data size17:06
when we have lots of small files
when we have lots of big files
when we have single large file
Spark Internal Memory Management10:17
Spark Executor Memory understanding, how different operation like shuffle join or persist/cache use them
How to set them to avoid Out of Memory error
Apache Spark Performance boost techniques & Functions23:21
Preferring broadcast join over sort merge , HashAggregate over SortAggregate alogrithm, reduceBykey over groupByKey etc...
API for join /count / isEmpty / take / head
Spark Streaming Performance Boost Tricks13:41
Spark Streaming with HDFS as source or Kafka
Spark Structured Streaming with HDFS or Kafka
Apache spark with hadoop ecosystem

Requirements

Knowledge of basic java
Knowledge of Basic SQL

Description

I am Big Data Solution Designer in IT industry from last few years. I am adding all my learning and experience in this video series. So that you can understand working of Spark eco-system, work like a professional big data engineer and get a good job. Updated course with latest version

Benefits of this course:

Enroll into this course and get end to end knowledge of Apache Spark +Spark-SQL + Spark Streaming + Spark with Hive + Real World Use cases + Designing of Big Data project with Spark eco-system & Interview asked Use cases. This course is very rare of its kind and includes even very thin details of Spark which are not available anywhere online.

In this course you will get to understand a step by step learning of very Basic Spark to Advance Spark (which is actually used in Real-time projects) like with latest Spark version 3.x

Spark Setup , All file formats ,Hive Optimization Concepts like Partition , Bucketing , Joins , Spark Code Review like Experts : all demo / interactive sessions

Spark Google cloud account setup for hands-on over all concepts

Spark SQL Clauses : Distribute by , order by , clustered by , sort by

Scala basics Coding

Eclipse Coding Application with Java 8 as Maven Project and Spark API

Window functions like rank , row_number , dense_rank : all demo / interactive sessions

RDD , Dataset & DataFrame API

Different ways to create / insert data in Hadoop or Hive table

Spark Job Configuration Optimization

Spark Application DAG analysis and debugging using spark UI

Spark Streaming & Structured Streaming with Coding in Java

Performance Technique that big companies use to query fast on data.

This course is a full package explaining even rarely used commands and concepts in Spark. After completing this course you won't find any topic left in Spark. This course is made keeping in mind the Real Implementation of Spark in Live Projects..

Additionally ,You can download the Step Step Installation Guide (doc) to Install Scala and Apache Spark

Who this course is for:

IT Engineer want to move career into Big data technologies
Beginner in Big Data hadoop and spark
Students who want to crack Interview for Big data technologies related positions
Data Analyst who works on large data or continuous flow of data

Apache Spark, Spark sql & Streaming basic to advance 2024

What you'll learn

Explore related topics

Course content

Introduction & Environment Setup15 lectures • 3hr 36min

Spark with BigData Ecosystem10 lectures • 2hr 52min

Spark Streaming & Structured Streaming6 lectures • 1hr 35min

Professional Techniques & Concepts5 lectures • 1hr 17min

Requirements

Description

Who this course is for: