Udemy
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
Turn what you know into an opportunity and reach millions around the world.
Learn More
Your cart is empty.
Keep shopping
Spark Performance Tuning for Data Engineers: Part2 - Spill
Rating: 4.9 out of 5(24 ratings)
111 students
Created bySuprobho Santra
Last updated 3/2026
English

What you'll learn

  • Hands on Demo based on different Scenarios & Usecases
  • Learn the nuances of spark performance tuning
  • Get detailed insights about different operations in spark
  • Get clear understanding about how spark configs work hand in hand & best combination for optimal results
  • Learn to identify and solve bottlenecks & errors in your spark application

Course content

4 sections15 lectures3h 9m total length
  • Introduction4:37

    Linkedin - www.linkedin.com/in/suprobho-santra

  • What is Optimization5:41
  • What is Benchmarking8:27
  • Suggest for Upcoming Courses0:09

Requirements

  • Basic Spark Architecture & internals
  • Spark programming in PySpark or Scala
  • Databricks Cloud Platform

Description

Unlock the true potential of Apache Spark by mastering storage-related performance tuning techniques. This hands-on course is packed with real-world scenarios, guided demos, and practical use cases that will help you fine-tune Spark storage strategies for speed, efficiency, and scalability.


This course is perfect for Intermediate Data Engineers & Spark Developers as well as Aspiring Achitects who wants to optimize Spark jobs, reduce resource costs, and ensure fast, reliable performance for large-scale data applications.


What You’ll Learn

1. Understand how Apache Spark handles storage internally: memory vs disk

2. Learn when and how to use Spark caching and persistence effectively

3. Compare and choose the right storage levels: MEMORY_ONLY, MEMORY_AND_DISK, etc.

4. Use real-world examples and hands-on demos to benchmark storage decisions

5. Learn how to monitor storage metrics using the Spark UI

6. Handle memory spills, disk I/O bottlenecks, and storage tuning in cluster environments

7. Apply best practices for storage optimization in cloud and on-prem Spark clusters


Why Take This Course?

  • 100% Hands-on: Focused on practical implementation, not just theory

  • Designed for Data Engineers, Spark Developers, and Big Data Practitioners

  • Covers both foundational concepts and advanced tuning techniques

  • Teaches how to measure performance gains using real metrics

  • Helps you make cost-efficient decisions for big data storage


Tools & Technologies Covered

  • Apache Spark (2.x and 3.x)

  • DataBricks

  • Spark UI

  • HDFS, DataLake (for storage scenarios)

Who this course is for:

  • Data Engineers & Spark Developers as well as Aspiring Achitects curious about advanced techniques of Performance Tuning & Optimization