Udemy
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
Turn what you know into an opportunity and reach millions around the world.
Learn More
Your cart is empty.
Keep shopping
Data Engineering Vol2 AWS : Data Processing - Spark & Kafka
Highest Rated
Rating: 4.9 out of 5(14 ratings)
74 students
Created bySoumyadeep Dey
Last updated 2/2026
English

What you'll learn

  • Deep dive on Spark and Kafka using AWS EMR, Databricks, MSK
  • Understand Data Engineering (Volume 2) on AWS using Spark and Kafka
  • Batch and Stream processing using Spark and Kafka
  • Production level projects and hands-on to help candidates provide on-job-like training
  • Get access to datasets of size 100 GB - 200 GB and practice using the same
  • Learn Python for Data Engineering with HANDS-ON (Functions, Arguments, OOP (class, object, self), Modules, Packages, Multithreading, file handling etc.
  • Learn SQL for Data Engineering with HANDS-ON (Database objects, CASE, Window Functions, CTE, CTAS, MERGE, Materialized View etc.)
  • AWS Data Analytics services - S3, EMR, Databricks, MSK

Course content

20 sections193 lectures47h 25m total length
  • Introduction - Data, Data Lifecycle & Data Engineering Pipeline26:28
  • Data Engineering Volume 2 Course & Projects Overview, Roles in Data25:18
  • AWS Resource Cost for the Course6:25

Requirements

  • Good to have AWS and SQL knowledge

Description

This is Volume 2 of Data Engineering course. In this course I will talk about Open Source Data Processing technologies -  Spark and Kafka, which are the most used and most popular data processing frameworks for Batch & Stream Processing. In this course you will learn Spark from Level 100 to Level 400 with real-life hands on and projects. I will also introduce you to Data Lake on AWS (that is S3) & Data Lakehouse using Apache Iceberg.


I will use AWS as the hosting platform and talk about AWS Services - EMR, S3 and MSK. I will cover Databricks as Spark hosting platform. I will also show you Spark integration with other services like AWS RDS (MySQL or PostgreSQL) and Redshift.


You will get opportunities to do hands-on using large datasets (100 GB - 300 GB or more of data). This course will provide you hands-on exercises that match with real-time scenarios like Spark batch processing, stream processing, performance tuning, streaming ingestion, Window functions, ACID transactions on Iceberg etc. 

Some other highlights:

  • 10 Projects with different datasets. Total dataset size of 250 GB or more.

  • Other technologies covered - EC2, EBS, VPC and IAM.

  • Optional Python videos

  • Optional AWS and SQL Essentials videos


I will conclude the Data Engineering course with Volume 3, in which, I will be covering the following Topics.

  • Flink

  • Apache Airflow

  • Apache Pinot

  • AWS Kinesis

Please provide feedback and suggestions if you want me to add any other topics.

Who this course is for:

  • Python developers, Application Developers, Big Data Developers
  • Data Engineers, Data Scientists, Data Analysts
  • Database Administrators, Big Data Administrators
  • Data Engineering Aspirants
  • Solutions Architect, Cloud Architect, Big Data Architect
  • Technical Managers, Engineering Managers, Project Managers