Apache Spark for Java Developers
4.6 (1,113 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
6,453 students enrolled

Apache Spark for Java Developers

Get processing Big Data using RDDs, DataFrames, SparkSQL and Machine Learning - and real time streaming with Kafka!
Bestseller
4.6 (1,113 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
6,456 students enrolled
Last updated 10/2019
English
English [Auto-generated]
Current price: $23.99 Original price: $34.99 Discount: 31% off
5 hours left at this price!
30-Day Money-Back Guarantee
This course includes
  • 21.5 hours on-demand video
  • 7 articles
  • 9 downloadable resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • Use functional style Java to define complex data processing jobs
  • Learn the differences between the RDD and DataFrame APIs
  • Use an SQL style syntax to produce reports against Big Data sets
  • Use Machine Learning Algorithms with Big Data and SparkML
  • Connect Spark to Apache Kafka to process Streams of Big Data
  • See how Structured Streaming can be used to build pipelines with Kafka
Requirements
  • Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
  • Previous knowledge of Java is assumed, but anything above the basics is explained
  • Some previous SQL will be useful for part of the course, but if you've never used it before this will be a good first experience
Description

Get started with the amazing Apache Spark parallel computing framework - this course is designed especially for Java Developers.

If you're new to Data Science and want to find out about how massive datasets are processed in parallel, then the Java API for spark is a great way to get started, fast.

All of the fundamentals you need to understand the main operations you can perform in Spark Core, SparkSQL and DataFrames are covered in detail, with easy to follow examples. You'll be able to follow along with all of the examples, and run them on your own local development computer.

Included with the course is a module covering SparkML, an exciting addition to Spark that allows you to apply Machine Learning models to your Big Data! No mathematical experience is necessary!

And finally, there's a full 3 hour module covering Spark Streaming, where you will get hands-on experience of integrating Spark with Apache Kafka to handle real-time big data streams. We use both the DStream and the Structured Streaming APIs.


Optionally, if you have an AWS account, you'll see how to deploy your work to a live EMR (Elastic Map Reduce) hardware cluster. If you're not familiar with AWS you can skip this video, but it's still worthwhile to watch rather than following along with the coding.

You'll be going deep into the internals of Spark and you'll find out how it optimizes your execution plans. We'll be comparing the performance of RDDs vs SparkSQL, and you'll learn about the major performance pitfalls which could save a lot of money for live projects.

Throughout the course, you'll be getting some great practice with Java 8 Lambdas - a great way to learn functional-style Java if you're new to it.

NOTE: Java 8 is required for the course. Spark does not currently support Java9+ (we will update when this changes) and Java 8 is required for the lambda syntax.


Who this course is for:
  • Anyone who already knows Java and would like to explore Apache Spark
  • Anyone new to Data Science who want a fast way to get started, without learning Python, Scala or R!
Course content
Expand all 143 lectures 21:42:39
+ Getting Started
2 lectures 21:41
Warning - Java 9/10/11 is not supported by Spark
00:39
+ Mapping and Outputting
4 lectures 23:55
Mapping Operations
07:00
Outputting Results to the Console
04:45
Counting Big Data Items
06:17
If you've had a "NotSerializableException" in Spark
05:53
+ Tuples
2 lectures 18:06
RDDs of Objects
08:05
Tuples and RDDs
10:01
+ PairRDDs
5 lectures 41:19
Overview of PairRDDs
08:46
Building a PairRDD
09:11
Coding a ReduceByKey
11:29
Using the Fluent API
06:45
Grouping By Key
05:08
+ Keyword Ranking Practical
3 lectures 41:25
Practical Requirements
11:35
Worked Solution
15:15
Worked Solution (continued) with Sorting
14:35
+ Sorts and Coalesce
3 lectures 29:33
Why do sorts not work with foreach in Spark?
10:31
Why Coalesce is the Wrong Solution
14:18
What is Coalesce used for in Spark?
04:44