Apache Spark with Scala By Example
3.8 (73 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
1,121 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Apache Spark with Scala By Example to your Wishlist.

Add to Wishlist

Apache Spark with Scala By Example

Advance your Spark skills and become more valuable, confident, and productive
3.8 (73 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
1,121 students enrolled
Created by Todd McGrath
Last updated 5/2016
Current price: $16 Original price: $55 Discount: 71% off
30-Day Money-Back Guarantee
  • 3 hours on-demand video
  • 6 Articles
  • 1 Supplemental Resource
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
Gain confidence and hands-on knowledge exploring, running and deploying Apache Spark
Access to numerous and wide variety of Spark with Scala, Spark SQL, Spark Streaming and Spark MLLib source code examples
Create hands-on Spark environments for experimenting with course examples
Participate in course discussion boards with instructor and other students
Know when and how Spark with Scala, Spark SQL, Spark Streaming and Spark MLLibr may be an appropriate solution
View Curriculum
  • Prior programming or scripting experience in at least one programming language is preferred, but not required.
  • If you are training for a new career or looking to advance your career
  • You are curious how and when the Apache Spark ecosystem might be beneficial for your operations or product development efforts

Understanding how to manipulate, deploy and leverage Apache Spark is quickly becoming essential for data engineers, architects, and data scientists.  So, it's time for you to stay ahead of the crowd by learning Spark with Scala from an industry veteran and nice guy. 

This course is designed to give you the core principles needed to understand Apache Spark and build your confidence through hands-on experiences. 

In this course, you’ll be guided through a wide range of core Apache Spark concepts using Scala source code examples; all of which are designed to give you fundamental, working knowledge.  Each section carefully builds upon previous sections, so your learning is reinforced along every step of the way.  

All of the source code is conveniently available for download, so you can run and modify for yourself.  

Here are just a few of concepts this course will teach you using more than 50 hands-on examples: 

  • Learn the fundamentals and run examples of Spark's Resilient Distributed Datasets, Actions and Transformations through Scala
  • Run Spark on your local cluster and also Amazon EC2
  • Troubleshooting tricks when deploying Scala applications to Spark clusters
  • Explore Spark SQL with CSV, JSON and mySQL database (JDBC) data sources
  • Discover Spark Streaming through numerous examples and build a custom application which streams from Slack
  • Hands-on machine learning experiments with Spark MLlib
  • Reinforce your understanding through multiple quizzes and lecture recap

Check out the free preview videos below!

As an added bonus, this course will teach you about Scala and the Scala ecosystem such as SBT and SBT plugins to make packaging and deploying to Spark easier and more efficient.  

As another added bonus, on top of all the extensive course content, the course offers a private message board so you can ask the instructor questions at anytime during your Spark learning journey.

This course will make you more knowledgeable about Apache Spark.  It offers you the chance to build your confidence, productivity and value in your Spark adventures. 

Who is the target audience?
  • People looking to expand their working knowledge of Apache Spark and Scala
  • A desire to learn more about the Spark ecosystem such as Spark SQL, Spark Streaming and Spark MLlib
  • Software developers wanting to expand their skills and abilities for future career growth. Spark with Scala is an in-demand skill set.
  • Anyone who suspects an on-demand Spark course with access to both source code and questions/ answers with the instructor is probably more efficient than buying a Spark book or reading blog posts
Curriculum For This Course
Expand All 45 Lectures Collapse All 45 Lectures 02:48:02
3 Lectures 03:54

Let's show and describe the structure of this Apache Spark with Scala course from a high level. 

  1. What Apache Spark topics will be covered? 
  2. Why is it structured this way? 
  3. What are the course activities and resources? 

After watching this video, you'll know how each section in this course builds upon each other.  So, as we progress through Spark Core and Spark SQL, we know these beginning sections will be relevant when learning Spark Streaming and Spark MLlib. 

Preview 02:04

Download, review and run the source code.  Customize the source code and re-run.  The way to build confidence is through doing.  

Participate in the course discussion boards.  Through discussion and collaboration, you'll have the opportunity to teach others and ask questions.  This will strengthen your Spark with Scala skills.

A note for Windows users.

Where and how to download the course source code.

How to Succeed in this Course

Provides link to download all source code used in this Apache Spark with Scala course.

Course Source Code
Introducing the Apache Spark Fundamentals
3 Lectures 13:34

Before we jump into Spark with Scala examples, let's presenting a high-level overview of the key concepts you need to know. These fundamentals will be used throughout the rest of this Spark with Scala course.

Key constructs: Resilient Distributed Datasets (RDDs), Transformations, Actions, Spark Driver programs, SparkContext and how applications deployed to a Spark cluster utilize the parallel nature of Spark.

Preview 06:05

We're going to be running many examples in this next section.  I don't expect you to follow every detail.  Rather, I just want to experience loading external data and run some simple examples of Spark Transformations and Actions. 

Preview 01:08

To begin the course, let's run some Spark code with Scala from the shell.

I don't expect you to follow all the details of this code. I just want to get us motivated to continue our Spark learning adventure.

In this example, we'll get a glimpse into Spark core concepts such as Resilient Distributed Datasets, Transformations, Actions and Spark drivers from a Scala perspective. Again, I'll fill in all the details of this Scala code in later lectures.

Let's run some Apache Spark code!

Before moving to more advanced examples, we need to ensure the Apach Spark fundamentals are understood. This quiz will ensure the student is ready to proceed.

[Milestone] Quiz - Spark Core Fundamentals
3 questions
Preparing up your Spark environment
4 Lectures 07:29

In this section of the Spark with Scala course, we'll set up and verify your Spark with Scala environment. With your own environment in place, you can choose to run the course examples and experiment with the Scala Spark API.

Preview 00:52

Walk through all steps required to setup Apache Spark on your machine.

Download and Install Spark

We need sample data to run Scala examples in the Spark Console. This lecture will prepare the Apache Spark environment for loading data and confirm the Spark console.

[Milestone] Prepare Sample Data Source and Confirm Console

Reference links used in this section of the Spark with Scala course

Setup Resources
Deeper Dive into Spark Actions and Transformations
6 Lectures 24:46

There are two kinds of Spark functions: Transformations and Actions. Transformations transform an existing RDD into a new, different one. Actions are functions used against RDDs to produce a value.

In this section of the Apache Spark with Scala course, we'll go over a variety of Spark Transformation and Action functions.

This should build your confidence and understanding of how you can apply these functions to your uses cases. It will also create more foundation for us to build upon in your journey of learning Apache Spark with Scala.

Preview 02:13

What are Spark Transformations? Let's review common Spark Transformation functions through Scala code examples.

We're going to break Apache Spark transformations into groups. In this video, we'll cover some common spark transformations which produce RDDs. These include map, flatMap, filter, etc.

We're going to use a CSV dataset of baby names in New York. As we progress through transformations and actions in this Apache Spark with Scala course, we'll determine more and more results for this sample data set.

So, let's begin with some commonly used Spark transformations.

Transformations Part 1

In part 2 of Spark Transformations, we'll discover spark transformations used when we need to combine, compare and contrast elements in two RDDs. This is something we often have to do when working with datasets. Spark helps compare RDDs through transformation functions union, intersection, distinct, etc.

Transformations Part 2

In part 3 of our focus on Spark Transformation functions were going work with the "key" functions including groupByKey, reduceByKey, aggregateByKey, sortByKey

All these transformations work with key,value pair RDDs, so we will cover the creation of PairRDDs as well.

We'll continue to use the baby_names.csv file used in Part 1 and Part 2 of Spark Transformations

Transformations Part 3

Test and confirm your knowledge of Spark Transformations.

[Milestone] Transformation Quiz
3 questions

Run and review common Spark actions. You have already seen many Spark action examples before this lecture, so we will go quickly to review.

Spark Actions produce values back to the Spark Driver program. Also, recall that Action functions called against RDD cause a previously lazy RDD to be evaluated. So, in the real world when working with large datasets, we need to be careful when triggering RDDs to be evaluated through Spark actions.

This video shows commonly used Spark Actions.


Test and confirm knowledge of Spark Actions.

[Milestone] Actions Quiz
2 questions

Links to conveniently download the Spark source code examples presented in this section of the course.  Also, links to the latest programming guides for SparkTransformations and Actions is included. 

Transformations and Actions Source Code and Programming Guides
Utilizing Clusters with Apache Spark
7 Lectures 26:31

Clusters allow Spark to processes huge volumes of data by distributing the workload across multiple nodes. This is also referred to as "running in parallel" or "horizontal scaling"

A cluster manager is required to Spark on a cluster. Spark supports 3 types of cluster managers including Apache YARN, Apache Mesos and an internal cluster manager distributed with Spark called Standalone.

Preview 03:37

Let's run a Spark Standalone cluster within your environment. We'll start a Spark Master and one Spark worker. We'll introduce the Spark UI web console.

Run Standalone Cluster

Setup, compile and package a Scala Spark program using `sbt`.  `sbt` is short for "simple build tool" and is most often used in Scala based projects. 

This is easy example to ensure you're ready for more advanced build and cluster deploys later in this Apache Spark with Scala course.  

[Milestone] Deploy a Scala Program to a Cluster

Let's configure an Apache Spark cluster running on two instances of Amazon EC2.

Create an Amazon EC2 Based Cluster Part 1

Before the EC2 cluster is ready to use from local running shell, we need to open port 7077.
Create an Amazon EC2 Based Cluster Part 2

Review key takeaways from this section on Spark running in a cluster and deploying a Scala based Spark program to the cluster.

[Milestone] Cluster Section Recap

To reinforce the key takeaways from the Cluster section of the course

Cluster Section Quiz
4 questions

Convenient link to download all source code used in this section

Cluster Section Resources
Spark SQL
6 Lectures 31:42

Spark SQL background, key concepts and high-level examples of CSV, JSON and mySQL (JDBC) data sources. This lecture lays the groundwork for next lectures in this course section. It provides overview examples and common patterns of Spark SQL from a Scala perspective.

Preview 03:20

Spark SQL uses a type of Resilient Distributed Dataset called DataFrames which are composed of Row objects accompanied with a schema. The schema describes the data types of each column. A DataFrame may be considered similar to a table in a traditional relational database.


We’re going to use the baby names dataset and the spark-csv package available from Spark Packages to make our lives easier. The spark-csv package is described as a “library for parsing and querying CSV data with Apache Spark, for Spark SQL and DataFrames” This library is compatible with Spark 1.3 and above.

Spark SQL with CSV source

Let's load a JSON input source to Spark SQL’s SQLContext. This Spark SQL JSON with Scala portion of the course has two parts. The first part shows examples of JSON input sources with a specific structure. The second part warns you of something you might not expect when using Spark SQL with JSON data source.


We are going to use two JSON inputs. We’ll start with a simple, trivial example and then move to an analysis of more realistic JSON example.

Spark SQL with JSON source

Now that we have Spark SQL experience with CSV and JSON, connecting and using a mySQL database will be easy. So, let’s cover how to use Spark SQL with Scala and a mySQL database input data source.


We’re going to load data into a database. Then, we’re going to fire up spark-shell with a command line argument to specifiy the JDBC driver needed to connect to the JDBC data source. We’ll make sure we can authenticate and then start running some queries.

Spark SQL with mySQL (JDBC) source

Earlier in the course, we performed a simple deploy to an Apache Spark Cluster.  Let's build upon the simple example and deploy our Spark SQL code examples.    

Deploying the Spark SQL examples introduces a new challenge.  How do we deploy when our application uses 3rd party libraries such as CSV parsing and JDBC drivers?

[Milestone] Spark SQL Deploying to a Spark Cluster

Links to download Spark SQL code examples and videos on setting up mySQL

Spark SQL Section Resources
1 page
Spark Streaming
9 Lectures 34:27

Spark Streaming introduction, key concepts and our approach for learning Apache Spark Streaming through examples and building our own application streaming.

Preview 03:08

Present an overview of the lessons contained in this Spark Streaming section.  For some of you, you may be able to skip the first two examples and move to a more complex Spark Streaming custom application.  

Spark Streaming Overview

To ensure your environment is ready for more complex Spark Streaming examples, let's run through a trivial example.  This is a word count example which streams for the netcat utility found on Linux and Mac.  For windows users, check https://nmap.org/ncat/ which may be used to run this example.

Spark Streaming Example Part 1

Let's continue to take one step at a time as we are learning Spark Streaming.  In this example, we will build and deploy a spark streaming application to a Spark cluster.

Spark Streaming Example Part 2

This video demonstrates our custom Spark Streaming application and how you can configure Slack to stream your own channel content.  

I think it's important to show you running example of Spark Streaming application a 

Spark Streaming Application - Streaming from Slack

Spark Streaming example code review.  Answers the questions -- how do I write my own custom receiver and how did the Slack Spark Streaming example work?

Spark Streaming Custom Example Code Review

Our Spark Streaming with Slack program contains 3rd party libraries.  As we've seen previously in the course, we can use the sbt-assembly plugin to make "fat jars" for Spark Driver programs using 3rd party libraries.

But, what happens when things do not deploy according to plan?  

In this video, we'll cover three advanced issues when deploying to a Spark Cluster and how to address.  

1) What happens if your Spark Driver program is compiled to Scala 2.11, but you are deploying to Spark compiled to Scala 2.10?

2) What happens if your 3rd party library conflicts with your Spark Cluster? 

3) What to do if your Spark Cluster uses a jar which is older and incompatible with a jar needed by your driver program?

[Advanced] Spark Streaming Deploy to Cluster Introduction

In this video, we'll cover three advanced issues when deploying to a Spark Cluster and how to address.  

1) What happens if your Spark Driver program is compiled to Scala 2.11, but you are deploying to Spark compiled to Scala 2.10?

2) What happens if your 3rd party library conflicts with your Spark Cluster? 

3) What to do if your Spark Cluster uses a jar which is older and incompatible with a jar needed by your driver program?

[Milestone] Advanced Spark Deploy Troubleshooting and Tactics

A list of resources used in this Spark Streaming section of the Apache Spark course tutorials 

Spark Streaming Resources
Spark Machine Learning
5 Lectures 23:00

Machine Learning is an exciting and growing topic of interest these days.  Let's start this section on Spark MLlib with a background on Machine Learning.  

Afterwards, we'll have a foundation of machine learning concepts when we run demos and review source code in later videos in this Spark MLlib section.

Preview 05:05

In this video, let's run a demo of a custom Spark MLlib based program so we have some context when reviewing the source code later in the course.

In this demo, we'll train our machine learning model.  Then, we'll use the trained model to make predictions on an incoming data stream.  

That's right, we're going to make machine learning based predictions on data arriving from a Spark Streaming source.  

Should be fun :)

Machine Learning Demonstration - Running our Custom Machine Learning Code

Review of Spark MLlib based source code from the demo of the near real-time machine learning prediction model.  The model used a Spark Streaming data source which will also be analyzed.  

The code has tons of comments in it to help.  Also, the source code is available for students to download from the course repository.

[Milestone] Source Code Review of Custom Spark MLlib Example Application

Up to now, we've seen a machine learning demo of near real-time prediction of stream data and we've reviewed the custom demo code.   

So, now let's cover aspects of machine learning specific to Spark MLlib.

Spark MLlib Overview

A suggested list of free resources for machine learning and Spark MLlib.

Spark Machine Learning (MLlib) Resources
Conclusion and Suggested Next Steps
2 Lectures 01:52

Conclusion of version 2 of the Apache Spark with Scala course.  We review the content of version 2 of this course, suggested next steps and ask for ideas for version 3 of the Apache Spark with Scala course.

Version 1 major release:  End of January 2015

- Spark Core and Clustering

Version 1.1, 1.2, 1.3 minor releases: February, March 2016

- Section introductions

- Add more resources to each section

- Spark SQL section

Version 2 major release: May 2016

- Spark Streaming

- Spark machine learning with Spark MLlib

Conclusion v2

Bonus lecture with access to free Spark learning resources, course coupons, tutorials and free software development, data engineering and data science books.

Bonus Lecture: Free Resources, Coupons and More
About the Instructor
Todd McGrath
4.0 Average rating
139 Reviews
1,585 Students
2 Courses
Data Engineer, Software Developer, Mentor

Todd has an extensive and proven track record in software development leadership and building solutions for the world's largest brands and Silicon Valley startups.

His courses are taught using the same skills used in his consulting and mentoring projects.  Todd believes the only way to gain confidence and become productive is to be hands-on through examples.  Each new subject should build upon previous examples or presentation, so each step is also a way to reemphasis a prior topic.

To learn more about Todd, visit his LinkedIn profile.