Advanced Apache Spark for Data Scientists and Developers

Apache Spark
3.2 (23 ratings) Instead of using a simple lifetime average, Udemy calculates a
course's star rating by considering a number of different factors
such as the number of ratings, the age of ratings, and the
likelihood of fraudulent ratings.
181 students enrolled
$50
Take This Course
  • Lectures 71
  • Contents Video: 2.5 hours
    Other: 3 hours
  • Skill Level Intermediate Level
  • Languages English
  • Includes Lifetime access
    30 day money back guarantee!
    Available on iOS and Android
    Certificate of Completion
Wishlisted Wishlist

How taking a course works

Discover

Find online courses made by experts from around the world.

Learn

Take your courses with you and learn anywhere, anytime.

Master

Learn and practice real-world skills and achieve your goals.

About This Course

Published 12/2015 English

Course Description

Apache Spark is an open source data processing engine. Spark is designed to provide fast processing of large datasets, and high performance for a wide range of analytics applications. Unlike MapReduce, Spark enables in-memory cluster computing which greatly improves the speed of iterative algorithms and interactive data mining tasks.

Adastra Academy’s Advanced Apache Spark includes illuminating video lectures, thorough application examples, a guide to install the NetBeans Integrated Development Environment, and quizzes. Through this course, you will learn about Spark’s four built-in libraries - SparkStreaming, DataFrames (SparkSQL), MLlib and GraphX - and how to develop, build, tune, and debug Spark applications. The course exercises will enable you to become proficient at creating fully functional real-world applications using the Apache Spark libraries. Unlike other courses, we give you the guided and ground-up approach to learning Spark that you need in order to become an expert.

What are the requirements?

  • Completed a introductory Apache Spark course. Adastra Academy's Introduction to Apache Spark for Developers and Engineers recommended.
  • A beginner to intermediate understanding of the Scala programming language. Adastra Academy's Scala in Practice recommended.
  • A basic understanding of Apache Hadoop and Big Data

What am I going to get from this course?

  • Understand the functionality of Spark's four built-in libraries
  • Create real-world applications using Spark’s libraries
  • Understand how to develop, debug and optimize the performance of Spark applications

What is the target audience?

  • Data Scientists
  • Developers
  • Data Engineers

What you get with this course?

Not for you? No problem.
30 day money back guarantee.

Forever yours.
Lifetime access.

Learn on the go.
Desktop, iOS and Android.

Get rewarded.
Certificate of completion.

Curriculum

Section 1: Introduction to Advanced Apache Spark
Introduction to Apache Spark
Preview
04:19
Spark Installation
16 pages
Spark Installation Quiz
1 question
IDE Installation
14 pages
IDE Installation Quiz
1 question
Section 2: Tuning and Debugging
Introduction and Topics
Preview
00:32
Spark Configuration with SparkConf
Preview
02:48
Web User-Interface and Log Files
Preview
03:10
Data Serialization
01:56
Memory Tuning
03:36
Level of Parallelism
02:20
Section Topics
00:33
Section 3: Spark Streaming
Introduction and Topics
00:41
Overview of Spark Streaming
01:17
Linking Input Sources
00:52
Streaming Context
01:15
Discretized Streams (DStreams)
00:47
Input DStreams
02:29
Hands-on Exercise 1: Spark Streaming
11 pages
Stateless Transformations on DStreams
03:51
Stateful Transformations
03:30
Hands-on Exercise 2: Spark Streaming
6 pages
Output Operations
01:54
Hands-on Exercise 3: Spark Streaming
7 pages
Checkpointing
00:46
Caching and Persisting
00:44
Tuning and Debugging
02:28
Section Topics
00:32
Section 4: Spark SQL
Introduction to Spark SQL
00:59
Spark SQL Overview
06:48
The Spark Shell hands-on
2 pages
Hands-on Exercise 1: part a) Import CSV
30 pages
Schema Inference
06:25
Data Query Select
05:19
Data Query Select
1 question
DataFrame.Reader DataFrame.Writer
08:11
Hands-on Exercise 1: part b) Import JSON
18 pages
Data Query INNER JOINs
06:40
Data Query INNER JOINs
2 questions
Group By, Order By, Window Functions
05:41
Group By, Order By, Window Functions
2 questions
Data Query OUTER JOINs, SEMI JOIN
09:50
Data Query OUTER JOINs, SEMI JOIN
1 question
Custom UDF (User Defined Function)
04:41
Custom UDF (User Defined Function)
1 question
API or SQL?
03:43
Hands-on Exercise 2: Spark SQL
18 pages
Section 5: Spark MLlib
Introduction and Topics
00:41
Machine Learning
01:17
MLlib
02:32
Basic Statistics
01:00
Optimization
01:49
Classification
06:20
Hands-on Exercise 1: Spark MLlib: Classification
12 pages
Validation
01:07
Regression
02:18
Clustering
03:51
Hands-on Exercise 2: Spark MLlib: Clustering
12 pages
Feature Extraction and Transformation
01:00
Dimensionality Reduction
05:23
Collaborative Filtering
00:55
Evaluation Metrics
03:37
Section 6: Spark GraphX
Introduction to Spark GraphX
07:18
Graph creation examples
2 pages
Graph Operators Overview, Information about a Graph
03:18
Information about a graph example
1 page
Transform Graph Items
02:35
Transform graph items examples
1 page
Modify Graph Structure
01:24
Modify graph structure example
1 page
Graph Neighborhood Aggregations
02:30
Neighborhood Aggregations Examples
2 pages
Graph Algorithms
02:36
Triangle Count Example
1 page
Pregel- Graph Parallel Computation
02:11
Pregel Example
1 page
Optimized Graph Representation
03:00
Hands-on Exercise: Spark GraphX
23 pages

Students Who Viewed This Course Also Viewed

  • Loading
  • Loading
  • Loading

Instructor Biography

Adastra Academy, Emerging Data Management and Analytics Technology Educators

We're focused on the tools and technologies that matter most for today and tomorrow.

Adastra Academy is a leading source of training and development for Information Management professionals and individuals interested in Data Management and Analytics technology. Our dedication to identifying and mastering emerging technologies guarantees our students are the first to gain access to critical skills. Our programs consist of hands-on labs and real world examples allowing students to easily apply their new knowledge.

As a division of Adastra Corporation, we leverage twenty years of world-class Information Management knowledge, experience, services and solutions to fuel the Academy and to advance Information Management professionals everywhere.

Ready to start learning?
Take This Course