Advanced Apache Spark for Data Scientists and Developers
3.5 (54 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
484 students enrolled

Advanced Apache Spark for Data Scientists and Developers

Apache Spark
3.5 (54 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
484 students enrolled
Created by Adastra Academy
Last updated 1/2016
English
English [Auto]
Current price: $31.99 Original price: $49.99 Discount: 36% off
1 day left at this price!
30-Day Money-Back Guarantee
This course includes
  • 2.5 hours on-demand video
  • 29 downloadable resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • Understand the functionality of Spark's four built-in libraries
  • Create real-world applications using Spark’s libraries
  • Understand how to develop, debug and optimize the performance of Spark applications
Course content
Expand all 71 lectures 05:33:19
+ Introduction to Advanced Apache Spark
3 lectures 04:19
Spark Installation
16 pages
Spark Installation Quiz
1 question
IDE Installation
14 pages
IDE Installation Quiz
1 question
+ Spark Streaming
16 lectures 21:06
Introduction and Topics
00:41
Overview of Spark Streaming
01:17
Linking Input Sources
00:52
Streaming Context
01:15
Discretized Streams (DStreams)
00:47
Input DStreams
02:29
Hands-on Exercise 1: Spark Streaming
11 pages
Stateless Transformations on DStreams
03:51
Stateful Transformations
03:30
Hands-on Exercise 2: Spark Streaming
6 pages
Output Operations
01:54
Hands-on Exercise 3: Spark Streaming
7 pages
Checkpointing
00:46
Caching and Persisting
00:44
Tuning and Debugging
02:28
Section Topics
00:32
+ Spark SQL
14 lectures 58:17
Introduction to Spark SQL
00:59
Spark SQL Overview
06:48
The Spark Shell hands-on
2 pages
Hands-on Exercise 1: part a) Import CSV
30 pages
Schema Inference
06:25
Data Query Select
05:19
Data Query Select
1 question
DataFrame.Reader DataFrame.Writer
08:11
Hands-on Exercise 1: part b) Import JSON
18 pages
Data Query INNER JOINs
06:40
Data Query INNER JOINs
2 questions
Group By, Order By, Window Functions
05:41
Group By, Order By, Window Functions
2 questions
Data Query OUTER JOINs, SEMI JOIN
09:50
Data Query OUTER JOINs, SEMI JOIN
1 question
Custom UDF (User Defined Function)
04:41
Custom UDF (User Defined Function)
1 question
API or SQL?
03:43
Hands-on Exercise 2: Spark SQL
18 pages
+ Spark MLlib
15 lectures 31:50
Introduction and Topics
00:41
Machine Learning
01:17
MLlib
02:32
Basic Statistics
01:00
Optimization
01:49
Classification
06:20
Hands-on Exercise 1: Spark MLlib: Classification
12 pages
Validation
01:07
Regression
02:18
Clustering
03:51
Hands-on Exercise 2: Spark MLlib: Clustering
12 pages
Feature Extraction and Transformation
01:00
Dimensionality Reduction
05:23
Collaborative Filtering
00:55
Evaluation Metrics
03:37
+ Spark GraphX
16 lectures 24:52
Introduction to Spark GraphX
07:18
Graph creation examples
2 pages
Graph Operators Overview, Information about a Graph
03:18
Information about a graph example
1 page
Transform Graph Items
02:35
Transform graph items examples
1 page
Modify Graph Structure
01:24
Modify graph structure example
1 page
Graph Neighborhood Aggregations
02:30
Neighborhood Aggregations Examples
2 pages
Graph Algorithms
02:36
Triangle Count Example
1 page
Pregel- Graph Parallel Computation
02:11
Pregel Example
1 page
Optimized Graph Representation
03:00
Hands-on Exercise: Spark GraphX
23 pages
Requirements
  • Completed a introductory Apache Spark course. Adastra Academy's Introduction to Apache Spark for Developers and Engineers recommended.
  • A beginner to intermediate understanding of the Scala programming language. Adastra Academy's Scala in Practice recommended.
  • A basic understanding of Apache Hadoop and Big Data
Description

Apache Spark is an open source data processing engine. Spark is designed to provide fast processing of large datasets, and high performance for a wide range of analytics applications. Unlike MapReduce, Spark enables in-memory cluster computing which greatly improves the speed of iterative algorithms and interactive data mining tasks.

Adastra Academy’s Advanced Apache Spark includes illuminating video lectures, thorough application examples, a guide to install the NetBeans Integrated Development Environment, and quizzes. Through this course, you will learn about Spark’s four built-in libraries - SparkStreaming, DataFrames (SparkSQL), MLlib and GraphX - and how to develop, build, tune, and debug Spark applications. The course exercises will enable you to become proficient at creating fully functional real-world applications using the Apache Spark libraries. Unlike other courses, we give you the guided and ground-up approach to learning Spark that you need in order to become an expert.

Who this course is for:
  • Data Scientists
  • Developers
  • Data Engineers