Introduction to Apache Spark for Developers and Engineers

Name: Introduction to Apache Spark for Developers and Engineers
Rating: 3.8 (123 reviews)

Basic to intermediate level introduction to Apache Spark that provides the main skills required to use the technology

Created byAdastra Academy

Last updated 9/2015

English

What you'll learn

Identify and understand the concepts of Big Data
Clearly describe Apache Spark
Understand and explain the various components of the Spark framework
Differentiate between Spark and Hadoop MapReduce
Download, install and use Spark on a local machine
Identify and understand the main Scala programming language concepts
Develop basic Spark applications
Explain and use Spark Resilient Distributed Datasets

Course content

5 sections • 55 lectures • 2h 36m total length

1.1 Section 1 Introduction and topics2:24
1.2 Overview of Big Data and Hadoop6:40
This lecture discusses:

What big data is

Creation history of Hadoop

Overview of the MapReduce model
1.3 Big Data Features and Traditional Datawarehousing Charactaristics5:40
This lecture discusses:

Traditional data warehousing features

Big data features
1.4 Use Case: Adastra's Big Data Reference Architecture2:01
This lecture discusses:

How big data tools fit into an enterprise solution
1.5 Section Conclusion0:46
1.6 Big Data Concepts Quiz

2.1 Introduction and topcis0:33
2.2 Apache Spark Overview2:54
This lecture discusses:

What Apache Spark is

Spark programming languages

Spark's built-in libraries
2.3 Spark's History3:21
This lecture discusses:

Creation history of Spark

Spark's growth

Companies using Spark
2.4 Why Use Spark5:38
This lecture discusses:

Comparison of Spark and MapReduce

Reasons for choosing Spark
2.5 Section Conclusion0:36
2.6 Spark Concepts Quiz

3.1 Introduction and Topics0:31
3.2 Spark Deployment Modes3:49
This lecture discusses:

Spark deployment modes

Local stand-alone

Stand-alone cluster

Shared cluster
3.3 Hands-on Exercise: Installing Stand-Alone Spark0:15
3.4 Hands-on Exercise: Install Stand-Alone Spark on your computer16:00
This hands-on exercise will guide you through:

Installation of Scala

Local installation of stand-alone Apache Spark

Downloading of sample data used for course exercises
3.5 Spark Install Quiz
3.6 The Spark Framework10:08
This lecture discusses:

Cluster managers

Spark core

Built-in libraries
3.7 Spark Application Concepts6:45
This lecture discusses:

Driver program

SparkContext

Executors

Stand-alone applications
3.8 Section Conclusion0:36
3.9 Spark Infrastructure Quiz

4.1 Introduction and topics0:36
4.2 Scala Introduction & Language Features5:06
This lecture discusses:

Introduction to Scala

Scala main features
4.3 Scala Language Basics-Base Types2:17
This lecture discusses:

Scala base types
4.4 Hands-on Examples: Scala Base Types5:00
This hands-on exercise provides practice with:

Scala base types
4.5 Scala Language Basics-Operators3:36
This lecture discusses:

Scala operators
4.6 Hands-on Examples: Scala Operators5:00
This hands-on exercise provides practice with:

Scala operators
4.7 Scala Language Constructs-Variables1:52
This lecture discusses:

Variables in Scala
4.8 Hands-on Examples: Scala Variables1:00
This hands-on exercise provides practice with:

Variables in Scala
4.9 Scala Language Constructs-Variables Quiz
4.10 Scala Language Constructs-Arrays2:17
This lecture discusses:

Arrays in Scala
4.11 Hands-on Examples: Scala Arrays1:00
This hands-on exercise gives practice with:

Arrays in Scala
4.12 Scala Language Constructs-Lists2:18
This lecture discusses:

Lists in Scala
4.13 Hands-On Exercise: Scala Lists2:00
This hands-on exercise provides practice with:

Lists in Scala
4.14 Scala Language Constructs-Collections2:06
This lecture discusses:

Collections in Scala
4.15 Quiz: Scala Arrays and Lists
4.16 Scala Language Constructs-IF Expressions2:39
This lecture discusses:

Scala IF expressions
4.17 Hands-On Excercise: Scala IF Expressions1:00
This hands-on exercise provides practice with:

Scala IF expressions
4.18 Scala Language Constructs-MATCH-CASE Expressions1:28
This lecture discusses:

Scala Match-case expressions
4.19 Hands-On Excercise: Scala MATCH-CASE Expressions1:00
This hands-on exercise provides practice with:

Scala Match-case expressions
4.20 Scala Language Constructs-WHILE & FOR Loop Expressions2:09
This lecture discusses:

Scala while loop expressions

Scala for loop expressions
4.21 Hands-On Excercise: Scala WHILE & FOR Loop Expressions2:00
This hands-on exercise provides practice with:

Scala while loop expressions

Scala for loop expressions
4.22 Quiz: Scala Loops and Execution Flow
4.23 Scala Language Basics-Functions1:39
This lecture discusses:

Functions in Scala
4.24 Hands-On Excercise: Scala Functions1:00
This hands-on exercise provides practice with:

Functions in Scala
4.25 Quiz: Scala Functions: Greatest Common Divisor
4.26 Scala Language Basics-Anonymous Functions3:26
This lecture discusses:

Anonymous function in Scala
4.27 Hands-on Examples: Anonymous Functions1:00
This hands-on exercise provides practice with:

Anonymous functions in Scala
4.28 Scala Functions - Create your own function
4.29 Scala Functions - quiz solution2:00
4.30 Section Conclusion0:53

5.1 Introduction and sections1:09
5.2 Resilient Distributed Datasets-Overview3:04
This lecture discusses:

What are Resilient Distributed Datasets (RDDs)?

Why use RDDs?
5.3 Resilient Distributed Datasets10:26
This lecture discusses:

RDD Operations

Transformations

RDD Fault Tolerance

Directed Acyclic Graph

Lazy Evaluation

Actions
5.4 Hands-On Exercise: RDDs Lazy Evaluation & Actions5:00
This hand-on exercise provides practice with:

Creating RDDs

Performing transformations and actions on RDDs
5.5 RDDs Lazy Evaluation & Actions
5.6 Resilient Distributed Datasets-How to Create3:32
This lecture discusses:

RDD creation methods

Loading from an external dataset

Parallelizing an existing dataset

Creating from an existing RDD
5.7 Hands-On Exercise: Creating an RDD from a Collection1:00
This hands-on exercise provides practice with:

Creating RDDs from a collection
5.8 RDD Creation
5.9 Pair Resilient Distributed Datasets4:18
This lecture discusses several topics relating to RDD key/value pairs:

What pair RDDs are

Creating pair RDDs

Performing transformations on pair RDDs
5.10 Hands-On Exercise: Pair RDDs6:00
This hands-on exercise provides practice with:

Creating pair RDDs
5.11 Pair RDDs - Joining datasets
5.12 Resilient Distributed Datasets-Persistence4:15
This lecture discusses:

RDD persistence

cache() method

persist() method
5.13 Resilient Distributed Datasets-Shared Variables4:55
This lecture discusses:

shuffle operations

shared variables

broadcast variables

accumulator variables
5.14 Hands-on Examples: Distributed Shared Variables4:00
This hands-on exercise provides practice with:

Creating and using shared variables
5.15 "Advanced" data processing with Spark
5.16 "Advanced" data processing with Spark - quiz solution1:00
5.17 Section Conclusion1:10

Requirements

Basic understanding of Big Data concepts
Some understanding of a programming language such as Python, Java or Scala
Administrator privileges on a computer to download and install software

Description

What is Apache Spark?

Apache Spark is the next generation open source Big Data processing engine. Spark is designed to provide fast processing of large datasets and high performance for a wide range of applications. Spark enables in-memory cluster computing which greatly improves the speed of iterative algorithms and interactive data mining tasks.

Course Outcomes

'Introduction to Apache Spark' includes illuminating video lectures, practical hands-on Scala and Spark exercises, a guide to local installation of Spark, and quizzes. In this course, we guide students through:

An explanation of the Spark framework
The basics of programming in Scala, Spark's native language
An outline of how to work with Spark's primary abstraction, resilient distributed datasets (RDDs).

Upon completion of the course, students will be able to explain core concepts relating to Spark, understand the fundamentals of coding in Scala, and execute basic programming and data manipulation in Spark. This course will take approximately 8 hours to complete.

Recommended Experience

Programming Languages recommended for this course:

Scala (course exercises are in Scala)
Java
Python

Recommended for:

Data scientists and engineers
Developers
Individuals with a basic understanding of: Apache Hadoop, Big Data, programming languages (Scala, Java, or Python)

For students unfamiliar with Big Data and Hadoop, the course will provide a brief overview of each topic.

Why Adastra Academy?

Adastra Academy is a leading source of training and development for Information Management professionals and individuals interested in Data Management and Analytics technology. Our dedication to identifying and mastering emerging technologies guarantees our students are the first to have access to these quality courses. For an exceptional learning experience, our programs include hands-on labs and real world examples allowing students to easily apply their new knowledge.

Who this course is for:

Big Data Developers
Data Engineers
Big Data Consultants
Data Scientists

Introduction to Apache Spark for Developers and Engineers

What you'll learn

Explore related topics

Course content

Overview of Big Data5 lectures • 18min

What is Apache Spark5 lectures • 13min

Spark Infrastructure7 lectures • 38min

The Scala Programming Language25 lectures • 54min

Resilient Distributed Datasets13 lectures • 50min

Requirements

Description

Who this course is for: