Introduction to Apache Spark for Developers and Engineers

Basic to intermediate level introduction to Apache Spark that provides the main skills required to use the technology
4.0 (56 ratings) Instead of using a simple lifetime average, Udemy calculates a
course's star rating by considering a number of different factors
such as the number of ratings, the age of ratings, and the
likelihood of fraudulent ratings.
355 students enrolled
$19
$50
62% off
Take This Course
  • Lectures 55
  • Length 3 hours
  • Skill Level Beginner Level
  • Languages English
  • Includes Lifetime access
    30 day money back guarantee!
    Available on iOS and Android
    Certificate of Completion
Wishlisted Wishlist

How taking a course works

Discover

Find online courses made by experts from around the world.

Learn

Take your courses with you and learn anywhere, anytime.

Master

Learn and practice real-world skills and achieve your goals.

About This Course

Published 6/2015 English

Course Description

What is Apache Spark?

Apache Spark is the next generation open source Big Data processing engine. Spark is designed to provide fast processing of large datasets and high performance for a wide range of applications. Spark enables in-memory cluster computing which greatly improves the speed of iterative algorithms and interactive data mining tasks.

Course Outcomes

'Introduction to Apache Spark' includes illuminating video lectures, practical hands-on Scala and Spark exercises, a guide to local installation of Spark, and quizzes. In this course, we guide students through:

  • An explanation of the Spark framework
  • The basics of programming in Scala, Spark's native language
  • An outline of how to work with Spark's primary abstraction, resilient distributed datasets (RDDs).

Upon completion of the course, students will be able to explain core concepts relating to Spark, understand the fundamentals of coding in Scala, and execute basic programming and data manipulation in Spark. This course will take approximately 8 hours to complete.

Recommended Experience

Programming Languages recommended for this course:

  • Scala (course exercises are in Scala)
  • Java
  • Python

Recommended for:

  • Data scientists and engineers
  • Developers
  • Individuals with a basic understanding of: Apache Hadoop, Big Data, programming languages (Scala, Java, or Python)

For students unfamiliar with Big Data and Hadoop, the course will provide a brief overview of each topic.

Why Adastra Academy?

Adastra Academy is a leading source of training and development for Information Management professionals and individuals interested in Data Management and Analytics technology. Our dedication to identifying and mastering emerging technologies guarantees our students are the first to have access to these quality courses. For an exceptional learning experience, our programs include hands-on labs and real world examples allowing students to easily apply their new knowledge.

What are the requirements?

  • Basic understanding of Big Data concepts
  • Some understanding of a programming language such as Python, Java or Scala
  • Administrator privileges on a computer to download and install software

What am I going to get from this course?

  • Identify and understand the concepts of Big Data
  • Clearly describe Apache Spark
  • Understand and explain the various components of the Spark framework
  • Differentiate between Spark and Hadoop MapReduce
  • Download, install and use Spark on a local machine
  • Identify and understand the main Scala programming language concepts
  • Develop basic Spark applications
  • Explain and use Spark Resilient Distributed Datasets

What is the target audience?

  • Big Data Developers
  • Data Engineers
  • Big Data Consultants
  • Data Scientists

What you get with this course?

Not for you? No problem.
30 day money back guarantee.

Forever yours.
Lifetime access.

Learn on the go.
Desktop, iOS and Android.

Get rewarded.
Certificate of completion.

Curriculum

Section 1: Overview of Big Data
1.1 Section 1 Introduction and topics
Preview
02:24
06:40

This lecture discusses:

  • What big data is
  • Creation history of Hadoop
  • Overview of the MapReduce model
05:40

This lecture discusses:

  • Traditional data warehousing features
  • Big data features
02:01

This lecture discusses:

  • How big data tools fit into an enterprise solution
1.5 Section Conclusion
00:46
1.6 Big Data Concepts Quiz
3 questions
Section 2: What is Apache Spark
2.1 Introduction and topcis
00:33
02:54

This lecture discusses:

  • What Apache Spark is
  • Spark programming languages
  • Spark's built-in libraries
03:21

This lecture discusses:

  • Creation history of Spark
  • Spark's growth
  • Companies using Spark
05:38

This lecture discusses:

  • Comparison of Spark and MapReduce
  • Reasons for choosing Spark
2.5 Section Conclusion
00:36
2.6 Spark Concepts Quiz
5 questions
Section 3: Spark Infrastructure
3.1 Introduction and Topics
00:31
03:49

This lecture discusses:

  • Spark deployment modes
    • Local stand-alone
    • Stand-alone cluster
    • Shared cluster
3.3 Hands-on Exercise: Installing Stand-Alone Spark
00:15
16 pages

This hands-on exercise will guide you through:

  • Installation of Scala
  • Local installation of stand-alone Apache Spark
  • Downloading of sample data used for course exercises
3.5 Spark Install Quiz
1 question
10:08

This lecture discusses:

  • Cluster managers
  • Spark core
  • Built-in libraries
06:45

This lecture discusses:

  • Driver program
  • SparkContext
  • Executors
  • Stand-alone applications
3.8 Section Conclusion
00:36
3.9 Spark Infrastructure Quiz
5 questions
Section 4: The Scala Programming Language
4.1 Introduction and topics
00:36
05:06

This lecture discusses:

  • Introduction to Scala
  • Scala main features
02:17

This lecture discusses:

  • Scala base types
5 pages

This hands-on exercise provides practice with:

  • Scala base types
03:36

This lecture discusses:

  • Scala operators
5 pages

This hands-on exercise provides practice with:

  • Scala operators
01:52

This lecture discusses:

  • Variables in Scala
1 page

This hands-on exercise provides practice with:

  • Variables in Scala
4.9 Scala Language Constructs-Variables Quiz
2 questions
02:17

This lecture discusses:

  • Arrays in Scala
1 page

This hands-on exercise gives practice with:

  • Arrays in Scala
02:18

This lecture discusses:

  • Lists in Scala
2 pages

This hands-on exercise provides practice with:

  • Lists in Scala
02:06

This lecture discusses:

  • Collections in Scala
4.15 Quiz: Scala Arrays and Lists
2 questions
02:39

This lecture discusses:

  • Scala IF expressions
1 page

This hands-on exercise provides practice with:

  • Scala IF expressions
01:28

This lecture discusses:

  • Scala Match-case expressions
1 page

This hands-on exercise provides practice with:

  • Scala Match-case expressions
02:09

This lecture discusses:

  • Scala while loop expressions
  • Scala for loop expressions
2 pages

This hands-on exercise provides practice with:

  • Scala while loop expressions
  • Scala for loop expressions
4.22 Quiz: Scala Loops and Execution Flow
1 question
01:39
This lecture discusses:
  • Functions in Scala
1 page

This hands-on exercise provides practice with:

  • Functions in Scala
4.25 Quiz: Scala Functions: Greatest Common Divisor
1 question
03:26

This lecture discusses:

  • Anonymous function in Scala
1 page

This hands-on exercise provides practice with:

  • Anonymous functions in Scala
4.28 Scala Functions - Create your own function
2 questions
4.29 Scala Functions - quiz solution
2 pages
4.30 Section Conclusion
00:53
Section 5: Resilient Distributed Datasets
5.1 Introduction and sections
01:09
03:04

This lecture discusses:

  • What are Resilient Distributed Datasets (RDDs)?
  • Why use RDDs?
10:26

This lecture discusses:

  • RDD Operations
    • Transformations
      • RDD Fault Tolerance
      • Directed Acyclic Graph
      • Lazy Evaluation
    • Actions
5 pages

This hand-on exercise provides practice with:

  • Creating RDDs
  • Performing transformations and actions on RDDs
5.5 RDDs Lazy Evaluation & Actions
2 questions
03:32

This lecture discusses:

  • RDD creation methods
    • Loading from an external dataset
    • Parallelizing an existing dataset
    • Creating from an existing RDD
1 page

This hands-on exercise provides practice with:

  • Creating RDDs from a collection
5.8 RDD Creation
2 questions
04:18

This lecture discusses several topics relating to RDD key/value pairs:

  • What pair RDDs are
  • Creating pair RDDs
  • Performing transformations on pair RDDs
6 pages

This hands-on exercise provides practice with:

  • Creating pair RDDs
5.11 Pair RDDs - Joining datasets
1 question
04:15

This lecture discusses:

  • RDD persistence
    • cache() method
    • persist() method
04:55

This lecture discusses:

  • shuffle operations
  • shared variables
    • broadcast variables
    • accumulator variables
4 pages

This hands-on exercise provides practice with:

  • Creating and using shared variables
5.15 "Advanced" data processing with Spark
1 question
5.16 "Advanced" data processing with Spark - quiz solution
1 page
5.17 Section Conclusion
01:10

Students Who Viewed This Course Also Viewed

  • Loading
  • Loading
  • Loading

Instructor Biography

Adastra Academy, Emerging Data Management and Analytics Technology Educators

We're focused on the tools and technologies that matter most for today and tomorrow.

Adastra Academy is a leading source of training and development for Information Management professionals and individuals interested in Data Management and Analytics technology. Our dedication to identifying and mastering emerging technologies guarantees our students are the first to gain access to critical skills. Our programs consist of hands-on labs and real world examples allowing students to easily apply their new knowledge.

As a division of Adastra Corporation, we leverage twenty years of world-class Information Management knowledge, experience, services and solutions to fuel the Academy and to advance Information Management professionals everywhere.

Ready to start learning?
Take This Course