Udemy
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
Turn what you know into an opportunity and reach millions around the world.
Learn More
Your cart is empty.
Keep shopping
Introduction to Apache Spark for Developers and Engineers
Rating: 3.8 out of 5(123 ratings)
625 students

Introduction to Apache Spark for Developers and Engineers

Basic to intermediate level introduction to Apache Spark that provides the main skills required to use the technology
Created byAdastra Academy
Last updated 9/2015
English

What you'll learn

  • Identify and understand the concepts of Big Data
  • Clearly describe Apache Spark
  • Understand and explain the various components of the Spark framework
  • Differentiate between Spark and Hadoop MapReduce
  • Download, install and use Spark on a local machine
  • Identify and understand the main Scala programming language concepts
  • Develop basic Spark applications
  • Explain and use Spark Resilient Distributed Datasets

Course content

5 sections55 lectures2h 36m total length
  • 1.1 Section 1 Introduction and topics2:24
  • 1.2 Overview of Big Data and Hadoop6:40

    This lecture discusses:

    • What big data is
    • Creation history of Hadoop
    • Overview of the MapReduce model
  • 1.3 Big Data Features and Traditional Datawarehousing Charactaristics5:40

    This lecture discusses:

    • Traditional data warehousing features
    • Big data features
  • 1.4 Use Case: Adastra's Big Data Reference Architecture2:01

    This lecture discusses:

    • How big data tools fit into an enterprise solution
  • 1.5 Section Conclusion0:46
  • 1.6 Big Data Concepts Quiz

Requirements

  • Basic understanding of Big Data concepts
  • Some understanding of a programming language such as Python, Java or Scala
  • Administrator privileges on a computer to download and install software

Description

What is Apache Spark?

Apache Spark is the next generation open source Big Data processing engine. Spark is designed to provide fast processing of large datasets and high performance for a wide range of applications. Spark enables in-memory cluster computing which greatly improves the speed of iterative algorithms and interactive data mining tasks.

Course Outcomes

'Introduction to Apache Spark' includes illuminating video lectures, practical hands-on Scala and Spark exercises, a guide to local installation of Spark, and quizzes. In this course, we guide students through:

  • An explanation of the Spark framework
  • The basics of programming in Scala, Spark's native language
  • An outline of how to work with Spark's primary abstraction, resilient distributed datasets (RDDs).

Upon completion of the course, students will be able to explain core concepts relating to Spark, understand the fundamentals of coding in Scala, and execute basic programming and data manipulation in Spark. This course will take approximately 8 hours to complete.

Recommended Experience

Programming Languages recommended for this course:

  • Scala (course exercises are in Scala)
  • Java
  • Python

Recommended for:

  • Data scientists and engineers
  • Developers
  • Individuals with a basic understanding of: Apache Hadoop, Big Data, programming languages (Scala, Java, or Python)

For students unfamiliar with Big Data and Hadoop, the course will provide a brief overview of each topic.

Why Adastra Academy?

Adastra Academy is a leading source of training and development for Information Management professionals and individuals interested in Data Management and Analytics technology. Our dedication to identifying and mastering emerging technologies guarantees our students are the first to have access to these quality courses. For an exceptional learning experience, our programs include hands-on labs and real world examples allowing students to easily apply their new knowledge.

Who this course is for:

  • Big Data Developers
  • Data Engineers
  • Big Data Consultants
  • Data Scientists