Learning Path: Big Data Analytics and Streaming with Spark 2
0.0 (0 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
0 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Learning Path: Big Data Analytics and Streaming with Spark 2 to your Wishlist.

Add to Wishlist

Learning Path: Big Data Analytics and Streaming with Spark 2

Get the most out of the trending big data framework for all your data processing needs
0.0 (0 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
0 students enrolled
Created by Packt Publishing
Last updated 8/2017
English
Curiosity Sale
Current price: $10 Original price: $200 Discount: 95% off
30-Day Money-Back Guarantee
Includes:
  • 4.5 hours on-demand video
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Introduction to Apache Hadoop and Spark
  • Understand the Spark API and its architecture
  • Find out how to load and save data in Spark
  • Write Spark application in Scala and execute it on Hadoop cluster
  • Learn to join big amounts of data
  • Implement stream processing using Apache Spark Streaming
  • Master event time and processing time
View Curriculum
Requirements
  • Some familiarity with Scala
Description

Every year we have a big increment of data that we need to store and analyze. To be able to process such amounts of data, we need to use a technology that can distribute multiple computations and make them more efficient. Apache Spark is a technology that allows us to process big data leading to faster and scalable process. If you're looking for a complete, comprehensive source on Apache Spark, then go for this Learning Path.

Packt’s Video Learning Paths are a series of individual video products put together in a logical and stepwise manner such that each video builds on the skills learned in the video before it.

The highlights of this Learning Path are:

  • Explore the Apache Spark architecture and delve into its API and key features
  • Write code that is maintainable and easy to test
  • Get to know the Apache Spark Streaming API and create jobs that analyze data in near real time.

Let’s take a quick look at your journey. This Learning Path introduce you to the various components of the Spark framework to efficiently process, analyze, and visualize data. You will learn about the Apache Spark programming fundamentals such as RDD and see which operations can be used to perform a transformation or action operation on the RDD. You will then learn how to load and save data from various data sources as different type of files, No-SQL and RDBMS databases. Moving ahead, you will explore the advanced programming concepts such as managing key-value pairs and accumulators. You'll also discover how to create an effective Spark application and execute it on Hadoop cluster to the data and gain insights to make informed business decisions.

Moving ahead, you'll learn about data mining and data cleaning, wherein we will look at the input data structure and how input data is loaded. You'll be then writing actual jobs that analyze data. You'll learn how to handle big amount of unbounded infinite streams of data. Furthermore, you'll look at common problems when processing event streams: sorting, watermarks, deduplication, and keeping state (for example, user sessions). Finally you'll implement streaming processing using Spark Streaming and analyze traffic on a web page in real time.

After completing this Learning Path, you will have a sound understanding of the Spark framework, which will help you in analyzing and processing big data.

About the Author:

We have combined the best works of the following esteemed authors to ensure that your learning journey is smooth:

Nishant Garg has over 16 years of software architecture and development experience in various technologies, such as Java Enterprise Edition, SOA, Spring, Hadoop, Hive, Flume, Sqoop, Oozie, Spark, YARN, Impala, Kafka, Storm, Solr/Lucene, NoSQL databases (such as HBase, Cassandra, and MongoDB), and MPP databases (such as GreenPlum).

He received his MS in software systems from the Birla Institute of Technology and Science, Pilani, India, and is currently working as a senior technical architect for the Big Data R&D Labs with Impetus Infotech Pvt. Ltd. Nishant has also undertaken many speaking engagements on big data technologies and is also the author of Learning Apache Kafka & HBase Essestials, Packt Publishing.

Tomasz Lelek is a software engineer, programming mostly in Java, Scala. He is a fan of microservices architecture, and functional programming. He has dedicated considerable time and effort to be better every day. He recently dived into big data technologies such as Apache Spark and Hadoop. Recently, he was a speaker at conferences in Poland - Confitura and JDD (Java Developers Day) and also at Krakow Scala User Group. He has also conducted a live coding session at Geecon Conference.

Who is the target audience?
  • This Learning Path is aimed data analyst, data scientist, or big data enthusiast who want to learn big data analytics and streaming.
Students Who Viewed This Course Also Viewed
Curriculum For This Course
45 Lectures
04:42:23
+
Apache Spark Fundamentals
19 Lectures 02:18:10

This video provides an overview of the entire course.

Preview 03:44

What are the origins of Apache Spark and what are its uses?

Spark Introduction
04:53

What are the various components in Apache Spark?

Spark Components
06:02

This video gets us familiar with the tools used in Apache spark.

Getting Started
10:41

This video explains the complete historical journey of project Nutch to Apache Hadoop—how the project Hadoop was started, what were the research papers that influenced the Spark project, and so on. In the end, various goals achieved by developing Hadoop are explained.

Preview 06:49

In this video, we are going to look at the Apache Hadoop background running JVM processes—name node, data node, resource manager, and node manager. It also provides an overview of Hadoop components—HDFS, YARN, and Map Reduce programming mode.

Hadoop Processes and Components
07:24

This video shares more details about Hadoop components Hadoop distributed filesystem—Goals, HDFS components, and the working of HDFS. It also explains another Hadoop component YARN—components, lifecycle, and its use cases.

HDFS and YARN
07:10

This video provides an overview of Map Reduce—the Hadoop programming model and its execution behavior at various stages.

Map Reduce
06:47

The aim of this video is to introduce the Scala language and its features, and by the end of this video, you should be able to get started with Scala.

Preview 07:16

The aim of this video is to explain the fundamentals of Scala Programming, such as Scala classes, fields, methods, and the different types of arguments, such as default and named arguments passed to class constructors and methods.

Scala Programming Fundamentals
07:42

The aim of this video is to explain the objects in Scala language, singleton object in Scala, and outline the usages of objects in Scala applications. It also describes companion objects.

Objects in Scala
06:22

The aim of this video is to explain the structure of the Scala collections hierarchy. Look at the examples of different collection types, such as Array, Set, and Map. It also covers how to apply functions to data in collections and outlines the basics of structural sharing.

Collections
08:36

The aim of this video is to start your learning of Apache Spark fundamentals. It introduces you to the Spark component architecture and how different components are stitched together for Spark execution.

Preview 07:39

The aim of this video is to take the first step towards Spark programming. It explains the Spark Context and also shares the need of Resilient Distributed Datasets called RDD. It also explains the execution approach change in Map Reduce due to RDD.

Understanding RDD
07:06

The aim of this video is to explain the operations that can be applied on RDDs. These operations are in the form of transformations and actions. It explains various operations under both the categories with examples.

RDD Operations
09:06

The aim of this video is to explain and demonstrate data loading and storing in Spark from different file types; such as text, CSV, JSON file, and sequence file; different filesystems, such as local filesystem, Amazon S3, and HDFS; and different databases, such as My SQL, Postgres, HBase, and so on.

Preview 10:15

The aim of this video is to explain the motivations behind key-value-based RDD and the creation of such RDDs. Next, it explains the various transformations and actions that can be applied on key-value-based RDD. Finally, it explains data partitioning techniques in Spark.

Managing Key-Value Pairs
06:56

The aim of this video is to explain a few more advance concepts, such as accumulators, broadcast variables, and passing data to external programs using pipes.

Accumulators
06:56

The aim of this video is to demonstrate the writing of Spark jobs using Eclipse-based Scala IDE, creating Spark job JAR files, and, finally, copying and executing the Spark job on Hadoop cluster.

Writing a Spark Application
06:46

Test Your Knowledge
5 questions
+
Big Data Processing using Apache Spark
13 Lectures 01:24:40

This video will an overview of entire course

Preview 01:37

In this video, we will cover the Spark Architecture.

Overview of the Apache Spark and its Architecture
11:29

This video focuses on creating a project.

Start a Project Using Apache Spark, Look at build.sbt
03:32

This video shows the installation of spark-submit on our machine.

Creating the Spark Context
07:00

In this video we will look at the API of Spark.

Looking at API of Spark
07:34

Thinking what problem we want to solve?

Preview 04:42

In this video, we will learn about Spark API to load data.

Using RDD API in the Data Mining Process
04:22

In this video, we will cover how to load input data.

Loading Input Data
04:42

In this video, we look at how to tokenizing input data

Cleaning Input Data
07:44

This video shows how to implement counting Word Logic.

Preview 07:37

In this video, we will focus on solving problems.

Using RDD API Transformations and Actions to Solve a Problem
10:23

This video shows how to write Robust Spark Test Suite.

Testing Spark Job
09:38

This video shows how to start our Apache Spark job for two text books.

Summary of Data Processing
04:20

Test Your Knowledge
5 questions
+
Real Time Streaming using Apache Spark Streaming
13 Lectures 59:33

This video provides an overview of the entire course

Preview 02:19

In this video, we will explore the Spark-Streaming Architecture and API

Introduction to Spark Streaming API
09:13

In this video, we will see how to create a project in Spark streaming

Creating a Project in Spark Streaming
05:26

In this video, we will define data Source and data sink

Defining Data Source and Data Sink
07:58

In this video, we will see how to implement testing

Creating Base for Testing Spark Streaming
03:30

In this video, we will handle Unbounded Data.

Preview 02:57

In this video, we will be use event time and processing time.

Using Event Time and Processing Time
02:49

In this video, we will sort a stream of data.

Sorting Stream Data
05:15

In this video, we will deduplicate our events.

Deduplicating Data
05:52

In this video, we will implement transformation and actual logic of our processing.

Preview 02:49

In this video, we will write test for the streaming job.

Writing Test for Steaming Job
03:05

In this video, we will create processing logic that needs to keep state of the user session.

Creating Processing Logic That Needs to Keep State of the User Session
04:42

In this video, we will summarize all the topics covered in this course.

Summary of Stream Processing
03:38

Test Your Knowledge
5 questions
About the Instructor
Packt Publishing
3.9 Average rating
7,297 Reviews
52,159 Students
616 Courses
Tech Knowledge in Motion

Packt has been committed to developer learning since 2004. A lot has changed in software since then - but Packt has remained responsive to these changes, continuing to look forward at the trends and tools defining the way we work and live. And how to put them to work.

With an extensive library of content - more than 4000 books and video courses -Packt's mission is to help developers stay relevant in a rapidly changing world. From new web frameworks and programming languages, to cutting edge data analytics, and DevOps, Packt takes software professionals in every field to what's important to them now.

From skills that will help you to develop and future proof your career to immediate solutions to every day tech challenges, Packt is a go-to resource to make you a better, smarter developer.

Packt Udemy courses continue this tradition, bringing you comprehensive yet concise video courses straight from the experts.