Apache Spark 2 for Beginners
1.0 (1 rating)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
23 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Apache Spark 2 for Beginners to your Wishlist.

Add to Wishlist

Apache Spark 2 for Beginners

Get to grips with data processing using Spark, Python and Scala. A complete beginners guide!
1.0 (1 rating)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
23 students enrolled
Created by Packt Publishing
Last updated 1/2017
English
Current price: $10 Original price: $125 Discount: 92% off
5 hours left at this price!
30-Day Money-Back Guarantee
Includes:
  • 5.5 hours on-demand video
  • 1 Supplemental Resource
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Understand the fundamentals of Apache Spark
  • Process and display data with Python and Scala
  • Stream processing, machine learning and graph processing
  • Develop a complete Spark application
View Curriculum
Requirements
  • Some exposure to Python is advantageous
  • An understanding of SQL concepts
Description

No matter where you are in your coding journey this course will get you up and running with Apache Spark, from installation and configuration to power user with 5.5 hours of top quality video tutorials.

The first chapters are a step by step guide through the fundamentals of Spark programming, covering data frames, aggregations and data sets. 

Next you'll dive into what you can do with all the data you collect using Spark, filter results with R and expose your data to Python for deeper processing and presentation using charts and graphs. After that, you go further into the capabilities of Spark's stream processing, machine learning, and graph processing libraries. 

The last chapter combines all the skills you learned from the preceding chapters to develop a real-world Spark application.By the end of this video, you will be able to consolidate data processing, stream processing, machine learning, and graph processing into one unified and highly interoperable framework with a uniform API using Scala or Python.

About The Author

Rajanarayanan Thottuvaikkatumana, Raj, is a seasoned technologist with more than 23 years of software development experience at various multinational companies. He has lived and worked in India, Singapore, and the USA, and is presently based out of the UK. His experience includes architecting, designing, and developing software applications. He has worked on various technologies including major databases, application development platforms, web technologies, and big data technologies. Since 2000, he has been working mainly in Java related technologies, and does heavy-duty server-side programming in Java and Scala. He has worked on very highly concurrent, highly distributed, and high transaction volume systems. Currently he is building a next generation Hadoop YARN-based data processing platform and an application suite built with Spark using Scala.

Raj holds one master's degree in Mathematics, one master's degree in Computer Information Systems and has many certifications in ITIL and cloud computing to his credit. Raj is the author of Cassandra Design Patterns - Second Edition, published by Packt.

When not working on the assignments his day job demands, Raj is an avid listener to classical music and watches a lot of tennis.


Who is the target audience?
  • Anyone who needs to process large amounts of data
Students Who Viewed This Course Also Viewed
Curriculum For This Course
45 Lectures
05:38:30
+
Spark Fundamentals
4 Lectures 29:21

This video gives an overview of the entire course.

Preview 04:30

This video will take you through the overview of Apache Hadoop. You will also explore the Apache Hadoop Framework and the MapReduce process. 

An Overview of Apache Hadoop
05:50

By the end of this video, you will learn in depth about Spark and its advantages. You will also go through the Spark libraries and then dive into Spark Programming Paradigm. 

Understanding Apache Spark
05:13

In this video, you will learn Python installation and also how to install R. Finally, you will be able to set up the Spark environment for your machine. 

Installing Spark on Your Machines
13:48
+
Spark Programming Model
5 Lectures 45:14

Ability to get consistent results from a program or function because of the side effect that the program logic has, which makes many applications very complex 

Preview 08:44

Learn to process data using RDDs from the relevant data source, such as text files and NoSQL data stores 

Data Transformations and Actions with RDDs
05:21

Learn to handle the tools for monitoring the jobs running in a given Spark ecosystem 

Monitoring with Spark
04:01

Ability to explain the core concepts from which the elementary data items have been picked up. 

The Basics of Programming with Spark
20:30

Ability to handle the appropriate Spark connector program to be used and the appropriate API to be used for reading data. 

Creating RDDs from Files and Understanding the Spark Library Stack
06:38
+
Spark SQL
5 Lectures 43:11

What if you could not make use of the RDD-based Spark programming model as it requires some amount of functional programming? The solution to this is Spark SQL, which you will learn in this video. 

Preview 09:38

This video will take you through the structure and internal workings of Spark SQL. 

Anatomy of Spark SQL
05:08

This video will demonstrate to you two types of DataFrame programming models, one using the SQL queries and the other usingthe DataFrameAPIs for Spark. 

DataFrame Programming
12:00

Spark SQL allows the aggregation of data. Instead of running SQL statements on a single data source located in a single machine, you can use SparkSQL to do the same on distributed data sources. 

Understanding Aggregations and Multi-Datasource Joining with SparkSQL
08:32

This video will show you the methods used to create a Dataset, along with its usage, conversion of RDD to DataFrame, and conversion of DataFrame to dataset. You will also learn the usage of Catalog API in Scala and Python. 

Introducing Datasets and Understanding Data Catalogs
07:53
+
Spark Programming with R
4 Lectures 19:59

This video will make you understand the necessity of SparkR and the basic data types in the R language. 

Preview 08:09

You may encounter several situations where you need to convert an R DataFrame to a Spark DataFrame or vice versa. Let’s see how to do it 

DataFrames in R and Spark
02:57

This video will show you how to write programs with SQL and R DataFrame APIs. 

Spark DataFrame Programming with R
04:42

In SQL, the aggregation of data is very flexible. The same thing is true in Spark SQL too. Let’s see its use and the implementation of multi-datasource joins 

Understanding Aggregations and Multi- Datasource Joins in SparkR
04:11
+
Spark Data Analysis with Python
4 Lectures 22:13

This video will walk you through the Charting and Plotting Libraries and give a brief description of the application stack. You will also learn how to set up a dataset with Spark in conjunction with Python, NumPy, SciPy, and matplotlib. 

Preview 03:59

There are several instances where you need to create various charts and plots to visually represent the various aspects of the dataset and then perform data processing, charting, and plotting. This video will enable you to do this with Spark. 

Charts, Plots, and Histograms
05:36

This video will let you explore more on the different types of charts and bars, namely Stacked Bar Chart, Donut Chart, Box Plot, and Vertical Bar Chart. So, let’s do it! 

Bar Chart and Pie Chart
07:45

Through this video, you will learn in detail about scatter plot and line graph using Spark. You will also see how to enhance scatter plot in depth. 

Scatter Plot and Line Graph
04:53
+
Spark Stream Processing
5 Lectures 52:16

Data sources generate data like a stream, and many real-world use cases require them to be processed in real time. This video will give you a deep understanding of Stream processing in Spark. 

Preview 08:36

These days, it is very common to have a central repository of application log events in many enterprises. Also, the log events are streamed live to data processing applications in order to monitor the performance of the running applications on a real-time basis. This video demonstrates the real-time processing of log events using a Spark Streaming data processing application. 

A Log Event Processor
16:22

This video will let you know the different processing options that you can pick up in Spark to work in a smart way with any data. 

Windowed Data Processing and More Processing Options
07:26

Kafka is a publish-subscribe messaging system used by many IoT applications to process a huge number of messages. Let’s see how to use it! 

Kafka Stream Processing
10:43

When a Spark Streaming application is processing the incoming data, it is very important to have an uninterrupted data processing capability so that all the data that is getting ingested is processed. This video will take you through those tasks that enable you to achieve this goal. 

Spark Streaming Jobs in Production
09:09
+
Spark Machine Learning
5 Lectures 37:03

This video will let you know the basics of machine learning and understand the ability of Spark to achieve the goals of machine learning in an efficient manner. 

Preview 06:22

By the end of this video, you will be able to perform predictions on huge data such as the Wine quality, which is a widely used data set in data analysis. 

Wine Quality Prediction and Model Persistence
10:43

Let’s use Spark to perform Wine classification by using various algorithms.

Wine Classification
05:57

Spam filtering is a very common use case that is used in many applications. It is ubiquitous in e-mail applications. It is one of the most widely used classification problems. This video will enable you to deal with this problem and show you the best approach to resolve it in Spark.

Spam Filtering
07:07

It is not very easy to get raw data in the appropriate form of features and labels in order to train the model. Through this video, you will be able to play with the raw data and use it efficiently for processing. 

Feature Algorithms and Finding Synonyms
06:54
+
Spark Graph Processing
7 Lectures 47:41

Graphs are widely used in data analysis. Let’s explore some commonly used graphs and their usage. 

Preview 04:35

Many libraries are available in the open source world. Giraph, Pregel, GraphLab, and Spark GraphX are some of them. Spark GraphX is one of the recent entrantsinto this space. Let’s dive into it! 

The Spark GraphX Library
10:08

Just like any other data structure, a graph also undergoes lots of changes because of the change in the underlying data. Let’s learn to process these changes. 

Graph Processing and Graph Structure Processing
09:44

Since the basic graph processing fundamentals are in place, now it is time to take up a real-world use case that uses graphs. Let’s take the tennis tournament's results for it. 

Tennis Tournament Analysis
05:34

When searching the web using Google, pages that are ranked highly by its algorithm are displayed. In the context of graphs, instead of web pages, if vertices are ranked based on the same algorithm, lots of new inferences can be made. Let’s jump right in and see how to do this. 

Applying PageRank Algorithm
03:30

In a graph, finding a subgraph consisting of connected vertices is a very common requirement with tremendous applications. This video will enable you to find the connected vertices, making it easy for you to work on the given data. 

Connected Component Algorithm
04:39

GraphFrames is a new graph processing library available as an external Spark package developed by Databricks. Though this video, you will learn the concepts and queries used in GraphFrames. 

Understanding GraphFrames and Its Queries
09:31
+
Designing Spark Applications
6 Lectures 41:32

Application architecture is very important for any kind of software development. Lambda Architecture is a recent and popular architecture that's ideal for developing data processing applications. Let’s dive into it! 

Preview 04:47

In the recent years, the concept of microblogging included the general public in the culture of blogging. Let’s see how we could work it and have fun! 

Micro Blogging with Lambda Architecture
07:13

Since the Lambda Architecture is a technology-agnostic architecture framework, when designing applications with it, it is imperative to capture the technology choices used in the specific implementations. This video does exactly that. 

Implementing Lambda Architecture and Working with Spark Applications
08:19

You may require using different coding styles and performing data ingestion. This video will enhance your knowledge and enable you to implement these tasks with ease. 

Coding Style, Setting Up the Source Code, and Understanding Data Ingestion
09:09

This video will show you how to create the purposed views and queries discussed in the previous videos of this section. 

Generating Purposed Views and Queries
05:53

Let’s explore custom data processes with this video!

Understanding Custom Data Processes
06:11
About the Instructor
Packt Publishing
3.9 Average rating
7,241 Reviews
51,759 Students
616 Courses
Tech Knowledge in Motion

Packt has been committed to developer learning since 2004. A lot has changed in software since then - but Packt has remained responsive to these changes, continuing to look forward at the trends and tools defining the way we work and live. And how to put them to work.

With an extensive library of content - more than 4000 books and video courses -Packt's mission is to help developers stay relevant in a rapidly changing world. From new web frameworks and programming languages, to cutting edge data analytics, and DevOps, Packt takes software professionals in every field to what's important to them now.

From skills that will help you to develop and future proof your career to immediate solutions to every day tech challenges, Packt is a go-to resource to make you a better, smarter developer.

Packt Udemy courses continue this tradition, bringing you comprehensive yet concise video courses straight from the experts.