Master Apache Spark - Hands On!
4.5 (401 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
2,744 students enrolled

Master Apache Spark - Hands On!

Learn how to slice and dice data using the next generation big data platform - Apache Spark!
4.5 (401 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
2,744 students enrolled
Created by Imtiaz Ahmad
Last updated 8/2019
English
English [Auto]
Current price: $59.99 Original price: $99.99 Discount: 40% off
2 days left at this price!
30-Day Money-Back Guarantee
This course includes
  • 7 hours on-demand video
  • 5 articles
  • 6 downloadable resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • Utilize the most powerful big data batch and stream processing engine to solve big data problems
  • Master the new Spark Java Datasets API to slice and dice big data in an efficient manner
  • Build, deploy and run Spark jobs on the cloud and bench mark performance on various hardware configurations
  • Optimize spark clusters to work on big data efficiently and understand performance tuning
  • Transform structured and semi-structured data using Spark SQL, Dataframes and Datasets
  • Implement popular Machine Learning algorithms in Spark such as Linear Regression, Logistic Regression, and K-Means Clustering
Course content
Expand all 31 lectures 06:54:56
+ Introduction
6 lectures 01:00:19
Spark High Level Components
04:04
Creating a Spark Maven Project
09:16
Import Source Code into Eclipse
05:51
First Spark Application
21:24
+ Spark Java Dataset API Basics
5 lectures 01:27:08
How to reduce logging in the console
00:27
Real World Dataframes Example
21:13
Union Dataframes and Other Set Transformations
29:15
Converting Between Datasets and Dataframes
14:31
+ Diving Deeper with Datasets, Dataframes, Transformations and the DAG
6 lectures 01:48:56
Using Datasets with User Defined POJOs
19:14
Using Datasets with Unstructured Textual Data
18:16
Joining Dataframes and Using Various Filter Transformations
23:29
Aggregation Transformations + Join Assignment
14:05
More on Transformations, Actions and the DAG
17:17
+ Running Spark Jobs on the Cloud
3 lectures 48:27
Using Spark to Analyze Reddit Comments
26:39
Running the Reddit Spark Application on an EMR Cluster
20:11
Instructions for Configuring a Spark Stand-alone Cluster
01:37
+ Spark Streaming Applications
3 lectures 42:22
Streaming Network Socket Example
21:40
Stock Market Files Streaming Example
06:23
Using Kafka with Spark Streaming
14:19
+ Machine Learning with Spark MLlib
8 lectures 01:07:44
Machine Learning Resources
00:48
Overview of Linear Regression
06:28
Spark Java Linear Regression Example
23:04
Overview of Logistic Regression
02:19
Spark Java Logistic Regression (Classification Algorithm)
16:03
Overview of K-Means Clustering
07:46
Spark Java K-Means Clustering Example
10:52
Get Access to All of my current and future courses!
00:23
Requirements
  • Some basic Java programming experience is required. A crash course on Java 8 lambdas is included
  • You will need a personal computer with an internet connection.
  • The software needed for this course is completely freely and I'll walk you through the steps on how to get it installed on your computer
Description


Apache Spark is the next generation batch and stream processing engine. It's been proven to be almost 100 times faster than Hadoop and much much easier to develop distributed big data applications with. It's demand has sky rocketed in recent years and having this technology on your resume is truly a game changer. Over 3000 companies are using Spark in production right now and the list is growing very quickly!  Some of the big names include: Oracle, Hortonworks, Cisco, Verizon, Visa, Microsoft, Amazon as well as most of the big world banks and financial institutions! 

In this course you'll learn everything you need to know about using Apache Spark in your organization while using their latest and greatest Java Datasets API.  Below are some of the things you'll learn:

  • How to develop Spark Java Applications using Spark SQL Dataframes

  • Understand how the Spark Standalone cluster works behind the scenes

  • How to use various transformations to slice and dice your data in Spark Java

  • How to marshall/unmarshall Java domain objects (pojos) while working with Spark Datasets

  • Master joins, filters, aggregations and ingest data of various sizes and file formats (txt, csv, Json etc.)

  • Analyze over 18 million real-world comments on Reddit to find the most trending words used

  • Develop programs using Spark Streaming for streaming stock market index files

  • Stream network sockets and messages queued on a Kafka cluster

  • Learn how to develop the most popular machine learning algorithms using Spark MLlib

  • Covers the most popular algorithms: Linear Regression, Logistic Regression and K-Means Clustering


You'll be developing over 15 practical Spark Java applications crunching through real world data and slicing and dicing it in various ways using several data transformation techniques. This course is especially important for people who would like to be hired as a java developer or data engineer because Spark is a hugely sought after skill. We'll even go over how to setup a live cluster and configure Spark Jobs to run on the cloud. You'll also learn about the practical implications of performance tuning and scaling out a cluster to work with big data so you'll definitely be learning a ton in this course. This course has a 30 day money back guarantee. You will have access to all of the code used in this course.


Who this course is for:
  • Anyone who is a Java developer and want's to add this seriously marketable technology on their resume
  • Anyone who wants to get into the data science field
  • Anyone who is interested in into the world of big data
  • Anyone who wants to implement machine learning algorithms in spark