Apache Spark : Best Practices for High Performance
2.3 (4 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
25 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Apache Spark : Best Practices for High Performance to your Wishlist.

Add to Wishlist

Apache Spark : Best Practices for High Performance

Explore more about how to improve the Spark queries to get low latency,high throughput in your application
2.3 (4 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
25 students enrolled
Created by Ashok M
Last updated 5/2017
English
Curiosity Sale
Current price: $10 Original price: $25 Discount: 60% off
30-Day Money-Back Guarantee
Includes:
  • 3 hours on-demand video
  • 1 Article
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • You will learn the best practices to be followed in Spark
  • Features of Spark2.0
  • How to improve performance of Spark sql joins
  • How to improve the performance of spark programs
  • Areas to be considered to avoid Out of memory exceptions
View Curriculum
Requirements
  • You should have basic understanding of sql
  • You should have basic knowledge of Bigdata
  • You should have basic knowledge of Spark
Description

Apache Spark is an open source framework that provides highly generalizable methods to process data in parallel. On its own, Spark is not a data storage solution. Spark can be run locally, on a single machine with a single JVM (called local mode). More often Spark is used in tandem with a distributed storage system to write the data processed with Spark (such as HDFS, Cassandra, or S3) and a cluster manager to manage the distribution of the application across the cluster. Spark currently supports three kinds of cluster managers: the manager included in Spark, called the Standalone Cluster Manager, which requires Spark to be installed in each node of a cluster, Apache Mesos; and Hadoop YARN.

Various components of spark

Spark core

Spark Sql

Spark Streaming

Spark Mlib

Spark GraphLib

Who is the target audience?
  • This course is for all who are working on Bigdata
  • This is for all Architects
  • This is for all Managers
  • This is for all students who are interested in Spark
Students Who Viewed This Course Also Viewed
Curriculum For This Course
24 Lectures
02:46:45
+
Introduction
3 Lectures 16:06
+
Spark 1.6 vs Spark 2.0
2 Lectures 10:00
+
High Performance of Apache Spark
13 Lectures 59:00

Performance -- Reducebykey vs Groupby key vs DataFrame
03:27

Avoiding Garbage collection in Spark to get more performance
04:44

How fast can Spark 1.6 sum up 1 billion numbers
01:23

How fast can Spark 2.0 sum 1 billion numbers
01:44

How fast can Spark 1.6 join 1 billion records
01:11

How fast can Spark 2.0 join 1 billion records
01:32

Broadcast Hash Join to speedup joins in Spark
03:59

Areas to be consider to avoid Out of Memory issues
03:19

Why Spark RDD is immutable
01:05

Spark with In-Memory DataGrid UseCase
09:37

Top 5 considerations in Production-1
08:55

Top 5 Considerations in Production-part2
14:22
+
Spark With Flume integration
4 Lectures 01:05:57
Flume Overview
13:01

Flume Usecases
25:38

Flume usecases -part2
14:00

Spark with Flume Integration DEMO
13:18
+
Google Cloud Platform
2 Lectures 15:41
Processing billions of records in GCP
08:41

ClickStream Data Processing Patterns using Mapreduce
07:00
About the Instructor
Ashok M
2.4 Average rating
61 Reviews
337 Students
29 Courses
Architect

I am  Reddy having 10 years of IT experience.For the last 4 years I have been working on Bigdata.
From Bigdata perspective,I had working experience on Kafka,Spark,and Hbase,cassandra,hive technologies.
And also I had working experience with AWS and Java technologies.

I have the experience in desigining and implemeting lambda architecture solutions in bigdata

Has experience in Working with Rest API and worked in various domains like financial ,insurance,manufacuring.

I am so passinate about  new technologies.


BigDataTechnologies  is a online training provider and has many experienced lecturers who will proivde excellent training.

BigDataTechnologies has extensive experience in providing training for Java,AWS,iphone,Mapredue,hive,pig,hbase,cassandra,Mongodb,spark,storm and Kafka.

From skills that will help you to develop and future proof your career to immediate solutions to every day tech challenges.

Main objective is to provide high quality content to all students