Real-World Data Science with Spark 2
3.8 (13 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
262 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Real-World Data Science with Spark 2 to your Wishlist.

Add to Wishlist

Real-World Data Science with Spark 2

Address Big Data challenges with the fast and scalable features of Spark.
3.8 (13 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
262 students enrolled
Created by Packt Publishing
Last updated 4/2017
Current price: $10 Original price: $200 Discount: 95% off
5 hours left at this price!
30-Day Money-Back Guarantee
  • 3.5 hours on-demand video
  • 13 Articles
  • 1 Supplemental Resource
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • An introduction to Big Data and data science
  • Get to know the fundamentals of Spark 2
  • Understand Spark and its ecosystem of packages in data science
  • Consolidate, clean, and transform your data acquired from various data sources
  • Unlock the capabilities of various Spark components to perform efficient data processing, machine learning, and graph processing
  • Dive deeper and explore various facets of data science with Spark
View Curriculum
  • A basic knowledge of statistics and computational mathematics
  • Prior knowledge of Python and Scala would be beneficial

Are you looking forward to expand your knowledge of performing data science operations in Spark? Or are you a data scientist who wants to understand how algorithms are implemented in Spark, or a newbie with minimal development experience and want to learn about Big Data analytics? If yes, then this course is ideal you. Let’s get on this data science journey together.

When people want a way to process Big Data at speed, Spark is invariably the solution. With its ease of development (in comparison to the relative complexity of Hadoop), it’s unsurprising that it’s becoming popular with data analysts and engineers everywhere. It is one of the most widely-used large-scale data processing engines and runs extremely fast.

The aim of the course is to make you comfortable and confident at performing real-time data processing using Spark.

What is included?

This course is meticulously designed and developed in order to empower you with all the right and relevant information on Spark. However, I want to highlight that the road ahead may be bumpy on occasions, and some topics may be more challenging than others, but I hope that you will embrace this opportunity and focus on the reward. Remember that throughout this course, we will add many powerful techniques to your arsenal that will help us solve the problems.

Let’s take a look at the learning journey. The course begins with the basics of Spark 2 and covers the core data processing framework and API, installation, and application development setup. Then, you’ll be introduced to the Spark programming model through real-world examples. Next, you’ll learn how to collect, clean, and visualize the data coming from Twitter with Spark streaming. Then, you will get acquainted with Spark machine learning algorithms and different machine learning techniques. You will also learn to apply statistical analysis and mining operations on your dataset. The course will  give you ideas on how to perform analysis including graph processing. Finally, we will take up an end-to-end case study and apply all that we have learned so far.

By the end of the course, you should be able to put your learnings into practice for faster, slicker Big Data projects.

Why should I choose this course?

Packt courses are very carefully designed to make sure that they're delivering the best learning experience possible. This course is a blend of text, videos, code examples, and quizzes, which together makes your learning journey all the more exciting and truly rewarding. This helps you learn a range of topics at your own speed and also move towards your goal of learning the technology. We have prepared this course using extensive research and curation skills. Each section adds to the skills learned and helps you to achieve mastery of Spark

This course is an amalgamation of sections that form a sequential flow of concepts covering a focused learning path presented in a modular manner. We have combined the best of the following Packt products:

  • Data Science with Spark by Eric Charles
  • Spark for Data Science by Bikramaditya Singhal and Srinivas Duvvuri
  • Apache Spark 2 for Beginners by Rajanarayanan Thottuvaikkatumana

Meet your expert instructors:

For this course, we have combined the best works of these extremely esteemed authors:

Eric Charles has 10 years of experience in the field of data science and is the founder of Datalayer, a social network for data scientists. He is passionate about using software and mathematics to help companies get insights from data.

Bikramaditya Singhal is a data scientist with about 7 years of industry experience. He is an expert in statistical analysis, predictive analytics, machine learning, Bitcoin, Blockchain, and programming in C, R, and Python. He has extensive experience in building scalable data analytics solutions in many industry sectors.

Srinivas Duvvuri is currently the senior vice president development, heading the development teams for fixed income suite of products at Broadridge Financial Solutions (India) Pvt Ltd. In addition, he also leads the Big Data and Data Science COE and is the principal member of the Broadridge India Technology Council.

Rajanarayanan Thottuvaikkatumana, Raj, is a seasoned technologist with more than 23 years of software development experience at various multinational companies. He has worked on various technologies including major databases, application development platforms, web technologies, and Big Data technologies.

Who is the target audience?
  • This course is for anyone who wants to work with Spark on large and complex datasets.
  • Data analyst, data scientists, or Big Data architects interested to explore the data processing power of Apache Spark will find this course very useful.
Compare to Other Apache Spark Courses
Curriculum For This Course
55 Lectures
Big Data and Data Science
2 Lectures 11:11

An introduction to Big Data
The Spark Programming Model
5 Lectures 33:37

Install Spark on your laptop with Docker, or scale fast in the cloud

Apache Zeppelin, a web-based notebook for Spark with matplotlib and ggplot2


Test Your Knowledge
5 questions
Spark SQL and DataFrames
2 Lectures 24:10
Understanding the structure of data and the need of Spark SQL

The DataFrame API and its operations

Test Your Knowledge
2 questions
Data Analysis on Spark
4 Lectures 49:34
Data analytics life cycle

Basics of statistics

Descriptive statistics

Inferential statistics

Test Your Knowledge
5 questions
First Step with Spark Visualization
7 Lectures 37:07
Data visualization

Manipulating data with the core RDD API

Using DataFrame, dataset, and SQL – natural and easy!

Manipulating rows and columns

Dealing with file format

Visualizing more – ggplot2, matplotlib, and Angular.js at the rescue


Test Your Knowledge
2 questions
The Spark Machine Learning Algorithms
7 Lectures 31:32
An introduction to machine learning

Discovering and spark.mllib - and other libraries

Wrapping up basic statistics and linear algebra

Cleansing data and engineering the features

Reducing the dimensionality

Pipeline for a life


Test Your Knowledge
3 questions
Collecting and Cleansing the Dirty Tweets
4 Lectures 17:26
Streaming tweets to disk

Streaming tweets on a map

Cleansing and building your reference dataset

Querying and visualizing tweets with SQL
Statistical Analysis on Tweets
4 Lectures 17:23
Indicators, correlations, and sampling

Validating statistical relevance

Running SVD and PCA

Extending the basic statistics to your needs
Extracting Features from the Tweets
4 Lectures 17:24
Analyzing free text from the tweets

Dealing with stemming, syntax, idioms, and hashtags

Detecting tweet sentiment

Identifying topics with LDA
Mine Data and Share Results
4 Lectures 16:38
Word cloudify your dataset

Locating users and displaying heatmaps with GeoHash

Collaborating on the same note with peers

Create visual dashboards for your business stakeholders

Test Your Knowledge
2 questions
5 More Sections
About the Instructor
Packt Publishing
3.9 Average rating
8,197 Reviews
58,863 Students
687 Courses
Tech Knowledge in Motion

Packt has been committed to developer learning since 2004. A lot has changed in software since then - but Packt has remained responsive to these changes, continuing to look forward at the trends and tools defining the way we work and live. And how to put them to work.

With an extensive library of content - more than 4000 books and video courses -Packt's mission is to help developers stay relevant in a rapidly changing world. From new web frameworks and programming languages, to cutting edge data analytics, and DevOps, Packt takes software professionals in every field to what's important to them now.

From skills that will help you to develop and future proof your career to immediate solutions to every day tech challenges, Packt is a go-to resource to make you a better, smarter developer.

Packt Udemy courses continue this tradition, bringing you comprehensive yet concise video courses straight from the experts.