Spark for Data Analysis in Scala
5.0 (2 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
21 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Spark for Data Analysis in Scala to your Wishlist.

Add to Wishlist

Spark for Data Analysis in Scala

Spark the new Data Analysis Library in Scala
5.0 (2 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
21 students enrolled
Created by Packt Publishing
Last updated 6/2017
Curiosity Sale
Current price: $10 Original price: $125 Discount: 92% off
30-Day Money-Back Guarantee
  • 2 hours on-demand video
  • 1 Supplemental Resource
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Learn to load your data in Spark
  • Work and plot with you data
  • Transform your data
  • Work with machine learning in Spark
View Curriculum
  • This friendly course takes you through the tools Apache Spark has to offer for a standard data science workflow. It is packed with step-by-step instructions and working examples. This comprehensive course is divided into clear bite size chunks so you can learn at your own pace and focus on the areas of most interest to you.

Scala has emerged as an important tool for performing various data analysis tasks efficiently. This video will help you leverage popular Scala libraries and tools to perform core data analysis tasks with ease.

This course will give you everything that you need to perform data analysis with Scala libraries. You will master loading raw datasets with Spark, and perform exploratory data analysis on them via plotting. Along the way you will learn what Spark has to offer when it comes to transforming datasets and how you can build a statistical model of a dataset with Spark.

About the Author

Anatolii Kmetiuk has been working with Scala-based technologies for four years. He has experience in Deep Learning models for text processing.

He is interested in Category Theory and Type-level programming in Scala. Another field of interest is Chaos and Complexity Theory and Artificial Life, and ways to implement them in programming languages.

Who is the target audience?
  • Data scientists, data analysts, or Scala developers who want to learn how to perform data analysis using Scala will find this video useful. You need not be an expert in Scala programming; a fundamental understanding of the language will be sufficient.
Students Who Viewed This Course Also Viewed
Curriculum For This Course
12 Lectures
Setting Up the Environment
3 Lectures 21:07

This video provides an overview of the entire course.

Preview 03:45

We need a data set to practice the skills learned in this course. We download the Houses Prices dataset from Kaggle for this.

Downloading the Competition Dataset

Spark Notebook is a convenient environment for data analysis and reproducible research. We need to install it.
Installing Spark Notebook
Loading the Data
2 Lectures 20:29

Before proceeding to load the data, we need to understand how Spark represents and handles it. This theoretical part covers it.

Preview 07:48

Now that we know the theory, we need to actually see how to load the example dataset in Spark.
Loading CSV data into DataFrame
Exploratory Data Analysis
2 Lectures 26:32
Before building a statistical model of a dataset, one must have some understanding of that dataset. This video provides tools to build a visual intuition about the data in the dataset.
Preview 10:49

Another way to draw insights from the data is to look at its statistical metrics. This video describes how to compute them with Spark.
Statistical Functions Supported by Spark
Data Processing in Spark
3 Lectures 31:24

Preprocess the data before feeding it to a ML algorithm. This video describes how to do that with standard SQL/Collections methods.

Preview 12:53

SparkSQL operations are powerful, but SparkML supports some common ML operations out of the box. Learning them may greatly reduce the work to be done.

Feature Transformers

A particular kind of operation on data that is commonly used is slicing the features (taking a subset of them) based on a predicate.
Feature Selectors
Machine Learning in Spark with House Prices
2 Lectures 25:53

Before proceeding to concrete examples of using SparkML, we need to understand its structure.

Preview 08:32

The result of data analysis is usually a model of the data in question. This video explains how to do data modeling with the ML algorithms that Spark has.
Algorithms: Linear Regression and Regression Trees
About the Instructor
Packt Publishing
3.9 Average rating
7,282 Reviews
52,028 Students
616 Courses
Tech Knowledge in Motion

Packt has been committed to developer learning since 2004. A lot has changed in software since then - but Packt has remained responsive to these changes, continuing to look forward at the trends and tools defining the way we work and live. And how to put them to work.

With an extensive library of content - more than 4000 books and video courses -Packt's mission is to help developers stay relevant in a rapidly changing world. From new web frameworks and programming languages, to cutting edge data analytics, and DevOps, Packt takes software professionals in every field to what's important to them now.

From skills that will help you to develop and future proof your career to immediate solutions to every day tech challenges, Packt is a go-to resource to make you a better, smarter developer.

Packt Udemy courses continue this tradition, bringing you comprehensive yet concise video courses straight from the experts.