Spark and Python for Big Data with PySpark
4.5 (575 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
5,016 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Spark and Python for Big Data with PySpark to your Wishlist.

Add to Wishlist

Spark and Python for Big Data with PySpark

Learn how to use Spark with Python, including Spark Streaming, Machine Learning, Spark 2.0 DataFrames and more!
Best Seller
4.5 (575 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
5,016 students enrolled
Created by Jose Portilla
Last updated 7/2017
English [Auto-generated]
Curiosity Sale
Current price: $10 Original price: $195 Discount: 95% off
30-Day Money-Back Guarantee
  • 10.5 hours on-demand video
  • 6 Articles
  • 3 Supplemental Resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Use Python and Spark together to analyze Big Data
  • Learn how to use the new Spark 2.0 DataFrame Syntax
  • Work on Consulting Projects that mimic real world situations!
  • Classify Customer Churn with Logisitic Regression
  • Use Spark with Random Forests for Classification
  • Learn how to use Spark's Gradient Boosted Trees
  • Use Spark's MLlib to create Powerful Machine Learning Models
  • Learn about the DataBricks Platform!
  • Get set up on Amazon Web Services EC2 for Big Data Analysis
  • Learn how to use AWS Elastic MapReduce Service!
  • Learn how to leverage the power of Linux with a Spark Environment!
  • Create a Spam filter using Spark and Natural Language Processing!
  • Use Spark Streaming to Analyze Tweets in Real Time!
View Curriculum
  • General Programming Skills in any Language (Preferrably Python)
  • 20 GB of free space on your local computer (or alternatively a strong internet connection for AWS)

Learn the latest Big Data Technology - Spark! And learn to use it with one of the most popular programming languages, Python!

One of the most valuable technology skills is the ability to analyze huge data sets, and this course is specifically designed to bring you up to speed on one of the best technologies for this task, Apache Spark! The top technology companies like Google, Facebook, Netflix, Airbnb, Amazon, NASA, and more are all using Spark to solve their big data problems!

Spark can perform up to 100x faster than Hadoop MapReduce, which has caused an explosion in demand for this skill! Because the Spark 2.0 DataFrame framework is so new, you now have the ability to quickly become one of the most knowledgeable people in the job market!

This course will teach the basics with a crash course in Python, continuing on to learning how to use Spark DataFrames with the latest Spark 2.0 syntax! Once we've done that we'll go through how to use the MLlib Machine Library with the DataFrame syntax and Spark. All along the way you'll have exercises and Mock Consulting Projects that put you right into a real world situation where you need to use your new skills to solve a real problem!

We also cover the latest Spark Technologies, like Spark SQL, Spark Streaming, and advanced models like Gradient Boosted Trees! After you complete this course you will feel comfortable putting Spark and PySpark on your resume! This course also has a full 30 day money back guarantee and comes with a LinkedIn Certificate of Completion!

If you're ready to jump into the world of Python, Spark, and Big Data, this is the course for you!

Who is the target audience?
  • Someone who knows Python and would like to learn how to use it for Big Data
  • Someone who is very familiar with another programming language and needs to learn Spark
Students Who Viewed This Course Also Viewed
Curriculum For This Course
65 Lectures
Introduction to Course
4 Lectures 30:12

Course Overview

Frequently Asked Questions

What is Spark? Why Python?
Setting up Python with Spark
2 Lectures 06:11

Let's explain the set-up for the course!

Set-up Overview

Note on Installation Sections
Local VirtualBox Set-up
3 Lectures 31:09

Let's walk through the local installation of Ubuntu

Local Installation VirtualBox Part 1

Local Installation VirtualBox Part 2

Setting up PySpark
AWS EC2 PySpark Set-up
4 Lectures 38:58

Let's show you how to use Amazon Web Services' EC2 Instances for Spark!

AWS EC2 Set-up Guide

Creating the EC2 Instance

SSH with Mac or Linux

Installations on EC2
Databricks Setup
1 Lecture 11:41
Databricks Setup
AWS EMR Cluster Setup
1 Lecture 17:16
Python Crash Course
7 Lectures 58:50
Introduction to Python Crash Course

Jupyter Notebook Overview

Python Crash Course Part One

Python Crash Course Part Two

Python Crash Course Part Three

Python Crash Course Exercises

Python Crash Course Exercise Solutions
Spark DataFrame Basics
7 Lectures 01:04:52
Introduction to Spark DataFrames

Learn the basics of Spark DataFrames!

Spark DataFrame Basics

Spark DataFrame Basics Part Two

Learn some basic operations with Spark 2.0

Spark DataFrame Basic Operations

Groupby and Aggregate Operations

Dates and Timestamps
Spark DataFrame Project Exercise
2 Lectures 20:06
DataFrame Project Exercise

DataFrame Project Exercise Solutions
Introduction to Machine Learning with MLlib
2 Lectures 19:25
Introduction to Machine Learning and ISLR

Machine Learning with Spark and Python with MLlib
8 More Sections
About the Instructor
Jose Portilla
4.5 Average rating
54,139 Reviews
258,833 Students
13 Courses
Data Scientist

Jose Marcial Portilla has a BS and MS in Mechanical Engineering from Santa Clara University and years of experience as a professional instructor and trainer for Data Science and programming. He has publications and patents in various fields such as microfluidics, materials science, and data science technologies. Over the course of his career he has developed a skill set in analyzing data and he hopes to use his experience in teaching and data science to help other people learn the power of programming the ability to analyze data, as well as present the data in clear and beautiful visualizations. Currently he works as the Head of Data Science for Pierian Data Inc. and provides in-person data science and python programming training courses to employees working at top companies, including General Electric, Cigna, The New York Times, Credit Suisse, and many more. Feel free to contact him on LinkedIn for more information on in-person training sessions.