Udemy
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
Development
Web Development Data Science Mobile Development Programming Languages Game Development Database Design & Development Software Testing Software Engineering Development Tools No-Code Development
Business
Entrepreneurship Communications Management Sales Business Strategy Operations Project Management Business Law Business Analytics & Intelligence Human Resources Industry E-Commerce Media Real Estate Other Business
Finance & Accounting
Accounting & Bookkeeping Compliance Cryptocurrency & Blockchain Economics Finance Finance Cert & Exam Prep Financial Modeling & Analysis Investing & Trading Money Management Tools Taxes Other Finance & Accounting
IT & Software
IT Certification Network & Security Hardware Operating Systems Other IT & Software
Office Productivity
Microsoft Apple Google SAP Oracle Other Office Productivity
Personal Development
Personal Transformation Personal Productivity Leadership Career Development Parenting & Relationships Happiness Esoteric Practices Religion & Spirituality Personal Brand Building Creativity Influence Self Esteem & Confidence Stress Management Memory & Study Skills Motivation Other Personal Development
Design
Web Design Graphic Design & Illustration Design Tools User Experience Design Game Design Design Thinking 3D & Animation Fashion Design Architectural Design Interior Design Other Design
Marketing
Digital Marketing Search Engine Optimization Social Media Marketing Branding Marketing Fundamentals Marketing Analytics & Automation Public Relations Advertising Video & Mobile Marketing Content Marketing Growth Hacking Affiliate Marketing Product Marketing Other Marketing
Lifestyle
Arts & Crafts Beauty & Makeup Esoteric Practices Food & Beverage Gaming Home Improvement Pet Care & Training Travel Other Lifestyle
Photography & Video
Digital Photography Photography Portrait Photography Photography Tools Commercial Photography Video Design Other Photography & Video
Health & Fitness
Fitness General Health Sports Nutrition Yoga Mental Health Dieting Self Defense Safety & First Aid Dance Meditation Other Health & Fitness
Music
Instruments Music Production Music Fundamentals Vocal Music Techniques Music Software Other Music
Teaching & Academics
Engineering Humanities Math Science Online Education Social Science Language Teacher Training Test Prep Other Teaching & Academics
AWS Certification Microsoft Certification AWS Certified Solutions Architect - Associate AWS Certified Cloud Practitioner CompTIA A+ Cisco CCNA Amazon AWS CompTIA Security+ AWS Certified Developer - Associate
Photoshop Graphic Design Adobe Illustrator Drawing Digital Painting InDesign Character Design Canva Figure Drawing
Life Coach Training Neuro-Linguistic Programming Mindfulness Personal Development Meditation Personal Transformation Life Purpose Neuroscience Emotional Intelligence
Web Development JavaScript React CSS Angular PHP WordPress Node.Js Python
Google Flutter Android Development iOS Development Swift React Native Dart Programming Language Mobile Development Kotlin SwiftUI
Digital Marketing Google Ads (Adwords) Social Media Marketing Google Ads (AdWords) Certification Marketing Strategy Internet Marketing YouTube Marketing Email Marketing Retargeting
SQL Microsoft Power BI Tableau Business Analysis Business Intelligence MySQL Data Analysis Data Modeling Data Science
Business Fundamentals Entrepreneurship Fundamentals Business Strategy Online Business Business Plan Startup Freelancing Blogging Home Business
Unity Game Development Fundamentals Unreal Engine C# 3D Game Development C++ 2D Game Development Unreal Engine Blueprints Blender
30-Day Money-Back Guarantee
Development Data Science Apache Spark

Taming Big Data with Apache Spark and Python - Hands On!

Apache Spark tutorial with 20+ hands-on examples of analyzing large data sets on your desktop or on Hadoop with Python!
Bestseller
Rating: 4.5 out of 54.5 (10,016 ratings)
56,605 students
Created by Sundog Education by Frank Kane, Frank Kane
Last updated 2/2021
English
English [Auto], French [Auto], 
30-Day Money-Back Guarantee

What you'll learn

  • Use DataFrames and Structured Streaming in Spark 3
  • Frame big data analysis problems as Spark problems
  • Use Amazon's Elastic MapReduce service to run your job on a cluster with Hadoop YARN
  • Install and run Apache Spark on a desktop computer or on a cluster
  • Use Spark's Resilient Distributed Datasets to process and analyze large data sets across many CPU's
  • Implement iterative algorithms such as breadth-first-search using Spark
  • Use the MLLib machine learning library to answer common data mining questions
  • Understand how Spark SQL lets you work with structured data
  • Understand how Spark Streaming lets your process continuous streams of data in real time
  • Tune and troubleshoot large jobs running on a cluster
  • Share information between nodes on a Spark cluster using broadcast variables and accumulators
  • Understand how the GraphX library helps with network analysis problems
Curated for the Udemy for Business collection

Course content

8 sections • 64 lectures • 6h 54m total length

  • Preview01:46
  • How to Use This Course
    01:41
  • Udemy 101: Getting the Most From This Course
    02:10
  • [Activity]Getting Set Up: Installing Python, a JDK, Spark, and its Dependencies.
    14:42
  • Alternate MovieLens download location
    00:05
  • [Activity] Installing the MovieLens Movie Rating Dataset
    03:35
  • Preview06:12

  • Preview06:48
  • Introduction to Spark
    10:11
  • The Resilient Distributed Dataset (RDD)
    12:35
  • Ratings Histogram Walkthrough
    13:27
  • Key/Value RDD's, and the Average Friends by Age Example
    Preview16:08
  • Preview05:40
  • Filtering RDD's, and the Minimum Temperature by Location Example
    08:11
  • [Activity]Running the Minimum Temperature Example, and Modifying it for Maximums
    05:06
  • [Activity] Running the Maximum Temperature by Location Example
    03:19
  • [Activity] Counting Word Occurrences using flatmap()
    07:24
  • [Activity] Improving the Word Count Script with Regular Expressions
    04:42
  • Preview07:46
  • [Exercise] Find the Total Amount Spent by Customer
    04:01
  • [Excercise] Check your Results, and Now Sort them by Total Amount Spent.
    05:09
  • Check Your Sorted Implementation and Results Against Mine.
    02:44

  • Introducing SparkSQL
    09:29
  • [Activity] Executing SQL commands and SQL-style functions on a DataFrame
    07:51
  • Preview07:39
  • [Exercise] Friends by Age, with DataFrames
    01:45
  • Exercise Solution: Friends by Age, with DataFrames
    07:54
  • [Activity] Word Count, with DataFrames
    09:37
  • [Activity] Minimum Temperature, with DataFrames (using a custom schema)
    10:27
  • [Exercise] Implement Total Spent by Customer with DataFrames
    02:08
  • Exercise Solution: Total Spent by Customer, with DataFrames
    04:07

  • [Activity] Find the Most Popular Movie
    04:16
  • [Activity] Use Broadcast Variables to Display Movie Names Instead of ID Numbers
    10:34
  • Preview03:15
  • [Activity] Run the Script - Discover Who the Most Popular Superhero is!
    08:00
  • [Exercise] Find the Most Obscure Superheroes
    02:16
  • Exercise Solution: Most Obscure Superheroes
    04:13
  • Superhero Degrees of Separation: Introducing Breadth-First Search
    07:56
  • Superhero Degrees of Separation: Accumulators, and Implementing BFS in Spark
    06:44
  • [Activity] Superhero Degrees of Separation: Review the Code and Run it
    09:35
  • Item-Based Collaborative Filtering in Spark, cache(), and persist()
    06:00
  • Preview13:43
  • [Exercise] Improve the Quality of Similar Movies
    03:05

  • Introducing Elastic MapReduce
    05:09
  • [Activity] Setting up your AWS / Elastic MapReduce Account and Setting Up PuTTY
    09:58
  • Partitioning
    04:21
  • Create Similar Movies from One Million Ratings - Part 1
    05:10
  • Preview11:26
  • Create Similar Movies from One Million Ratings - Part 3
    03:30
  • Troubleshooting Spark on a Cluster
    03:43
  • More Troubleshooting, and Managing Dependencies
    06:02

  • Introducing MLLib
    06:04
  • [Activity] Using Spark ML to Produce Movie Recommendations
    09:54
  • Analyzing the ALS Recommendations Results
    04:12
  • Preview13:25
  • [Exercise] Using Decision Trees in Spark ML to Predict Real Estate Prices
    05:34
  • Exercise Solution: Decision Trees with Spark
    06:19

  • Spark Streaming
    08:04
  • Preview08:47
  • [Exercise] Use Windows with Structured Streaming to Track Most-Viewed URL's
    05:49
  • Exercise Solution: Using Structured Streaming with Windows
    06:37
  • GraphX
    02:11

  • Learning More about Spark and Data Science
    03:43
  • Bonus Lecture: More courses to explore!
    00:44

Requirements

  • Access to a personal computer. This course uses Windows, but the sample code will work fine on Linux as well.
  • Some prior programming or scripting experience. Python experience will help a lot, but you can pick it up as we go.

Description

New! Updated for Spark 3, more hands-on exercises, and a stronger focus on DataFrames and Structured Streaming.

“Big data" analysis is a hot and highly valuable skill – and this course will teach you the hottest technology in big data: Apache Spark. Employers including Amazon, EBay, NASA JPL, and Yahoo all use Spark to quickly extract meaning from massive data sets across a fault-tolerant Hadoop cluster. You'll learn those same techniques, using your own Windows system right at home. It's easier than you might think.

Learn and master the art of framing data analysis problems as Spark problems through over 20 hands-on examples, and then scale them up to run on cloud computing services in this course. You'll be learning from an ex-engineer and senior manager from Amazon and IMDb.

  • Learn the concepts of Spark's DataFrames and Resilient Distributed Datastores

  • Develop and run Spark jobs quickly using Python

  • Translate complex analysis problems into iterative or multi-stage Spark scripts

  • Scale up to larger data sets using Amazon's Elastic MapReduce service

  • Understand how Hadoop YARN distributes Spark across computing clusters

  • Learn about other Spark technologies, like Spark SQL, Spark Streaming, and GraphX

By the end of this course, you'll be running code that analyzes gigabytes worth of information – in the cloud – in a matter of minutes. 

This course uses the familiar Python programming language; if you'd rather use Scala to get the best performance out of Spark, see my "Apache Spark with Scala - Hands On with Big Data" course instead.

We'll have some fun along the way. You'll get warmed up with some simple examples of using Spark to analyze movie ratings data and text in a book. Once you've got the basics under your belt, we'll move to some more complex and interesting tasks. We'll use a million movie ratings to find movies that are similar to each other, and you might even discover some new movies you might like in the process! We'll analyze a social graph of superheroes, and learn who the most “popular" superhero is – and develop a system to find “degrees of separation" between superheroes. Are all Marvel superheroes within a few degrees of being connected to The Incredible Hulk? You'll find the answer.

This course is very hands-on; you'll spend most of your time following along with the instructor as we write, analyze, and run real code together – both on your own system, and in the cloud using Amazon's Elastic MapReduce service. 7 hours of video content is included, with over 20 real examples of increasing complexity you can build, run and study yourself. Move through them at your own pace, on your own schedule. The course wraps up with an overview of other Spark-based technologies, including Spark SQL, Spark Streaming, and GraphX.

Wrangling big data with Apache Spark is an important skill in today's technical world. Enroll now!

  • " I studied "Taming Big Data with Apache Spark and Python" with Frank Kane, and helped me build a great platform for Big Data as a Service for my company. I recommend the course!  " - Cleuton Sampaio De Melo Jr.

Who this course is for:

  • People with some software development background who want to learn the hottest technology in big data analysis will want to check this out. This course focuses on Spark from a software development standpoint; we introduce some machine learning and data mining concepts along the way, but that's not the focus. If you want to learn how to use Spark to carve up huge datasets and extract meaning from them, then this course is for you.
  • If you've never written a computer program or a script before, this course isn't for you - yet. I suggest starting with a Python course first, if programming is new to you.
  • If your software development job involves, or will involve, processing large amounts of data, you need to know about Spark.
  • If you're training for a new career in data science or big data, Spark is an important part of it.

Featured review

Amiri McCain
Amiri McCain
70 courses
20 reviews
Rating: 5.0 out of 512 months ago
Great course to get you going with Apache Spark and Python! Frank's delivery is very thorough yet unpretentious; his explanations for each new concept that he introduces is down to earth and easy to follow. I took a similar course as part of Udacity's Data Engineering Nanodegree program and I am glad that I took Frank's course beforehand because Udacity's way of explaining it and the lessons they provided were unclear and confusing.

Instructors

Sundog Education by Frank Kane
Founder, Sundog Education. Machine Learning Pro
Sundog Education by Frank Kane
  • 4.5 Instructor Rating
  • 98,353 Reviews
  • 441,886 Students
  • 22 Courses

Sundog Education's mission is to make highly valuable career skills in big data, data science, and machine learning accessible to everyone in the world. Our consortium of expert instructors shares our knowledge in these emerging fields with you, at prices anyone can afford. 

Sundog Education is led by Frank Kane and owned by Frank's company, Sundog Software LLC. Frank spent 9 years at Amazon and IMDb, developing and managing the technology that automatically delivers product and movie recommendations to hundreds of millions of customers, all the time. Frank holds 17 issued patents in the fields of distributed computing, data mining, and machine learning. In 2012, Frank left to start his own successful company, Sundog Software, which focuses on virtual reality environment technology, and teaching others about big data analysis.

Due to our volume of students we are unable to respond to private messages; please post your questions within the Q&A of your course. Thanks for understanding.

Frank Kane
Founder, Sundog Education
Frank Kane
  • 4.5 Instructor Rating
  • 94,896 Reviews
  • 397,621 Students
  • 14 Courses

Frank spent 9 years at Amazon and IMDb, developing and managing the technology that automatically delivers product and movie recommendations to hundreds of millions of customers, all the time. Frank holds 17 issued patents in the fields of distributed computing, data mining, and machine learning. In 2012, Frank left to start his own successful company, Sundog Software, which focuses on virtual reality environment technology, and teaching others about big data analysis.

Due to our volume of students, I am unable to respond to private messages; please post your questions within the Q&A of your course. Thanks for understanding.

  • Udemy for Business
  • Teach on Udemy
  • Get the app
  • About us
  • Contact us
  • Careers
  • Blog
  • Help and Support
  • Affiliate
  • Terms
  • Privacy policy
  • Cookie settings
  • Sitemap
  • Featured courses
Udemy
© 2021 Udemy, Inc.