Udemy
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
Development
Web Development Data Science Mobile Development Programming Languages Game Development Database Design & Development Software Testing Software Engineering Development Tools No-Code Development
Business
Entrepreneurship Communications Management Sales Business Strategy Operations Project Management Business Law Business Analytics & Intelligence Human Resources Industry E-Commerce Media Real Estate Other Business
Finance & Accounting
Accounting & Bookkeeping Compliance Cryptocurrency & Blockchain Economics Finance Finance Cert & Exam Prep Financial Modeling & Analysis Investing & Trading Money Management Tools Taxes Other Finance & Accounting
IT & Software
IT Certification Network & Security Hardware Operating Systems Other IT & Software
Office Productivity
Microsoft Apple Google SAP Oracle Other Office Productivity
Personal Development
Personal Transformation Personal Productivity Leadership Career Development Parenting & Relationships Happiness Esoteric Practices Religion & Spirituality Personal Brand Building Creativity Influence Self Esteem & Confidence Stress Management Memory & Study Skills Motivation Other Personal Development
Design
Web Design Graphic Design & Illustration Design Tools User Experience Design Game Design Design Thinking 3D & Animation Fashion Design Architectural Design Interior Design Other Design
Marketing
Digital Marketing Search Engine Optimization Social Media Marketing Branding Marketing Fundamentals Marketing Analytics & Automation Public Relations Advertising Video & Mobile Marketing Content Marketing Growth Hacking Affiliate Marketing Product Marketing Other Marketing
Lifestyle
Arts & Crafts Beauty & Makeup Esoteric Practices Food & Beverage Gaming Home Improvement Pet Care & Training Travel Other Lifestyle
Photography & Video
Digital Photography Photography Portrait Photography Photography Tools Commercial Photography Video Design Other Photography & Video
Health & Fitness
Fitness General Health Sports Nutrition Yoga Mental Health Dieting Self Defense Safety & First Aid Dance Meditation Other Health & Fitness
Music
Instruments Music Production Music Fundamentals Vocal Music Techniques Music Software Other Music
Teaching & Academics
Engineering Humanities Math Science Online Education Social Science Language Teacher Training Test Prep Other Teaching & Academics
AWS Certification Microsoft Certification AWS Certified Solutions Architect - Associate AWS Certified Cloud Practitioner CompTIA A+ Cisco CCNA Amazon AWS CompTIA Security+ Microsoft AZ-900
Graphic Design Photoshop Adobe Illustrator Drawing Digital Painting InDesign Character Design Canva Figure Drawing
Life Coach Training Neuro-Linguistic Programming Personal Development Mindfulness Personal Transformation Life Purpose Meditation CBT Emotional Intelligence
Web Development JavaScript React CSS Angular PHP Node.Js WordPress Vue JS
Google Flutter Android Development iOS Development React Native Swift Dart Programming Language Mobile Development Kotlin SwiftUI
Digital Marketing Google Ads (Adwords) Social Media Marketing Google Ads (AdWords) Certification Marketing Strategy Internet Marketing YouTube Marketing Email Marketing Retargeting
Microsoft Power BI SQL Tableau Business Analysis Data Modeling Business Intelligence MySQL Data Analysis Blockchain
Business Fundamentals Entrepreneurship Fundamentals Business Strategy Business Plan Startup Online Business Freelancing Blogging Home Business
Unity Game Development Fundamentals Unreal Engine C# 3D Game Development C++ 2D Game Development Unreal Engine Blueprints Blender
2021-02-11 19:14:29
30-Day Money-Back Guarantee
IT & Software Other IT & Software Apache Spark

Apache Spark for Java Developers

Get processing Big Data using RDDs, DataFrames, SparkSQL and Machine Learning - and real time streaming with Kafka!
Bestseller
Rating: 4.6 out of 54.6 (1,614 ratings)
9,344 students
Created by Richard Chesterwood, Matt Greencroft, Virtual Pair Programmers
Last updated 2/2021
English
English [Auto]
30-Day Money-Back Guarantee

What you'll learn

  • Use functional style Java to define complex data processing jobs
  • Learn the differences between the RDD and DataFrame APIs
  • Use an SQL style syntax to produce reports against Big Data sets
  • Use Machine Learning Algorithms with Big Data and SparkML
  • Connect Spark to Apache Kafka to process Streams of Big Data
  • See how Structured Streaming can be used to build pipelines with Kafka
Curated for the Udemy for Business collection

Requirements

  • Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
  • Previous knowledge of Java is assumed, but anything above the basics is explained
  • Some previous SQL will be useful for part of the course, but if you've never used it before this will be a good first experience

Description

Get started with the amazing Apache Spark parallel computing framework - this course is designed especially for Java Developers.

If you're new to Data Science and want to find out about how massive datasets are processed in parallel, then the Java API for spark is a great way to get started, fast.

All of the fundamentals you need to understand the main operations you can perform in Spark Core, SparkSQL and DataFrames are covered in detail, with easy to follow examples. You'll be able to follow along with all of the examples, and run them on your own local development computer.

Included with the course is a module covering SparkML, an exciting addition to Spark that allows you to apply Machine Learning models to your Big Data! No mathematical experience is necessary!

And finally, there's a full 3 hour module covering Spark Streaming, where you will get hands-on experience of integrating Spark with Apache Kafka to handle real-time big data streams. We use both the DStream and the Structured Streaming APIs.


Optionally, if you have an AWS account, you'll see how to deploy your work to a live EMR (Elastic Map Reduce) hardware cluster. If you're not familiar with AWS you can skip this video, but it's still worthwhile to watch rather than following along with the coding.

You'll be going deep into the internals of Spark and you'll find out how it optimizes your execution plans. We'll be comparing the performance of RDDs vs SparkSQL, and you'll learn about the major performance pitfalls which could save a lot of money for live projects.

Throughout the course, you'll be getting some great practice with Java Lambdas - a great way to learn functional-style Java if you're new to it.



Who this course is for:

  • Anyone who already knows Java and would like to explore Apache Spark
  • Anyone new to Data Science who want a fast way to get started, without learning Python, Scala or R!

Course content

46 sections • 143 lectures • 21h 43m total length

  • Preview03:21
  • Downloading the Code
    00:11
  • Preview05:12
  • Spark Architecture and RDDs
    Preview10:43

  • Warning - Java 9+ is not supported by Spark 2. You can optionally use Spark 3.
    01:28
  • Installing Spark
    Preview21:02

  • Reduces on RDDs
    Preview13:22

  • Mapping Operations
    07:00
  • Outputting Results to the Console
    04:45
  • Counting Big Data Items
    06:17
  • If you've had a "NotSerializableException" in Spark
    05:53

  • RDDs of Objects
    08:05
  • Tuples and RDDs
    10:01

  • Overview of PairRDDs
    08:46
  • Building a PairRDD
    09:11
  • Coding a ReduceByKey
    11:29
  • Using the Fluent API
    06:45
  • Grouping By Key
    05:08

  • FlatMaps
    09:46
  • Filters
    04:55

  • Reading from Disk
    13:19

  • Practical Requirements
    11:35
  • Worked Solution
    15:15
  • Worked Solution (continued) with Sorting
    14:35

  • Why do sorts not work with foreach in Spark?
    10:31
  • Why Coalesce is the Wrong Solution
    14:18
  • What is Coalesce used for in Spark?
    04:44

Instructors

Richard Chesterwood
Software developer at VirtualPairProgrammers
Richard Chesterwood
  • 4.7 Instructor Rating
  • 9,781 Reviews
  • 69,175 Students
  • 4 Courses

Richard has been developing software for the past 25 years and has a particular fondness for the JVM ecosystem. For the last 15 years he's delivered training courses to projects around the world, and was one of the founders of VirtualPairProgrammers.

His main field of interest is in the DevOps area, managing several large scale projects in the cloud.

Matt Greencroft
Course tutor at Virtual Pair Programmers
Matt Greencroft
  • 4.6 Instructor Rating
  • 6,002 Reviews
  • 49,521 Students
  • 10 Courses

Having worked for over 20 years as a professional programmer, mainly in banking, Matt now teaches for Virtual Pair Programmers. His specialist areas are JavaEE, Android, Hadoop and NoSQL. Matt's currently working on a Clojure project, which he finds an enjoyable challenge!

Outside of work, Matt enjoys cycling, but prefers going downhill to uphill, and he also plays the piano… very badly.

Virtual Pair Programmers
Virtual Pair Programmers
  • 4.6 Instructor Rating
  • 11,648 Reviews
  • 76,063 Students
  • 12 Courses

Virtual Pair Programmers are here to help you take your programming skills to the next level. We're a group of enthusiastic software trainers who are all professional developers, and have a really practical approach to learning - our courses are full of real world case studies and hands on examples. We teach what you need to know to be productive in the workplace and to get the job done, rather than going through each feature turn by turn.

  • Udemy for Business
  • Teach on Udemy
  • Get the app
  • About us
  • Contact us
  • Careers
  • Blog
  • Help and Support
  • Affiliate
  • Impressum Kontakt
  • Terms
  • Privacy policy
  • Cookie settings
  • Sitemap
  • Featured courses
Udemy
© 2021 Udemy, Inc.