Udemy
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
Development
Web Development Data Science Mobile Development Programming Languages Game Development Database Design & Development Software Testing Software Engineering Development Tools No-Code Development
Business
Entrepreneurship Communications Management Sales Business Strategy Operations Project Management Business Law Business Analytics & Intelligence Human Resources Industry E-Commerce Media Real Estate Other Business
Finance & Accounting
Accounting & Bookkeeping Compliance Cryptocurrency & Blockchain Economics Finance Finance Cert & Exam Prep Financial Modeling & Analysis Investing & Trading Money Management Tools Taxes Other Finance & Accounting
IT & Software
IT Certification Network & Security Hardware Operating Systems Other IT & Software
Office Productivity
Microsoft Apple Google SAP Oracle Other Office Productivity
Personal Development
Personal Transformation Personal Productivity Leadership Career Development Parenting & Relationships Happiness Esoteric Practices Religion & Spirituality Personal Brand Building Creativity Influence Self Esteem & Confidence Stress Management Memory & Study Skills Motivation Other Personal Development
Design
Web Design Graphic Design & Illustration Design Tools User Experience Design Game Design Design Thinking 3D & Animation Fashion Design Architectural Design Interior Design Other Design
Marketing
Digital Marketing Search Engine Optimization Social Media Marketing Branding Marketing Fundamentals Marketing Analytics & Automation Public Relations Advertising Video & Mobile Marketing Content Marketing Growth Hacking Affiliate Marketing Product Marketing Other Marketing
Lifestyle
Arts & Crafts Beauty & Makeup Esoteric Practices Food & Beverage Gaming Home Improvement Pet Care & Training Travel Other Lifestyle
Photography & Video
Digital Photography Photography Portrait Photography Photography Tools Commercial Photography Video Design Other Photography & Video
Health & Fitness
Fitness General Health Sports Nutrition Yoga Mental Health Dieting Self Defense Safety & First Aid Dance Meditation Other Health & Fitness
Music
Instruments Music Production Music Fundamentals Vocal Music Techniques Music Software Other Music
Teaching & Academics
Engineering Humanities Math Science Online Education Social Science Language Teacher Training Test Prep Other Teaching & Academics
AWS Certification Microsoft Certification AWS Certified Solutions Architect - Associate AWS Certified Cloud Practitioner CompTIA A+ Cisco CCNA Amazon AWS CompTIA Security+ AWS Certified Developer - Associate
Graphic Design Photoshop Adobe Illustrator Drawing Digital Painting InDesign Character Design Canva Figure Drawing
Life Coach Training Neuro-Linguistic Programming Personal Development Mindfulness Meditation Personal Transformation Life Purpose Emotional Intelligence Neuroscience
Web Development JavaScript React CSS Angular PHP WordPress Node.Js Python
Google Flutter Android Development iOS Development Swift React Native Dart Programming Language Mobile Development Kotlin SwiftUI
Digital Marketing Google Ads (Adwords) Social Media Marketing Google Ads (AdWords) Certification Marketing Strategy Internet Marketing YouTube Marketing Email Marketing Google Analytics
SQL Microsoft Power BI Tableau Business Analysis Business Intelligence MySQL Data Modeling Data Analysis Big Data
Business Fundamentals Entrepreneurship Fundamentals Business Strategy Online Business Business Plan Startup Blogging Freelancing Home Business
Unity Game Development Fundamentals Unreal Engine C# 3D Game Development C++ 2D Game Development Unreal Engine Blueprints Blender
30-Day Money-Back Guarantee
Development Data Science Apache Spark

SGLearn@From 0 to 1 : Spark for Data Science with Python

This is an Adapted Course for Singaporeans picking up new skillsets and competencies under the CITREP+ Scheme.
Rating: 4.6 out of 54.6 (6 ratings)
13 students
Created by DioPACT SG
Last updated 6/2017
English
30-Day Money-Back Guarantee

What you'll learn

  • Use Spark for a variety of analytics and Machine Learning tasks
  • Implement complex algorithms like PageRank or Music Recommendations
  • Work with a variety of datasets from Airline delays to Twitter, Web graphs, Social networks and Product Ratings
  • Use all the different features and libraries of Spark : RDDs, Dataframes, Spark SQL, MLlib, Spark Streaming and GraphX

Course content

12 sections • 55 lectures • 8h 39m total length

  • Preview02:15

  • Preview08:45
  • Why is Spark so cool?
    12:23
  • An introduction to RDDs - Resilient Distributed Datasets
    09:39
  • Built-in libraries for Spark
    15:37
  • Installing Spark
    06:42
  • The PySpark Shell
    04:50
  • Transformations and Actions
    13:33
  • See it in Action : Munging Airlines Data with PySpark - I
    10:13
  • [For Linux/Mac OS Shell Newbies] Path and other Environment Variables
    08:25

  • Preview12:35
  • RDD Characteristics: Lineage, RDDs know where they came from
    06:06
  • What can you do with RDDs?
    11:08
  • Create your first RDD from a file
    16:10
  • Average distance travelled by a flight using map() and reduce() operations
    05:50
  • Get delayed flights using filter(), cache data using persist()
    05:23
  • Average flight delay in one-step using aggregate()
    15:10
  • Frequency histogram of delays using countByValue()
    03:26
  • See it in Action : Analyzing Airlines Data with PySpark - II
    06:25

  • Preview14:45
  • Average delay per airport, use reduceByKey(), mapValues() and join()
    18:11
  • Average delay per airport in one step using combineByKey()
    11:53
  • Get the top airports by delay using sortBy()
    04:34
  • Lookup airport descriptions using lookup(), collectAsMap(), broadcast()
    14:03
  • See it in Action : Analyzing Airlines Data with PySpark - III
    04:58

  • Get information from individual processing nodes using accumulators
    13:35
  • See it in Action : Using an Accumulator variable
    02:40
  • Long running programs using spark-submit
    05:58
  • See it in Action : Running a Python script with Spark-Submit
    03:58
  • Behind the scenes: What happens when a Spark script runs?
    14:30
  • Running MapReduce operations
    13:44
  • See it in Action : MapReduce with Spark
    02:05

  • Preview15:58
  • Pair RDDs in Java
    04:49
  • Running Java code
    03:49
  • Installing Maven
    02:20
  • See it in Action : Running a Spark Job with Java
    05:08

  • Preview16:44
  • The PageRank algorithm
    06:15
  • Implement PageRank in Spark
    12:01
  • Join optimization in PageRank using Custom Partitioning
    07:27
  • Preview03:46

  • Dataframes: RDDs + Tables
    16:04
  • See it in Action : Dataframes and Spark SQL
    04:49

  • Preview12:19
  • Latent Factor Analysis with the Alternating Least Squares method
    11:39
  • Music recommendations using the Audioscrobbler dataset
    07:51
  • Implement code in Spark using MLlib
    16:05

  • Preview09:55
  • Implement stream processing in Spark using Dstreams
    10:54
  • Stateful transformations using sliding windows
    09:26
  • See it in Action : Spark Streaming
    04:17

Requirements

  • The course assumes knowledge of Python. You can write Python code directly in the PySpark shell. If you already have IPython Notebook installed, we'll show you how to configure it for Spark
  • For the Java section, we assume basic knowledge of Java. An IDE which supports Maven, like IntelliJ IDEA/Eclipse would be helpful
  • All examples work with or without Hadoop. If you would like to use Spark with Hadoop, you'll need to have Hadoop installed (either in pseudo-distributed or cluster mode).

Description

Welcome to the SGLearn Series targeted at Singapore-based learners picking up new skillsets and competencies.

This course is an adaptation of the same course by Janani Ravi and the team and is specially produced in collaboration with Janani for Singaporean learners. If you are a Singaporean, you are eligible for the CITREP+ funding scheme, terms and conditions apply.

_____________ 

Note from the team ... 

Taught by a 4 person team including 2 Stanford-educated, ex-Googlers  and 2 ex-Flipkart Lead Analysts. This team has decades of practical experience in working with Java and with billions of rows of data. 

Get your data to fly using Spark for analytics, machine learning and data science 

Let’s parse that.

What's Spark? If you are an analyst or a data scientist, you're used to having multiple systems for working with data. SQL, Python, R, Java, etc. With Spark, you have a single engine where you can explore and play with large amounts of data, run machine learning algorithms and then use the same system to productionize your code.

Analytics: Using Spark and Python you can analyze and explore your data in an interactive environment with fast feedback. The course will show how to leverage the power of RDDs and Dataframes to manipulate data with ease. 

Machine Learning and Data Science : Spark's core functionality and built-in libraries make it easy to implement complex algorithms like Recommendations with very few lines of code. We'll cover a variety of datasets and algorithms including PageRank, MapReduce and Graph datasets. 

What's Covered:

Lot's of cool stuff ..

  • Music Recommendations using Alternating Least Squares and the Audioscrobbler dataset

  • Dataframes and Spark SQL to work with Twitter data

  • Using the PageRank algorithm with Google web graph dataset

  • Using Spark Streaming for stream processing 

  • Working with graph data using the  Marvel Social network dataset 



.. and of course all the Spark basic and advanced features: 

  • Resilient Distributed Datasets, Transformations (map, filter, flatMap), Actions (reduce, aggregate) 

  • Pair RDDs , reduceByKey, combineByKey 

  • Broadcast and Accumulator variables 

  • Spark for MapReduce 

  • The Java API for Spark 

  • Spark SQL, Spark Streaming, MLlib and GraphFrames (GraphX for Python) 



Using discussion forums

Please use the discussion forums on this course to engage with other students and to help each other out. Unfortunately, much as we would like to, it is not possible for us at Loonycorn to respond to individual questions from students:-(

We're super small and self-funded with only 2-3 people developing technical video content. Our mission is to make high-quality courses available at super low prices.

The only way to keep our prices this low is to *NOT offer additional technical support over email or in-person*. The truth is, direct support is hugely expensive and just does not scale.

We understand that this is not ideal and that a lot of students might benefit from this additional support. Hiring resources for additional support would make our offering much more expensive, thus defeating our original purpose.

It is a hard trade-off.

Thank you for your patience and understanding!

Who this course is for:

  • Yep! Analysts who want to leverage Spark for analyzing interesting datasets
  • Yep! Data Scientists who want a single engine for analyzing and modelling data as well as productionizing it.
  • Yep! Engineers who want to use a distributed computing engine for batch or stream processing or both

Instructor

DioPACT SG
SGLearn
DioPACT SG
  • 4.4 Instructor Rating
  • 18 Reviews
  • 60 Students
  • 5 Courses

Dioworks is an e-learning design company focused on using technology as enablers to make learning easy, engaging and effective. Premised on innovative designs, pedagogy and research, we provide quality learning experiences for learners globally. Dioworks offers bespoke solutions for organisations to integrate learning, training and assessment of work-based competencies via blended learning strategies. We are also the local partner to Udemy in Singapore. 

More specifically, we combine the strengths of Classroom-Facilitated Learning, Massive Open Online Courses (MOOCs) in partnership with UDEMY Inc, and our "Kinetic Coach" automated response training solution to achieve learning outcomes.

  • Udemy for Business
  • Teach on Udemy
  • Get the app
  • About us
  • Contact us
  • Careers
  • Blog
  • Help and Support
  • Affiliate
  • Impressum Kontakt
  • Terms
  • Privacy policy
  • Cookie settings
  • Sitemap
  • Featured courses
Udemy
© 2021 Udemy, Inc.