Udemy
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
Turn what you know into an opportunity and reach millions around the world.
Learn More
Your cart is empty.
Keep shopping
Mastering Big Data Analytics with PySpark
Rating: 4.5 out of 5(63 ratings)
468 students

Mastering Big Data Analytics with PySpark

Effectively apply Advanced Analytics to large datasets using the power of PySpark
Last updated 6/2020
English

What you'll learn

  • Gain a solid knowledge of vital Data Analytics concepts via practical use cases
  • Create elegant data visualizations using Jupyter
  • Run, process, and analyze large chunks of datasets using PySpark
  • Utilize Spark SQL to easily load big data into DataFrames
  • Create fast and scalable Machine Learning applications using MLlib with Spark
  • Perform exploratory Data Analysis in a scalable way
  • Achieve scalable, high-throughput and fault-tolerant processing of data streams using Spark Streaming

Course content

9 sections41 lectures8h 7m total length
  • Course Overview6:50

    This video gives an entire overview of the course.

  • Python versus Spark10:35

    One might wonder, why Spark, and where does Python fit in? In this video we will cover why Python is a good pick when working with Spark.

       •  Compare various programming languages; understand how Spark interacts with them

       •  Explore how Spark creates jobs

       •  Get a good understanding of where Python fits in

  • Preparing for the Course6:23

    Here, we prepare for the course by downloading the data and exploring how the lab environment will look like.

       •  Downloading all the courseware

       •  Familiarizing oneself with the layout of the courseware

       •  Learn how to use Docker and Jupyter

  • Connecting Jupyter to Spark14:58

    To follow along the labs of the course, it is important to do some setting up.

       •  Setup the local development environment

       •  Run the first PySpark ‘Hello World’ script

  • Test Your Knowledge

Requirements

  • A working knowledge of Python assumed.

Description

PySpark helps you perform data analysis at-scale; it enables you to build more scalable analyses and pipelines. This course starts by introducing you to PySpark's potential for performing effective analyses of large datasets. You'll learn how to interact with Spark from Python and connect Jupyter to Spark to provide rich data visualizations. After that, you'll delve into various Spark components and its architecture.

You'll learn to work with Apache Spark and perform ML tasks more smoothly than before. Gathering and querying data using Spark SQL, to overcome challenges involved in reading it. You'll use the DataFrame API to operate with Spark MLlib and learn about the Pipeline API. Finally, we provide tips and tricks for deploying your code and performance tuning.

By the end of this course, you will not only be able to perform efficient data analytics but will have also learned to use PySpark to easily analyze large datasets at-scale in your organization.

About the Author

Danny Meijer works as the Lead Data Engineer in the Netherlands for the Data and Analytics department of a leading sporting goods retailer. He is a Business Process Expert, big data scientist and additionally a data engineer, which gives him a unique mix of skills—the foremost of which is his business-first approach to data science and data engineering.

He has over 13-years' IT experience across various domains and skills ranging from (big) data modeling, architecture, design, and development as well as project and process management; he also has extensive experience with process mining, data engineering on big data, and process improvement.

As a certified data scientist and big data professional, he knows his way around data and analytics, and is proficient in various types of programming language. He has extensive experience with various big data technologies and is fluent in everything: NoSQL, Hadoop, Python, and of course Spark.

Danny is a driven person, motivated by everything data and big-data. He loves math and machine learning and tackling difficult problems.

Who this course is for:

  • This course will greatly appeal to data science enthusiasts, data scientists, or anyone who is familiar with Machine Learning concepts and wants to scale out his/her work to work with big data.
  • If you find it difficult to analyze large datasets that keep growing, then this course is the perfect guide for you!