Data Science:Hands-on Diabetes Prediction with Pyspark MLlib
4.2 (84 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
11,727 students enrolled

Data Science:Hands-on Diabetes Prediction with Pyspark MLlib

Diabetes Prediction using Machine Learning in Apache Spark
New
4.2 (84 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
11,727 students enrolled
Last updated 7/2020
English
English [Auto]
Current price: $13.99 Original price: $19.99 Discount: 30% off
23 hours left at this price!
30-Day Money-Back Guarantee
This course includes
  • 1 hour on-demand video
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • Diabetes Prediction using Spark Machine Learning (Spark MLlib)
  • Learn Pyspark fundamentals
  • Working with dataframes in Pyspark
  • Analyzing and cleaning data
  • Process data using a Machine Learning model using Spark MLlib
  • Build and train logistic regression model
  • Performance evaluation and saving model
Requirements
  • Basics of Python
Description

This is a Hands-on 1- hour Machine Learning Project using Pyspark. You learn by Practice.


No unnecessary lectures. No unnecessary details.


A precise, to the point and efficient course about Machine learning in Spark.


About Pyspark:


Pyspark is the collaboration of Apache Spark and Python. PySpark is a tool used in Big Data Analytics.

Apache Spark is an open-source cluster-computing framework, built around speed, ease of use, and streaming analytics whereas Python is a general-purpose, high-level programming language. It provides a wide range of libraries and is majorly used for Machine Learning and Real-Time Streaming Analytics.

In other words, it is a Python API for Spark that lets you harness the simplicity of Python and the power of Apache Spark in order to tame Big Data. We will be using Big data tools in this project.


You will learn more in this one hour of Practice than hundreds of hours of unnecessary theoretical lectures.


Learn the most important aspect of Spark Machine learning (Spark MLlib) :

  • Pyspark fundamentals and implementing spark machine learning

  • Importing and Working with Datasets

  • Process data using a Machine Learning model using spark MLlib

  • Build and train Logistic regression model

  • Test and analyze the model

We will build a model to predict diabetes. This is a 1- hour project. In this hands-on project, we will complete the following tasks:


Task 1: Project overview


Task 2: Intro to Colab environment & install dependencies to run spark on Colab


Task 3: Clone & explore diabetes dataset


Task 4: Data Cleaning

  1. Check for missing values

  2. Replace unnecessary values


Task 5: Correlation & feature selection


Task 6: Build and train Logistic Regression Model using Spark MLlib


Task 7: Performance evaluation & Test the model


Task 8: Save & load model


Make a leap into Data science with this Spark MLlib project and showcase your skills on your resume. So click on the “ENROLL NOW” button and start learning, building and testing model.


Who this course is for:
  • Anyone interested in Data analysis with Spark and ML
  • Anyone who wants to learn fundamentals of Apache Spark in Big Data Analytics