Data Science:Hands-on Diabetes Prediction with Pyspark MLlib
What you'll learn
- Diabetes Prediction using Spark Machine Learning (Spark MLlib)
- Learn Pyspark fundamentals
- Working with dataframes in Pyspark
- Analyzing and cleaning data
- Process data using a Machine Learning model using Spark MLlib
- Build and train logistic regression model
- Performance evaluation and saving model
- Basics of Python
Would you like to build, train, test and evaluate a machine learning model that is able to detect diabetes using logistic regression?
This is a Hands-on Machine Learning Course where you will practice alongside the classes. The dataset will be provided to you during the lectures. We highly recommend that for the best learning experience, you practice alongside the lectures.
You will learn more in this one hour of Practice than hundreds of hours of unnecessary theoretical lectures.
Learn the most important aspect of Spark Machine learning (Spark MLlib) :
Pyspark fundamentals and implementing spark machine learning
Importing and Working with Datasets
Process data using a Machine Learning model using spark MLlib
Build and train Logistic regression model
Test and analyze the model
The entire course has been divided into tasks. Each task has been very carefully created and designed to give you the best learning experience. In this hands-on project, we will complete the following tasks:
Task 1: Project overview
Task 2: Intro to Colab environment & install dependencies to run spark on Colab
Task 3: Clone & explore the diabetes dataset
Task 4: Data Cleaning
Task 5: Correlation & feature selection
Task 6: Build and train Logistic Regression Model using Spark MLlib
Task 7: Performance evaluation & Test the model
Task 8: Save & load model
Pyspark is the collaboration of Apache Spark and Python. PySpark is a tool used in Big Data Analytics.
Apache Spark is an open-source cluster-computing framework, built around speed, ease of use, and streaming analytics whereas Python is a general-purpose, high-level programming language. It provides a wide range of libraries and is majorly used for Machine Learning and Real-Time Streaming Analytics.
In other words, it is a Python API for Spark that lets you harness the simplicity of Python and the power of Apache Spark in order to tame Big Data. We will be using Big data tools in this project.
Make a leap into Data science with this Spark MLlib project and showcase your skills on your resume.
Click on the “ENROLL NOW” button and start learning.
Who this course is for:
- Anyone interested in Data analysis with Spark and ML
- Anyone who wants to learn fundamentals of Apache Spark in Big Data Analytics
Welcome to the School of the Disruptive Innovation. We are here to teach you what they don't teach you in school. We are unconventional in our ways but we promise and we over-deliver.
We have a community of over 40,000+ students and 60,000+ enrollments across 166 countries. We offer courses on Data Science (Classical machine Learning, Deep learning, BigData, Data Visualization & Analysis), Android Development, Web Development, and Graphics Design.
Every course is created and delivered by professionals in the field such as Technology related courses by software engineers and business related courses are created by business experts.