Data pre-processing for Machine Learning in Python

Name: Data pre-processing for Machine Learning in Python
Rating: 4.6 (263 reviews)

How to transform a dataset for a machine learning model

Highest Rated

Created byGianluca Malato

Last updated 12/2023

English

What you'll learn

How to fill the missings in numerical and categorical variables
How to encode the categorical variables
How to transform the numerical variables
How to scale the numerical variables
Principal Component Analysis and how to use it
How to apply oversampling using SMOTE
How to use several useful objects in scikit-learn library

Course content

11 sections • 48 lectures • 5h 35m total length

Introduction to the course2:58
Numerical and categorical variables2:07
The dataset0:09
Required Python packages0:21
Jupyter notebooks9:07

Requirements

Basic knowledge of Python programming language

Description

In this course, we are going to focus on pre-processing techniques for machine learning.

Pre-processing is the set of manipulations that transform a raw dataset to make it used by a machine learning model. It is necessary for making our data suitable for some machine learning models, to reduce the dimensionality, to better identify the relevant data, and to increase model performance. It's the most important part of a machine learning pipeline and it's strongly able to affect the success of a project. In fact, if we don't feed a machine learning model with the correctly shaped data, it won't work at all.

Sometimes, aspiring Data Scientists start studying neural networks and other complex models and forget to study how to manipulate a dataset in order to make it used by their algorithms. So, they fail in creating good models and only at the end they realize that good pre-processing would make them save a lot of time and increase the performance of their algorithms. So, handling pre-processing techniques is a very important skill. That's why I have created an entire course that focuses only on data pre-processing.

With this course, you are going to learn:

Data cleaning
Encoding of the categorical variables
Transformation of the numerical features
Scikit-learn Pipeline and ColumnTransformer objects
Scaling of the numerical features
Principal Component Analysis
Filter-based feature selection
Oversampling using SMOTE

All the examples will be given using Python programming language and its powerful scikit-learn library. The environment that will be used is Jupyter, which is a standard in the data science industry. All the sections of this course end with some practical exercises and the Jupyter notebooks are all downloadable.

Who this course is for:

Python developers
Aspiring data scientists
People interested in machine learning and artificial intelligence

Data pre-processing for Machine Learning in Python

What you'll learn

Explore related topics

Course content

Introduction5 lectures • 15min

Data cleaning7 lectures • 57min

Encoding of the categorical features5 lectures • 46min

Transformations of the numerical features7 lectures • 43min

Pipelines3 lectures • 32min

Scaling3 lectures • 22min

Principal Component Analysis4 lectures • 20min

Filter-based feature selection9 lectures • 1hr

A complete pipeline1 lecture • 19min

Oversampling3 lectures • 21min

Requirements

Description

Who this course is for: