Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Data Science 2023: Data Preprocessing & Feature Engineering

Name: Data Science 2023: Data Preprocessing & Feature Engineering
Rating: 4.3 (41 reviews)

Become expert in Data Cleaning and Feature Engineering for Machine Learning using Pandas & Scikit learn

Created byAnupam Khare

Last updated 2/2023

English

What you'll learn

Preprocessing the data takes 60%-70% of time. The course provides the entire toolbox to you to convert your raw data to model ready data
Become Expert in Python Pandas and scikit-learn for data manipulation and feature engineering
Become efficient in pre-processing data using various python packages such as pandas_profiling, catagory-encoders etc.
Learn feature Engineering techniques like encoding, imputation scaling etc. using Scikit-learn
Learn Scikit-learn Pipeline, Column tranformers to make the code readable and efficient
Learn to Write Python Functions which wraps various pandas functionalities to automate tasks
Export Analysis Output to Text file or Excel (export multiple dataframes to different sheets and multiple dataframes to same sheet in a workbook programatically
Bonus Lecture to help you strategise in interview preparations

Course content

11 sections • 36 lectures • 6h 14m total length

Introduction6:46

Understanding Series and Dataframes8:26
Some methods in Pandas works with Pandas dataframes and some with pandas series. This lecture details the difference between the two and tells what to use where
Dealing with missing values Part18:31
Data can have various invalid values like '#', 'none' etc, which you want to treat as a missing value. Students will be able to convert these invalid values to NAN while importing the data.
Daling with missing values: Part 2: Why df.na() is bad option8:38
People generally use df.na() to get rid of the missing data. The lecture explains why this is not the great idea to use it
Removing missing data Intelligently9:41
Students will be able to use parameters in df.na() to get rid of missing values intelligently
Dealing with Constant and Quasi Constant Variables9:52
Every dataset has some constant and quasi constant variables. Lecture explains what are these variable and what to do with these kind of variables
Cleaning the text Data15:31
Every dataset has some textual data in the form of categorical variables. Since these could be manual inputs, these kinds of data can have multiple issues. Students will learn how to work with textual data and clean this data

Filtering Dataset6:03
Able to learn about different filtering methods available
Filtering Dataset Using Column Names0:19
Learn how to use column names to subset dataset
Understand what is Lambda, Apply and Applymap12:53
During preprocessing steps, Lambda, Apply, Applymap can work as an alternative to looping over rows. Students will be able to work and learn the difference between the three.
Using IF ELSE Condition to Generate New Columns12:44
Students will be able to use if-else statements on the dataset to generate new columns
Subset rows and Create New Columns Using Text Columns13:08
Students will be able to use text columns to subset data and create new columns
Working with date and time: Part 116:28
Students will learn how to manipulate dates and extract information from dates
Working with date and time: Part 27:10
Students will learn how to manipulate dates and extract information from dates
Create Excel like Pivot from Data12:59
Students will be able to create excel like pivot table
Use Groupby and Transform to consolidate your data21:51
Students will learn the use of groupby to aggregate data. They will also learn the difference between using aggregate with groupby and using transform with groupby to get a different effects on data frames
Using Group by to rank rows within GrouG4:52
Students will learn
1- how to use groupby to rank rows within a group
2- how to find the difference between rows of the same columns

Understanding Feature Engineering Process Flow5:13
The presentation details the sequence wise process to be followed using feature engineering
Understanding Scikit-learn fit, ransform and fit_transform6:40
fit, tranform and fit_transform are scikit-learn methods to transform data. Students will be able to learn the difference and using it
Missing Value Imputation11:35
Students will be able to understand various missing values imputation methods in scikit learn and how to use it on dataframes
Correlation Analysis9:19
Students will be able to understand the need for correlation analysis, understand the various statistics associated with correlation analysis and find the correlated variables
Outlier Treatment and Removal13:56
Students will be able to identify and treat outliers
Encoding Categorical Variable18:29
Encoding of the categorical variables is needed to convert textual data to numbers so that it can be into the model. Students in this lecture will be able to learn various encoders, when to use what and how to use it
Scaling of Numerical Variable9:31
Scaling of variables before modeling is needed to provide all the variables same weightage. Students in this lecture will be able to learn various scaling techniques, when to use what and how to use it
Pipeline and Columntransformers13:03
Pipeline and Column transformers let users create a pipeline of various feature engineering techniques. The lecture helps students create pipeline and then use column transformers to perform various feature engineering tasks in one go. It makes the code more readable and concise.

Part 1- Writing A Simple Functions with Pandas Dataframe11:19
Students will be able to write a simple function using Pandas functionalities. This function will allow users to perform multiple tasks with a single line of code and automate tasks which are performed repeatedly
Part2 - Writing Simple Functions12:31
Now we will add more functionalities in our functions like if-else statement and loop through columns

Requirements

Beginner level understanding of python is preferred but not mandatory
You’ll need to install Anaconda and run jupyter notebook

Description

Real-life data are dirty. This is the reason why preprocessing tasks take approximately 70% of the time in the ML modeling process. Moreover, there is a lack of dedicated courses which deal with this challenging task

Introducing, "Data Science Course: Data Cleaning & Feature Engineering" a hardcore completely dedicated course to the most tedious tasks of Machine Learning modeling - "Data preprocessing".

if you want to enhance your data preprocessing skills to get better high-performing ML models, then this course is for you!

This course has been designed by experienced Data Scientists who will help you to understand the WHYs and HOWs of preprocessing.

I will walk you step-by-step into the process of data preprocessing. With every tutorial, you will develop new skills and improve your understanding of preprocessing challenging ways to overcome this challenge

It is structured the following way:

Part 1- EDA (exploratory Data Analysis): Get insights into your dataset

Part 2 - Data Cleaning: Clean your data based on insights

Part 3 - Data Manipulation: Generating features, subsetting, working with dates, etc.

Part 4 - Feature Engineering- Get the data ready for modeling

Part 5 - Function writing with Pandas Darframe

Bonus Section: A few Interview preparation tips and strategies for data science enthusiasts in the job hunt

Who this course is for:

Anyone who is interested in becoming efficient in data preprocessing
People who are learning data scientists and want better to understand the various nuances of data and its treatment
Budding data scientists who want to improve data preprocessing skills
Anyone who is interested in preprocessing part of data science

This course is not for people who want to learn machine learning algorithms

Who this course is for:

Beginner ML enthusiast and ML engineers who want to improve their preprocessing and feature engineering skills
People who are programmers but want to enhance skill and get familiar with packages like Pandas and Scikit Learn

Data Science 2023: Data Preprocessing & Feature Engineering

What you'll learn

Explore related topics

Course content

Introduction and Way Forward1 lecture • 7min

Install Anaconda and Jupyter notebook and Resources1 lecture • 1min

Working with Large Dataset2 lectures • 19min

Understand data with EDA (Exploratory data Analysis)2 lectures • 31min

Data Cleaning:6 lectures • 1hr 1min

Data Manipulation10 lectures • 1hr 48min

Feature Engineering8 lectures • 1hr 28min

Writing Functions in Python2 lectures • 24min

Writing Data frames and Analytical output to Text or Excel workbooks2 lectures • 20min

Understanding And Debugging Common Errors1 lecture • 12min

Requirements

Description

Who this course is for: