Udemy
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
Turn what you know into an opportunity and reach millions around the world.
Learn More
Your cart is empty.
Keep shopping
Data Cleaning & Preprocessing in Python for Machine Learning
Rating: 4.4 out of 5(35 ratings)
164 students

Data Cleaning & Preprocessing in Python for Machine Learning

Learn how to resolve Data Quality issues in Machine Learning & Data Science using Data Cleaning in Python Pandas.
Last updated 7/2022
English

What you'll learn

  • You will learn how to detect and impute missing values in the data.
  • How to detect and rectify incorrect data types.
  • How to deal with Categorical Columns.
  • How to detect and replace incorrect values with correct ones.
  • How to use Apply Lambda method for using advanced cleaning functions.
  • How to group the dataset by a particular column.
  • How to detect and remove outliers.
  • How to perform feature scaling.
  • How to clean and preprocess textual data for NLP.

Course content

4 sections31 lectures1h 34m total length
  • Introduction1:16

    Learn to detect and resolve common data issues in real-world datasets using Python, including missing values, incorrect data types, feature scaling, normalization, handling categorical variables, structural problems, and outliers.

  • Curriculum1:39

    Explore a three-section curriculum on data quality checks, fixing issues with imputation and data types, and nlp preprocessing like tokenization, stop words, and stemming.

  • Installation and Setup2:03

Requirements

  • Basic knowledge of Python.

Description

More often than not, real world data is messy and can rarely be used directly. It needs a lot of cleaning and preprocessing before it can be used in Analytics, Machine Learning or other application. Data Cleaning be a dirty job, which often requires lots of effort and advanced technical skills like familiarity with Pandas and other libraries.

For most of the data cleaning, all you need is data manipulation skills in Python. In this course you will learn just that. This course has lectures, quizzes and Jupyter notebooks, which will teach you to deal with real world raw data. The course contains tutorials on a range of data cleaning techniques, like imputing missing values, feature scaling and fixing data types issues etc.

In this you course you will learn:

  • How to detect and deal with missing values in the data.

  • How to detect and rectify incorrect data types.

  • How to deal with Categorical Columns.

  • How to detect and replace incorrect values with correct ones.

  • How to use Apply Lambda method for using advanced cleaning functions.

  • How to group the dataset by a particular column.

  • How to detect and remove outliers.

  • How to perform feature scaling.

  • How to clean and preprocess textual data for NLP.


Who this course is for:

  • Data Analysts, Data Engineers, Machine Learning Engineers and Data Sicentists.