Udemy
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
Turn what you know into an opportunity and reach millions around the world.
Learn More
Your cart is empty.
Keep shopping
Machine Learning: Data Preprocessing[Python][Hindi]
Rating: 4.1 out of 5(233 ratings)
9,454 students
Created byRishi Bansal
Last updated 6/2020
English

What you'll learn

  • What is Data Preprocessing? What are various types of Data Preprocessing and why we need it.

Course content

1 section11 lectures1h 10m total length
  • What is Data Preprocessing?7:03

    •Preprocessing refers to transformation before feeding to machine learning

    •Quality of data is important to train the model

    •Source – Government databases, professional or company data sources(twitter), your company, etc

    •Data will never be in the format you need – Pandas Dataframe for reformatting

    •Columns to remove – No values, duplicate(correlated column, e.g: house size in ft and metres)

    •Learning algorithms understands only number, converting text image to number is required

    •Unscaled or unstandardized data have might have unacceptable prediction

  • Checking for Null Values: Concept + Python Code7:45

    •Check for Null values

    •Remove or Impute

    •df.isnull().values.any()

    •df = df.dropna(how='any',axis=0)

  • Correlated Feature Check: Concept + Python Code9:28

    •Sometimes two features that are meant to measure different characteristics of a model are influenced by common mechanism and they move together.

    How to Handle Correlation:

    •Remove one of the feature

    •Apply Principal Component Analysis(PLA)

  • Data Molding(Encoding): Concept + Python Code3:23

    •Adjusting Data Types - Inspect data types to see if there are any issues. Data should be numeric.

    •If required create new columns

  • Data Splitting5:30
  • Data Splitting : Python Code9:46
  • Impute Missing Values: Concept + Python Code5:34

    Missing Data - Ways to Handle

    •Drop rows

    •Replace values (Impute)

  • Scaling5:56

    •Feature Scaling is a technique to standardize the independent features present in the data in a fixed range.

    •It is performed during the data pre-processing to handle highly varying magnitudes or values or units.

    Disadvantage:

    • Without Feature Scaling a machine learning algorithm tends to weigh greater values -> higher and consider smaller values as the lower values, regardless of the unit of the values.

  • Scaling: Python Code6:15
  • Label Encoder: Concept + Code4:52

    Convert text values to numbers. These can be used in the following situations:

    •There are only two values for a column in your data. The values will then become 0/1 - effectively a binary representation

    •The values have relationship with each other where comparisons are meaningful (e.g. low<medium<high)

  • One-Hot Encoder: Concept + Python Code4:51

    •Use when there is no meaningful comparison between values in the column

    •Creates a new column for each unique value for the specified feature in the data set

Requirements

  • Knowledge of Python is required for Coding section
  • No prerequisite. Anyone can do this course.
  • After completing this course, you can connect to me on my blog for any question.
  • Urdu speaking people can do this course as well.

Description


This course is designed to understand the basic concept of data preprocessing. Anyone can opt for this course. No prior understanding of machine learning is required. The data pre-processing concept and its implementation in Python are covered in detail.


Data quality is critical to a successful machine learning model. Data preprocessing is a prerequisite for machine learning. We cannot feed into machine learning algorithms as raw data. It is important to clean the data, analyze it, and transform it to understand machine learning algorithms.


Who this course is for:

  • Data Preprocessing is prerequisite for Machine Learning coding.