Udemy
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
Turn what you know into an opportunity and reach millions around the world.
Learn More
Your cart is empty.
Keep shopping
Data Cleaning Techniques in Data Science & Machine Learning
Rating: 3.8 out of 5(9 ratings)
173 students

Data Cleaning Techniques in Data Science & Machine Learning

Explore all the concepts of Data Cleaning for AI & Data Science to become an expert with this complete online tutorial.
Last updated 1/2020
English

What you'll learn

  • Professional ways for handling the data
  • Learn Standard visualization techniques like Histograms, Scatterplots etc
  • How to locate discrepancies, and deal with issues

Course content

5 sections30 lectures4h 54m total length
  • Identifying the task8:57

    Identify the task in data cleaning and data science by distinguishing supervised, unsupervised, and semi supervised learning, and applying classification, prediction, and clustering to data.

  • Model building7:22

    Identify the data task, select the target variable and features, and apply preprocessing to build a model for classification, prediction, or clustering on a dataset like Iris.

  • Some common solutions12:45

    Survey common supervised and unsupervised model types, including decision trees, random forests, boosting, regression, neural networks, SVMs, clustering, and PCA.

  • Training and test data13:13

    Split data into training and test sets, shuffle with a random state to avoid bias, and use a validation set for hyperparameter tuning to ensure robust model generalization.

  • Cross validation5:03

    Apply cross validation by using multiple folds, such as fivefold or tenfold, with leave-one-out for scarce data, to train on most data and test on a held-out fold.

  • Feature selection14:04

    Identify irrelevant and redundant features to improve model accuracy and efficiency. Explore filter, wrapper, and embedded feature selection methods, including principal component analysis and feature engineering, to build lean models.

  • Accuracy measures16:22

    Explore how confusion matrices support binary classification, learn to calculate and interpret accuracy, specificity, sensitivity, precision, recall, f1 score, and area under the roc curve using Python and scikit-learn.

  • Overfitting11:53

    Explore how to identify and prevent overfitting by balancing model complexity, handling noise and outliers, and applying train-test splits and feature selection for robust data cleaning in machine learning.

Requirements

  • Basic Knowledge of Python

Description

One of the most essential aspects of Data Science or Machine Learning is Data Cleaning. In order to get the most out of the data, your data must be clean as uncleaned data can make it harder for you to train ML models. In regard to ML & Data Science, data cleaning generally filters & modifies your data making it easier for you to explore, understand and model.


A good statistician or a researcher must spend at least 90% of his/her time on collecting or cleaning data for developing a hypothesis and remaining 10% on the actual manipulation of the data for analyzing or deriving the results. Despite these facts, data cleaning is not commonly discussed or taught in detail in most of the data science or ML courses. With the rise of big data & ML, now data cleaning has also become equally important.


Why should you learn Data Cleaning?


  • Improve decision making

  • Improve the efficiency

  • Increase productivity

  • Remove the errors and inconsistencies from the dataset

  • Identifying missing values

  • Remove duplication


Why should you take this course?


Data Cleaning is an essential part of Data Science & AI, and it has become an equally important skill for a programmer. It’s true that you will find hundreds of online tutorials on Data Science and Artificial Intelligence but only a few of them cover data cleaning or just give the basic overview. This online guide for data cleaning includes numerous sections having over 5 hours of video which are enough to teach anyone about all its concepts from the very beginning. Enroll in this course now to learn all the concepts of Data Cleaning.


This course teaches you everything including the basics of Data Cleaning, Data Reading, merging or splitting datasets, different visualization tools, locate or handling missing/absurd values and hands-on sessions where you’ll be introduced to the dataset for ensuring complete learning of Data Cleaning.


Enroll in this course now to learn about data cleaning concepts and techniques in detail!

Who this course is for:

  • Students who want to learn the basics of Data Cleaning