Data Cleaning and Visualization in Python

Imputation techniques | Outlier analysis | Data transformation | Data visualization

Created bySairam V A

Last updated 2/2025

English

What you'll learn

Understand the various issues that can be present in real time data
Understand imputation techniques and outlier analysis
Understand skewness and data transformation techniques to rectify them
Understand univariate, bivariate and multivariate feature visualization techniques
Implement the above mentioned concepts on real time dataset using python

Course content

5 sections • 10 lectures • 3h 5m total length

Introduction3:35
This lecture talks about the introduction to the subject of Exploratory Data Analysis (EDA) and the various sections to be covered in this course!!

Requirements

This course is for beginners who don't have much expertise in data cleaning and analytics.
Minimal level of expertise would be needed. Basic idea of python programming like variables, loops, conditional statements would be enough to understand the course.
It is important to understand the theoretical aspects of the concepts. That's the reason why this course is aligned more towards theory!!

Description

This course provides a comprehensive understanding of Exploratory Data Analysis (EDA), a crucial step in the machine learning lifecycle. EDA helps in diagnosing issues within datasets and applying appropriate techniques to improve data quality.

The first phase of the course focuses on data cleaning, covering essential techniques such as handling missing values (imputation), data transformation, and outlier detection. Understanding these processes ensures the dataset is refined and structured for better model performance. Various imputation methods, including statistical, neighbor-based, and predictive filling, are discussed along with transformations like log, square root, and Box-Cox. Outlier detection techniques such as Z-score, IQR, and Mahalanobis distance are also explored.

The second phase delves into data visualization, covering univariate, bivariate, and multivariate analysis. It provides an extensive discussion on various plots, including histograms, box plots, scatter plots, heatmaps, and more, ensuring clarity in data interpretation.

The course concludes with real-world case studies, demonstrating how EDA helps derive meaningful insights. All implementations are carried out in Python, leveraging libraries such as pandas, numpy, seaborn, and matplotlib. By the end of this course, participants will have hands-on expertise in performing EDA effectively for any dataset and leverage these techniques to improvise the data for better results in machine learning analysis.

This course provides more focus and priority to the theoretical aspects of the concepts, since understanding the theory is very much needed and expected in the industry also. Learning this course will give an in-depth idea on various practical issues with data and how to sort them out, followed by various visualization techniques. This knowledge can be useful to work on real time datasets and develop python programs for effective and insightful analysis. Furthermore, mastering the EDA process can be highly helpful in boosting the performance of machine learning algorithms. This can be useful for a career as data analyst, data scientist, or machine learning engineer.

Who this course is for:

Beginner Engineering Aspirants who want to learning data science, machine learning and deep learning.
Understand and apply the fundamental steps that can boost the performance of machine learning models.
Engineering Students of various background who can apply these concepts on their domain.
AI and data science aspirants who are looking for a single course on data cleaning, analysis and visualization using python.

Data Cleaning and Visualization in Python

What you'll learn

Explore related topics

Course content

Introduction1 lecture • 4min

Issues with Real Time Data2 lectures • 17min

Imputation2 lectures • 19min

Outlier analysis2 lectures • 43min

Data Visualization Techniques3 lectures • 1hr 15min

Requirements

Description

Who this course is for: