
Identify duplicate rows and keep one observation per unique entry, showing how removing duplicates affects the collection.
Identify missing values in a dataset, learn when to remove or impute them, visualize missingness, and apply mean or median imputation to numeric features for data prepared for PCA.
Learn how to select a subset of the data based on specified criteria, such as year greater than 2000 and age less than 30, and compute the proportions for analysis.
Select the number of principal components in PCA by examining the curve, noting a big drop at five, and diagnose each component with a correlation matrix.
Explore data manipulation and principal component analysis, computing principal components from a standardized correlation matrix using eigenvectors, projecting data into reduced five-component space.
In this course, we learn the following:
How to Stet a working directory
How to Import a txt or csv file
How to eliminate duplicate rows in the data
How to detect rows containing missing values
How to eliminate rows containing missing values
How to replace missing values
How to select a subset of the data based on specifics criteria
How to do arithmetic on columns
How detect strongly correlated variable (some nice plots for visualization )
How to compute the correlation matrix , the eigenvalue and eigenvector
How select the number of components
How to compute the components