# What is unsupervised learning used for?

**A free video tutorial from**Lazy Programmer Team

## Lecture description

This lecture describes what unsupervised machine learning (not just clustering) is used for in general.

There are 2 major categories:

1) **density estimation**

If we can figure out the probability distribution of the data, not only is this a model of the data, but we can then *sample* from the distribution to generate new data.

For example, we can train a model to read lots of Shakespeare and then generate writing in the style of Shakespeare.

2) **latent variables**

This allows us to find the underlying cause of the data we've observed by reducing it to a small set of factors.

For example, if we measure the heights of all the people in our class and plot them on a histogram, we may notice 2 "bumps".

These "bumps" correspond to male heights and female heights.

Thus, being male or female is the hidden cause of higher / lower height values.

Clustering does exactly this - it tells us how the data can be split up into distinct groups / segments / categories.

Unsupervised machine learning can also be used for:

dimensionality reduction - modern datasets can have millions of features, but many of them may be correlated

visualization - you can't see a million-dimensional dataset, but if you reduce the dimensionality to 2, then it can be visualized

### Learn more from the full course

Cluster Analysis and Unsupervised Machine Learning in PythonData science techniques for pattern recognition, data mining, k-means clustering, and hierarchical clustering, and KDE.

07:54:19 of on-demand video • Updated January 2021

- Understand the regular K-Means algorithm
- Understand and enumerate the disadvantages of K-Means Clustering
- Understand the soft or fuzzy K-Means Clustering algorithm
- Implement Soft K-Means Clustering in Code
- Understand Hierarchical Clustering
- Explain algorithmically how Hierarchical Agglomerative Clustering works
- Apply Scipy's Hierarchical Clustering library to data
- Understand how to read a dendrogram
- Understand the different distance metrics used in clustering
- Understand the difference between single linkage, complete linkage, Ward linkage, and UPGMA
- Understand the Gaussian mixture model and how to use it for density estimation
- Write a GMM in Python code
- Explain when GMM is equivalent to K-Means Clustering
- Explain the expectation-maximization algorithm
- Understand how GMM overcomes some disadvantages of K-Means
- Understand the Singular Covariance problem and how to fix it