Cluster Analysis: Unsupervised Machine Learning with Python
4.1 (8 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
211 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Cluster Analysis: Unsupervised Machine Learning with Python to your Wishlist.

Add to Wishlist

Cluster Analysis: Unsupervised Machine Learning with Python

Discover two non-hierarchical clustering algorithms, k-means and DBSCAN.
4.1 (8 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
211 students enrolled
Created by Ermin Dedic
Last updated 7/2017
English
Current price: $12 Original price: $195 Discount: 94% off
3 days left at this price!
30-Day Money-Back Guarantee
Includes:
  • 1 hour on-demand video
  • 2 Articles
  • 6 Supplemental Resources
  • Full lifetime access
  • Access on mobile and TV

Training 5 or more people?

Get your team access to Udemy's top 2,000 courses anytime, anywhere.

Try Udemy for Business
What Will I Learn?
  • Apply kmeans clustering
  • Apply DBSCAN clustering
  • Appreciate and understand the purpose of unsupervised machine learning
View Curriculum
Requirements
  • Understanding of Python at beginner or intermediate level is useful
Description

*New Course*


This course is ideal for those that are interested in data mining/data analysis.

Most data in the world (whether text,audio,visual, etc) is raw or unlabeled. This is precisely the reason that unsupervised machine learning has become so important. By using certain approaches to unsupervised machine learning (like clustering) we can discover patterns or underlying structures in data. This is a major component of exploratory data mining. Furthermore, when one does EDA, it is used to draw hypotheses, assess assumptions about our statistical inferences, and its used as a basis for further research. For example, the conclusion of a cluster analysis could result in the initiation of a full scale experiment.

The course starts by covering two of the most important and common non-hierarchical clustering algorithms, K-means and DBSCAN using Python. Later, I cover hierarchical clustering using the Agglomerative method, utilizing the SAS programming language.  Quite a few examples are used to aide learning.

With K-Means, we start with a 'starter' (or simple) example. We then discuss 'Completeness Score'. The next lesson we discuss how k-means deals with larger variances and different shapes. Then we discuss 'Color Quantization'. This is used when an individual wants to decrease the size of an image/and or see if there is any underlying structure to an image. Finally, we will take a look at cells of the human body, and do some cell segmentation. For DBSCAN, we will look at a starter example as well using Blobs. Then I will show you how DBSCAN overcomes some of the issues of K-means.

We will also cover (available soon) the Agglomerative method using SAS programming language. Single-linkage and average-linkage criteria will be discussed. There will be a visual example (not using SAS), and then there will be a dataset used with SAS.

Who is the target audience?
  • Students interested in clustering techniques and unsupervised machine learning
  • Interest in data mining and/or data analysis
Compare to Other SAS Courses
Curriculum For This Course
+
Introduction
3 Lectures 03:03

A short and easy intro into machine learning and cluster analysis.

Preview 02:18

Install Python
00:44

The Two Images Used in Lectures
00:00
+
K-Means Algorithm
8 Lectures 31:46

An intro to k-means. We talk about a couple parameters(arguments), and look at a visual example.

An Intro To K-Means
07:00

A simple example to get us started with k-means.

Preview 05:20

A completeness scores essentially tells us how closely the grouping (cluster) resembles the class (or label).

Completeness Score
02:39

This lesson helps you understand how to think about clustering. Essentially, in clustering, imagine that you have the attributes/dimensions (the different columns of data (age, weight), but you don't know the classes(labels). If you pick the wrong k, then it makes sense that completeness score would not be great as it measures how well the cluster(grouping) resembles the class(label).

Completeness Score When You Select The Wrong K
02:07

K-means is not great at handling more variance (more spread to the data), and certain shapes.

How Does K-Means Handle More Variance and Different Shape in Data?
02:07

K-means has real applications. One common application is the reduction of image sizes, and often trying to seeif there is any underlying structure to the data.

Color Quantization
03:31


Clustering is being used more in medical research, and it's because algorithms like k-means can help researchers focus on certain portions of cells, and intensify and uncover certain structures that may not be seen with the naked eye.

Clustering Cell Segments (medical research example)
04:15

Quiz 1

K-Means Quiz
4 questions
+
DBSCAN
3 Lectures 12:41

I discuss the two main parameters of DBSCAN, and expand on it with a clear example.

An Intro To DBSCAN
05:46

A DBSCAN example with Blobs..it helps illustrate one way that DBSCAN overcomes some of the issues of kmeans. (good outlier detection)

DBSCAN Example With Blobs
03:36

DBSCAN works its magic when it comes to non-spherical data. I compare the power of the DBSCAN algorithm on data that is oddly shaped vs kmeans. We compare side by side.

DBSCAN AND Non-Spherical Data
03:19

DBSCAN Quiz

DBSCAN Quiz
2 questions
About the Instructor
Ermin Dedic
4.3 Average rating
329 Reviews
4,497 Students
9 Courses
Data Science, Graduate Student (Educational Psychology)

The success and fun I had with statistics based courses in University has resulted in my current teaching interests. My interest in data and statistics boils down to my passion for finding the objective truth, and applying these findings in life and business. Currently, I teach five courses. A Statistics course, a SAS course in English, a SAS course in Portuguese (with subtitles, but English instruction), a SAS SQL course, and a Pandas (Python 3 ) course.

O sucesso e diversão que eu tive durante meus cursos de estatística na Universidade resultaram em meu interesse em ensinar. Meu interesse em dados e estatística vêm de minha paixão por encontrar verdades objetivas, e aplicar estas descobertas na vida e no negócio. Atualmente, eu ensino quatro cursos. Um curso de Estatística, um curso de SAS em inglês, um curso de SAS em português (com legendas, mas instruções em inglês) e um curso de Pandas (Python 3).