Clustering Data with Hierarchical Clustering

Packt Publishing
A free video tutorial from Packt Publishing
Tech Knowledge in Motion
3.9 instructor rating • 1266 courses • 371,590 students

Lecture description

Hierarchical clustering adopts either an agglomerative or a divisive method to build a hierarchy of clusters. This video shows us how to cluster data with the help of hierarchical clustering.

Learn more from the full course

Learning Path: R: Complete Machine Learning & Deep Learning

Unleash the true potential of R to unlock the hidden layers of data

17:36:05 of on-demand video • Updated June 2017

  • Develop R packages and extend the functionality of your model
  • Perform pre-model building steps
  • Understand the working behind core machine learning algorithms
  • Build recommendation engines using multiple algorithms
  • Incorporate R and Hadoop to solve machine learning problems on Big Data
  • Understand advanced strategies that help speed up your R code
  • Learn the basics of deep learning and artificial neural networks
  • Learn the intermediate and advanced concepts of artificial and recurrent neural networks
English [Auto] In our previous section we got acquainted with the ensemble a learning method. Welcome to the ninth section of this course titled clustering in this section we will see how to cluster data using hierarchical clustering and k means method. We will also cut trees into clusters and draw by various cluster plot further. We will compare clustering methods and see how to extract silhouette information from clustering. Next we will see how to obtain the optimum number of clusters for k means. Later we move on to cluster data with the density based and the model based method. In the end we will visualize a dissimilarity matrix and validate clusters externally. Let's get started with the first video of this section titled clustering data with hierarchical clustering where we will demonstrate how to cluster customers with hierarchical clustering. In this video we will perform hierarchical clustering on customer data which involve segmenting customers into different groups. For that we will first download the data and examine the data structure. We will then use agglomerated hierarchical clustering to cluster data. Begin hierarchical clustering adopt either in a glomerata or divisive method to build a hierarchy of clusters regardless of which approach is adopted. Both First use a distance similarity measure to combine or split clusters. The recursive process continues until there is only one cluster left where you cannot split more clusters. Eventually we can use agenda Graham to represent the hierarchy of clusters in order to perform hierarchical clustering on customer data. We will first download the data from this GET HELP page. You need to right click and download the customer docs CSP file as shown here. Let's go to the session now. We need to make sure that we place the customer data file in the working directory. Next we need to perform a few steps in order to cluster customer data into a hierarchy of clusters. First you need to load data from custom adopt C S V and save it into customer. This is the line of code that we need to run. You can see the output here all the values are loaded. You can then examine the dataset structure using S T R function. You can see the output here. Next you should normalize the customer data into the same scale. This can be done using this line of code. Further you can use agglomerated hierarchical clustering to cluster the customer data. This is the line of code to do that. This is the output. Lastly you can use the plot function to plot the den to ground. This is the code for at this is how the dhanda gram of hierarchical clustering looks like. Finally we can simply use the plot function to plot the then the gramme of the hierarchical clusters. This is the code for it. You can see the pentagram class here we specify hang to display labels at the bottom of the pentagram and use C-x to shrink the label to 70 percent of the normal size in order to compare the differences using the word dot D-2 and single methods to generate a hierarchy of clusters. Let's draw another underground using single. Additionally you can use the single method to perform hierarchical clustering and see how the generated pentagram differs from the previous. This is the line of code to do that you can see the difference between the two graphs you can choose a different distance measure and method while performing hierarchical clustering. For more details you can refer to the documents for d'Este and H. Clast functions for that you can use help command. You can use help command for h class to. In this video we use JH Clast to perform agglomerated hierarchical clustering. If you would like to perform divisive hierarchical clustering you can use the Dyana function first. You can use Dyana to perform divisive hierarchical clustering. This is the code to do that hierarchical clustering is a clustering technique that tries to build a hierarchy of clusters. Intuitively generally there are two approaches to build hierarchical clusters. Number one agglomerated hierarchical clustering. This is a bottom up approach. Each observation starts in its own cluster. We can then compute the similarity with the distance between each cluster and then merge the two most similar ones at each iteration until there is only one cluster left. Number two divisive hierarchical clustering. This is a top down approach. All observations start in one cluster and then we split the cluster into two least dissimilar clusters recursively until there is one cluster for each observation. As shown here before performing hierarchical clustering we needs to determine how similar the two clusters are. Here we list some common distance functions used for the measurement of similarity number one single linkage. This refers to the shortest distance between two points in each cluster. Number two complete linkage. This refers to the longest distance between the two points in each cluster. Number three average linkage. This refers to the average distance between two points in each cluster where C is the size of cluster C and C J is the size of cluster C J. Number four. Warda method. This refers to the sum of the square distance from each point to the mean of the merge clusters. Next we will see how to use a cluster to perform agglomerated hierarchical clustering. If you would like to perform divisive hierarchical clustering you can use the Dyana function first. You can use Dyana to perform divisive hierarchical clustering. This is the code to load the library and this is the code to load the Dyana function. Then you can use summary function to obtain the summary information. This is the code to do that. You can see the summary information here. Lastly you can plot a diagram and banner with the plot function. This is the code to plot the graph. It will ask you to write the return statement. You can do that and to press Enter if you are interested in drawing a horizontal pentagram. You can use the Dand extend package. Use the following procedure to generate a horizontal underground. First install and load the Dandie extend and Magritte to your packages. If you are version is point and above you do not have to install and load the Magritte TR package. You can see the packages being installed so that don't extend package is getting loaded Magritte to your package is loaded here. Next lets set up the den to Graham finally plot the horizontal Dinda Graham shown here. Great. So in this video we saw how to cluster data with hierarchical clustering in our next video. We will see how to cut trees into cluster.