How it works...

Hierarchical clustering is a clustering technique that tries to build a hierarchy of clusters iteratively. Generally, there are two approaches to building hierarchical clusters:

Agglomerative hierarchical clustering: This is a bottom-up approach. Each observation starts in its own cluster. We can then compute the similarity (or the distance) between each cluster and then merge the two most similar ones at each iteration until there is only one cluster left.
Divisive hierarchical clustering: This is a top-down approach. All observations start in one cluster, and then we split the cluster into the two least dissimilar clusters recursively until there is one cluster for each observation:

An illustration of hierarchical clustering

Before performing hierarchical clustering, we need to determine how similar the two clusters are. Here, we list some common distance functions used for the measurement of similarity:

Single linkage: This refers to the shortest distance between two points in each cluster:

Complete linkage: This refers to the longest distance between two points in each cluster:

Average linkage: This refers to the average distance between two points in each cluster (where is the size of cluster and is the size of cluster):

Ward method: This refers to the sum of the squared distance from each point to the mean of the merged clusters (where is the mean vector of):

In this recipe, we perform hierarchical clustering on customer data. First, we load the data from customer.csv, and then load it into the customer DataFrame. Within the data, we find five variables of customer account information, which are ID, number of visits, average expense, sex, and age. As the scale of each variable varies, we use the scale function to normalize the scale.

After the scales of all the attributes are normalized, we perform hierarchical clustering using the hclust function. We use the Euclidean distance as distance metrics, and use Ward's minimum variance method to perform agglomerative clustering.

Finally, we use the plot function to plot the dendrogram of hierarchical clusters. We specify hang to display labels at the bottom of the dendrogram, and use cex to shrink the label to 70 percent of the normal size. In order to compare the differences using the ward.D2 and single methods to generate a hierarchy of clusters, we draw another dendrogram using single in the preceding figure (step 6).

Table of Contents for How it works...

Create new playlist

Sign In

Sign Up

Table of Contents for
How it works...