Implementing agglomerative hierarchical clustering

Although OpenCV does not provide an implementation of agglomerative hierarchical clustering, it is a popular algorithm that should, by all means, belong to our machine learning skill set:

We start by generating 10 random data points, just like in the previous screenshot:

In [1]: from sklearn.datasets import make_blobs
...     X, y = make_blobs(random_state=100, n_samples=10)

Using the familiar statistical modeling API, we import the AgglomerativeClustering algorithm and specify the desired number of clusters:

In [2]: from sklearn import cluster
...     agg = cluster.AgglomerativeClustering(n_clusters=3)

Fitting the model to the data works, as usual, via the fit_predict method:

In [3]: labels = agg.fit_predict(X)

We can generate a scatter plot where every data point is colored according to the predicted label:

In [4]: import matplotlib.pyplot as plt
... %matplotlib inline
... plt.style.use('ggplot')
... plt.scatter(X[:, 0], X[:, 1], c=labels, s=100)

The resulting clustering is equivalent to the following diagram:

Finally, before we end this chapter, let's look at how to compare clustering algorithms and choose the correct clustering algorithm for the data you have!

Table of Contents for Implementing agglomerative hierarchical clustering

Create new playlist

Sign In

Sign Up

Table of Contents for
Implementing agglomerative hierarchical clustering