Although OpenCV does not provide an implementation of agglomerative hierarchical clustering, it is a popular algorithm that should, by all means, belong to our machine learning skill set:
- We start by generating 10 random data points, just like in the previous screenshot:
In [1]: from sklearn.datasets import make_blobs
... X, y = make_blobs(random_state=100, n_samples=10)
- Using the familiar statistical modeling API, we import the AgglomerativeClustering algorithm and specify the desired number of clusters:
In [2]: from sklearn import cluster
... agg = cluster.AgglomerativeClustering(n_clusters=3)
- Fitting the model to the data works, as usual, via the fit_predict method:
In [3]: labels = agg.fit_predict(X)
- We can generate a scatter plot where every data point is colored according to the predicted label:
In [4]: import matplotlib.pyplot as plt
... %matplotlib inline
... plt.style.use('ggplot')
... plt.scatter(X[:, 0], X[:, 1], c=labels, s=100)
The resulting clustering is equivalent to the following diagram:
Finally, before we end this chapter, let's look at how to compare clustering algorithms and choose the correct clustering algorithm for the data you have!