We encountered the concept of hierarchical clustering in Chapter 9, Ensemble Learning and Dimensionality Reduction. In this recipe, we will segment an image by hierarchically clustering it. We will apply agglomerative clustering O(n3), which is a type of hierarchical clustering.
In agglomerative clustering, each item is assigned its own cluster at initialization. Later, these clusters merge (agglomerate) and move up the hierarchy as needed. Obviously, we only merge clusters that are similar by some measure.
After initialization, we find the pair that are closest by some distance metric and merge them. The merged cluster is a higher-level cluster consisting of lower-level clusters. After that, we again find the closest pair and merge them, and so on. During this process, clusters can have any number of items. We stop clustering after we reach a certain number of clusters, or when the clusters are too far apart.
import numpy as np from scipy.misc import ascent import matplotlib.pyplot as plt from sklearn.feature_extraction.image import grid_to_graph from sklearn.cluster import AgglomerativeClustering import dautil as dl
img = ascent() X = np.reshape(img, (-1, 1))
9
(a guess):connectivity = grid_to_graph(*img.shape) NCLUSTERS = 9 ac = AgglomerativeClustering(n_clusters=NCLUSTERS, connectivity=connectivity) ac.fit(X) label = np.reshape(ac.labels_, img.shape)
for l in range(NCLUSTERS): plt.contour(label == l, contours=1, colors=[plt.cm.spectral(l/float(NCLUSTERS)), ]) dl.plotting.img_show(plt.gca(), img, cmap=plt.cm.gray)
Refer to the following screenshot for the end result:
The code is in the clustering_hierarchy.ipynb
file in this book's code bundle.
AgglomerativeClustering
class documented at http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html (retrieved December 2015)