k-medoids

As we have described earlier, the k-means (medians) algorithm is best suited to particular distance metrics, the squared Euclidean and Manhattan distance (respectively), since these distance metrics are equivalent to the optimal value for the statistic (such as total squared distance or total distance) that these algorithms are attempting to minimize. In cases where we might have other distance metrics (such as correlations), we might also use the k-medoid method (Theodoridis, Sergios, and Konstantinos Koutroumbas. Pattern recognition. (2003).), which consists of the following steps:

  1. Select k initial points as the initial cluster centers.
  2. Calculate the nearest cluster center for each datapoint by any distance metric and assign it to that cluster.
  3. For each point and each cluster center, swap the cluster center with the point and calculate the reduction in overall distances to the cluster center across all cluster members using this swap. If it doesn't improve, undo it. Iterate step 3 for all points.

This is obviously not an exhaustive search (since we don't repeat step 1), but has the advantage that the optimality criterion is not a specific optimization function but rather improving the compactness of the clusters by a flexible distance metric. Can k-medoids improve our clustering of concentric circles? Let's try running using the following commands and plotting the result:

>>> from pyclust import KMedoids
>>> kmedoids_clusters = KMedoids(2).fit_predict(np.array(df)[:,1:])
>>> df.plot(kind='scatter', x='x_coord', y='y_coord', c=kmedoids_clusters)

Note

Note that k-medoids is not included in sci-kit learn, so you will need to install the pyclust library using easy_install or pip.

k-medoids

There isn't much improvement over k-means, so perhaps we need to change our clustering algorithm entirely. Perhaps instead of generating a similarity between datapoints in a single stage, we could examine hierarchical measures of similarity and clustering, which is the goal of the agglomerative clustering algorithms we will examine next.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset