The first caveat – no guarantee of finding the global optimum

Although mathematicians have proved that the expectation-maximization step improves the result in each step, there is still no guarantee that, in the end, we will find the global best solution. For example, if we use a different random seed in our simple example (such as using seed 10 instead of 5), we suddenly get very poor results:

In [9]: centers, labels = find_clusters(X, 4, rseed=10)
... plt.scatter(X[:, 0], X[:, 1], c=labels, s=100, cmap='viridis');

This will generate the following diagram:

The preceding diagram shows an example of k-means missing the global optimum. What happened?

The short answer is that the random initialization of cluster centers was unfortunate. It led to the center of the yellow cluster migrating in-between the two top blobs, essentially combining them into one. As a result, the other clusters got confused because they suddenly had to split two visually distinct blobs into three clusters.

For this reason, it is common for the algorithm to be run for multiple initial states. Indeed, OpenCV does this by default (set by the optional attempts parameter).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset