K-means clustering

K-means is a relatively simple and effective way to cluster data. The main idea is that by starting with a number of K points as the initial cluster centers, each instance is assigned to the nearest cluster center. Then, the centers are re-calculated as the mean point of their respective members. This process repeats until the cluster centers no longer change. The main steps are as follows:

  1. Select the number of clusters, K
  2. Select K random instances as the initial cluster centers
  3. Assign each instance to the closest cluster center
  4. Re-calculate the cluster centers as the mean of each cluster's members
  5. If the new centers differ from the previous, go back to Step 3

A graphical example is depicted as follows. After four iterations, the algorithm converges:

The first four iterations on a toy dataset. Stars represent the cluster centers
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset