K-means is a relatively simple and effective way to cluster data. The main idea is that by starting with a number of K points as the initial cluster centers, each instance is assigned to the nearest cluster center. Then, the centers are re-calculated as the mean point of their respective members. This process repeats until the cluster centers no longer change. The main steps are as follows:
- Select the number of clusters, K
- Select K random instances as the initial cluster centers
- Assign each instance to the closest cluster center
- Re-calculate the cluster centers as the mean of each cluster's members
- If the new centers differ from the previous, go back to Step 3
A graphical example is depicted as follows. After four iterations, the algorithm converges:
The first four iterations on a toy dataset. Stars represent the cluster centers