How it works...

The preceding recipe used the k-means clustering estimator of TensorFlow to cluster the given data into clusters. Here, as we know the number of clusters, we decided to keep num_clusters=3, but in most cases with unlabeled data, one is never sure how many clusters exist. One can determine the optimal number of clusters using the elbow method. The method is based on the principle that we should choose the cluster number that reduces that sum of squared error (SSE) distance. If k is the number of clusters, then as k increases, the SSE decreases, with SSE = 0; when k is equal to the number of data points, each point is its own cluster. We want a low value of k such that SSE is also low. In TensorFlow, we can find the SSE using the score() method defined in the KmeansClustering class; the method returns the total sum of distances to the nearest clusters:

sum_distances = kmeans.score(input_fn=input_fn, steps=100)

For Iris data, if we plot SSE for different k values, we see that for k=3, the variance in SSE is the highest; after that, it starts reducing, thus the elbow point is k=3:

Table of Contents for How it works...

Create new playlist

Sign In

Sign Up

Table of Contents for
How it works...