Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Hierarchically clustering data

In Python Data Analysis, you learned about clustering—separating data into clusters without providing any hints-which is a form of unsupervised learning. Sometimes, we need to take a guess for the number of clusters, as we did in the Clustering streaming data with Spark recipe.

There is no restriction against having clusters contain other clusters. In such a case, we speak of hierarchical clustering. We need a distance metric to separate data points. Take a look at the following equations:

In this recipe, we will use Euclidean distance (9.2), provided by the SciPy pdist() function. The distance between sets of points is given by the linkage criteria. In this recipe, we will use the single-linkage criteria (9.3) provided by the SciPy linkage() function.

How to do it...

The script is in the clustering_hierarchy.ipynb file in this book's code bundle:

The imports are as follows:

from scipy.spatial.distance import pdist
from scipy.cluster.hierarchy import linkage
from scipy.cluster.hierarchy import dendrogram
import dautil as dl
import matplotlib.pyplot as plt

Load the data, resample to annual values, and compute distances:

df = dl.data.Weather.load().resample('A').dropna()
dist = pdist(df)

Plot the hierarchical cluster as follows:

dendrogram(linkage(dist), labels=[d.year for d in df.index],
           orientation='right')
plt.tick_params(labelsize=8)
plt.xlabel('Cluster')
plt.ylabel('Year')

Refer to the following screenshot for the end result:

Table of Contents for
Hierarchically clustering data

Hierarchically clustering data

How to do it...

See also

Table of Contents for Hierarchically clustering data

Create new playlist

Sign In

Sign Up

Hierarchically clustering data

How to do it...

See also

Table of Contents for
Hierarchically clustering data