t-SNE is a technique for dimensionality reduction that is best suited to the visualization of high-dimensional data.
In this section, we will see an example of how to visualize high-dimensional datasets using t-SNE. Let's use the digits dataset in this case, which has handwritten images of digits from 0 to 9. It's a publicly available dataset, commonly referred to as the MNIST dataset. We will see how we can visualize the dimensionality reduction on this dataset using t-SNE:
- First, let's load the dataset:
In [1]: import numpy as np
In [2]: from sklearn.datasets import load_digits
In [3]: digits = load_digits()
In [4]: X, y = digits.data/255.0, digits.target
In [5]: print(X.shape, y.shape)
Out[5]: (1797, 64) (1797,)
- You should first apply a dimensional reduction technique such as PCA to reduce the high number of dimensions to a lower number and then use a technique such as t-SNE to visualize the data. But, in this case, let's use all of the dimensions and use t-SNE directly:
In [6]: from sklearn.manifold import TSNE
In [7]: tsne = TSNE(n_components=2, verbose=1, perplexity=40, n_iter=300)
In [8]: tsne_results = tsne.fit_transform(df.loc[:,features].values)
Out[8]: [t-SNE] Computing 121 nearest neighbors...
... [t-SNE] Indexed 1797 samples in 0.009s...
... [t-SNE] Computed neighbors for 1797 samples in 0.395s...
... [t-SNE] Computed conditional probabilities for sample 1000 / 1797
... [t-SNE] Computed conditional probabilities for sample 1797 / 1797
... [t-SNE] Mean sigma: 0.048776
... [t-SNE] KL divergence after 250 iterations with early exaggeration: 61.094833
... [t-SNE] KL divergence after 300 iterations: 0.926492
- Finally, let's visualize the two dimensions that we have extracted using t-SNE with the help of a scatterplot:
In [9]: import matplotlib.pyplot as plt
In [10]: plt.scatter(tsne_results[:,0],tsne_results[:,1],c=y/10.0)
... plt.xlabel('x-tsne')
... plt.ylabel('y-tsne')
... plt.title('t-SNE')
In [11]: plt.show()
And we get the following output:
Now, let's discuss how we can represent categorical variables in the next section.