Visualizing the dimensionality reduction using t-Distributed Stochastic Neighbor Embedding (t-SNE)

t-SNE is a technique for dimensionality reduction that is best suited to the visualization of high-dimensional data. 

In this section, we will see an example of how to visualize high-dimensional datasets using t-SNE. Let's use the digits dataset in this case, which has handwritten images of digits from 0 to 9. It's a publicly available dataset, commonly referred to as the MNIST dataset. We will see how we can visualize the dimensionality reduction on this dataset using t-SNE:

  1. First, let's load the dataset:
In [1]: import numpy as np
In [2]: from sklearn.datasets import load_digits
In [3]: digits = load_digits()
In [4]: X, y = digits.data/255.0, digits.target
In [5]: print(X.shape, y.shape)
Out[5]: (1797, 64) (1797,)
  1. You should first apply a dimensional reduction technique such as PCA to reduce the high number of dimensions to a lower number and then use a technique such as t-SNE to visualize the data. But, in this case, let's use all of the dimensions and use t-SNE directly:
In [6]: from sklearn.manifold import TSNE
In [7]: tsne = TSNE(n_components=2, verbose=1, perplexity=40, n_iter=300)
In [8]: tsne_results = tsne.fit_transform(df.loc[:,features].values)
Out[8]: [t-SNE] Computing 121 nearest neighbors...
... [t-SNE] Indexed 1797 samples in 0.009s...
... [t-SNE] Computed neighbors for 1797 samples in 0.395s...
... [t-SNE] Computed conditional probabilities for sample 1000 / 1797
... [t-SNE] Computed conditional probabilities for sample 1797 / 1797
... [t-SNE] Mean sigma: 0.048776
... [t-SNE] KL divergence after 250 iterations with early exaggeration: 61.094833
... [t-SNE] KL divergence after 300 iterations: 0.926492
  1. Finally, let's visualize the two dimensions that we have extracted using t-SNE with the help of a scatterplot:
In [9]: import matplotlib.pyplot as plt
In [10]: plt.scatter(tsne_results[:,0],tsne_results[:,1],c=y/10.0)
... plt.xlabel('x-tsne')
... plt.ylabel('y-tsne')
... plt.title('t-SNE')
In [11]: plt.show()

And we get the following output:

Now, let's discuss how we can represent categorical variables in the next section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset