Applying linear discriminant analysis for dimension reduction

Linear discriminant analysis (LDA) is an algorithm that looks for a linear combination of features in order to distinguish between classes. It can be used for classification or dimensionality reduction by projecting to a lower dimensional subspace. LDA requires a target attribute both for classification and dimensionality reduction.

If we represent class densities as multivariate Gaussians, then LDA assumes that the classes have the same covariance matrix. We can use training data to estimate the parameters of the class distributions.

In scikit-learn, lda.LDA has been deprecated in 0.17 and renamed discriminant_analysis.LinearDiscriminantAnalysis. The default solver of this class uses singular value decomposition, does not need to calculate the covariance matrix, and is therefore fast.

How to do it...

The code is in the applying_lda.ipynb file in this book's code bundle:

  1. The imports are as follows:
    import dautil as dl
    from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
    import matplotlib.pyplot as plt
  2. Load the data as follows:
    df = dl.data.Weather.load().dropna()
    X = df.values
    y = df['WIND_DIR'].values
  3. Apply LDA to project the data into a two-dimensional space:
    lda = LinearDiscriminantAnalysis(n_components=2)
    X_r = lda.fit(X, y).transform(X).T
  4. Plot the result of the transformation:
    plt.scatter(X_r[0], X_r[1])
    plt.xlabel('x')
    plt.ylabel('y')
    plt.title('Dimension Reduction with LDA')

Refer to the following screenshot for the end result:

How to do it...

See also

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset