Linear discriminant analysis (LDA) is an algorithm that looks for a linear combination of features in order to distinguish between classes. It can be used for classification or dimensionality reduction by projecting to a lower dimensional subspace. LDA requires a target attribute both for classification and dimensionality reduction.
If we represent class densities as multivariate Gaussians, then LDA assumes that the classes have the same covariance matrix. We can use training data to estimate the parameters of the class distributions.
In scikit-learn, lda.LDA
has been deprecated in 0.17 and renamed discriminant_analysis.LinearDiscriminantAnalysis
. The default solver of this class uses singular value decomposition, does not need to calculate the covariance matrix, and is therefore fast.
The code is in the applying_lda.ipynb
file in this book's code bundle:
import dautil as dl from sklearn.discriminant_analysis import LinearDiscriminantAnalysis import matplotlib.pyplot as plt
df = dl.data.Weather.load().dropna() X = df.values y = df['WIND_DIR'].values
lda = LinearDiscriminantAnalysis(n_components=2) X_r = lda.fit(X, y).transform(X).T
plt.scatter(X_r[0], X_r[1]) plt.xlabel('x') plt.ylabel('y') plt.title('Dimension Reduction with LDA')
Refer to the following screenshot for the end result: