PCA based on the covariance matrix

We first compute the principal components using the square covariance matrix with the pairwise sample covariances for the features xi, xj, i, j = 1, ..., n as entries in row i and column j:

For a square matrix M of n dimensions, we define the eigenvectors ωi and eigenvalues λi, i=1, ..., n as follows:

Hence, we can represent the matrix M using eigenvectors and eigenvalues, where W is a matrix that contains the eigenvectors as column vectors, and L is a matrix that contains the λi as diagonal entries (and 0s otherwise). We define the eigendecomposition as follows:

Using NumPy, we implement this as follows, where the pandas DataFrame contains the 100 data points of the ellipse:

# compute covariance matrix:
cov = np.cov(data, rowvar=False) # expects variables in rows by default
(3, 3)

Next, we calculate the eigenvectors and eigenvalues of the covariance matrix. The eigenvectors contain the principal components (where the sign is arbitrary):

eigen_values, eigen_vectors = eig(cov)
array([[ 0.71409739, -0.66929454, -0.20520656],
[-0.70000234, -0.68597301, -0.1985894 ],
[ 0.00785136, -0.28545725, 0.95835928]])

We can compare the result with the result obtained from sklearn, and find that they match in absolute terms:

pca = PCA()
C = pca.components_.T # columns = principal components
array([[ 0.71409739, 0.66929454, 0.20520656],
[-0.70000234, 0.68597301, 0.1985894 ],
[ 0.00785136, 0.28545725, -0.95835928]])
np.allclose(np.abs(C), np.abs(eigen_vectors))

We can also verify the eigendecomposition, starting with the diagonal matrix L that contains the eigenvalues:

# eigenvalue matrix
ev = np.zeros((3, 3))
np.fill_diagonal(ev, eigen_values)
ev # diagonal matrix
array([[1.92923132, 0. , 0. ],
[0. , 0.55811089, 0. ],
[0. , 0. , 0.00581353]])

We find that the result does indeed hold:

decomposition = eigen_vectors.dot(ev).dot(inv(eigen_vectors))
np.allclose(cov, decomposition)
