The following screenshot illustrates several aspects of PCA for a two-dimensional random dataset (see the pca_key_ideas notebook):
- The left panel shows how the first and second principal components align with the directions of maximum variance while being orthogonal.
- The central panel shows how the first principal component minimizes the reconstruction error, measured as the sum of the distances between the data points and the new axis.
- Finally, the right panel illustrates supervised OLS, which approximates the outcome variable (here we choose x2) by a (one-dimensional) hyperplane computed from the (single) feature. The vertical lines highlight how OLS minimizes the distance along the outcome axis, in contrast with PCA, which minimizes the distances orthogonal to the hyperplane.