Dimension reduction – feature transformations versus feature selection versus feature construction

In the last section, I mentioned how we could squish datasets to have fewer columns to describe data in new ways. This sounds similar to the concept of feature selection: removing columns from our original dataset to create a different, potentially better, views of our dataset by cutting out the noise and enhancing signal columns. While both feature selection and feature transformation are methods of performing dimension reduction, it is worth mentioning that they could not be more different in their methodologies. 

Feature selection processes are limited to only being able to select features from the original set of columns, while feature transformation algorithms use these original columns and combine them in useful ways to create new columns that are better at describing the data than any single column from the original dataset. Therefore, feature selection methods reduce dimensions by isolating signal columns and ignoring noise columns.

Feature transformation methods create new columns using hidden structures in the original datasets to produce an entirely new, structurally different dataset. These algorithms create brand new columns that are so powerful that we only need a few of them to explain our entire dataset accurately.

We also mentioned that feature transformation works by producing new columns that capture the essence (variance) of the data. This is similar to the crux of feature construction: creating new features for the purpose of capturing latent structures in data. Again, we should mention that these two different processes achieve similar results using vastly different methods.

Feature construction is again limited to constructing new columns using simple operations (addition, multiplication, and so on) between a few columns at a time. This implies that any constructed features using classical feature construction are constructed using only a few columns from the original dataset at a time. If our goal is to create enough features to capture all possible feature interactions, that might take an absurd number of additional columns. For example, if given a dataset had 1,000 features or more, we would need to create tens of thousands of columns to construct enough features to capture even a subset of all possible feature interactions.

Feature transformation methods are able to utilize small bits of information from all columns in every new super-column, so we do not need to create an inordinate amount of new columns to capture latent feature interactions. Due to the nature of feature transformation algorithms and its use of matrixes/linear algebra, feature transformation methods never create more columns than we start with, and are still able to extract the latent structure that features construction columns attempt to extract.

Feature transformation algorithms are able to construct new features by selecting the best of all columns and combining this latent structure with a few brand new columns. In this way, we may consider feature transformation as one of the most powerful sets of algorithms that we will discuss in this text. That being said, it is time to introduce our first algorithm and dataset in the book: Principal Components Analysis (PCA) and the iris dataset.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset