The types of feature selection

Recall that our goal with feature selection is to improve our machine learning capabilities by increasing predictive power and reducing the time cost. To do this, we introduce two broad categories of feature selection: statistical-based and model-based. Statistical-based feature selection will rely heavily on statistical tests that are separate from our machine learning models in order to select features during the training phase of our pipeline. Model-based selection relies on a preprocessing step that involves training a secondary machine learning model and using that model's predictive power to select features.

Both of these types of feature selection attempt to reduce the size of our data by subsetting from our original features only the best ones with the highest predictive power. We may intelligently choose which feature selection method might work best for us, but in reality, a very valid way of working in this domain is to work through examples of each method and measure the performance of the resulting pipeline.

To begin, let's take a look at the subclass of feature selection modules that are reliant on statistical tests to select viable features from a dataset.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset