Selecting features

If we want to be nice to our machine learning algorithm, we provide it with features that are not dependent on each other, but which are highly dependent on the value that is to be predicted. This means that each feature adds salient information. Removing any of the features will lead to a drop in performance.

If we only have a handful of features, we could draw a matrix of scatter plots (one scatter plot for each feature pair combination). Relationships between the features could then be easily spotted. For every feature pair showing an obvious dependence, we would then think of whether we should remove one of them or better design a newer, cleaner feature out of both.

Most of the time, however, we have more than a handful of features to choose from. Just think of the classification task where we had a bag of words to classify the quality of an answer, which would require a 1,000 x 1,000 scatter plot (using a vocabulary of 1,000 words). In this case, we need a more automated way to detect overlapping features and to resolve them. We will present two general ways to do so in the following subsections.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset