Feature extraction

When the data is too large to be processed, it is transformed into a reduced representation set of features. The process of transforming the input data into a set of features is called feature extraction. Indeed, feature extraction starts from an initial set of measured data and builds derivative values that can retain the information contained in the original dataset but emptied of redundant data.

The extraction of features means simplifying the cost of resources required to describe a large set of data accurately. When performing complex data analysis, one of the biggest problems is to limit the number of variables involved. An analysis of a large number of variables generally requires a large use of memory and processing or machine learning algorithms; they need a high adaptation threshold with test samples and they poorly generalize new samples. Features extraction is a general term for variable combination construction methods used to circumvent these problems but describe them with sufficient accuracy.

The most used feature extraction method is principal component analysis (PCA). PCA generates a new set of variables, among them uncorrelated objects, called principal components; each main component is a linear combination of the original variables. All principal components are orthogonal to each other, so there is no redundant information. The principal components as a whole constitute an orthogonal basis for the data space. The goal of PCA is to explain the maximum amount of variance with the least number of principal components. PCA is a form of multidimensional scaling. It is a linear transformation of the variables into a lower dimensional space that retains the maximum amount of information about the variables. A principal component is therefore a combination of the original variables after a linear transformation.

In the following graph, you can see the results of PCA with the data distributed according to the first two principal components:

In this case, two new variables are represented, obtained as linear combinations of the original variables so that they are able to explain a significant portion of the total variance of the data. The method has a strong geometric connotation and has its theoretical justification in the theory of symmetric matrices.

The first main component explains the maximum percentage of the variability present in the data that can be represented in a single dimension. This component returns the direction along which maximum data dispersion is recorded. Furthermore, this percentage of explained variability can be calculated through the variance. The variance is in fact an index of the dispersion of data along a particular direction. Moreover, it is independent of the reference system: a rotation of the axes keeps the total variance in the data unchanged. The total variance is the sum of the variances along all the directions and a measure of the variability present in the dataset.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset