Feature selection

Feature selection is one of the toughest parts of financial model building. Feature selection can be done statistically or by having domain knowledge. Here we are going to discuss only a few of the statistical feature selection methods in the financial space.

Removing irrelevant features

Data may contain highly correlated features and the model does better if we do not have highly correlated features in the model. The Caret R package gives the method for finding a correlation matrix between the features, which is shown by the following example.

A few lines of data used for correlation analysis and multiple regression analysis are displayed here by executing the following code:

>DataMR = read.csv("C:/Users/prashant.vats/Desktop/Projects/BOOK R/DataForMultipleRegression.csv") 
>head(DataMR) 

StockYPrice

StockX1Price

StockX2Price

StockX3Price

StockX4Price

1

80.13

72.86

93.1

63.7

83.1

2

79.57

72.88

90.2

63.5

82

3

79.93

71.72

99

64.5

82.8

4

81.69

71.54

90.9

66.7

86.5

5

80.82

71

90.7

60.7

80.8

6

81.07

71.78

93.1

62.9

84.2

The preceding output shows five variables in DataMR named StockYPrice, StockX1Price, StockX2Price, StockX3Price, and StockX4Price. Here StockYPrice is dependent and all the other four variables are independent variables. Dependence structure is very important to study for going deep into the analysis.

The following command calculates the correlation matrix between the first four columns, which are StockYPrice, StockX1Price, StockX2PriceΒΈ and StockX3Price:

 > correlationMatrix<- cor(DataMR[,1:4]) 

Removing irrelevant features

Figure 3.11: Correlation matrix table

The preceding correlation matrix shows which variables are highly correlated and, accordingly, the feature will be selected in such a way that highly correlated features are not in the model.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset