Creating forests

By creating a number of trees using any valid randomization method, we have essentially created a forest, hence the algorithm's name. After generating the ensemble's trees, their predictions must be combined in order to have a functional ensemble. This is usually achieved through majority voting for classification problems and through averaging for regression problems. There are a number of hyperparameters associated with Random Forests, such as the number of features to consider at each node split, the number of trees in the forest, and the individual tree's size. As mentioned earlier, a good starting point for the number of features to consider is as follows:

  • The square root of the number of total features for classification problems
  • One-third of the number of total features for regression problems

The total number of trees can be fine-tuned by hand, as the ensemble's error converges to a limit when this number increases. Out-of-bag errors can be utilized to find an optimal value. Finally, the size of each tree can be a deciding factor in overfitting. Thus, if overfitting is observed, the tree size should be reduced.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset