Building trees

As mentioned in Chapter 1, A Machine Learning Refresher, create a tree by selecting at each node a single feature and split point, such that the train set is best split. When an ensemble is created, we wish the base learners to be as uncorrelated (diverse) as possible.

Bagging is able to produce reasonably uncorrelated trees by diversifying each tree's train set through bootstrapping. But bagging only diversifies the trees by acting on one axis: each set's instances. There is still a second axis on which we can introduce diversity, the features. By selecting a subset of the available features during training, the generated base learners can be even more diverse. In random forests, for each tree and at each node, only a subset of the available features is considered when choosing the best feature/split point combination. The number of features that will be selected can be optimized by hand, but one-third of all features for regression problems and the square root of all features are considered to be a good starting point.

The algorithm's steps are as follows:

  1. Select the number of features m that will be considered at each node
  2. For each base learner, do the following:
    1. Create a bootstrap train sample
    2. Select the node to split
    3. Select m features randomly
    4. Pick the best feature and split point from m
    5. Split the node into two nodes
    6. Repeat from step 2-2 until a stopping criterion is met, such as maximum tree depth
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset