Creating the ensemble

Assuming a classification problem, the AdaBoost algorithm can be described on a high-level basis, from its basic steps. For regression purposes, the steps are similar:

  1. Initialize all of the train set instance's weights equally, so their sum equals 1.
  2. Generate a new set by sampling with replacement, according to the weights.
  3. Train a weak learner on the sampled set.
  1. Calculate its error on the original train set.
  2. Add the weak learner to the ensemble and save its error rate.
  3. Adjust the weights, increasing the weights of misclassified instances and decreasing the weights of correctly classified instances.
  4. Repeat from Step 2.
  5. The weak learners are combined by voting. Each learner's vote is weighted, according to its error rate.

The whole process is depicted in the following diagram:

The process of creating the ensemble for the nth learner

In essence, this makes each new classifier focus on the instances that the previous learners could not handle correctly. Assuming a binary classification problem, we may start with a dataset that looks like the following diagram:

Our initial dataset

Here, all weights are equal. The first decision stump decides to partition the problem space as follows. The dotted line represents the decision boundary. The two black + and - symbols denote the sub-space that the decision stump classifies every instance as positive or negative, respectively. This leaves two misclassified instances. These instance weights will be increased, while all other weights will be decreased:

The first decision stump's space partition and errors

By creating another dataset, where the two misclassified instances are dominant (they may be included several times, as we sample with replacement and their weights are larger than the other instances), the second decision stump partitions the space, as follows:

The second decision stump's space partition and errors

Finally, after repeating the process for a third decision stump, the final ensemble has partitioned the space as depicted in the following diagram:

The final ensemble's partition of the problem space
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset