In OpenCV, random forests can be built using the RTrees_create function from the ml module:
In [7]: import cv2
... rtree = cv2.ml.RTrees_create()
The tree object provides a number of options, the most important of which are the following:
- setMaxDepth: This sets the maximum possible depth of each tree in the ensemble. The actual obtained depth may be smaller if other termination criteria are met first.
- setMinSampleCount: This sets the minimum number of samples that a node can contain for it to get split.
- setMaxCategories: This sets the maximum number of categories allowed. Setting the number of categories to a smaller value than the actual number of classes in the data leads to subset estimation.
- setTermCriteria: This sets the termination criteria of the algorithm. This is also where you set the number of trees in the forest.
We can specify the number of trees in the forest by passing an integer, n_trees, to the setTermCriteria method. Here, we also want to tell the algorithm to quit once the score does not increase by at least eps from one iteration to the next:
In [8]: n_trees = 10
... eps = 0.01
... criteria = (cv2.TERM_CRITERIA_MAX_ITER + cv2.TERM_CRITERIA_EPS,
... n_trees, eps)
... rtree.setTermCriteria(criteria)
Then, we are ready to train the classifier on the data from the preceding code:
In [9]: rtree.train(X_train.astype(np.float32), cv2.ml.ROW_SAMPLE,
y_train);
The test labels can be predicted with the predict method:
In [10]: _, y_hat = rtree.predict(X_test.astype(np.float32))
Using scikit-learn's accuracy_score, we can evaluate the model on the test set:
In [11]: from sklearn.metrics import accuracy_score
... accuracy_score(y_test, y_hat)
Out[11]: 0.83999999999999997
After training, we can pass the predicted labels to the plot_decision_boundary function:
In [12]: plot_decision_boundary(rtree, X_test, y_test)
This will produce the following plot:
The preceding image shows the decision landscape of a random forest classifier.