Implementing a random forest with scikit-learn

Alternatively, we can implement random forests using scikit-learn:

In [13]: from sklearn.ensemble import RandomForestClassifier
...      forest = RandomForestClassifier(n_estimators=10, random_state=200)

Here, we have a number of options to customize the ensemble:

n_estimators: This specifies the number of trees in the forest.
criterion: This specifies the node-splitting criterion. Setting criterion='gini' implements the Gini impurity, whereas setting criterion='entropy' implements information gain.
max_features: This specifies the number (or fraction) of features to consider at each node split.
max_depth: This specifies the maximum depth of each tree.
min_samples: This specifies the minimum number of samples required to split a node.

We can then fit the random forest to the data and score it like any other estimator:

In [14]: forest.fit(X_train, y_train)
...      forest.score(X_test, y_test)
Out[14]: 0.83999999999999997

This gives roughly the same result as in OpenCV. We can use our helper function to plot the decision boundary:

In [15]: plot_decision_boundary(forest, X_test, y_test)

The output looks like this:

The preceding image shows the decision boundary of a random forest.

Table of Contents for Implementing a random forest with scikit-learn