Bootstrap aggregating or bagging is an algorithm introduced by Leo Breiman in 1994, which applies bootstrapping to machine learning problems. Bagging was also mentioned in the Learning with random forests recipe.
The algorithm aims to reduce the chance of overfitting with the following steps:
The scikit-learn BaggingClassifier
class allows us to bootstrap training examples, and we can also bootstrap features as in the random forests algorithm. When we perform a grid search, we refer to hyperparameters of the base estimator with the prefix base_estimator__
. We will use a decision tree as the base estimator so that we can reuse some of the hyperparameter configuration from the Learning with random forests recipe.
The code is in the bagging.ipynb
file in this book's code bundle:
import ch9util from sklearn.ensemble import BaggingClassifier from sklearn.grid_search import GridSearchCV from sklearn.tree import DecisionTreeClassifier import numpy as np import dautil as dl from IPython.display import HTML
BaggingClassifier
:X_train, X_test, y_train, y_test = ch9util.rain_split() clf = BaggingClassifier(base_estimator=DecisionTreeClassifier( min_samples_leaf=3, max_depth=4), random_state=43)
params = { 'n_estimators': [320, 640], 'bootstrap_features': [True, False], 'base_estimator__criterion': ['gini', 'entropy'] } gscv = GridSearchCV(estimator=clf, param_grid=params, cv=5, n_jobs=-1) gscv.fit(X_train, y_train) preds = gscv.predict(X_test)
sp = dl.plotting.Subplotter(2, 2, context) html = ch9util.report_rain(preds, y_test, gscv.best_params_, sp.ax)
ntrees = 2 ** np.arange(4, 11) ch9util.plot_validation(sp.next_ax(), gscv.best_estimator_, X_train, y_train, 'n_estimators', ntrees)
max_samples
parameter:nsamples = 2 ** np.arange(4, 14) ch9util.plot_validation(sp.next_ax(), gscv.best_estimator_, X_train, y_train, 'max_samples', nsamples)
ch9util.plot_learn_curve(sp.next_ax(), gscv.best_estimator_, X_train, y_train) HTML(html + sp.exit())
Refer to the following screenshot for the end result:
BaggingClassifier
at http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html (retrieved November 2015)