Using gradient boosting

Scikit-learn also implements gradient boosting regression and classification. They too are included in the ensemble package, under GradientBoostingRegressor and GradientBoostingClassifier, respectively. The two classes store the errors at each step, in the train_score_ attribute of the object. Here, we present an example for the diabetes regression dataset. The train and validation processes follow the scikit-learn standard, using the fit and predict functions. The only parameter that needs to be specified is the learning rate, which is passed to the GradientBoostingRegressor constructor through the learning_rate parameter:

# --- SECTION 1 ---
# Libraries and data loading
from sklearn.datasets import load_diabetes
from sklearn.ensemble import GradientBoostingRegressor
from sklearn import metrics
import numpy as np
diabetes = load_diabetes()
train_size = 400
train_x, train_y = diabetes.data[:train_size], diabetes.target[:train_size]
test_x, test_y = diabetes.data[train_size:], diabetes.target[train_size:]
np.random.seed(123456)

# --- SECTION 2 ---
# Create the ensemble
ensemble_size = 200
learning_rate = 0.1
ensemble = GradientBoostingRegressor(n_estimators=ensemble_size,
 learning_rate=learning_rate)

# --- SECTION 3 ---
# Evaluate the ensemble
ensemble.fit(train_x, train_y)
predictions = ensemble.predict(test_x)

# --- SECTION 4 ---
# Print the metrics
r2 = metrics.r2_score(test_y, predictions)
mse = metrics.mean_squared_error(test_y, predictions)
print('Gradient Boosting:')
print('R-squared: %.2f' % r2)
print('MSE: %.2f' % mse)

The ensemble achieves an R-squared of 0.44 and an MSE of 3092. Furthermore, if we use matplotlib to plot ensemble.train_score_, we can see that diminishing returns appear after around 20 base learners. If we further analyze the errors, by calculating the improvements (difference between base learners), we see that after 25 base learners there are cases where adding a base learner worsens the performance.

Although on average the performance continues to increase, after 50 base learners there is no significant improvement. Thus, we repeat the experiment, with ensemble_size = 50, yielding an R-squared of 0.61 and an MSE of 2152:

Errors and differences for gradient boost regression

For the classification example, we use the hand-written digit classification dataset. Again, we define the n_estimators and learning_rate parameters:

# --- SECTION 1 ---
# Libraries and data loading
import numpy as np

from sklearn.datasets import load_digits
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn import metrics


digits = load_digits()

train_size = 1500
train_x, train_y = digits.data[:train_size], digits.target[:train_size]
test_x, test_y = digits.data[train_size:], digits.target[train_size:]

np.random.seed(123456)
# --- SECTION 2 ---
# Create the ensemble
ensemble_size = 200
learning_rate = 0.1
ensemble = GradientBoostingClassifier(n_estimators=ensemble_size,
 learning_rate=learning_rate)

# --- SECTION 3 ---
# Train the ensemble
ensemble.fit(train_x, train_y)

# --- SECTION 4 ---
# Evaluate the ensemble
ensemble_predictions = ensemble.predict(test_x)

ensemble_acc = metrics.accuracy_score(test_y, ensemble_predictions)

# --- SECTION 5 ---
# Print the accuracy
print('Boosting: %.2f' % ensemble_acc)

The accuracy achieved with the specific ensemble size is 89%. By plotting the errors and their differences, we see that there are again diminishing returns, but there are no cases where performance significantly drops. Thus, we do not expect a predictive performance improvement by reducing the ensemble size.

Table of Contents for Using gradient boosting

Create new playlist

Sign In

Sign Up

Table of Contents for
Using gradient boosting