Using XGBoost for regression

We will present a simple regression example with XGBoost, using the diabetes dataset. As it will be shown, its usage is quite simple and similar to the scikit-learn classifiers. XGBoost implements regression with XGBRegressor. The constructor has a respectably large number of parameters, which are very well-documented in the official documentation. In our example, we will use the n_estimators, n_jobs, max_depth, and learning_rate parameters. Following scikit-learn's conventions, they define the ensemble size, the number of parallel processes, the tree's maximum depth, and the learning rate, respectively:

# --- SECTION 1 ---
# Libraries and data loading
from sklearn.datasets import load_diabetes
from xgboost import XGBRegressor
from sklearn import metrics
import numpy as np
diabetes = load_diabetes()
train_size = 400
train_x, train_y = diabetes.data[:train_size], diabetes.target[:train_size]
test_x, test_y = diabetes.data[train_size:], diabetes.target[train_size:]
np.random.seed(123456)

# --- SECTION 2 ---
# Create the ensemble
ensemble_size = 200
ensemble = XGBRegressor(n_estimators=ensemble_size, n_jobs=4,
                        max_depth=1, learning_rate=0.1,
 objective ='reg:squarederror')

The rest of the code evaluates the generated ensemble, and is similar to any of the previous examples:


# --- SECTION 3 ---
# Evaluate the ensemble
ensemble.fit(train_x, train_y)
predictions = ensemble.predict(test_x)

# --- SECTION 4 ---
# Print the metrics
r2 = metrics.r2_score(test_y, predictions)
mse = metrics.mean_squared_error(test_y, predictions)
print('Gradient Boosting:')
print('R-squared: %.2f' % r2)
print('MSE: %.2f' % mse)

XGBoost achieves an R-squared of 0.65 and an MSE of 1932.9, the best performance out of all the boosting methods we tested and implemented in this chapter. Furthermore, we did not fine-tune any of its parameters, which further displays its modeling power.

Table of Contents for Using XGBoost for regression

Create new playlist

Sign In

Sign Up

Table of Contents for
Using XGBoost for regression