Bagging

In this section, we will classify the dataset using bagging. As we have previously shown, decision trees with maximum depth of five are optimal thus, we will use these trees for our bagging example.

We would like to optimize the ensemble's size. We will generate validation curves for the original train set by testing sizes in the range of [5, 30]. The actual curves are depicted here in the following graph:

Validation curves for the original train set, for various ensemble sizes

We observe that variance is minimized for an ensemble size of 10, thus we will utilize ensembles of size 10.

The following code loads the data and libraries (Section 1), splits the data into train and test sets, and fits and evaluates the ensemble on the original dataset (Section 2) and the reduced-features dataset (Section 3):

# --- SECTION 1 ---
# Libraries and data loading
import numpy as np
import pandas as pd

from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics

np.random.seed(123456)
data = pd.read_csv('creditcard.csv')
data.Time = (data.Time-data.Time.min())/data.Time.std()
data.Amount = (data.Amount-data.Amount.mean())/data.Amount.std()
# Train-Test slpit of 70%-30%
x_train, x_test, y_train, y_test = train_test_split(
                                   data.drop('Class', axis=1).values, data.Class.values, test_size=0.3)

After creating our train and test splits, we train and evaluate our ensemble on the original dataset, as well as a reduced-features dataset as follows:

# --- SECTION 2 ---
# Ensemble evaluation
ensemble = BaggingClassifier(n_estimators=10,
base_estimator=DecisionTreeClassifier(max_depth=5))
ensemble.fit(x_train, y_train)
print('Bagging f1', metrics.f1_score(y_test, ensemble.predict(x_test)))
print('Bagging recall', metrics.recall_score(y_test, ensemble.predict(x_test)))
# --- SECTION 3 ---
# Filter features according to their correlation to the target
np.random.seed(123456)
threshold = 0.1
correlations = data.corr()['Class'].drop('Class')
fs = list(correlations[(abs(correlations)>threshold)].index.values)
fs.append('Class')
data = data[fs]
x_train, x_test, y_train, y_test = train_test_split(
                                    data.drop('Class', axis=1).values, data.Class.values, test_size=0.3)
ensemble = BaggingClassifier(n_estimators=10,
base_estimator=DecisionTreeClassifier(max_depth=5))
ensemble.fit(x_train, y_train)

print('Bagging f1', metrics.f1_score(y_test, ensemble.predict(x_test)))
print('Bagging recall', metrics.recall_score(y_test, ensemble.predict(x_test)))

Using bagging ensembles with trees of a maximum depth of 5 and 10 trees per ensemble, we are able to achieve the following F1 and recall scores. It outperforms both stacking and voting in both datasets on all metrics, with one exception. The F1 score for the original dataset is slightly worse than stacking (0.843 compared to 0.844):

Dataset	Metric	Value
Original	F1	0.843
	Recall	0.787
Filtered	F1	0.831
	Recall	0.794

Bagging performance for the original and filtered datasets

Although we have concluded that a maximum depth of 5 is optimal for a single decision tree, it does restrict the diversity of each tree. By increasing the maximum depth to 8, we are able to achieve an F1 score of 0.864 and a recall score of 0.816 on the filtered dataset, the best performance up to now.

Nonetheless, performance on the original dataset suffers, confirming that the features that we removed were, indeed, noise, as the trees are now able to model in-sample noise, and thus, their out-of-sample performance suffers:

Dataset	Metric	Value
Original	F1	0.840
	Recall	0.772
Filtered	F1	0.864
	Recall	0.816

Table of Contents for Bagging

Create new playlist

Sign In

Sign Up

Table of Contents for
Bagging