Boosting

As we move on, we will start to utilize generative methods. The first generative method we will experiment with is boosting. We will first try to classify the datasets using AdaBoost. As AdaBoost resamples the dataset based on misclassifications, we expect that it will be able to handle our imbalanced dataset relatively well.

First, we must decide on the ensemble's size. We generate validation curves for a number of ensemble sizes depicted as follows:

Validation curves of various ensemble sizes for AdaBoost

As we can observe, 70 base learners provide the best trade-off between bias and variance. As such, we will proceed with ensembles of size 70.

The following code implements the training and evaluation for AdaBoost: 

# --- SECTION 1 ---
# Libraries and data loading
import numpy as np
import pandas as pd
from sklearn.ensemble import AdaBoostClassifier
from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle
from sklearn import metrics

np.random.seed(123456)
data = pd.read_csv('creditcard.csv')
data.Time = (data.Time-data.Time.min())/data.Time.std()
data.Amount = (data.Amount-data.Amount.mean())/data.Amount.std()
# Train-Test slpit of 70%-30%
x_train, x_test, y_train, y_test = train_test_split(
data.drop('Class', axis=1).values, data.Class.values, test_size=0.3)

We then train and evaluate our ensemble, using 70 estimators and a learning rate of 1.0:

# --- SECTION 2 ---
# Ensemble evaluation
ensemble = AdaBoostClassifier(n_estimators=70, learning_rate=1.0)
ensemble.fit(x_train, y_train)
print('AdaBoost f1', metrics.f1_score(y_test, ensemble.predict(x_test)))
print('AdaBoost recall', metrics.recall_score(y_test, ensemble.predict(x_test)))

We reduce the number of features, by selecting only features with high correlation with respect to the target. Finally, we repeat the procedure of training and evaluating the ensemble: 

# --- SECTION 3 ---
# Filter features according to their correlation to the target
np.random.seed(123456)
threshold = 0.1
correlations = data.corr()['Class'].drop('Class')
fs = list(correlations[(abs(correlations)>threshold)].index.values)
fs.append('Class')
data = data[fs]
x_train, x_test, y_train, y_test = train_test_split(
data.drop('Class', axis=1).values, data.Class.values, test_size=0.3)
ensemble = AdaBoostClassifier(n_estimators=70, learning_rate=1.0)
ensemble.fit(x_train, y_train)
print('AdaBoost f1', metrics.f1_score(y_test, ensemble.predict(x_test)))
print('AdaBoost recall', metrics.recall_score(y_test, ensemble.predict(x_test)))

The results are depicted in the following table. As it is evident, it does not perform as well as our previous models:

Dataset

Metric

Value

Original

F1

0.778

Recall

0.721

Filtered

F1

0.794

Recall

0.721

Performance of AdaBoost

We can try to increase the learning rate to 1.3, which seems to improve overall performance. If we further increase it to 1.4, we notice a drop in performance. If we increase the number of base learners to 80, we notice an increase in performance for the filtered dataset, while the original dataset seems to trade recall for F1 performance:

Dataset

Metric

Value

Original

F1

0.788

Recall

0.765

Filtered

F1

0.815

Recall

0.743

Performance of AdaBoost, learning_rate=1.3

Dataset

Metric

Value

Original

F1

0.800

Recall

0.765

Filtered

F1

0.800

Recall

0.735

Performance of AdaBoost, learning_rate=1.4

Dataset

Metric

Value

Original

F1

0.805

Recall

0.757

Filtered

F1

0.805

Recall

0.743

Performance of AdaBoost, learning_rate=1.4, ensemble_size=80

We can, in fact, observe a Pareto front of F1 and Recall, which is directly linked to the learning rate and number of base learners present in the ensemble. The front is depicted in the following graph:

Pareto front of F1 and Recall for AdaBoost
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset