We can also try to stack the base learners, instead of using Voting. First, we will try to stack a single decision tree with depth five, a Naive Bayes classifier, and a logistic regression. As a meta-learner, we will use a logistic regression.
The following code is responsible for loading the required libraries and data, training, and evaluating the ensemble on the original and filtered datasets. We first load the required libraries and data, while creating train and test splits:
# --- SECTION 1 ---
# Libraries and data loading
import numpy as np
import pandas as pd
from stacking_classifier import Stacking
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import LinearSVC
from sklearn.model_selection import train_test_split
from sklearn import metrics
np.random.seed(123456)
data = pd.read_csv('creditcard.csv')
data.Time = (data.Time-data.Time.min())/data.Time.std()
data.Amount = (data.Amount-data.Amount.mean())/data.Amount.std()
# Train-Test slpit of 70%-30%
x_train, x_test, y_train, y_test = train_test_split(
data.drop('Class', axis=1).values, data.Class.values, test_size=0.3)
After creating our train and test splits, we train and evaluate our ensemble on the original dataset, as well as a reduced-features dataset as follows:
# --- SECTION 2 ---
# Ensemble evaluation
base_classifiers = [DecisionTreeClassifier(max_depth=5),
GaussianNB(),
LogisticRegression()]
ensemble = Stacking(learner_levels=[base_classifiers,
[LogisticRegression()]])
ensemble.fit(x_train, y_train)
print('Stacking f1', metrics.f1_score(y_test, ensemble.predict(x_test)))
print('Stacking recall', metrics.recall_score(y_test, ensemble.predict(x_test)))
# --- SECTION 3 ---
# Filter features according to their correlation to the target
np.random.seed(123456)
threshold = 0.1
correlations = data.corr()['Class'].drop('Class')
fs = list(correlations[(abs(correlations) > threshold)].index.values)
fs.append('Class')
data = data[fs]
x_train, x_test, y_train, y_test = train_test_split(data.drop('Class', axis=1).values,
data.Class.values, test_size=0.3)
ensemble = Stacking(learner_levels=[base_classifiers,
[LogisticRegression()]])
ensemble.fit(x_train, y_train)
print('Stacking f1', metrics.f1_score(y_test, ensemble.predict(x_test)))
print('Stacking recall', metrics.recall_score(y_test, ensemble.predict(x_test)))
As it is seen in the following resultant table, the ensemble achieves a slightly better F1 score on the original dataset, but worse recall score, compared to the voting ensemble with the same base learners:
Dataset |
Metric |
Value |
Original |
F1 |
0.823 |
Recall |
0.750 |
|
Filtered |
F1 |
0.828 |
Recall |
0.794 |
We can further experiment with different base learners. By adding two decision trees with maximum depths of three and eight, respectively (same with the second Voting setup), observe how stacking exhibits the same behavior. It outperforms on the F1 score and underperforms on the recall score for the original dataset.
On the filtered dataset, the performance remains on par with Voting. Finally, we experiment with second level of base learners, consisting of a Decision Tree with depth two and a linear support vector machine, which performs worse than the five base learners' setup:
Dataset |
Metric |
Value |
Original |
F1 |
0.844 |
Recall |
0.757 |
|
Filtered |
F1 |
0.828 |
Recall |
0.794 |
The following table depicts the results for the stacking ensemble with an additional level of base learners. As it is evident, it performs worse than the original ensemble.
Dataset |
Metric |
Value |
Original |
F1 |
0.827 |
Recall |
0.757 |
|
Filtered |
F1 |
0.827 |
Recall |
0.772 |