Custom hard voting implementation

In order to implement a custom hard voting solution, we will use three base learners: a Perceptron (a neural network with a single neuron), a Support Vector Machine (SVM), and a Nearest Neighbor. These are contained in the sklearn.linear_model, sklearn.svm, and sklearn.neighbors packages. Furthermore, we will use the argmax function from NumPy. This function returns the index of an array's (or array-like data structure) element with the highest value. Finally, accuracy_score will calculate the accuracy of each classifier on our test data:

# --- SECTION 1 ---
# Import the required libraries
from sklearn import datasets, linear_model, svm, neighbors
from sklearn.metrics import accuracy_score
from numpy import argmax
# Load the dataset
breast_cancer = datasets.load_breast_cancer()
x, y = breast_cancer.data, breast_cancer.target

We then instantiate our base learners. We hand-picked their hyperparameters to ensure that they are diverse in order to produce a well-performing ensemble. As breast_cancer is a classification dataset, we use SVC, the classification version of SVM, along with KNeighborsClassifier and Perceptron. Furthermore, we set the random state of Perceptron to 0 in order to ensure the reproducibility of our example:

# --- SECTION 2 ---
# Instantiate the learners (classifiers)
learner_1 = neighbors.KNeighborsClassifier(n_neighbors=5)
learner_2 = linear_model.Perceptron(tol=1e-2, random_state=0)
learner_3 = svm.SVC(gamma=0.001)

We split the data into train and test sets, using 100 instances for our test set and train our base learners on the train set:

# --- SECTION 3 ---
# Split the train and test samples
test_samples = 100
x_train, y_train = x[:-test_samples], y[:-test_samples]
x_test, y_test = x[-test_samples:], y[-test_samples:]

# Fit learners with the train data
learner_1.fit(x_train, y_train)
learner_2.fit(x_train, y_train)
learner_3.fit(x_train, y_train)

By storing each base learner's prediction in predictions_1, predictions_2, and predictions_3, we can further analyze and combine them into our ensemble. Note that we trained each classifier individually; additionally, as well as that each classifier produces predictions for the test data autonomously. As mentioned in Chapter 2, Getting Started with Ensemble Learning, this is the main characteristic of non-generative ensemble methods:

#--- SECTION 4 ---
# Each learner predicts the classes of the test data
predictions_1 = learner_1.predict(x_test)
predictions_2 = learner_2.predict(x_test)
predictions_3 = learner_3.predict(x_test)

Following the predictions, we combine the predictions of each base learner for each test instance. The hard_predictions list will contain the ensemble's predictions (output). By iterating over every test sample with for i in range(test_samples), we count the total number of votes that each class has received from the three base learners. As the dataset contains only two classes, we need a list of two elements: counts = [0 for _ in range(2)]. In # --- SECTION 3 ---, we stored each base learner's predictions in an array. Each one of those array's elements contains the index of the instance's predicted class (in our case, 0 and 1). Thus, we increase the corresponding element's value in counts[predictions_1[i]] by one to count the base learner's vote. Then, argmax(counts) returns the element (class) with the highest number of votes:

# --- SECTION 5 ---
# We combine the predictions with hard voting
hard_predictions = []
# For each predicted sample
for i in range(test_samples):
# Count the votes for each class
counts = [0 for _ in range(2)]
counts[predictions_1[i]] = counts[predictions_1[i]]+1
counts[predictions_2[i]] = counts[predictions_2[i]]+1
counts[predictions_3[i]] = counts[predictions_3[i]]+1
# Find the class with most votes
final = argmax(counts)
# Add the class to the final predictions
hard_predictions.append(final)

Finally, we calculate the accuracy of the individual base learners as well as the ensemble with accuracy_score, and print them on screen:

# --- SECTION 6 ---
# Accuracies of base learners
print('L1:', accuracy_score(y_test, predictions_1))
print('L2:', accuracy_score(y_test, predictions_2))
print('L3:', accuracy_score(y_test, predictions_3))
# Accuracy of hard voting
print('-'*30)
print('Hard Voting:', accuracy_score(y_test, hard_predictions))

The final output is as follows:

L1: 0.94
L2: 0.93
L3: 0.88
------------------------------
Hard Voting: 0.95
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset