Soft voting implementation

Scikit-learn's implementation allows for soft voting as well. The only requirement is that the base learners implement the predict_proba function. In our example, Perceptron does not implement the function at all, while SVC only produces probabilities when it is passed the probability=True argument. Having these limitations in mind, we swap our Perceptron with a Naive Bayes classifier implemented in the sklearn.naive_bayes package.

To actually use soft voting, the VotingClassifier object must be initialized with the voting='soft' argument. Except for the changes mentioned here, the majority of the code remains the same. Load the libraries and datasets, and produce a train/test split as follows:

# --- SECTION 1 ---
# Import the required libraries
from sklearn import datasets, naive_bayes, svm, neighbors
from sklearn.ensemble import VotingClassifier
from sklearn.metrics import accuracy_score
# Load the dataset
breast_cancer = datasets.load_breast_cancer()
x, y = breast_cancer.data, breast_cancer.target

# Split the train and test samples
test_samples = 100
x_train, y_train = x[:-test_samples], y[:-test_samples]
x_test, y_test = x[-test_samples:], y[-test_samples:]

Instantiate the base learners and voting classifier. We use a Gaussian Naive Bayes implemented as GaussianNBNote that we use probability=True in order for the GaussianNB object to be able to produce probabilities:

# --- SECTION 2 ---
# Instantiate the learners (classifiers)
learner_1 = neighbors.KNeighborsClassifier(n_neighbors=5)
learner_2 = naive_bayes.GaussianNB()
learner_3 = svm.SVC(gamma=0.001, probability=True)

# --- SECTION 3 ---
# Instantiate the voting classifier
voting = VotingClassifier([('KNN', learner_1),
('NB', learner_2),
('SVM', learner_3)],
voting='soft')

We fit both VotingClassifier and the individual learners. We want to analyze our results, and, as mentioned earlier, the classifier will not fit the objects that we pass as arguments, but will instead clone them. Thus, we have to manually fit our learners as follows:

# --- SECTION 4 ---
# Fit classifier with the training data
voting.fit(x_train, y_train)
learner_1.fit(x_train, y_train)
learner_2.fit(x_train, y_train)
learner_3.fit(x_train, y_train)

We predict the test set's targets using both the voting ensemble and the individual learners:

# --- SECTION 5 ---
# Predict the most probable class
hard_predictions = voting.predict(x_test)

# --- SECTION 6 ---
# Get the base learner predictions
predictions_1 = learner_1.predict(x_test)
predictions_2 = learner_2.predict(x_test)
predictions_3 = learner_3.predict(x_test)

Finally, we print the accuracy of each base learner and the soft voting ensemble's accuracy:

# --- SECTION 7 ---
# Accuracies of base learners
print('L1:', accuracy_score(y_test, predictions_1))
print('L2:', accuracy_score(y_test, predictions_2))
print('L3:', accuracy_score(y_test, predictions_3))
# Accuracy of hard voting
print('-'*30)
print('Hard Voting:', accuracy_score(y_test, hard_predictions))

The final output is as follows:

L1: 0.94
L2: 0.96
L3: 0.88
------------------------------
Hard Voting: 0.94
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset