Analyzing our results using Python

The final accuracy achieved is 1% better than the best of the three classifiers (the k-Nearest Neighbors (k-NN) classifier). We can visualize the learner's errors in order to examine why the ensemble performs in this specific way.

First, we import matplotlib and use a specific seaborn-paper plotting style with mpl.style.use('seaborn-paper'):

# --- SECTION 1 ---
# Import the required libraries
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.style.use('seaborn-paper')

Then, we calculate the errors by subtracting our prediction from the actual target. Thus, we get a -1 each time the learner predicts a positive (1) when the true class is negative (0), and a 1 when it predicts a negative (0) while the true class is positive (1). If the prediction is correct, we get a zero (0):

# --- SECTION 2 ---
# Calculate the errors
errors_1 = y_test-predictions_1
errors_2 = y_test-predictions_2
errors_3 = y_test-predictions_3

For each base learner, we plot the instances where they have predicted the wrong class. Our aim is to scatter plot the x and y lists. These lists will contain the instance number (the x list) and the type of error (the y list). With plt.scatter, we can specify the coordinates of our points using the aforementioned lists, as well as specify how these points are depicted. This is important in order to ensure that we can simultaneously visualize all the errors of the classifiers as well as the relationship between them.

The default shape for each point is a circle. By specifying the marker parameter, we can alter this shape. Furthermore, with the s parameter, we can specify the marker's size. Thus, the first learner (k-NN) will have a round shape of size 120, the second learner (Perceptron) will have an x shape of size 60, and the third learner (SVM) will have a round shape of size 20. The if not errors_*[i] == 0 guard ensures that we will not store correctly classified instances:

# --- SECTION 3 ---
# Discard correct predictions and plot each learner's errors
x=[]
y=[]
for i in range(len(errors_1)):
if not errors_1[i] == 0:
x.append(i)
y.append(errors_1[i])
plt.scatter(x, y, s=120, label='Learner 1 Errors')

x=[]
y=[]
for i in range(len(errors_2)):
if not errors_2[i] == 0:
x.append(i)
y.append(errors_2[i])
plt.scatter(x, y, marker='x', s=60, label='Learner 2 Errors')

x=[]
y=[]
for i in range(len(errors_3)):
if not errors_3[i] == 0:
x.append(i)
y.append(errors_3[i])
plt.scatter(x, y, s=20, label='Learner 3 Errors')

 Finally, we specify the figure's title and labels, and plot the legend:

plt.title('Learner errors')
plt.xlabel('Test sample')
plt.ylabel('Error')
plt.legend()
plt.show()

As the following shows, there are five samples where at least two learners predict the wrong class. These are the 5 cases out of the 100 that the ensemble predicts wrong, as the most voted class is wrong, thus producing a 95% accuracy. In all other cases, two out of three learners predict the correct class, thus the ensemble predicts the correct class as it is the most voted:

Learner errors on the test set
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset