Classifying the data with a normal Bayes classifier

We will then use the same procedure as in earlier chapters to train a normal Bayes classifier. Wait, why not a Naive Bayes classifier? Well, it turns out OpenCV doesn't really provide a true Naive Bayes classifier. Instead, it comes with a Bayesian classifier that doesn't necessarily expect features to be independent, but rather expects the data to be clustered into Gaussian blobs. This is exactly the kind of dataset we created earlier!

By following these steps, you will learn how to build a classifier with a normal Bayes classifier:

We can create a new classifier using the following function:

In [5]: import cv2
...     model_norm = cv2.ml.NormalBayesClassifier_create()

Then, training is done via the train method:

In [6]: model_norm.train(X_train, cv2.ml.ROW_SAMPLE, y_train)
Out[6]: True

Once the classifier has been trained successfully, it will return True. We go through the motions of predicting and scoring the classifier, just like we have done a million times before:

In [7]: _, y_pred = model_norm.predict(X_test)
In [8]: from sklearn import metrics
...     metrics.accuracy_score(y_test, y_pred)
Out[8]: 1.0

Even better—we can reuse the plotting function from the last chapter to inspect the decision boundary! If you recall, the idea was to create a mesh grid that would encompass all data points and then classify every point on the grid. The mesh grid is created via the NumPy function of the same name:

In [9]: def plot_decision_boundary(model, X_test, y_test):
...         # create a mesh to plot in
...         h = 0.02 # step size in mesh
...         x_min, x_max = X_test[:, 0].min() - 1, X_test[:, 0].max() +
            1
...         y_min, y_max = X_test[:, 1].min() - 1, X_test[:, 1].max() +
            1
...         xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
...                              np.arange(y_min, y_max, h))

The meshgrid function will return two floating-point matrices, xx and yy, that contain the x and y coordinates of every coordinate point on the grid. We can flatten these matrices into column vectors using the ravel function and stack them to form a new matrix, X_hypo:

...         X_hypo = np.column_stack((xx.ravel().astype(np.float32),
...                                   yy.ravel().astype(np.float32)))

X_hypo now contains all x values in X_hypo[:, 0] and all y values in X_hypo[:, 1]. This is a format that the predict function can understand:

...         ret = model.predict(X_hypo)

However, we want to be able to use models from both OpenCV and scikit-learn. The difference between the two is that OpenCV returns multiple variables (a Boolean flag indicating success/failure and the predicted target labels), whereas scikit-learn returns only the predicted target labels. Hence, we can check whether the ret output is a tuple, in which case, we know we're dealing with OpenCV. In this case, we store the second element of the tuple (ret[1]). Otherwise, we are dealing with scikit-learn and don't need to index into ret:

...         if isinstance(ret, tuple):
...             zz = ret[1]
...         else:
...             zz = ret
...         zz = zz.reshape(xx.shape)

All that's left to do is to create a contour plot where zz indicates the color of every point on the grid. On top of that, we plot the data points using our trusty scatter plot:

...         plt.contourf(xx, yy, zz, cmap=plt.cm.coolwarm, alpha=0.8)
...         plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test, s=200)

We call the function by passing a model (model_norm), a feature matrix (X), and a target label vector (y):

In [10]: plot_decision_boundary(model_norm, X, y)

The output looks like this:

So far, so good. The interesting part is that a Bayesian classifier also returns the probability with which each data point has been classified:

In [11]: ret, y_pred, y_proba = model_norm.predictProb(X_test)

The function returns a Boolean flag (True for success and False for failure), the predicted target labels (y_pred), and the conditional probabilities (y_proba). Here, y_proba is an N x 2 matrix that indicates, for every one of the N data points, the probability with which it was classified as either class 0 or class 1:

In [12]: y_proba.round(2)
Out[12]: array([[ 0.15000001,  0.05      ],
                [ 0.08      ,  0.        ],
                [ 0.        ,  0.27000001],
                [ 0.        ,  0.13      ],
                [ 0.        ,  0.        ],
                [ 0.18000001,  1.88      ],
                [ 0.        ,  0.        ],
                [ 0.        ,  1.88      ],
                [ 0.        ,  0.        ],
                [ 0.        ,  0.        ]], dtype=float32)

This means that, for the first data point (top row), the probability of it belonging to class 0 (that is, p(C₀|X)) is 0.15 (or 15%)). Similarly, the probability of belonging to class 1 is p(C₁|X) = 0.05.

The reason why some of the rows show values greater than 1 is that OpenCV does not really return probability values. Probability values are always between 0 and 1, and each row in the preceding matrix should add up to 1. Instead, what is being reported is a likelihood, which is basically the numerator of the conditional probability equation, p(C) p(M|C). The denominator, p(M), does not need to be computed. All we need to know is that 0.15 > 0.05 (top row). Hence, the data point most likely belongs to class 0.

Table of Contents for Classifying the data with a normal Bayes classifier

Create new playlist

Sign In

Sign Up

Table of Contents for
Classifying the data with a normal Bayes classifier