Training a normal Bayes classifier

From here on out, things are (almost) like they always were. We can use scikit-learn to split the data into training and test sets (let's reserve 20% of all data points for testing):

In [13]: from sklearn import model_selection as ms
... X_train, X_test, y_train, y_test = ms.train_test_split(
... X, y, test_size=0.2, random_state=42
... )

We can instantiate a new normal Bayes classifier with OpenCV:

In [14]: import cv2
... model_norm = cv2.ml.NormalBayesClassifier_create()

However, OpenCV does not know about sparse matrices (at least its Python interface does not). If we were to pass X_train and y_train to the train function as we did earlier, OpenCV would complain that the data matrix is not a NumPy array. But converting the sparse matrix into a regular NumPy array will likely make you run out of memory. Hence, a possible workaround is to train the OpenCV classifier only on a subset of data points (say 1,000) and features (say 300):

In [15]: import numpy as np
... X_train_small = X_train[:1000, :300].toarray().astype(np.float32)
... y_train_small = y_train[:1000].astype(np.float32)

Then, it becomes possible to train the OpenCV classifier (although this might take a while):

In [16]: model_norm.train(X_train_small, cv2.ml.ROW_SAMPLE, y_train_small)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset