The following steps will help you build a Naive Bayes classifier:
- We can compare the result to a true Naive Bayes classifier by asking scikit-learn for help:
In [13]: from sklearn import naive_bayes
... model_naive = naive_bayes.GaussianNB()
- As usual, training the classifier is done via the fit method:
In [14]: model_naive.fit(X_train, y_train)
Out[14]: GaussianNB(priors=None)
- Scoring the classifier is built in:
In [15]: model_naive.score(X_test, y_test)
Out[15]: 1.0
- Again a perfect score! However, in contrast to OpenCV, this classifier's predict_proba method returns true probability values, because all values are between 0 and 1 and because all rows add up to 1:
In [16]: yprob = model_naive.predict_proba(X_test)
... yprob.round(2)
Out[16]: array([[ 0., 1.],
[ 0., 1.],
[ 0., 1.],
[ 1., 0.],
[ 1., 0.],
[ 1., 0.],
[ 0., 1.],
[ 0., 1.],
[ 1., 0.],
[ 1., 0.]])
You might have noticed something else: This classifier has absolutely no doubt about the target label of each and every data point. It's all or nothing.
- The decision boundary returned by the Naive Bayes classifier looks slightly different, but can be considered identical to the previous command for the purpose of this exercise:
In [17]: plot_decision_boundary(model_naive, X, y)
The output looks like this:
The preceding screenshot shows a decision boundary of a Naive Bayes classifier.