The receiver operating characteristic (ROC) is a plot of the recall (10.3) and the false positive rate (FPR) of a binary classifier. The FPR is given by the following equation:
In this recipe, we will plot the ROC for the various classifiers we used in Chapter 9, Ensemble Learning and Dimensionality Reduction. Also, we will plot the curve associated with random guessing and the ideal curve. Obviously, we want to beat the baseline and get as close as possible to the ideal curve.
The area under the curve (AUC, ROC AUC, or AUROC) is another evaluation metric that summarizes the ROC. AUC can also be used to compare models, but it provides less information than ROC.
from sklearn import metrics import numpy as np import ch10util import dautil as dl from IPython.display import HTML
y_test = np.load('rain_y_test.npy') roc_aucs = [metrics.roc_auc_score(y_test, preds) for preds in ch10util.rain_preds()]
sp = dl.plotting.Subplotter(2, 1, context) ch10util.plot_bars(sp.ax, roc_aucs) sp.label()
cp = dl.plotting.CyclePlotter(sp.next_ax()) for preds, label in zip(ch10util.rain_preds(), ch10util.rain_labels()): fpr, tpr, _ = metrics.roc_curve(y_test, preds, pos_label=True) cp.plot(fpr, tpr, label=label) fpr, tpr, _ = metrics.roc_curve(y_test, y_test) sp.ax.plot(fpr, tpr, 'k', lw=4, label='Ideal') sp.ax.plot(np.linspace(0, 1), np.linspace(0, 1), '--', label='Baseline') sp.label() sp.fig.text(0, 1, ch10util.classifiers()) HTML(sp.exit())
Refer to the following screenshot for the end result:
The code is in the roc_auc.ipynb
file in this book's code bundle.
roc_auc_score()
function documented at http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html (retrieved November 2015)