Sensitivity, specificity, and area under the curve

Area under the curve (AUC) concerns binary classification datasets, and it depicts the probability that the model will rank any given instance correctly. In order to define it, we must first define sensitivity and specificity:

Sensitivity (True Positive Rate): Sensitivity is the percentage of positive instances correctly predicted as positive, relative to all positive instances. It is calculated as follows:

Specificity (False Positive Rate): Specificity is the percentage of negative instances incorrectly predicted as positive, relative to all negative instances. It is calculated as follows:

By iteratively computing (1-specificity) and sensitivity at specific intervals (for example, in 0.05 increments), we can see how the model behaves. The intervals concern the model's output probability for each instance; for example, we first compute them for all instances with an estimated probability of belonging to the Yes class of less than 0.05. Then, we re-compute for all instances with an estimated probability of less than 0.1 and so on. The result is depicted here:

Receiver operator characteristic curve

The straight line represents an equal probability of ranking an instance correctly or incorrectly: a random model. The orange line (ROC curve) depicts the model's probability. If the ROC curve is below the straight line, it means that the model performs worse than a random, uninformed model.

Table of Contents for Sensitivity, specificity, and area under the curve

Create new playlist

Sign In

Sign Up

Table of Contents for
Sensitivity, specificity, and area under the curve