Sensitivity, specificity, and area under the curve

Area under the curve (AUC) concerns binary classification datasets, and it depicts the probability that the model will rank any given instance correctly. In order to define it, we must first define sensitivity and specificity:

  • Sensitivity (True Positive Rate): Sensitivity is the percentage of positive instances correctly predicted as positive, relative to all positive instances. It is calculated as follows:

  • Specificity (False Positive Rate): Specificity is the percentage of negative instances incorrectly predicted as positive, relative to all negative instances. It is calculated as follows:

By iteratively computing (1-specificity) and sensitivity at specific intervals (for example, in 0.05 increments), we can see how the model behaves. The intervals concern the model's output probability for each instance; for example, we first compute them for all instances with an estimated probability of belonging to the Yes class of less than 0.05. Then, we re-compute for all instances with an estimated probability of less than 0.1 and so on. The result is depicted here:

Receiver operator characteristic curve

The straight line represents an equal probability of ranking an instance correctly or incorrectly: a random model. The orange line (ROC curve) depicts the model's probability. If the ROC curve is below the straight line, it means that the model performs worse than a random, uninformed model.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset