Classification metrics

If the label is discrete, the prediction problem is called classification. In general, the target can take only one of the values for each record (even though multivalued targets are possible, particularly for text classification problems to be considered in Chapter 6, Working with Unstructured Data).

If the discrete values are ordered and the ordering makes sense, such as Bad, Worse, Good, the discrete labels can be cast into integer or double, and the problem is reduced to regression (we believe if you are between Bad and Worse, you are definitely farther away from being Good than Worse).

A generic metric to optimize is the misclassification rate is as follows:

Classification metrics

However, if the algorithm can predict the distribution of possible values for the target, a more general metric such as the KL divergence or Manhattan can be used.

KL divergence is a measure of information loss when probability distribution Classification metrics is used to approximate probability distribution Classification metrics:

Classification metrics

It is closely related to entropy gain split criteria used in the decision tree induction, as the latter is the sum of KL divergences of the node probability distribution to the leaf probability distribution over all leaf nodes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset