Bias–variance trade-off and learning curve

It has been observed that non-linear classifiers are usually more powerful than the linear classifiers for text classification problems. But, that does not necessarily imply that a non-linear classifier is the solution to each text classification problem. It is quite interesting to note that there does not exist any optimal learning algorithm that can be universally applicable. Thus, the algorithm selection becomes quite a pivotal part of any modeling exercise. Also, the complexity of a model should not entirely be assumed by the fact that it is a linear or non-linear classifier; there are multiple other aspects of a modeling process, which can lead to complexity in the model, such as feature selection, regularization, and so on.

The error components in a learning model can be categorized broadly as irreducible errors and reducible errors. Irreducible errors are caused by inherent variability in a system; not much can be done about this component. A reducible error component is the one which can be minimized to increase the prediction accuracy. It is very important to understand the phenomenon of bias and variance, the different error sources which lead to these components and the trade-off between them, in order to improve the model fitting process and come up with highly accurate models. In a classification model, bias and variance are the reducible error components that may prohibit the algorithm from being able to approximate on an unknown test data. The bias-variance tradeoff phenomenon is about trying to decrease these two errors simultaneously. Because of its presence, there is no universally optimal learning algorithm. In an ideal world, a learning method should not only be able to extract the regularities in the training data but also be capable of generalizing over an unobserved dataset. But because of bias-variance tradeoff, this is literally impossible to achieve.

High bias in a model typically implies a simpler model, which barely fits the data; or under-fits the data and loses out on vital regularities in the training data. On the other hand, a high variance would mean the model captures the regularities in the training data really well. This means it is able to fit the training data really well but does not generalize well on unobserved instances, which is a typical case of over-fitting on the training data. Low bias models typically carry a lot of complexity, which for sure fits the training data well, but despite the high complexity it might still not be able to predict well on unseen instances because of the added noise component that comes with increased complexity.

Bias–variance trade-off and learning curve

Bias-variance decomposition

As we learnt in the last segment, there are two components of error in a learning model: reducible and irreducible components. An expected error on any unobserved instance can be decomposed as:

Bias-variance decomposition

Where:

Bias-variance decomposition

And:

Bias-variance decomposition

If we sample our training data multiple times, it is quite obvious we will get to learn a different hypothesis most of the time, due to the underlying randomness in the training data. Thus, the resultant learner will have a range of predictions. Bias is the measure of the difference between the prediction and the expected value in general. High bias means, the average prediction is substantially far away from the expected or true value. It tends to decrease with the added complexity in the model.

An error due to variance is the measure of variability in predictions caused by multiple realization of the model. Let's say, we go through the model building process across the entire training set multiple times, and observe the variability in the predictions for a specific instance across all the models. The error due to variance is the difference between what is expected to be learnt from a dataset and what is actually learnt from it. It decreases with the decrease in complexity of the model, as shown in the following diagram:

Bias-variance decomposition
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset