Evaluating models

Although there are various metrics that indicate a model's performance, it is important to carefully set the testing environment. One of the most important things is to split the dataset into two parts. One part of the dataset will be utilized by the algorithm in order to generate a model; the second part will be utilized to assess the model. These are usually called the train and test set.

The train set is available to the algorithm to generate and optimize a model, using any cost function. After the algorithm is finished, the produced model is tested on the test set, in order to assess its predictive ability on unseen data. While the algorithm may produce a model that performs well on the train set (in-sample performance), it may not be able to generalize and perform as well on the test set (out-of-sample performance). This can be attributed to many factors – covered in the next chapter. Some of the problems that arise can be tackled with the use of ensembles. Nonetheless, if the algorithm is presented with low-quality data, there is little that can be done to improve out-of-sample performance.

In order to obtain a fair estimate, we sometimes iteratively split different parts of a dataset into fixed-size train and test sets, say, 90% train and 10% test, until we have tested the whole dataset. This is called K-fold cross validation. In the case of a 90% to 10% split, it is called 10-fold cross validation, because we need to perform it 10 times in order to get an estimate for the whole dataset.

Table of Contents for Evaluating models

Create new playlist

Sign In

Sign Up

Table of Contents for
Evaluating models