Although grid search with cross-validation makes for a much more robust model selection procedure, you might have noticed that we performed the split into training and validation sets still only once. As a result, our results might still depend too much on the exact training-validation split of the data.
Instead of splitting the data into training and validation sets once, we can go a step further and use multiple splits for cross-validation. This will result in what is known as nested cross-validation, and the process is illustrated in the following diagram:
In nested cross-validation, there is an outer loop over the grid search box that repeatedly splits the data into training and validation sets. For each of these splits, a grid search is run, which will report back a set of best parameter values. Then, for each outer split, we get a test score using the best settings.
Now that we know how to find the best parameters of a model, let's take a closer look at the different evaluation metrics that we can use to score a model.