Typical challenges in model tuning

After the following discussion, you might be thinking that this process is difficult, and you'd be right. In fact, because of the difficulty in determining what optimal model parameters are, often some more complex learning algorithms are used before experimenting effectively with simpler options with better-tuned parameters. As we've already discussed, machine learning involves a lot of experimentation. And the tuning of the internal knobs of a learning algorithm, commonly referred to as hyperparameters, are equally important from model building to prediction and before deployment.

Technically, running a learning algorithm over a training dataset with different hyperparameter settings will result in different models, and of course different performance parameters. According to Oracle developers, it is not recommended to begin tuning without having first established clear objectives, since you cannot succeed if there is any definition of success.

Subsequently, we are typically interested in selecting the best-performing model from the training data set; we need to find a way to estimate their respective performances in order to rank them against each other. Going one step beyond mere algorithm fine-tuning, we are usually not only experimenting with the one single algorithm that we think would be the best solution under the given circumstances. More often than not, we want to compare different algorithms to each other, often in terms of predictive and computational performance.

Often there is a very basic question regarding parameter tuning using grid search and random search. Typically, some machine learning methods have parameters that need to be tuned using either of them. For example, according to Wei Ch. et al., (A General Formulation for Support Vector Machine, proceedings of the 9th International Conference on Neural Information Processing (ICONIP, 02), V-5, 18-22 Nov. 2002) the standard formulation of SVMs is as follows:

Typical challenges in model tuning

Figure 1: The standard formula for SVM

Now, suppose we need to tune model parameter C, and we need to do it with ease. It's clearly seen from the equation that, tuning C also involves other parameters like xi, i and w; where, the regularization parameter C > 0, is the norm of w. In the RHS it is the stabilizer and Typical challenges in model tuning is the empirical loss term depending upon the target function f(xi). In the standard SVMs (either linear SVM or other variants), the regularized functionalities can be minimized further by solving the convex quadratic optimization problem. Once the problem is solved, it guarantees the unique global minimum solution to gain the best predictive performance. Therefore, the whole process is more or less an optimization problem.

In summary, Figure 2, The model tuning process, consideration, and workflow shows the tuning process and its consideration as a workflow:

Typical challenges in model tuning

Figure 2: The model tuning process, consideration, and workflow

Now, if you are given the raw dataset, you will most probably do the pre-processing and split the dataset into training and test sets. Therefore, to tune hyperparameter C, you need to first split the training set into a validation training set and a validation test set.

After that, you might try tuning the parameters using the validation training set and validation test set. Then use the best parameters you've got and retrain the model on the complete training set. Now you can perform the testing on the test set as the final step.

Up to this point your approach seems to be okay, but which of the following two options do you think is better, on average?

  • Would it be better to use the final model from validation, which was trained on a validation training set for the final testing?
  • Or would it better to use the entire training set and retrain the model with the best parameters from the grid or random search? Although the parameters were not optimised for this set, we have final training data in this case.

Are you intending to go for option 1, because the parameters were already optimized on this training (that is, the validation training set) set? Or are you intending to go for option 2 because, although the parameters were not optimized for the training set, you have the final training data in this case? We suggest you go for option 2, but only if you trust your validation setup in option 2. The reason is that you have performed the Cross Validation (CV) to identify the most general parameters setup or else model selection or whatever you're trying to optimize. These findings should be applied to the entire training set and tested once on the test set.

Well, suppose you went for option 2; now the second challenge is evolving. How do we estimate the performance of a machine learning model? Technically, you might argue that we should nourish the training data to our learning algorithm to learn the optimal model. Then we could predict the labels based on the test labels. Thirdly, we count the number of wrong predictions on the test dataset to compute the model's error rate.

That's it? Not so fast my friend! Be contingent on our goal, unfortunately guesstimating the performance of that model is not that insignificant. Maybe we should address the previous question from another angle: why do we care about performance estimation at all? Well, ideally, the estimated performance of a model tells how well it performs on unobserved data - making predictions on future data is often the main problem we want to solve in applications of machine learning or the development of novel algorithms.

Finally, there are several other challenges depending upon the data structure, problem type, problem domain and appropriate use cases that need to be addressed that you will come across when you start practicing a lot.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset