We've touched on ways to avoid overfitting when discussing the pros and cons of algorithms in the last practice. We herein formally summarize them, as follows:
- Cross-validation, a good habit that we have built over all of the chapters in this book.
- Regularization. It adds penalty terms to reduce the error caused by fitting the model perfectly on the given training set.
- Simplification, if possible. The more complex the mode is, the higher chance of overfitting. Complex models include a tree or forest with excessive depth, a linear regression with high degree polynomial transformation, and an SVM with a complicated kernel.
- Ensemble learning, combining a collection of weak models to form a stronger one.