How to test on the holdout set

Finally, we would like to evaluate the best model's performance on the holdout set that we excluded from the GridSearchCV exercise. It contains the last six months of the sample period (through February 2018; see the notebook for details). We obtain a generalization performance estimate based on the AUC score of 0.6622 using the following code:

best_model = gridsearch_result.best_estimator_
preds= best_model.predict(test_feature_data)
roc_auc_score(y_true=test_target, y_score=preds)

The downside of the sklearn gradient boosting implementation is the limited speed of computation which makes it difficult to try out different hyperparameter settings quickly. In the next section, we will see that several optimized implementations have emerged over the last few years that significantly reduce the time required to train even large-scale models, and have greatly contributed to a broader scope for applications of this highly effective algorithm.

