Optimizing the hyperparameters

There are probably a lot of other features to add, but let's now shift our attention to the model itself. For now, we assumed the default, static parameters of the model, restricting its max_depth parameter to an arbitrary number. Now, let's try to fine-tune those parameters. If done properly, this process could add a few additional percentage points to the model accuracy, and sometimes, even a small gain in performance metrics can be a game-changer.

To do this, we'll use RandomizedSearchCV—another wrapper around the concept of cross-validation, but this time, one that iterates over parameters of the model, trying to find the optimal ones. A simpler approach, called GridSearchCV, takes a finite number of parameters, creates all of the permutations, and runs them all iteratively using, essentially, a brute-force approach.

Randomized search, on the other hand, takes parameter distributions and gets random samples. It has two advantages over the grid search:

Randomized search can find the parameters you didn't explicitly offer (some very specific ratio value).
It usually converges faster than grid search.

Let's take a look at how it works:

First, we need to import a randint method and RandomizedSearchCV:

from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint as sp_randint

Now, we'll declare a feature space to search for a better combination. Here, each key represents a model parameter, and the value, the options to search in. The randint function allows us to specify range boundaries for a random value search:

param_dist = {"max_depth": sp_randint(5, 20),
              "max_features": sp_randint(1, len(cols2)),
              "min_samples_split": sp_randint(2, 11),
              "criterion": ["gini", "entropy"]}

Finally, having this feature space, we can run our randomized search:

rs = RandomizedSearchCV(
    model1,
    param_distributions=param_dist,
    cv=4, iid=False,
    random_state=2019,
    n_iter=50
)

In lieu of the general philosophy of scikit-learn, RandomSearchCV behaves as if it was a model—it has fit and predict methods. Under the hood, it iterates over parameters, averaging parameters over folds. As a result, it can return both the best score and the best corresponding estimator—the one that got the highest score on average. Consider the following code:

>>> rs.fit(data2[cols2], data2['result_num'])
>>> rs.bestscore
0.5613636363636363

>>> rs.best_estimator_
DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=5,
                       max_features=3, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=8,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=2019, splitter='best')

As you can see, the process was indeed able to tweak the parameters, finding the most generally performant model configuration, and it improved our accuracy by ~2%. Let's generate a diagram for the resultant model:

dot_data = StringIO()

export_graphviz(rs.best_estimator_, out_file=dot_data, 
                filled=True, rounded=True,
                special_characters=True, feature_names=cols2)

graph = pydotplus.graph_from_dot_data(dot_data.getvalue()) 
Image(graph.create_png())

Here, the code is similar to the one we ran in the previous chapter—we emulate a file with an in-memory object, generate a diagram, render it with pydotplus, and inject it into the notebook.

And here is the resultant diagram:

As you can see, the difference in guns, the infantry ratio, and a difference in tanks pop up all over the diagram—those are the main features the model makes use of.

Did it use our leaders features? Only one! The only commander who got into the model is Leonid Govorov—which is quite interesting. As we mentioned in the previous chapter, correlation is not causation, especially given the causal relationship between the events and imperfect data—but it is still a useful insight to spur a discussion or direct further research. What context are we completely missing? Is it true that artillery (guns) on average is more important than tanks or planes? Do those features play as important a role in the different theaters of war?

The visual representation of the decision tree allowed us to understand the logic of the model and better navigate the data. The model, in this case, works as an objective analytical tool. As a result, we are able to generate quite a few questions and hypotheses on the nature of data and the underlying historical events.

So far, we have been working with simple models, which are easy to interpret. However, these models are usually not as good at predictions, so why don't we try something more complex and performant, now that we know our dataset pretty well?

Table of Contents for Optimizing the hyperparameters

Create new playlist

Sign In

Sign Up

Table of Contents for
Optimizing the hyperparameters