Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Reusing models with joblib

The joblib Memory class is a utility class that facilitates caching of function or method results to disk. We create a Memory object by specifying a caching directory. We can then decorate the function to cache or specify methods to cache in a class constructor. If you like, you can specify the arguments to ignore. The default behavior of the Memory class is to remove the cache any time the function is modified or the input values change. Obviously, you can also remove the cache manually by moving or deleting cache directories and files.

In this recipe, I describe how to reuse a scikit-learn regressor or classifier. The naïve method would be to store the object in a standard Python pickle or use joblib. However, in most cases, it is better to store the hyperparameters of the estimator.

We will use the ExtraTreesRegressor class as estimator. Extra trees (extremely randomized trees) are a variation of the random forest algorithm, which is covered in the Learning with random forests recipe.

How to do it...

The imports are as follows:

from sklearn.grid_search import GridSearchCV
from sklearn.ensemble import ExtraTreesRegressor
import ch9util
from tempfile import mkdtemp
import os
import joblib

Load the data and define a hyperparameter grid search dictionary:

X_train, X_test, y_train, y_test = ch9util.temp_split()
params = {'min_samples_split': [1, 3],
          'bootstrap': [True, False],
          'min_samples_leaf': [3, 4]}

Do a grid search as follows:

gscv = GridSearchCV(ExtraTreesRegressor(random_state=41),
                    param_grid=params, cv=5)

Fit and predict as follows:

gscv.fit(X_train, y_train)
preds = gscv.predict(X_test)

Store the best parameters found by the grid search:

dir = mkdtemp()
pkl = os.path.join(dir, 'params.pkl')
joblib.dump(gscv.best_params_, pkl)
params = joblib.load(pkl)
print('Best params', gscv.best_params_)
print('From pkl', params)

Create a new estimator and compare the predictions:

est = ExtraTreesRegressor(random_state=41)
est.set_params(**params)
est.fit(X_train, y_train)
preds2 = est.predict(X_test)
print('Max diff', (preds - preds2).max())

Refer to the following screenshot for the end result:

The code is in the reusing_models.py file in this book's code bundle.

Table of Contents for
Reusing models with joblib

Reusing models with joblib

How to do it...

See also

Table of Contents for Reusing models with joblib

Create new playlist

Sign In

Sign Up

Reusing models with joblib

How to do it...

See also

Table of Contents for
Reusing models with joblib