Creating a stacking ensemble

In order to create our stacking ensemble, we will utilize three dense networks, with embeddings consisting of 5, 10, and 15 features as base learners. We will train all networks on the original train set and utilize them to make predictions on the test set. Furthermore, we will train a Bayesian ridge regression as a meta learner. In order to train the regression, we will use all but the last 1,000 samples of the test set. Finally, we will evaluate the stacking ensemble on these last 1,000 samples.

First, we will create a function that creates and trains a dense network with n number of embedding features, as well as a function that accepts a model as input and return its predictions on the test set:

def create_model(n_features=5, train_model=True, load_weights=False):
    fts = n_features
 
    # Movie part. Input accepts the index as input
    # and passes it to the Embedding layer. Finally,
    # Flatten transforms Embedding's output to a
    # one-dimensional tensor.
    movie_in = Input(shape=[1], name="Movie")
    mov_embed = Embedding(n_movies, fts, name="Movie_Embed")(movie_in)
    flat_movie = Flatten(name="FlattenM")(mov_embed)
 
    # Repeat for the user.
    user_in = Input(shape=[1], name="User")
    user_inuser_embed = Embedding(n_users, fts, name="User_Embed")(user_in)
    flat_user = Flatten(name="FlattenU")(user_inuser_embed)
 
    # Concatenate the Embedding layers and feed them 
    # to the Dense part of the network
    concat = Concatenate()([flat_movie, flat_user])
    dense_1 = Dense(128)(concat)
    dense_2 = Dense(32)(dense_1)
    out = Dense(1)(dense_2)
 
    # Create and compile the model
    model = Model([user_in, movie_in], out)
    model.compile('adam', 'mean_squared_error')
    # Train the model
    model.fit([train.userId, train.movieId], train.rating, epochs=10, verbose=1)
 
    return model

def predictions(model):
    preds = model.predict([test.userId, test.movieId])
    return preds

We will then create and train our base learners and meta learner in order to predict on the test set. We combine all three models' predictions in a single array:

# Create base and meta learner
model5 = create_model(5)
model10 = create_model(10)
model15 = create_model(15)
meta_learner = BayesianRidge()

# Predict on the test set
preds5 = predictions(model5)
preds10 = predictions(model10)
preds15 = predictions(model15)
# Create a single array with the predictions
preds = np.stack([preds5, preds10, preds15], axis=-1).reshape(-1, 3)

Finally, we train the meta learner on all but the last 1,000 test samples and evaluate the base learners, as well as the whole ensemble, on these last 1,000 samples:

# Fit the meta learner on all but the last 1000 test samples
meta_learner.fit(preds[:-1000], test.rating[:-1000])

# Evaluate the base learners and the meta learner on the last
# 1000 test samples
print('Base Learner 5 Features')
print(metrics.mean_squared_error(test.rating[-1000:], preds5[-1000:]))
print('Base Learner 10 Features')
print(metrics.mean_squared_error(test.rating[-1000:], preds10[-1000:]))
print('Base Learner 15 Features')
print(metrics.mean_squared_error(test.rating[-1000:], preds15[-1000:]))
print('Ensemble')
print(metrics.mean_squared_error(test.rating[-1000:], meta_learner.predict(preds[-1000:])))

The results are depicted in the following table. As is evident, the ensemble is able to outperform the individual base learners on unseen data, achieving a lower MSE than any individual base learner:

Model	MSE
Base Learner 5	0.7609
Base Learner 10	0.7727
Base Learner 15	0.7639
Ensemble	0.7596

Results for individual base learners and the ensemble

Table of Contents for Creating a stacking ensemble

Create new playlist

Sign In

Sign Up

Table of Contents for
Creating a stacking ensemble