Bayesian deep learning

We now have a whole set of models that can make forecasts on time series. But are the point estimates that these models give sensible estimates or just random guesses? How certain is the model? Most classic probabilistic modeling techniques, such as Kalman filters, can give confidence intervals for predictions, whereas regular deep learning cannot do this. The field of Bayesian deep learning combines Bayesian approaches with deep learning to enable models to express uncertainty.

The key idea in Bayesian deep learning is that there is inherent uncertainty in the model. Sometimes this is done by learning a mean and standard deviation for weights instead of just a single weight value. However, this approach increases the number of parameters required, so it did not catch on. A simpler hack that allows us to turn regular deep networks into Bayesian deep networks is to activate dropout during prediction time and then make multiple predictions.

In this section, we will be using a simpler dataset than before. Our X values are 20 random values between -5 and 5, and our y values are just the sine function applied to these values.

We start by running the following code:

X = np.random.rand(20,1) * 10-5
y = np.sin(X)

Our neural network is relatively straightforward, too. Note that Keras does not allow us to make a dropout layer the first layer, therefore we need to add a Dense layer that just passes through the input value. We can achieve this with the following code:

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation

model = Sequential()

model.add(Dense(1,input_dim = 1))
model.add(Dropout(0.05))

model.add(Dense(20))
model.add(Activation('relu'))
model.add(Dropout(0.05))

model.add(Dense(20))
model.add(Activation('relu'))
model.add(Dropout(0.05))

model.add(Dense(20))
model.add(Activation('sigmoid'))

model.add(Dense(1))

To fit this function, we need a relatively low learning rate, so we import the Keras vanilla stochastic gradient descent optimizer in order to set the learning rate there. We then train the model for 10,000 epochs. Since we are not interested in the training logs, we set verbose to 0, which makes the model train "quietly."

We do this by running the following:

from keras.optimizers import SGD
model.compile(loss='mse',optimizer=SGD(lr=0.01))
model.fit(X,y,epochs=10000,batch_size=10,verbose=0)

We want to test our model over a larger range of values, so we create a test dataset with 200 values ranging from -10 to 10 in 0.1 intervals. We can imitate the test by running the following code:

X_test = np.arange(-10,10,0.1)
X_test = np.expand_dims(X_test,-1)

And now comes the magic trick! Using keras.backend, we can pass settings to TensorFlow, which runs the operations in the background. We use the backend to set the learning phase parameter to 1. This makes TensorFlow believe that we are training, and so it will apply dropout. We then make 100 predictions for our test data. The result of these 100 predictions is a probability distribution for the y value at every instance of X.

Note

Note: For this example to work, you have to load the backend, clear the session, and set the learning phase before defining and training the model, as the training process will leave the setting in the TensorFlow graph. You can also save the trained model, clear the session, and reload the model. See the code for this section for a working implementation.

To start this process, we first run:

import keras.backend as K
K.clear_session()
K.set_learning_phase(1)

And now we can obtain our distributions with the following code:

probs = []
for i in range(100):
    out = model.predict(X_test)
    probs.append(out)

Next we can calculate the mean and standard deviation for our distributions:

p = np.array(probs)

mean = p.mean(axis=0)
std = p.std(axis=0)

Finally, we plot the model's predictions with one, two, and four standard deviations (corresponding to different shades of blue):

plt.figure(figsize=(10,7))
plt.plot(X_test,mean,c='blue')

lower_bound = mean - std * 0.5
upper_bound =  mean + std * 0.5
plt.fill_between(X_test.flatten(),upper_bound.flatten(),lower_bound.flatten(),alpha=0.25, facecolor='blue')

lower_bound = mean - std
upper_bound =  mean + std
plt.fill_between(X_test.flatten(),upper_bound.flatten(),lower_bound.flatten(),alpha=0.25, facecolor='blue')

lower_bound = mean - std * 2
upper_bound =  mean + std * 2
plt.fill_between(X_test.flatten(),upper_bound.flatten(),lower_bound.flatten(),alpha=0.25, facecolor='blue')

plt.scatter(X,y,c='black')

As a result of running this code, we will see the following graph:

Bayesian deep learning

Predictions with uncertainty bands

As you can see, the model is relatively confident around areas where it had data and becomes less and less confident the further away it gets from the data points.

Getting uncertainty estimates from our model increases the value we can get from it. It also helps in improving the model if we can detect where the model is over or under confident. Right now, Bayesian deep learning is only in its infancy, and we will certainly see many advances in the next few years.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset