Stacking RBMs to create a deep belief network

RBM models are a neural network with just two layers: the input, that is, the visible layer, and the hidden layer with latent features. However, it is possible to add additional hidden layers and an output layer. When this is done within the context of an RBM, it is referred to as a deep belief network. In this way, deep belief networks are like other deep learning architectures. For a deep belief network, each hidden layer is fully connected meaning that it learns the entire input.

The first layer is the typical RBM, where latent features are calculated from the input units. In the next layer, the new hidden layer learns the latent features from the previous hidden layer. This, in turn, can lead to an output layer for classification tasks.

Implementing a deep belief network uses a similar syntax to what was used to train the RBM. To get started, let's first perform a quick check of the latent feature space from the RBM we just trained. To print a sample of the latent feature space from the model, we use the following code:

train_latent_features <- rbm.up(rbm1, as.matrix(train_x))
test_latent_features <- rbm.up(rbm1, as.matrix(test_x))

In the preceding code, we use the up function to generate a matrix of latent features using the model that we just fit. The up function takes as input an RBM model and a matrix of visible units and outputs a matrix of hidden units. The reverse is also possible. The down function takes a matrix of hidden units as input and outputs visible units. Using the preceding code, we will see an output like the following printed to the console:

We can see variance in the feature space of this first layer. To prepare for the next step, we imagine using this matrix now as input to another RBM that will further learn features. In this way, we can code our deep belief network using an almost identical syntax to the syntax used for training the RBM. The exception will be that for the hidden layer argument, rather than a single value representing the number of units in a single hidden layer, we can now use a vector of values that represent the number of units in each successive hidden layer. For our deep belief network, we will start with 100 units, just like in our RBM.

We will then reduce this to 50 units in the next layer and 10 units in the layer after that. The other difference is that we now have a target variable. While an RBM is an unsupervised, generative model, we can use our deep belief network to perform a classification task. We train our deep belief network using the following code:

dbn <- dbn.dnn.train(x = as.matrix(train_x), y = train_y, hidden = c(100,50,10), cd = 1, numepochs = 5)

With the deep belief network trained, we can now make predictions using our model. We perform this prediction task in a similar way to how we generate predictions for most machine learning tasks. However, in this case, we will use the nn.predict function to use our trained neural network to predict whether the new test input should be classified as spam or a legitimate text. We make a prediction on the test data using the following code:

predictions <- nn.predict(dbn, as.matrix(test_x))

We now have the probability values that tell us whether a given message is or is not spam. The probabilities are currently within a constrained range; however, we can still use it. Let's make a cut in the probabilities and assign 1 for those above the threshold signifying that the message is predicted to be spam, and everything under the cut point will receive a value of 0. After making this dividing line and creating a vector of binary values, we can create a confusion matrix to see how well our model performed. We create our binary variables and then see how well our model performed by running the following code:

pred_class <- if_else(predictions > 0.3, 1, 0)
table(test_y,pred_class)

After running the preceding code, we will see the following output to our console:

As we can see, even this very simple implementation of a deep belief network has performed fairly well. From here, additional modification can be made to the number of hidden layers, units in these layers, the output activation function, learning rate, momentum, and dropout, along with the contrastive divergence and the number of epochs or rounds. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset