So far, we have seen the visible and hidden layers of RBMs, but we have not yet seen how they learn features. Each of the visible layer's nodes take in a single feature from the dataset to be learned from. This data is then passed from the visible layer to the hidden layer through weights and biases:
The preceding visualization of an RBM shows the movement of a single data point through the graph and through a single hidden node. The visible layer has four nodes, representing the four columns of the original data. Each arrow represents a single feature of the data point moving through the four visible nodes in the first layer of the RBM. Each of the feature values is multiplied by a weight associated to that feature and are added up together. This calculation can also be summed up by a dot product between an input vector of data and a weight vector. The resulting weighted sum of the data is added to a bias variable and sent through an activation function (sigmoidal is popular). The result is stored in a variable called a.
As an example in Python, this code shows how a single data point (inputs) is multiplied by our weights vector and combined with the bias variable to create the activated variable, a:
import numpy as np
import math
# sigmoidal function
def activation(x):
return 1 / (1 + math.exp(-x))
inputs = np.array([1, 2, 3, 4])
weights = np.array([0.2, 0.324, 0.1, .001])
bias = 1.5
a = activation(np.dot(inputs.T, weights) + bias)
print a
0.9341341524806636
In a real RBM, each of the visible nodes is connected to each of the hidden nodes, and it looks something like this:
Because inputs from each visible node are passed to every single hidden node, an RBM can be defined as a symmetrical bipartite graph. The symmetrical part comes from the fact that the visible nodes are all connected with each hidden node. Bipartite means it has two parts (layers).