We use the RBM class we have already created in the recipe Restricted Boltzmann Machine, with just one change, we do not need to reconstruct the image after training now. Instead, our stacked RBMs will be only forward passing the data up to the last MLP layer of DBN. This is achieved by removing the reconstruct() function from the class, and replacing it with the rbm_output() function:
def rbm_output(self,X):
x = tf.nn.sigmoid(tf.matmul(X, self._W) + self._c)
return self.session.run(x, feed_dict={self._X: X})
For the data, we consider the Kaggle Face Emotion Recognition data, available at https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge. The data description given there is:
train.csv contains two columns, "emotion" and "pixels". The "emotion" column contains a numeric code ranging from 0 to 6, inclusive, for the emotion that is present in the image. The "pixels" column contains a string surrounded in quotes for each image. The contents of this string are space-separated pixel values in row major order. test.csv contains only the "pixels" column and your task is to predict the emotion column.
The training set consists of 28,709 examples. The public test set used for the leaderboard consists of 3,589 examples. The final test set, which was used to determine the winner of the competition, consists of another 3,589 examples.
This dataset was prepared by Pierre-Luc Carrier and Aaron Courville, as part of an ongoing research project. They have graciously provided the workshop organizers with a preliminary version of their dataset to use for this contest.
The complete data is in one .csv file called fer2013.csv. We separate out training, validation, and test data from this:
data = pd.read_csv('data/fer2013.csv')
tr_data = data[data.Usage == "Training"]
test_data = data[data.Usage == "PublicTest"]
mask = np.random.rand(len(tr_data)) < 0.8
train_data = tr_data[mask]
val_data = tr_data[~mask]
We will need to preprocess the data, that is, separate the pixels and emotion labels. For this we make two function dense_to_one_hot (), it performs the one hot encoding for labels. The second function is preprocess_data(), which separates out individual pixels as an array. With the help of these two functions, we generate the input feature and label of the training, validation, and test dataset:
def dense_to_one_hot(labels_dense, num_classes):
num_labels = labels_dense.shape[0]
index_offset = np.arange(num_labels) * num_classes
labels_one_hot = np.zeros((num_labels, num_classes))
labels_one_hot.flat[index_offset + labels_dense.ravel()] = 1
return labels_one_hot
def preprocess_data(dataframe):
pixels_values = dataframe.pixels.str.split(" ").tolist()
pixels_values = pd.DataFrame(pixels_values, dtype=int)
images = pixels_values.values
images = images.astype(np.float32)
images = np.multiply(images, 1.0/255.0)
labels_flat = dataframe["emotion"].values.ravel()
labels_count = np.unique(labels_flat).shape[0]
labels = dense_to_one_hot(labels_flat, labels_count)
labels = labels.astype(np.uint8)
return images, labels
Using the functions defined in the preceding code, we get the data in the format required for training. We build the emotion detection DBN, based on the similar principle as mentioned in this paper for MNIST: https://www.cs.toronto.edu/~hinton/absps/fastnc.pdf.