Summary

We have seen in this chapter two of the most powerful techniques at the core of many practical deep learning implementations: autoencoders and restricted Boltzmann machines.

For both of them, we started with the shallow example of one hidden layer, and we explored how we can stack them together to form a deep neural network able to automatically learn high-level and hierarchical features without requiring explicit human knowledge.

They both serve similar purposes, but there is a little substantial difference.

Autoencoders can be seen as a compression filter that we use to compress the data in order to preserve only the most informative part of it and be able to deterministically reconstruct an approximation of the original data. Autoencoders are an elegant solution to dimensionality reduction and non-linear compression bypassing the limitations of the principal component analysis (PCA) technique. The advantages of autoencoders are that they can be used as preprocessing steps for further classification tasks, where the output of each hidden layer is one of the possible levels of informative representations of the data, or a denoised and recovered version of it. Another great advantage is to exploit the reconstruction error as a measure of dissimilarity of a single point from the rest of the group. Such a technique is widely used for anomaly detection problems, where the relationships from what we observe and the internal representations are constant and deterministic. In the case of time-variant relationships or depending upon an observable dimension, we could group and train different networks in order to be adaptive, but once trained, the network assumes those relationships to not be affected by random variations.

On the other hand, RBM uses a stochastic approach to sample and adjust weights to minimize the reconstruction error. The intuition could be that there might exist some visible random variables and some hidden latent attributes, and the goal is to find how the two sets are connected to each other. To give an example, in the case of movie rating, we can have some hidden attributes, such as film genre, and some random observations, such as the rating and/or review. In such topology, we can also see the bias term as a way of adjusting the different inherent popularities of each movie. If we asked our users to rate which movie they like from a set made of Harry Potter, Avatar, Lord of The Ring, Gladiator, and Titanic, we might get a resulting network where two of the latent units could represent science fiction movies and Oscar-winning movies:

Summary

Example of possible RBM where only the links with a weight significantly different from 0 are drawn.

Although the attributes of SF and Oscar-winning are deterministic (effectively, they are attributes of the movie), the ratings of the users are influenced by that in a probabilistic way. The learned weights are the parameters that characterize the probability distribution of the movie rating (for example, Harry Potter with five stars), given that the user likes a particular genre (for example, science fiction).

In such a scenario, where the relationships are not deterministic, we want to prefer using RBM to using an autoencoder.

In conclusion, unsupervised features learning is a very powerful methodology to enrich feature engineering with the minimum required knowledge and human interaction.

Standing to a few benchmarks ([Lee, Pham and Ng, 2009] and [Le, Zhou and Ng, 2011]) performed in order to measure the accuracy of different feature learning techniques, it was proved that unsupervised feature learning improved accuracy with respect to the current state of the art.

There are a few open challenges though. If you do have some knowledge, it is always good not to discard it. We could embed that knowledge in the form of priors during the initialization step, where we might handcraft the network topology and initial state accordingly.

Moreover, since neural networks are already hard to explain and are mostly approached as black box, having an understanding of at least the input features could help. In our unsupervised feature learning, we want to consume raw data directly. Hence, understanding how the model works becomes even harder.

We will not address those issues in this book. We believe that it is too early to make some conclusions and that further evolutions of deep learning and the way people and businesses approach those applications will converge to a steady trustworthiness.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset