In Machine Learning (ML), the so-called curse of dimensionality is a progressive decline in performance with an increase in the input space, often with hundreds or thousands of dimensions, which does not occur in low-dimensional settings such as three-dimensional space. This occurs because the number of samples needed to obtain a sufficient sampling of the input space increases exponentially with the number of dimensions. To overcome this problem, some optimizing networks have been developed.
The first one is autoencoder networks. These are designed and trained to transform an input pattern in itself so that in the presence of a degraded or incomplete version of an input pattern, it is possible to obtain the original pattern. An autoencoder is a Neural Network (NN). The network is trained to create output data like those presented in the entrance and the hidden layer stores the compressed data.
The second optimizing networks are Boltzmann Machines (see Chapter 3, Feed-Forward Neural Networks with TensorFlow for more details). This type of network consists of a visible input/output layer and one hidden layer. The connections between the visible layer and the hidden one are non-directional—data can travel in both directions, visible-hidden and hidden-visible, and the different neuronal units can be fully connected or partially connected.
Autoencoders can be compared with Principal Component Analysis (PCA) (refer to https://en.wikipedia.org/wiki/Principal_component_analysis), which is used to represent a given input using fewer dimensions than originally present. However, in this chapter, we'll focus only on autoencoders.
In a nutshell, the following topics will be covered in this chapter:
Autoencoding is a data compression technique where the compression and decompression functions are data-specific, lossy, and learned automatically from samples rather than human-crafted manual features. Additionally, in almost all contexts where the term autoencoder is used, the compression and decompression functions are implemented with NNs.
An autoencoder is a network with three or more layers, where the input and the output layers have the same number of neurons, and those intermediate (hidden layers) have a lower number of neurons. The network is trained to reproduce output simply, for each piece of input data, the same pattern of activity in the input.
The remarkable aspect of autoencoders is that, due to the lower number of neurons in the hidden layer, if the network can learn from examples and generalize to an acceptable extent, it performs data compression: the status of the hidden neurons provides, for each example, a compressed version of the input and output common states.
In the first examples of such networks, in the mid-1980s, a compression of simple images was obtained in this way. Some authors, who have developed an effective strategy for improving the learning process in this type of network (they are usually very slow and are not always effective), have recently revived interest in autoencoders through a prelearning procedure, that provides a good initial condition of the weights for the learning procedure.
Useful applications of autoencoders are data denoising and dimensionality reduction for data visualization. The following diagram shows how an autoencoder typically works—it reconstructs the received input through two phases: an encoding phase, which corresponds to a dimensional reduction for the original input, and a decoding phase, capable of reconstructing the original input from the encoded (compressed) representation:
As mentioned earlier, an autoencoder is an NN, as well as an unsupervised learning (feature learning) algorithm. Less technically, it tries to learn an approximation of an identity function. However, we can impose constraints on the network such as fewer units in the hidden layer. In this way, an autoencoder represents original input from compressed, noisy, or corrupted data. The following diagram shows an autoencoder that consists of the narrow hidden layer between an encoder and a decoder:
In the preceding diagram, the hidden layer or the intermediate layer is also called the latent space representation of the input data. Now, suppose we have a set of unlabeled training examples , where and x is a vector, and x (1) refers to the first item in the vector.
An autoencoder NN is essentially an unsupervised learning algorithm that applies backpropagation, setting the target values to be equal to the inputs; it uses .
The autoencoder tries to learn a function, . In other words, it is trying to learn an approximation to the identity function in order to output that is similar to x. The identity function seems a particularly trivial function to be trying to learn, but by placing constraints on the network, such as by limiting the number of hidden units, we can discover interesting features of the data:
As a concrete example, suppose the inputs x are the pixel intensity values of a 10 × 10 image (100 pixels), so n=100, and there are hidden units in layer and . Since there are only 50 hidden units, the network is forced to learn a compressed representation of the input. It is only given the vector of hidden unit activations , so it must try to reconstruct the 100-pixel input, that is, x 1 , x 2 , …, x 100 from the 50 hidden units. The preceding diagram shows only 6 inputs feeding into layer 1 and exactly 6 units feeding out from layer 3.
A neuron can be active (or firing) if its output value is close to 1
, or inactive if its output value is close to 0
. However, for simplicity, we assume that the neurons are inactive most of the time. This argument is true as long as we are talking about the sigmoid activation function. However, if you are using the tanh
function as an activation function, then a neuron is inactive when it outputs values close to -1
.