Chapter 5. Optimizing TensorFlow Autoencoders

In Machine Learning (ML), the so-called curse of dimensionality is a progressive decline in performance with an increase in the input space, often with hundreds or thousands of dimensions, which does not occur in low-dimensional settings such as three-dimensional space. This occurs because the number of samples needed to obtain a sufficient sampling of the input space increases exponentially with the number of dimensions. To overcome this problem, some optimizing networks have been developed.

The first one is autoencoder networks. These are designed and trained to transform an input pattern in itself so that in the presence of a degraded or incomplete version of an input pattern, it is possible to obtain the original pattern. An autoencoder is a Neural Network (NN). The network is trained to create output data like those presented in the entrance and the hidden layer stores the compressed data.

The second optimizing networks are Boltzmann Machines (see Chapter 3, Feed-Forward Neural Networks with TensorFlow for more details). This type of network consists of a visible input/output layer and one hidden layer. The connections between the visible layer and the hidden one are non-directional—data can travel in both directions, visible-hidden and hidden-visible, and the different neuronal units can be fully connected or partially connected.

Autoencoders can be compared with Principal Component Analysis (PCA) (refer to https://en.wikipedia.org/wiki/Principal_component_analysis), which is used to represent a given input using fewer dimensions than originally present. However, in this chapter, we'll focus only on autoencoders.

In a nutshell, the following topics will be covered in this chapter:

  • How does an autoencoder work?
  • How to implement an autoencoder
  • Improving autoencoder robustness
  • Building denoising autoencoders
  • Convolutional autoencoders
  • Fraud analytics using autoencoders

How does an autoencoder work?

Autoencoding is a data compression technique where the compression and decompression functions are data-specific, lossy, and learned automatically from samples rather than human-crafted manual features. Additionally, in almost all contexts where the term autoencoder is used, the compression and decompression functions are implemented with NNs.

An autoencoder is a network with three or more layers, where the input and the output layers have the same number of neurons, and those intermediate (hidden layers) have a lower number of neurons. The network is trained to reproduce output simply, for each piece of input data, the same pattern of activity in the input.

The remarkable aspect of autoencoders is that, due to the lower number of neurons in the hidden layer, if the network can learn from examples and generalize to an acceptable extent, it performs data compression: the status of the hidden neurons provides, for each example, a compressed version of the input and output common states.

In the first examples of such networks, in the mid-1980s, a compression of simple images was obtained in this way. Some authors, who have developed an effective strategy for improving the learning process in this type of network (they are usually very slow and are not always effective), have recently revived interest in autoencoders through a prelearning procedure, that provides a good initial condition of the weights for the learning procedure.

Useful applications of autoencoders are data denoising and dimensionality reduction for data visualization. The following diagram shows how an autoencoder typically works—it reconstructs the received input through two phases: an encoding phase, which corresponds to a dimensional reduction for the original input, and a decoding phase, capable of reconstructing the original input from the encoded (compressed) representation:

How does an autoencoder work?

Figure 1: Encoder and decoder phases in an autoencoder

As mentioned earlier, an autoencoder is an NN, as well as an unsupervised learning (feature learning) algorithm. Less technically, it tries to learn an approximation of an identity function. However, we can impose constraints on the network such as fewer units in the hidden layer. In this way, an autoencoder represents original input from compressed, noisy, or corrupted data. The following diagram shows an autoencoder that consists of the narrow hidden layer between an encoder and a decoder:

How does an autoencoder work?

Figure 2: An unsupervised autoencoder as a network for latent feature learning

In the preceding diagram, the hidden layer or the intermediate layer is also called the latent space representation of the input data. Now, suppose we have a set of unlabeled training examples How does an autoencoder work?, where How does an autoencoder work? and x is a vector, and x (1) refers to the first item in the vector.

An autoencoder NN is essentially an unsupervised learning algorithm that applies backpropagation, setting the target values to be equal to the inputs; it uses How does an autoencoder work?.

The autoencoder tries to learn a function, How does an autoencoder work?. In other words, it is trying to learn an approximation to the identity function in order to output How does an autoencoder work? that is similar to x. The identity function seems a particularly trivial function to be trying to learn, but by placing constraints on the network, such as by limiting the number of hidden units, we can discover interesting features of the data:

How does an autoencoder work?

Figure 3: Learning an approximation of the identity function autoencoder

As a concrete example, suppose the inputs x are the pixel intensity values of a 10 × 10 image (100 pixels), so n=100, and there are How does an autoencoder work? hidden units in layer How does an autoencoder work? and How does an autoencoder work?. Since there are only 50 hidden units, the network is forced to learn a compressed representation of the input. It is only given the vector of hidden unit activations How does an autoencoder work?, so it must try to reconstruct the 100-pixel input, that is, x 1 , x 2 , …, x 100 from the 50 hidden units. The preceding diagram shows only 6 inputs feeding into layer 1 and exactly 6 units feeding out from layer 3.

A neuron can be active (or firing) if its output value is close to 1, or inactive if its output value is close to 0. However, for simplicity, we assume that the neurons are inactive most of the time. This argument is true as long as we are talking about the sigmoid activation function. However, if you are using the tanh function as an activation function, then a neuron is inactive when it outputs values close to -1.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset