Deep architectures

There is a great variety of deep neural architectures with both feedforward and feedback flows, although they are typically feedforward. Main architectures are, without limitation to:

Convolutional neural network

In this architecture, the layers may have multidimensional organization. Inspired by the visual cortex of animals, the typical dimensionality applied to the layers is three-dimensional. In convolutional neural networks (CNNs), part of the signals of a preceding layer is fed into another part of neurons in the following layer. This architecture is feedforward and is well applied for image and sound recognition. The main feature that distinguishes this architecture from Multilayer Perceptrons is the partial connectivity between layers. Considering the fact that not all neurons are relevant for a certain neuron in the next layer, the connectivity is local and respects the correlation between neurons. This prevents both long time training and overfitting, provided that a fully connected MLP blows up the number of weight as the dimension of images grows, for example. In addition, neurons in layers are arranged in dimensions, typically three, thereby staked in an array in width, height, and depth.

Deep architectures

In this architecture, the layers may have multidimensional organization. Inspired by the visual cortex of animals, the typical dimensionality applied to the layers is three-dimensional. In convolutional neural networks (CNNs), part of the signals of a preceding layer is fed into another part of neurons in the following layer. This architecture is feedforward and is well applied for image and sound recognition. The main feature that distinguishes this architecture from multilayer perceptrons is the partial connectivity between layers. Considering the fact that not all neurons are relevant for a certain neuron in the next layer, the connectivity is local and respects the correlation between neurons. This prevents both long time training and overfitting, provided that a fully connected MLP blows up the number of weight as the dimension of images grows, for example. In addition, neurons in layers are arranged in dimensions, typically three, thereby staked in an array in width, height, and depth.

Long short-term memory: This is a recurrent type of neural network that takes into account always the last value of the hidden layer, exactly like a hidden Markov model (HMM). A Long Short Time Memory network (LSTM) has LSTM units instead of traditional neurons, and these units implement operations such as store and forget a value to control the flow in a deep network. This architecture is well applied to natural language processing, due to the capacity of retaining information for a long time while receiving completely unstructured data such as audio or text files. One way to train this type of network is the backpropagation through time (BPTT) algorithm, but there are also other algorithms such as reinforcement learning or evolution strategies.

Deep architectures

Deep belief network: Deep belief networks (DBN's) are probabilistic models where layers are classified into visible and hidden. This is also a type of recurrent neural network based on a restricted Boltzmann machine (RBM). It is typically used as a first step in the training of a deep neural network (DNN), which is further trained by other supervised algorithms such as backpropagation. In this architecture each layer acts like a feature detector, abstracting new representations of data. The visible layer acts both as an output and as an input, and the deepest hidden layer represents the highest level of abstraction. Applications of this architecture are typically the same as those of convolutional neural networks.

Deep architectures

How to implement deep learning in Java

Because this book is introductory, we are not diving into further details on deep learning in this chapter. However, some recommendations of code for a deep architecture are provided. An example on how a convolutional neural network would be implemented is provided here. One needs to implement a class called ConvolutionalLayer to represent a multidimensional layer, and a CNN class for the convolutional neural network itself:

public class ConvolutionalLayer extends NeuralLayer{
  int height,width, depth;
//…
  ArrayList<ArrayList<ArrayList<Neuron>>> neurons;
  Map<Neuron,Neuron> connections;
  ConvolutionalLayer previousLayer;

  //the method call should take into account the mapping 
  // between neurons from different layers
  @Override
  public void calc(){
    ArrayList<ArrayList<ArrayList<double>>> inputs;
    foreach(Neuron n:neurons){
      foreach(Neuron m:connections.keySet()){ 
  // here we get only the inputs that are connected to the neuron

      }
    }
  }

}

public class CNN : NeuralNet{
  int depth;
  ArrayList<ConvolutionalLayer> layers;
//…
  @Override
  public void calc(){
    //here we perform the calculation for each layer, 
    //taking into account the connections between layers
  }
}

In this class, the neurons are organized in dimensions and methods for pruning are used to make the connections between the layers. Please see the files ConvolutionalLayer.java and CNN.java for further details.

Since the other architectures are recurrent and this book does not cover the recurrent neural networks (for simplicity purposes in an introductory book) they are provided only for the reader's information. We suggest the reader to take a look at the references provided to find out more on these architectures.

Hybrid systems

In machine learning, or even in the artificial intelligence field, there are many other algorithms and techniques other than neural networks. Each technique has its strengths and drawbacks, and that inspires many researchers to combine them into a single structure. Neural networks are part of the connectionist approach for artificial intelligence, whereby operations are performed on numerical and continuous values; but there are other approaches that include cognitive (rule-based systems) and evolutionary computation.

Connectionist

Cognitive

Evolutionary

Numerical processing

Symbolic processing

Numerical and symbolic processing

Large network structures

Large rule bases and premises

Large quantity of solutions

Performance by statistics

Design by experts/statistics

Better solutions are produced every iteration

Highly sensitive to data

Highly sensitive to theory

Local minima proof

The main representative of connectionism is the neural network, which has a lot of different architectures for a variety of purposes. Some neural networks, such as multilayer perceptrons (MLP), are good at mapping nonlinear input-output behaviors, while others such as self-organizing maps (SOM) are good at finding patterns in the data. Some architecture, such as radial basis function (RBF) networks, combines multiple features in different steps of training and processing.

One motivation for using hybrid neural systems is common to one of the foundations of deep learning, which is the feature extraction. Tasks like image recognition become very tough to deal with when resolution is very high; however, if that data can be compacted or reduced, the processing becomes much simpler.

Combining multiple approaches for artificial intelligence is also interesting, although it becomes more complex. In this context, let's review two strategies: neuro-fuzzy and neuro-genetic.

Tip

Considering that the concepts addressed in this chapter are advanced, we are not providing full code implementations; instead, we provide only a basic structural snippet on how to start implementing these concepts.

Neuro-fuzzy

Fuzzy logic is a type of rule-based processing, where every variable is converted to a symbolic value according to a membership function, and then the combination of all variables is queried against an IF-THEN rule database.

Neuro-fuzzy

A membership function usually has a Gaussian bell shape, which tells us how much a given value is a member of that class. Let's take, for example, temperature, which may take on three different classes (cold, normal, and warm). A membership value will be higher the more the temperature is closer to the bell shape centers.

Neuro-fuzzy

Furthermore, the fuzzy processing finds which rules are fired by every input record and which output values are produced. A neuro-fuzzy architecture treats each input differently, so the first hidden layer has a set of neurons for each input corresponding for each membership function:

Neuro-fuzzy

Tip

In this architecture, the training finds optimal weights for the rule processing and weighted sum of consequent parameters only, the first hidden layer has no adjustable weights.

In fuzzy logic architecture, the experts define a rule database that may become huge as the number of variables increase. The neuro-fuzzy architecture releases the designer from defining the rules, and lets this task be performed by the neural network. The training of a neuro-fuzzy can be performed by gradient type algorithms such as backpropagation or matrix algebra such as least squares, both in the supervised mode. Neuro-fuzzy systems are suitable for control of dynamic systems and diagnostics.

Neuro-genetic

In the evolutionary artificial intelligence approach, one common strategy is genetic algorithms. This name is inspired by natural evolution, which states that beings more adapted to the environment are able to produce new generations of better adapted beings. In the computing intelligence field, the beings or individuals are candidate solutions or hypotheses that can solve an optimization problem. Supervised neural networks are used for optimization, since there is an error measure that we want to minimize by adjusting the neural weights. While the training algorithms are able to find better weights by gradient methods, they often fall in local minima. Although some mechanisms, such as regularization and momentum, may improve the results, once the weights fall in a local minimum, it is very unlikely that a better weight will be found, and in this context genetic algorithms are very good at it.

Think of the neural weights as a genetic code (or DNA). If we could generate a finite number of random generated weight sets, and evaluate which produce the best results (smaller errors or other performance measurement), we would select a top N best weight, and then set and apply genetic operations on them, such as reproduction (interchange of weights) and mutation (random change of weights).

Neuro-genetic

This process is repeated until some acceptable solution is found.

Another strategy is to use genetic operations on neural network parameters, such as number of neurons, learning rate, activation functions, and so on. Considering that, there is always a need to adjust parameters or train multiple times to ensure we've found a good solution. So, one may code all parameters in a genetic code (parameter set) and generate multiple neural networks for each parameter set.

The scheme of a genetic algorithm is shown in the following figure:

Neuro-genetic

Tip

Genetic algorithms are broadly used for many optimization problems, but in this book we are sticking with these two classes of problems, weight and parameter optimization.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset