Vanishing and exploding gradients

Training an RNN is hard because of two stability problems. Due to the feedback loop, the gradient can quickly diverge to infinity, or it can rapidly to 0. In both cases, as illustrated in the following figure, the network will stop learning anything useful. The problem of an exploding gradient can be tackled with a relatively simple solution based on gradient clipping. The problem of a vanishing gradient is more difficult to solve and it involves the definition of more complex RNN basic cells, such as Long Short Term Memory (LSTM), or Gated Recurrent Units (GRUs). Let's first discuss exploding gradients and gradient clipping:

Examples of gradient

Gradient clipping consists of imposing a maximum value to the the gradient so that it cannot grow boundless. This simple solution, illustrated in the following figure, offers a simple solution to the problem of exploding gradients:

An example of gradient clipping

Solving the problem of a vanishing gradient requires a more complex model for memory which allows it to selectively forget previous states and remember only the one that really matters. Considering the following figure, the input is written in a memory M with a probability p in [0,1], which is multiplied to weight the input.

In a similar way, the output is read with a probability p in [0,1], which is multiplied to weight the output. One more probability is used to decide what to remember or forget:

An example of memory cell
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset