LSTM

For an LSTM, we compute a number of things:

  • An input gate, i
  • A forget gate, f
  • An output gate, o
  • A candidate internal state, g
  • An internal memory of the unit, c
  • A final internal state, s

These are defined by the following recursive equations:

Where:

The first three equations describe the input, forget, and output gates. They are called gates because they decide which fraction of the newly computed candidate state will be allowed through, and which fraction will be forgotten. Given the new observations, we might want to keep something from the old memories and delete other things, or decide how much of the new information to take into account. This is computed in the memory variable, c. Finally, given this memory update, we keep a fraction of it that will be used by other parts of the network. If the weights in the input gate are all set to 1, the weights in the forget gate are all set to 0 and the weights of the output gate are all set to 1. We almost recover standard recurrent neural networks, except that the activation layer will have an extra tanh.

There are many variations of LSTM architectures, building on this basic model. Examples include convolutional, bi-directional, peephole, and LSTM with a forget gate. A great resource is Chris Olah's blog post on the topic: http://colah.github.io/posts/2015-08-Understanding-LSTMs/.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset