GRU

Gated recurrent units (GRUs) share a similar design philosophy to LSTM layers. They consist of:

A hidden state, h
An internal state, s
An update gate, z
A reset gate, r

The updates are given by the following recurrence relations:

The reset gate tells us how to put together the input with the previous memory. The update gate defines how much of the previous memory we should keep for the next iteration. This helps the network to forget useless things, and create connections with newer evidence. The resulting network has no hidden memory (c) which is independent of the internal state, as in the case with LSTMs. They also have input and forget gates merged into the update gate. One more key difference is that there is no second non-linearity (the second call to tanh in LSTMs) when computing the final output.

So, which one to use? The jury is still out on that one. There is no conclusive evidence to use one network over another. GRUs have fewer parameters and may be a bit faster to train. They also might need less data to generalize well (as it is easier to estimate the weights accurately). GRUs are quite new, from 2014, so they have not been very well-explored.

Table of Contents for GRU

Create new playlist

Sign In

Sign Up

Table of Contents for
GRU