As we saw, the recursive structure of RNN and LSTM networks have problems with gradients, either the gradients vanish or explode. One workaround is to introduce forget gates, which will delete some of the old information. This helps to keep track of relevant information without destroying the gradients, and to better preserve important data observed a long time ago.
Both LSTM and GRU share the same design principle with recurrent neural networks, give an input, compute an output, and then a black box updates the internal state. This is crucial in order to understand the bigger picture.