So, what if you wanted to build a machine that writes like a dead author? Or understands that a pop in the price of a stock two weeks ago might mean that the stock will pop again today? For sequence prediction tasks where key information is observed early on in training, say at t+1, but necessary to make an accurate prediction at t+250, vanilla RNNs struggle. This is where LSTM (and, for some tasks, GRU) networks come into the picture. Instead of a simple cell, you have multiple, conditional mini neural networks, each determining whether or not to carry information across timesteps. We will now discuss each of these variations in detail.