RNNs, specifically LSTM models, is often a difficult topic to understand. Time series prediction is a useful application of RNNs because of temporal dependencies in the data. Time series data is abundantly available online. In this section, we will see an example of using an LSTM for handling time series data. Our LSTM network will be able to predict the number of airline passengers in the future.
The dataset that I will be using is data about international airline passengers from 1949 to 1960. The dataset can be downloaded from https://datamarket.com/data/set/22u3/international-airlinepassengers- monthly-totals-in#!ds=22u3&display=line. The following screenshot shows the metadata of the international airline passengers:
You can download the data by choosing the Export tab and then selecting CSV (,) in the Export group. You will have to edit the CSV file manually to remove the header line, as well as the additional footer line. I have downloaded and saved the data file named international-airline-passengers.csv
. The following graph is a nice plot of the time series data:
Now let's load the original dataset and see some facts. At first, we load the time series as follows (see time_series_preprocessor.py
):
import csv import numpy as np
Here, we can see the signature of load_series()
, which is a user-defined method that loads the time series and normalizes it:
def load_series(filename, series_idx=1): try: with open(filename) as csvfile: csvreader = csv.reader(csvfile) data = [float(row[series_idx]) for row in csvreader if len(row) > 0] normalized_data = (data - np.mean(data)) / np.std(data) return normalized_data except IOError: Print("Error occurred") return None
Now let's invoke the preceding method to load the time series and print (issue $ python3 plot_time_series.py
on Terminal) the number of series in the dataset:
import csv import numpy as np import matplotlib.pyplot as plt import time_series_preprocessor as tsp timeseries = tsp.load_series('international-airline-passengers.csv') print(timeseries)
The following is the output of the preceding code:
>>> [-1.40777884 -1.35759023 -1.24048348 -1.26557778 -1.33249593 -1.21538918 -1.10664719 -1.10664719 -1.20702441 -1.34922546 -1.47469699 -1.35759023 ….. 2.85825285 2.72441656 1.9046693 1.5115252 0.91762667 1.26894693] print(np.shape(timeseries))
>>> 144
That means there are 144
entries in the time series. Let's plot the time series:
plt.figure() plt.plot(timeseries) plt.title('Normalized time series') plt.xlabel('ID') plt.ylabel('Normalized value') plt.legend(loc='upper left') plt.show()
The following is the output of the preceding code:
>>>
Once we have loaded the time series dataset, the next task is to prepare the training set. Since we will be evaluating the model multiple times to predict future values, we will split the data into training and testing. To be more specific, the split_data()
function divides the dataset into two components for training and testing, 75% for training and 25% for testing:
def split_data(data, percent_train): num_rows = len(data) train_data, test_data = [], [] for idx, row in enumerate(data): if idx < num_rows * percent_train: train_data.append(row) else: test_data.append(row) return train_data, test_data
Once we have our dataset ready, we can train the predictor by loading the data in an acceptable format. For this step, I have written a Python script called TimeSeriesPredictor.py
, which starts by importing the necessary library and modules (issue $ python3 TimeSeriesPredictor.py
command on Terminal for this script):
import numpy as np import tensorflow as tf from tensorflow.python.ops import rnn, rnn_cell import time_series_preprocessor as tsp import matplotlib.pyplot as plt
Next, we define the hyperparameters for the LSTM network (tune it accordingly):
input_dim = 1 seq_size = 5 hidden_dim = 5
We now define the weight variables (no biases) and input placeholders:
W_out = tf.get_variable("W_out", shape=[hidden_dim, 1], dtype=tf.float32, initializer=None, regularizer=None, trainable=True, collections=None) b_out = tf.get_variable("b_out", shape=[1], dtype=tf.float32, initializer=None, regularizer=None, trainable=True, collections=None) x = tf.placeholder(tf.float32, [None, seq_size, input_dim]) y = tf.placeholder(tf.float32, [None, seq_size])
The next task is to construct the LSTM network. The following method, LSTM_Model()
, takes three parameters, as follows:
x
: Inputs of size [T, batch_size, input_size]W
: A matrix of fully-connected output layer weightsb
: A vector of fully-connected output layer biasesNow let's see the signature of the method:
def LSTM_Model(): cell = rnn_cell.BasicLSTMCell(hidden_dim) outputs, states = rnn.dynamic_rnn(cell, x, dtype=tf.float32) num_examples = tf.shape(x)[0] W_repeated = tf.tile(tf.expand_dims(W_out, 0), [num_examples, 1, 1]) out = tf.matmul(outputs, W_repeated) + b_out out = tf.squeeze(out) return out
Additionally, we create three empty lists to store the training loss, test loss, and the step:
train_loss = [] test_loss = [] step_list = []
The next method, called train()
, is used to train the LSTM network:
def trainNetwork(train_x, train_y, test_x, test_y): with tf.Session() as sess: tf.get_variable_scope().reuse_variables() sess.run(tf.global_variables_initializer()) max_patience = 3 patience = max_patience min_test_err = float('inf') step = 0 while patience > 0: _, train_err = sess.run([train_op, cost], feed_dict={x: train_x, y: train_y}) if step % 100 == 0: test_err = sess.run(cost, feed_dict={x: test_x, y: test_y}) print('step: {} train err: {} test err: {}'.format(step, train_err, test_err)) train_loss.append(train_err) test_loss.append(test_err) step_list.append(step) if test_err < min_test_err: min_test_err = test_err patience = max_patience else: patience -= 1 step += 1 save_path = saver.save(sess, 'model.ckpt') print('Model saved to {}'.format(save_path))
The next task is to create the cost optimizer and instantiate training_op
:
cost = tf.reduce_mean(tf.square(LSTM_Model()- y)) train_op = tf.train.AdamOptimizer(learning_rate=0.003).minimize(cost)
Additionally, here we have an auxiliary op
called saving the model:
saver = tf.train.Saver()
Now that we have created the model, the next method, called testLSTM()
, is used to test the prediction power of the model on the test set:
def testLSTM(sess, test_x): tf.get_variable_scope().reuse_variables() saver.restore(sess, 'model.ckpt') output = sess.run(LSTM_Model(), feed_dict={x: test_x}) return output
To plot the predicted results, we have a function called plot_results()
. The signature is as follows:
def plot_results(train_x, predictions, actual, filename): plt.figure() num_train = len(train_x) plt.plot(list(range(num_train)), train_x, color='b', label='training data') plt.plot(list(range(num_train, num_train + len(predictions))), predictions, color='r', label='predicted') plt.plot(list(range(num_train, num_train + len(actual))), actual, color='g', label='test data') plt.legend() if filename is not None: plt.savefig(filename) else: plt.show()
To evaluate the model, we have a method called main()
that actually invokes the preceding methods to create and train the LSTM network. The workflow of the code is as following:
Let's see the signature of the method:
def main(): data = tsp.load_series('international-airline-passengers.csv') train_data, actual_vals = tsp.split_data(data=data, percent_train=0.75) train_x, train_y = [], [] for i in range(len(train_data) - seq_size - 1): train_x.append(np.expand_dims(train_data[i:i+seq_size], axis=1).tolist()) train_y.append(train_data[i+1:i+seq_size+1]) test_x, test_y = [], [] for i in range(len(actual_vals) - seq_size - 1): test_x.append(np.expand_dims(actual_vals[i:i+seq_size], axis=1).tolist()) test_y.append(actual_vals[i+1:i+seq_size+1]) trainNetwork(train_x, train_y, test_x, test_y) with tf.Session() as sess: predicted_vals = testLSTM(sess, test_x)[:,0] # Following prediction results of the model given ground truth values plot_results(train_data, predicted_vals, actual_vals, 'ground_truth_predition.png') prev_seq = train_x[-1] predicted_vals = [] for i in range(1000): next_seq = testLSTM(sess, [prev_seq]) predicted_vals.append(next_seq[-1]) prev_seq = np.vstack((prev_seq[1:], next_seq[-1])) # Following predictions results where only the training data was given plot_results(train_data, predicted_vals, actual_vals, 'prediction_on_train_set.png') >>>
Finally, we call the main()
method to perform the training. Once the training is completed, it further plots the prediction results of the model consisting of ground truth values versus predictions results, where only the training data was given:
>>>
The next graph shows the prediction results on the training data. This procedure has less information available, but it still did a good job of matching the trends in the data:
The following method helps us plot the training and the test error:
def plot_error(): # Plot training loss over time plt.plot(step_list, train_loss, 'r--', label='LSTM training loss per iteration', linewidth=4) plt.title('LSTM training loss per iteration') plt.xlabel('Iteration') plt.ylabel('Training loss') plt.legend(loc='upper right') plt.show() # Plot test loss over time plt.plot(step_list, test_loss, 'r--', label='LSTM test loss per iteration', linewidth=4) plt.title('LSTM test loss per iteration') plt.xlabel('Iteration') plt.ylabel('Test loss') plt.legend(loc='upper left') plt.show()
Now we call the preceding method as follows:
plot_error() >>>
We can use a time series predictor to reproduce realistic fluctuations in data. Now you can prepare your own dataset and do some other predictive analytics. The next example is about sentiment analysis from the product and movie review dataset. We will also see how to develop a more complex RNN using an LSTM network.