How to do it...

Here is how we proceed with the recipe:

Clone the following GitHub repository. This is a project to encourage users to try and experiment with the seq2seq neural network architecture:

git clone https://github.com/guillaume-chevalier/seq2seq-signal-prediction.git

Given the preceding repository, consider the following functions which load and normalize the Bitcoin historical data for the USD or EUR Bitcoin value. The functions are defined in dataset.py. Training and testing data are separated according to the 80/20 rule. As a consequence, 20 percent of the testing data is the most recent historical Bitcoin values. Every example contains 40 data points of USD and then EUR data in the feature axis/dimension. Data is normalized according to the mean and standard deviation. The function generate_x_y_data_v4 generates random samples of the training data (respectively, the testing data) of size batch_size:

def loadCurrency(curr, window_size):
   """
   Return the historical data for the USD or EUR bitcoin value. Is done with an web API call.
   curr = "USD" | "EUR"
   """
   # For more info on the URL call, it is inspired by :
   # https://github.com/Levino/coindesk-api-node
   r = requests.get(
       "http://api.coindesk.com/v1/bpi/historical/close.json?start=2010-07-17&end=2017-03-03&currency={}".format(
           curr
       )
   )
   data = r.json()
   time_to_values = sorted(data["bpi"].items())
   values = [val for key, val in time_to_values]
   kept_values = values[1000:]
   X = []
   Y = []
   for i in range(len(kept_values) - window_size * 2):
       X.append(kept_values[i:i + window_size])
       Y.append(kept_values[i + window_size:i + window_size * 2])
   # To be able to concat on inner dimension later on:
   X = np.expand_dims(X, axis=2)
   Y = np.expand_dims(Y, axis=2)
   return X, Y
def normalize(X, Y=None):
   """
   Normalise X and Y according to the mean and standard
deviation of the X values only.
   """
   # # It would be possible to normalize with last rather than mean, such as:
   # lasts = np.expand_dims(X[:, -1, :], axis=1)
   # assert (lasts[:, :] == X[:, -1, :]).all(), "{}, {}, {}. {}".format(lasts[:, :].shape, X[:, -1, :].shape, lasts[:, :], X[:, -1, :])
   mean = np.expand_dims(np.average(X, axis=1) + 0.00001, axis=1)
   stddev = np.expand_dims(np.std(X, axis=1) + 0.00001, axis=1)
   # print (mean.shape, stddev.shape)
   # print (X.shape, Y.shape)
   X = X - mean
   X = X / (2.5 * stddev)
   if Y is not None:
       assert Y.shape == X.shape, (Y.shape, X.shape)
       Y = Y - mean
       Y = Y / (2.5 * stddev)
       return X, Y
   return X

def fetch_batch_size_random(X, Y, batch_size):
   """
   Returns randomly an aligned batch_size of X and Y among all examples.
   The external dimension of X and Y must be the batch size
(eg: 1 column = 1 example).
   X and Y can be N-dimensional.
   """
   assert X.shape == Y.shape, (X.shape, Y.shape)
   idxes = np.random.randint(X.shape[0], size=batch_size)
   X_out = np.array(X[idxes]).transpose((1, 0, 2))
   Y_out = np.array(Y[idxes]).transpose((1, 0, 2))
   return X_out, Y_out
X_train = []
Y_train = []
X_test = []
Y_test = []

def generate_x_y_data_v4(isTrain, batch_size):
   """
   Return financial data for the bitcoin.
   Features are USD and EUR, in the internal dimension.
   We normalize X and Y data according to the X only to not
   spoil the predictions we ask for.
   For every window (window or seq_length), Y is the prediction following X.
   Train and test data are separated according to the 80/20
rule.
   Therefore, the 20 percent of the test data are the most
   recent historical bitcoin values. Every example in X contains
   40 points of USD and then EUR data in the feature axis/dimension.
   It is to be noted that the returned X and Y has the same shape
   and are in a tuple.
   """
   # 40 pas values for encoder, 40 after for decoder's predictions.
   seq_length = 40
   global Y_train
   global X_train
   global X_test
   global Y_test
   # First load, with memoization:
   if len(Y_test) == 0:
       # API call:
       X_usd, Y_usd = loadCurrency("USD",
window_size=seq_length)
       X_eur, Y_eur = loadCurrency("EUR",
window_size=seq_length)
       # All data, aligned:
       X = np.concatenate((X_usd, X_eur), axis=2)
       Y = np.concatenate((Y_usd, Y_eur), axis=2)
       X, Y = normalize(X, Y)
       # Split 80-20:
       X_train = X[:int(len(X) * 0.8)]
       Y_train = Y[:int(len(Y) * 0.8)]
       X_test = X[int(len(X) * 0.8):]
       Y_test = Y[int(len(Y) * 0.8):]
   if isTrain:
       return fetch_batch_size_random(X_train, Y_train, batch_size)
   else:
       return fetch_batch_size_random(X_test,  Y_test,  batch_size)

Generate the training, validation, and testing data, and define a number of hyperparameters such as batch_size, hidden_dim (the number of hidden neurons in the RNN), and layers_stacked_count (the number of stacked recurrent cells). In addition, define a number of parameters for fine-tuning the optimizer, such as the optimizer's learning rate, number of iterations, lr_decay for the optimizer's simulated annealing, the optimizer's momentum, and L2 regularization for avoiding overfitting. Note that the GitHub repository has the default batch_size = 5 and nb_iters = 150 but I've obtained better results with batch_size = 1000 and nb_iters = 100000:

from datasets import generate_x_y_data_v4
generate_x_y_data = generate_x_y_data_v4
import tensorflow as tf  
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
sample_x, sample_y = generate_x_y_data(isTrain=True, batch_size=3)
print("Dimensions of the dataset for 3 X and 3 Y training
examples : ")
print(sample_x.shape)
print(sample_y.shape)
print("(seq_length, batch_size, output_dim)")
print sample_x, sample_y
# Internal neural network parameters
seq_length = sample_x.shape[0]  # Time series will have the same past and future (to be predicted) lenght.
batch_size = 5  # Low value used for live demo purposes - 100 and 1000 would be possible too, crank that up!
output_dim = input_dim = sample_x.shape[-1]  # Output dimension (e.g.: multiple signals at once, tied in time)
hidden_dim = 12  # Count of hidden neurons in the recurrent units.
layers_stacked_count = 2  # Number of stacked recurrent cells, on the neural depth axis.
# Optmizer:
learning_rate = 0.007  # Small lr helps not to diverge during training.
nb_iters = 150  # How many times we perform a training step (therefore how many times we show a batch).
lr_decay = 0.92  # default: 0.9 . Simulated annealing.
momentum = 0.5  # default: 0.0 . Momentum technique in weights update
lambda_l2_reg = 0.003  # L2 regularization of weights - avoids overfitting

Define the network as an encoder-decoder made up of basic GRU cells. The network is made up of layers_stacked_count=2 RNNs, and we will visualize the network with TensorBoard. Note that hidden_dim = 12 are the hidden neurons in the recurrent units:

tf.nn.seq2seq = tf.contrib.legacy_seq2seq
tf.nn.rnn_cell = tf.contrib.rnn
tf.nn.rnn_cell.GRUCell = tf.contrib.rnn.GRUCell
tf.reset_default_graph()
# sess.close()
sess = tf.InteractiveSession()
with tf.variable_scope('Seq2seq'):
   # Encoder: inputs
   enc_inp = [
       tf.placeholder(tf.float32, shape=(None, input_dim), name="inp_{}".format(t))
          for t in range(seq_length)
   ]
   # Decoder: expected outputs
   expected_sparse_output = [
       tf.placeholder(tf.float32, shape=(None, output_dim), name="expected_sparse_output_".format(t))
         for t in range(seq_length)
   ]
   
   # Give a "GO" token to the decoder.
   # You might want to revise what is the appended value "+ enc_inp[:-1]".
   dec_inp = [ tf.zeros_like(enc_inp[0], dtype=np.float32, name="GO") ] + enc_inp[:-1]
   # Create a `layers_stacked_count` of stacked RNNs (GRU cells here).
   cells = []
   for i in range(layers_stacked_count):
       with tf.variable_scope('RNN_{}'.format(i)):
           cells.append(tf.nn.rnn_cell.GRUCell(hidden_dim))
           # cells.append(tf.nn.rnn_cell.BasicLSTMCell(...))
   cell = tf.nn.rnn_cell.MultiRNNCell(cells)   
   # For reshaping the input and output dimensions of the seq2seq RNN:
   w_in = tf.Variable(tf.random_normal([input_dim, hidden_dim]))
   b_in = tf.Variable(tf.random_normal([hidden_dim], mean=1.0))
   w_out = tf.Variable(tf.random_normal([hidden_dim, output_dim]))
   b_out = tf.Variable(tf.random_normal([output_dim]))   
reshaped_inputs = [tf.nn.relu(tf.matmul(i, w_in) + b_in) for i in enc_inp]   
# Here, the encoder and the decoder uses the same cell, HOWEVER,
   # the weights aren't shared among the encoder and decoder, we have two
   # sets of weights created under the hood according to that function's def.

   dec_outputs, dec_memory = tf.nn.seq2seq.basic_rnn_seq2seq(
       enc_inp,
       dec_inp,
       cell
   )   

output_scale_factor = tf.Variable(1.0, name="Output_ScaleFactor")
   # Final outputs: with linear rescaling similar to batch norm,
   # but without the "norm" part of batch normalization hehe.
   reshaped_outputs = [output_scale_factor*(tf.matmul(i, w_out) + b_out) for i in dec_outputs]  
   # Merge all the summaries and write them out to /tmp/bitcoin_logs (by default)
   merged = tf.summary.merge_all()
   train_writer = tf.summary.FileWriter('/tmp/bitcoin_logs',                                     sess.graph)

Now let's run the TensorBoard and visualize the network consisting of an RNN encoder and an RNN decoder:

tensorboard --logdir=/tmp/bitcoin_logs

The following is the flow of the code:

An example of code for Bitcoin value prediction as seen in Tensorboard

Now let's define the loss function as an L2 loss with regularization to avoid overfitting and to have a better generalization. The chosen optimizer is RMSprop with the value of learning_rate, decay, and momentum as defined at step 3:

# Training loss and optimizer
with tf.variable_scope('Loss'):
   # L2 loss
   output_loss = 0
   for _y, _Y in zip(reshaped_outputs, expected_sparse_output):
       output_loss += tf.reduce_mean(tf.nn.l2_loss(_y - _Y))       
  # L2 regularization (to avoid overfitting and to have a  better generalization capacity)
   reg_loss = 0
   for tf_var in tf.trainable_variables():
       if not ("Bias" in tf_var.name or "Output_" in tf_var.name):
           reg_loss += tf.reduce_mean(tf.nn.l2_loss(tf_var))
           
   loss = output_loss + lambda_l2_reg * reg_loss

with tf.variable_scope('Optimizer'):
   optimizer = tf.train.RMSPropOptimizer(learning_rate, decay=lr_decay, momentum=momentum)
   train_op = optimizer.minimize(loss)

Prepare for training in batch by generating the training data and by running the optimizer on batch_size examples from the dataset. Similarly, prepare for test by generating test data on batch_size examples from the dataset. Training runs for nb_iters+1 iterations, and one iteration out of ten is used to test the results:

def train_batch(batch_size):
   """
   Training step that optimizes the weights
   provided some batch_size X and Y examples from the dataset.
   """
   X, Y = generate_x_y_data(isTrain=True, batch_size=batch_size)
   feed_dict = {enc_inp[t]: X[t] for t in range(len(enc_inp))}
   feed_dict.update({expected_sparse_output[t]: Y[t] for t in range(len(expected_sparse_output))})
   _, loss_t = sess.run([train_op, loss], feed_dict)
   return loss_t

def test_batch(batch_size):
   """
   Test step, does NOT optimizes. Weights are frozen by not
   doing sess.run on the train_op.
   """
   X, Y = generate_x_y_data(isTrain=False, batch_size=batch_size)
   feed_dict = {enc_inp[t]: X[t] for t in range(len(enc_inp))}
   feed_dict.update({expected_sparse_output[t]: Y[t] for t in range(len(expected_sparse_output))})
   loss_t = sess.run([loss], feed_dict)
   return loss_t[0]

# Training
train_losses = []
test_losses = []
sess.run(tf.global_variables_initializer())

for t in range(nb_iters+1):
   train_loss = train_batch(batch_size)
   train_losses.append(train_loss)   
   if t % 10 == 0:
       # Tester
       test_loss = test_batch(batch_size)
       test_losses.append(test_loss)
       print("Step {}/{}, train loss: {}, 	TEST loss: {}".format(t, nb_iters, train_loss, test_loss))
print("Fin. train loss: {}, 	TEST loss: {}".format(train_loss, test_loss))

Visualize the n_predictions results. We will visualize nb_predictions = 5 predictions in yellow and the actual value ix in blue with an x. Note that the prediction starts with the last blue dot in the histogram and, visually, you can observe that even this simple model is pretty accurate:

# Test
nb_predictions = 5
print("Let's visualize {} predictions with our signals:".format(nb_predictions))
X, Y = generate_x_y_data(isTrain=False, batch_size=nb_predictions)
feed_dict = {enc_inp[t]: X[t] for t in range(seq_length)}
outputs = np.array(sess.run([reshaped_outputs], feed_dict)[0])
for j in range(nb_predictions):
   plt.figure(figsize=(12, 3))   
   for k in range(output_dim):
       past = X[:,j,k]
       expected = Y[:,j,k]
       pred = outputs[:,j,k]       
       label1 = "Seen (past) values" if k==0 else "_nolegend_"
       label2 = "True future values" if k==0 else "_nolegend_"
       label3 = "Predictions" if k==0 else "_nolegend_"
       plt.plot(range(len(past)), past, "o--b", label=label1)
       plt.plot(range(len(past), len(expected)+len(past)), expected, "x--b", label=label2)
       plt.plot(range(len(past), len(pred)+len(past)), pred, "o--y", label=label3)   
   plt.legend(loc='best')
   plt.title("Predictions v.s. true values")
   plt.show()

We get the results as follows:

An example of bitcoin value prediction

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...