Breaking down the code

As we progress through the book, we will begin to only focus on important sections of code, sections that help us understand a concept or how a method is implemented. This will make it more important for you to open up the code and at least pursue it on your own. In the next exercise, we take a look at the important sections of the sample code:

Open Chapter_4_1.py and scroll down to the comment Vectorize the data, as follows:

# Vectorize the data.
input_texts = []
target_texts = []
input_characters = set()
target_characters = set()
with open(data_path, 'r', encoding='utf-8') as f:
    lines = f.read().split('
')
for line in lines[: min(num_samples, len(lines) - 1)]:
    input_text, target_text = line.split('	')
    # We use "tab" as the "start sequence" character
    # for the targets, and "
" as "end sequence" character.
    target_text = '	' + target_text + '
'
    input_texts.append(input_text)
    target_texts.append(target_text)
    for char in input_text:
        if char not in input_characters:
            input_characters.add(char)
    for char in target_text:
        if char not in target_characters:
            target_characters.add(char)

input_characters = sorted(list(input_characters))
target_characters = sorted(list(target_characters))
num_encoder_tokens = len(input_characters)
num_decoder_tokens = len(target_characters)
max_encoder_seq_length = max([len(txt) for txt in input_texts])
max_decoder_seq_length = max([len(txt) for txt in target_texts])

print('Number of samples:', len(input_texts))
print('Number of unique input tokens:', num_encoder_tokens)
print('Number of unique output tokens:', num_decoder_tokens)
print('Max sequence length for inputs:', max_encoder_seq_length)
print('Max sequence length for outputs:', max_decoder_seq_length)

This section of code inputs the training data and encodes it into the character sequences it uses to vectorize. Note how the num_encoder_tokens and num_decoder_tokens parameters being set here are dependent on the number of characters in each set and not the number of samples. Finally, the maximum length of the encoding and decoding sequences are set on the maximum length of the encoded characters in both.
Next, we want to take a look at the vectorization of the input data. Vectorization of the data reduces the number of characters for each response match and is also the memory-intensive part, except, when we align this data, we want to keep the responses or targets to be one step ahead of the original input. This subtle difference allows our sequence-learning LSTM layers to predict the next patterns in the sequence. A diagram of how this works follows:

Sequence-to-sequence model

In the diagram, we can see how the start of the text HELLO is being translated one step behind the response phrase SALUT (hello in French). Pay attention to how this works in the preceding code.

We then build the layers that will map to our network model with the code as follows:

# Define an input sequence and process it.
encoder_inputs = Input(shape=(None, num_encoder_tokens))
encoder = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]

# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None, num_decoder_tokens))
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs,
                                     initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

# Run training
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
model.fit([encoder_input_data, decoder_input_data], decoder_target_data,
          batch_size=batch_size,
          epochs=epochs,
          validation_split=0.2)
# Save model
model.save('s2s.h5')

Note how we are creating encoder and decoder inputs along with decoder outputs. This code builds and trains the model and then saves it for later use in inference. We use the term inference to mean that a model is inferring or generating an answer or response to some input. A diagram of this sequence-to-sequence model in layer architecture follows:

Encoder/decoder inference model

This model is quite complex and there is a lot going on here. We have just covered the first part of the model. Next, we need to cover the building of the thought vector and generating the sampling models. The final code to do this follows:

encoder_model = Model(encoder_inputs, encoder_states)

decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(
    decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs] + decoder_states)

# Reverse-lookup token index to decode sequences back to
# something readable.
reverse_input_char_index = dict(
    (i, char) for char, i in input_token_index.items())
reverse_target_char_index = dict(
    (i, char) for char, i in target_token_index.items())

Look over this code and see whether you can understand the structure. We are still missing a critical piece of the puzzle and we will cover that in the next section.

Table of Contents for Breaking down the code

Create new playlist

Sign In

Sign Up

Table of Contents for
Breaking down the code