Chapter 1. Packets of thought (NLP overview)
Figure 1.2. Token sorting tray
Figure 1.3. Chatbot recirculating (recurrent) pipeline
Figure 1.4. Example layers for an NLP pipeline
Figure 1.5. 2D IQ of some natural language processing systems
Chapter 2. Build your vocabulary (word tokenization)
Chapter 3. Math with words (TF-IDF vectors)
Chapter 4. Finding meaning in word counts (semantic analysis)
Figure 4.1. 3D vectors for a thought experiment about six words about pets and NYC
Figure 4.2. 3D scatter plot (point cloud) of your TF-IDF vectors
Figure 4.3. Term-document matrix reconstruction accuracy decreases as you ignore more dimensions.
Figure 4.4. Looking up from below the “belly” at the point cloud for a real object
Figure 4.5. Head-to-head horse point clouds upside-down
Figure 4.6. Semantic search accuracy deteriorates at around 12-D.
Chapter 5. Baby steps with neural networks (perceptrons and backpropagation)
Figure 5.3. A perceptron and a biological neuron
Equation 5.1. Threshold activation function
Figure 5.4. Linearly separable data
Figure 5.5. Nonlinearly separable data
Equation 5.2. Error between truth and prediction
Equation 5.3. Cost function you want to minimize
Figure 5.6. Neural net with hidden weights
Figure 5.7. Fully connected neural net
Equation 5.4. Sigmoid function
Equation 5.5. Mean squared error
Equation 5.7. Error derivative
Equation 5.8. Derivative of the previous layer
Chapter 6. Reasoning with word vectors (Word2vec)
Figure 6.1. Geometry of Word2vec math
Equation 6.2. Compute the answer to the soccer team question
Equation 6.2. Distance between the singular and plural versions of a word
Figure 6.2. Word vectors for ten US cities projected onto a 2D map
Figure 6.3. Training input and output example for the skip-gram approach
Equation 6.3. Example 3D vector
Equation 6.4. Example 3D vector after softmax
Figure 6.4. Network example for the skip-gram training
Figure 6.5. Conversion of one-hot vector to word vector
Figure 6.6. Training input and output example for the CBOW approach
Figure 6.7. CBOW Word2vec network
Equation 6.5. Bigram scoring function
Equation 6.6. Subsampling probability in Mikolov’s Word2vec paper
Equation 6.7. Subsampling probability in Mikolov’s Word2vec code
Figure 6.8. Google News Word2vec 300-D vectors projected onto a 2D map using PCA
Figure 6.10. Doc2vec training uses an additional document vector as input.
Chapter 7. Getting words in order with convolutional neural networks (CNNs)
Figure 7.1. Fully connected neural net
Figure 7.2. Window convolving over function
Figure 7.3. Small telephone pole image
Figure 7.4. Pixel values for the telephone pole image
Figure 7.5. Convolutional neural net step
Figure Sum of the gradients for a filter weight
Chapter 8. Loopy (recurrent) neural networks (RNNs)
Figure 8.1. 1D convolution with embeddings
Figure 8.2. Text into a feedforward network
Figure 8.3. Recurrent neural net
Figure 8.4. Unrolled recurrent neural net
Figure 8.5. Detailed recurrent neural net at time step t = 0
Figure 8.6. Detailed recurrent neural net at time step t = 1
Figure 8.7. Data into convolutional network
Figure 8.8. Data fed into a recurrent network
Figure 8.9. Only last output matters here
Figure 8.10. Backpropagation through time
Figure 8.11. All outputs matter here
Figure 8.12. Multiple outputs and backpropagation through time
Chapter 9. Improving retention with long short-term memory networks
Figure 9.1. LSTM network and its memory
Figure 9.2. Unrolled LSTM network and its memory
Figure 9.3. LSTM layer at time step t
Figure 9.5. First stop—the forget gate
Figure 9.7. Forget gate application
Figure 9.9. Update/output gate
Figure 9.10. Next word prediction
Figure 9.11. Next character prediction
Chapter 10. Sequence-to-sequence models and attention
Figure 10.1. Limitations of language modeling
Figure 10.2. Encoder-decoder sandwich with thought vector meat
Figure 10.3. Unrolled encoder-decoder
Figure 10.4. Next word prediction
Figure 10.5. Input and target sequence before preprocessing
Figure 10.6. Input and target sequence after preprocessing
Figure 10.8. LSTM states used in the sequence-to-sequence encoder
Chapter 11. Information extraction (named entity extraction and question answering)
Chapter 12. Getting chatty (dialog engines)
Figure 12.1. Chatbot techniques used for some example applications
Figure 12.2. Managing state (context)
Figure 12.3. Advantages and disadvantages of four chatbot approaches
Chapter 13. Scaling up (optimization, parallelization, and batch processing)
Figure 13.1. Comparison between a CPU and GPU
Figure 13.2. Matrix multiplication where each row multiplication can be parallelized on a GPU
Figure 13.3. Loading the training data without a generator function
Figure 13.4. Loading the training data with a generator function
Figure 13.5. Visualize word2vec embeddings with Tensorboard.
Appendix C. Vectors and matrices (linear algebra fundamentals)
Appendix D. Machine learning tools and techniques
Figure D.1. Overfit on training samples
Figure D.2. Underfit on training samples
Appendix E. Setting up your AWS GPU
Figure E.1. AWS Management Console
Figure E.2. Creating a new AWS instance
Figure E.3. Selecting an AWS Machine Image
Figure E.4. Cost overview for the machine image and the available instance types in your AWS region
Figure E.5. Choosing your instance type
Figure E.6. Adding storage to your instance
Figure E.7. Adding persistent storage to your instance
Figure E.8. Reviewing your instance setup before launching
Figure E.9. Creating a new instance key (or downloading an existing one)
Figure E.10. AWS launch confirmation
Figure E.11. EC2 Dashboard showing the newly created instance
Figure E.12. Confirmation request to exchange ssh credentials
Figure E.13. Welcome screen after a successful login
Figure E.14. Activating your pre-installed Keras environment
Appendix F. Locality sensitive hashing