List of Figures

Chapter 1. Packets of thought (NLP overview)

Figure 1.1. Kinds of automata

Figure 1.2. Token sorting tray

Figure 1.3. Chatbot recirculating (recurrent) pipeline

Figure 1.4. Example layers for an NLP pipeline

Figure 1.5. 2D IQ of some natural language processing systems

Chapter 2. Build your vocabulary (word tokenization)

Figure 2.1. Tokenized phrase

Figure 2.2. Tokenized phrase

Figure 2.3. Tokenized phrase

Chapter 3. Math with words (TF-IDF vectors)

Figure 3.1. 2D vectors

Figure 3.2. 2D term frequency vectors

Figure 3.3. 2D thetas

Figure 3.4. City population distribution

Chapter 4. Finding meaning in word counts (semantic analysis)

Figure 4.1. 3D vectors for a thought experiment about six words about pets and NYC

Figure 4.2. 3D scatter plot (point cloud) of your TF-IDF vectors

Figure 4.3. Term-document matrix reconstruction accuracy decreases as you ignore more dimensions.

Figure 4.4. Looking up from below the “belly” at the point cloud for a real object

Figure 4.5. Head-to-head horse point clouds upside-down

Figure 4.6. Semantic search accuracy deteriorates at around 12-D.

Chapter 5. Baby steps with neural networks (perceptrons and backpropagation)

Figure 5.1. Neuron cell

Figure 5.2. Basic perceptron

Figure 5.3. A perceptron and a biological neuron

Equation 5.1. Threshold activation function

Figure 5.4. Linearly separable data

Figure 5.5. Nonlinearly separable data

Equation 5.2. Error between truth and prediction

Equation 5.3. Cost function you want to minimize

Figure 5.6. Neural net with hidden weights

Figure 5.7. Fully connected neural net

Equation 5.4. Sigmoid function

Equation 5.5. Mean squared error

Equation 5.6. Chain rule

Equation 5.7. Error derivative

Equation 5.8. Derivative of the previous layer

Figure 5.8. Convex error curve

Figure 5.9. Nonconvex error curve

Chapter 6. Reasoning with word vectors (Word2vec)

Figure 6.1. Geometry of Word2vec math

Equation 6.2. Compute the answer to the soccer team question

Equation 6.2. Distance between the singular and plural versions of a word

Figure 6.2. Word vectors for ten US cities projected onto a 2D map

Figure 6.3. Training input and output example for the skip-gram approach

Equation 6.3. Example 3D vector

Equation 6.4. Example 3D vector after softmax

Figure 6.4. Network example for the skip-gram training

Figure 6.5. Conversion of one-hot vector to word vector

Figure 6.6. Training input and output example for the CBOW approach

Figure 6.7. CBOW Word2vec network

Equation 6.5. Bigram scoring function

Equation 6.6. Subsampling probability in Mikolov’s Word2vec paper

Equation 6.7. Subsampling probability in Mikolov’s Word2vec code

Figure 6.8. Google News Word2vec 300-D vectors projected onto a 2D map using PCA

Figure 6.9. Decoder rings (left: Hubert Berberich (HubiB) (https://commons.wikimedia.org/wiki/File:CipherDisk2000.jpg), CipherDisk2000, marked as public domain, more details on Wikimedia Commons: https://commons.wikimedia.org/wiki/Template:PD-self; middle: Cory Doctorow (https://www.flickr.com/photos/doctorow/2817314740/in/photostream/), Crypto wedding-ring 2, https://creativecommons.org/licenses/by-sa/2.0/legalcode; right: Sobebunny (https://commons.wikimedia.org/wiki/File:Captain-midnight-decoder.jpg), Captain-midnight-decoder, https://creativecommons.org/licenses/by-sa/3.0/legalcode)

Figure 6.10. Doc2vec training uses an additional document vector as input.

Chapter 7. Getting words in order with convolutional neural networks (CNNs)

Figure 7.1. Fully connected neural net

Figure 7.2. Window convolving over function

Figure 7.3. Small telephone pole image

Figure 7.4. Pixel values for the telephone pole image

Figure 7.5. Convolutional neural net step

Figure 7.6. Convolution

Figure Sum of the gradients for a filter weight

Figure 7.7. 1D convolution

Figure 7.8. 1D convolution with embeddings

Figure 7.9. Pooling layers

Chapter 8. Loopy (recurrent) neural networks (RNNs)

Figure 8.1. 1D convolution with embeddings

Figure 8.2. Text into a feedforward network

Figure 8.3. Recurrent neural net

Figure 8.4. Unrolled recurrent neural net

Figure 8.5. Detailed recurrent neural net at time step t = 0

Figure 8.6. Detailed recurrent neural net at time step t = 1

Figure 8.7. Data into convolutional network

Figure 8.8. Data fed into a recurrent network

Figure 8.9. Only last output matters here

Figure 8.10. Backpropagation through time

Figure 8.11. All outputs matter here

Figure 8.12. Multiple outputs and backpropagation through time

Figure 8.13. Bidirectional recurrent neural net

Chapter 9. Improving retention with long short-term memory networks

Figure 9.1. LSTM network and its memory

Figure 9.2. Unrolled LSTM network and its memory

Figure 9.3. LSTM layer at time step t

Figure 9.4. LSTM layer inputs

Figure 9.5. First stop—the forget gate

Figure 9.6. Forget gate

Figure 9.7. Forget gate application

Figure 9.8. Candidate gate

Figure 9.9. Update/output gate

Figure 9.10. Next word prediction

Figure 9.11. Next character prediction

Figure 9.12. Last character prediction only

Figure 9.13. Stacked LSTM

Chapter 10. Sequence-to-sequence models and attention

Figure 10.1. Limitations of language modeling

Figure 10.2. Encoder-decoder sandwich with thought vector meat

Figure 10.3. Unrolled encoder-decoder

Figure 10.4. Next word prediction

Figure 10.5. Input and target sequence before preprocessing

Figure 10.6. Input and target sequence after preprocessing

Figure 10.7. Thought encoder

Figure 10.8. LSTM states used in the sequence-to-sequence encoder

Figure 10.9. Thought decoder

Figure 10.10. Bucketing applied to target sequences

Figure 10.11. Overview of the attention mechanism

Chapter 11. Information extraction (named entity extraction and question answering)

Figure 11.1. Stanislav knowledge graph

Figure 11.2. The Pascagoula people

Chapter 12. Getting chatty (dialog engines)

Figure 12.1. Chatbot techniques used for some example applications

Figure 12.2. Managing state (context)

Figure 12.3. Advantages and disadvantages of four chatbot approaches

Chapter 13. Scaling up (optimization, parallelization, and batch processing)

Figure 13.1. Comparison between a CPU and GPU

Figure 13.2. Matrix multiplication where each row multiplication can be parallelized on a GPU

Figure 13.3. Loading the training data without a generator function

Figure 13.4. Loading the training data with a generator function

Figure 13.5. Visualize word2vec embeddings with Tensorboard.

Appendix C. Vectors and matrices (linear algebra fundamentals)

Figure C.1. Measuring Euclidean distance

Appendix D. Machine learning tools and techniques

Figure D.1. Overfit on training samples

Figure D.2. Underfit on training samples

Figure D.3. K-fold cross-validation

Figure L1 regularization

Figure L2 regularization

Figure D.4. The entries in the leftmost column are examples from the original MNIST; the other columns are all affine transformations of the data included in affNIST

Appendix E. Setting up your AWS GPU

Figure E.1. AWS Management Console

Figure E.2. Creating a new AWS instance

Figure E.3. Selecting an AWS Machine Image

Figure E.4. Cost overview for the machine image and the available instance types in your AWS region

Figure E.5. Choosing your instance type

Figure E.6. Adding storage to your instance

Figure E.7. Adding persistent storage to your instance

Figure E.8. Reviewing your instance setup before launching

Figure E.9. Creating a new instance key (or downloading an existing one)

Figure E.10. AWS launch confirmation

Figure E.11. EC2 Dashboard showing the newly created instance

Figure E.12. Confirmation request to exchange ssh credentials

Figure E.13. Welcome screen after a successful login

Figure E.14. Activating your pre-installed Keras environment

Figure E.15. AWS Billing Dashboard

Figure E.16. AWS Budget Console

Appendix F. Locality sensitive hashing

Figure F.1. Semantic search with LSHash

Figure F.2. Scatter matrix of four topics for tweets

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset