List of Listings

Chapter 2. Build your vocabulary (word tokenization)

Listing 2.1. Example Monticello sentence split into tokens

Listing 2.2. One-hot vector sequence for the Monticello sentence

Listing 2.3. Prettier one-hot vectors

Listing 2.4. Construct a DataFrame of bag-of-words vectors

Listing 2.5. Example dot product calculation

Listing 2.6. Overlap of word counts for two bag-of-words vectors

Listing 2.7. Tokenize the Monticello sentence with a regular expression

Listing 2.8. NLTK list of stop words

Listing 2.9. NLTK list of stop words

Chapter 3. Math with words (TF-IDF vectors)

Listing 3.1. Compute cosine similarity in python

Chapter 4. Finding meaning in word counts (semantic analysis)

Listing 4.1. The SMS spam dataset

Listing 4.2. Topic-word matrix for LSA on 16 short sentences about cats, dogs, and NYC

Listing 4.3. Umxp

Listing 4.4. Spxp

Listing 4.5. VpxnT

Listing 4.6. Term-document matrix reconstruction error

Listing 4.7. Pairwise distances available in sklearn

Chapter 5. Baby steps with neural networks (perceptrons and backpropagation)

Listing 5.1. OR problem setup

Listing 5.2. Perceptron random guessing

Listing 5.3. Perceptron learning

Listing 5.4. XOR Keras network

Listing 5.5. Fit model to the XOR training set

Listing 5.6. Save the trained model

Chapter 6. Reasoning with word vectors (Word2vec)

Listing 6.1. Compute nessvector

Listing 6.2. Parameters to control Word2vec model training

Listing 6.3. Instantiating a Word2vec model

Listing 6.4. Loading a saved Word2vec model

Listing 6.5. Load a pretrained Word2vec model using nlpia

Listing 6.6. Examine Word2vec vocabulary frequencies

Listing 6.7. Distance between “Illinois” and “Illini”

Listing 6.8. Some US city data

Listing 6.9. Some US state data

Listing 6.10. Augment city word vectors with US state word vectors

Listing 6.11. Bubble chart of US cities

Listing 6.12. Bubble plot of US city word vectors

Listing 6.13. Train your own document and word vectors

Chapter 7. Getting words in order with convolutional neural networks (CNNs)

Listing 7.1. Keras network with one convolution layer

Listing 7.2. Import your Keras convolution tools

Listing 7.3. Preprocessor to load your documents

Listing 7.4. Vectorizer and tokenizer

Listing 7.5. Target labels

Listing 7.6. Train/test split

Listing 7.7. CNN parameters

Listing 7.8. Padding and truncating your token sequence

Listing 7.9. Gathering your augmented and truncated data

Listing 7.10. Construct a 1D CNN

Listing 7.11. Fully connected layer with dropout

Listing 7.12. Funnel

Listing 7.13. Compile the CNN

Listing 7.14. Output layer for categorical variable (word)

Listing 7.15. Training a CNN

Listing 7.16. Save your hard work

Listing 7.17. Loading a saved model

Listing 7.18. Test example

Listing 7.19. Prediction

Chapter 8. Loopy (recurrent) neural networks (RNNs)

Listing 8.1. Import all the things

Listing 8.2. Data preprocessor

Listing 8.3. Data tokenizer + vectorizer

Listing 8.4. Target unzipper

Listing 8.5. Load and prepare your data

Listing 8.6. Initialize your network parameters

Listing 8.7. Load your test and training data

Listing 8.8. Initialize an empty Keras network

Listing 8.9. Add a recurrent layer

Listing 8.10. Add a dropout layer

Listing 8.11. Compile your recurrent network

Listing 8.12. Train and save your model

Listing 8.13. Model parameters

Listing 8.14. Build a larger network

Listing 8.15. Train your larger network

Listing 8.16. Crummy weather sentiment

Listing 8.17. Build a Bidirectional recurrent network

Chapter 9. Improving retention with long short-term memory networks

Listing 9.1. LSTM layer in Keras

Listing 9.2. Load and prepare the IMDB data

Listing 9.3. Build a Keras LSTM network

Listing 9.4. Fit your LSTM model

Listing 9.5. Save it for later

Listing 9.6. Reload your LSTM model

Listing 9.7. Use the model to predict on a sample

Listing 9.8. Optimize the thought vector size

Listing 9.9. Optimize LSTM hyperparameters

Listing 9.10. A more optimally sized LSTM

Listing 9.11. Train a smaller LSTM

Listing 9.12. Prepare the data

Listing 9.13. Calculate the average sample length

Listing 9.14. Prepare the strings for a character-based model

Listing 9.15. Pad and truncated characters

Listing 9.16. Character-based model “vocabulary”

Listing 9.17. One-hot encoder for characters

Listing 9.18. Load and preprocess the IMDB data

Listing 9.19. Split dataset for training (80%) and testing (20%)

Listing 9.20. Build a character-based LSTM

Listing 9.21. Train a character-based LSTM

Listing 9.22. And save it for later

Listing 9.23. Import the Project Gutenberg dataset

Listing 9.24. Preprocess Shakespeare plays

Listing 9.25. Assemble a training set

Listing 9.26. One-hot encode the training examples

Listing 9.27. Assemble a character-based LSTM model for generating text

Listing 9.28. Train your Shakespearean chatbot

Listing 9.29. Sampler to generate character sequences

Listing 9.30. Generate three texts with three diversity levels

Listing 9.31. Gated recurrent units in Keras

Listing 9.32. Two LSTM layers

Chapter 10. Sequence-to-sequence models and attention

Listing 10.1. Thought encoder in Keras

Listing 10.2. Thought decoder in Keras

Listing 10.3. Keras functional API (Model())

Listing 10.4. Train a sequence-to-sequence model in Keras

Listing 10.5. Decoder for generating text using the generic Keras Model

Listing 10.6. Sequence generator for random thoughts

Listing 10.7. Simple decoder—next word prediction

Listing 10.8. Build character sequence-to-sequence training set

Listing 10.9. Character sequence-to-sequence model parameters

Listing 10.10. Construct character sequence encoder-decoder training set

Listing 10.11. Construct and train a character sequence encoder-decoder network

Listing 10.12. Construct response generator model

Listing 10.13. Build a character-based translator

Chapter 11. Information extraction (named entity extraction and question answering)

Listing 11.1. Pattern hardcoded in Python

Listing 11.2. Brittle pattern-matching example

Listing 11.3. Regular expression for GPS coordinates

Listing 11.4. Regular expression for US dates

Listing 11.5. Structuring extracted dates

Listing 11.6. Basic context maintenance

Listing 11.7. Regular expression for European dates

Listing 11.8. Recognizing years

Listing 11.9. Recognizing month words with regular expressions

Listing 11.10. Combining information extraction regular expressions

Listing 11.11. Validating dates

Listing 11.12. POS tagging with spaCy

Listing 11.13. Visualize a dependency tree

Listing 11.14. Helper functions for spaCy tagged strings

Listing 11.15. Example spaCy POS pattern

Listing 11.16. Creating a POS pattern matcher with spaCy

Listing 11.17. Using a POS pattern matcher

Listing 11.18. Combining multiple patterns for a more robust pattern matcher

Chapter 12. Getting chatty (dialog engines)

Listing 12.1. nlpia/book/examples/greeting.v2.aiml

Listing 12.2. nlpia/nlpia/data/greeting_step1.aiml

Listing 12.3. nlpia/book/examples/ch12.py

Listing 12.4. nlpia/nlpia/book/examples/ch12.py

Listing 12.5. nlpia/data/greeting_step2.aiml

Listing 12.6. nlpia/nlpia/book/examples/ch12.py

Listing 12.7. nlpia/nlpia/data/greeting_step3.aiml

Listing 12.8. nlpia/nlpia/book/examples/ch12.py

Listing 12.9. ch12_retrieval.py

Listing 12.10. ch12_retrieval.py

Listing 12.11. ch12_retrieval.py

Listing 12.12. ch12_retrieval.py

Listing 12.13. ch12_retrieval.py

Listing 12.14. ch12_retrieval.py

Listing 12.15. ch12_retrieval.py

Listing 12.16. ch12_retrieval.py

Listing 12.17. ch12_chatterbot.sql

Chapter 13. Scaling up (optimization, parallelization, and batch processing)

Listing 13.1. Load word2vec vectors

Listing 13.2. Initialize 300D AnnoyIndex

Listing 13.3. Add each word vector to the AnnoyIndex

Listing 13.4. Build Euclidean distance index with 15 trees

Listing 13.5. Find Harry_Potter neighbors with AnnoyIndex

Listing 13.6. Top Harry_Potter neighbors with gensim.KeyedVectors index

Listing 13.7. Build a cosine distance index

Listing 13.8. Build a cosine distance index

Listing 13.9. Harry_Potter neighbors in a cosine distance world

Listing 13.10. Search results accuracy for top 10

Listing 13.11. MinMaxScaler for low-dimensional vectors

Listing 13.12. Error message if your training data exceeds the GPU’s memory

Listing 13.13. Generator for improved RAM efficiency

Listing 13.14. Convert an embedding into a TensorBoard projection

Appendix A. Your NLP tools

Listing A.1. Install Anaconda3

Listing A.2. Install nlpia source with conda

Listing A.3. Install developer tools with apt

Listing A.4. Install brew

Listing A.5. Install developer tools

Listing A.6. bash_profile

Appendix B. Playful Python and regular expressions

Listing B.1. Regex OR symbol

Listing B.2. regex grouping parentheses

Appendix C. Vectors and matrices (linear algebra fundamentals)

Listing C.1. Create a vector

Listing C.2. Vector difference

Listing C.3. Cosine distance

Appendix D. Machine learning tools and techniques

Listing D.1. A dropout layer in Keras reduces overfitting

Listing D.2. BatchNormalization

Listing D.3. Count what the model got right

Listing D.4. Count what the model got wrong

Listing D.5. Confusion matrix

Listing D.6. Precision

Listing D.7. Recall

Listing D.8. RMSE

Listing D.9. Correlation

Appendix E. Setting up your AWS GPU

Listing E.1. $HOME/.ssh/config

Appendix F. Locality sensitive hashing

Listing F.1. Explore high-dimensional space

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset