Part three focuses on text data and introduces state-of-the-art unsupervised learning techniques to extract high-quality signals from this key source of alternative data.
Chapter 13, Working with Text Data, demonstrates how to convert text data into a numerical format and applies the classification algorithms from part two for sentiment analysis to large datasets. Chapter 14, Topic Modeling, applies Bayesian unsupervised learning to extract latent topics that can summarize a large number of documents and offer more effective ways to explore text data or use topics as features for a classification model. It demonstrates how to apply this technique to earnings call transcripts sourced in Chapter 3, Alternative Data for Finance, and to annual reports filed with the Securities and Exchange Commission (SEC).
Chapter 15, Word Embeddings, uses neural networks to learn state-of-the-art language features in the form of word vectors that capture semantic context much better than traditional text features and represent a very promising avenue for extracting trading signals from text data.