Model training and evaluation

For illustration, we will create a document-term matrix containing terms appearing in between 0.5% and 50% of documents for around 1,560 features. Training a 15-topic model using 25 passes over the corpus takes a bit over two minutes on a four-core i7.

The top 10 words per topic identify several distinct themes that range from obvious financial information to clinical trials (topic 4) and supply chain issues (12):

Using pyLDAvis' relevance metric with a 0.6 weighting of unconditional frequency relative to lift, topic definitions become more intuitive, as illustrated for topic 14 about sales performance:

 Sales performance for Topic 14

The notebook also illustrates how to look up documents by their topic association. In this case, an analyst can review relevant statements for nuances, use sentiment analysis to further process the topic-specific text data, or assign labels derived from market prices.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset