F.3. “Like” prediction

Figure F.2 is what a collection of tweets looks like in hyperspace. These are the 2D shadows of 100D tweet topic vectors (points) from latent semantic analysis of those tweets. Most of the marks represent tweets that were liked at least once; a minority of marks are for tweets that received zero likes.

Figure F.2. Scatter matrix of four topics for tweets

An LDA model fit to these topic vectors will succeed 80% of the time. However, like your SMS dataset, your tweet dataset is also very imbalanced. So predicting the likability of new tweets using this model isn’t likely to be very accurate. You should probably only use LSA, LDA, and LDiA language models for classification problems where variance maximization (class separability) is helpful:

  • Semantic search
  • Sentiment analysis
  • Spam detection

For more subtle discrimination between texts that rely on generalizing from similarities in semantic content, you’ll want the most sophisticated NLP tools in your toolbox. Use LSTM deep learning models and t-SNE dimension reduction techniques to solve difficult problems such as

  • Human reaction prediction (tweet likability)
  • Machine translation
  • Natural language generation
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset