Sentiment Analysis with Word Embeddings

In this chapter, we turn to the problem of sentiment analysis. Sentiment analysis is an umbrella term for a number of techniques to figure out how a speaker feels about a certain topic or piece of content.

A vanilla case study of sentiment analysis is polarity. Given a document or text string (for instance, a Tweet, a review, or a comment on a social network), the aim is to determine whether the author feels good, bad, or neutral about the item or topic in question.  

At first look, this problem might seem trivial: A lookup table with positive and negative words, and simply counting the word frequencies should do, right? Not so fast. Here are a few examples of why this is tricky:

  • Their decadent desserts made me hate myself
  • You should try this place if you love cold food
  • Disliking cake is not really my thing

What can we see in these examples?

  • Negative terms used in a possibly positive sense
  • Positive terms used sarcastically
  • Two negative terms that imply something positive
Note that we have not dealt with spelling mistakes, neologisms and use of multiple languages, just to name a few potential issues in real-life situations. 

As you can see, sentiment analysis is a very complicated task, and we will merely scratch the surface. 

The human element is the most important element in sentiment analysis. The accuracy of a sentiment analysis system depends on how much it agrees with human judgement. And how accurate could that be? Unfortunately not that much. Some researchers report disagreement rates as high as 20%. This means that if your sentiment analysis system has an accuracy of 70%, although seemingly unimpressive, it will still be good, since humans would agree on 80% of the classifications of a perfect sentiment analysis system. But, beware!

These figures are not exactly comparable, since the computer will have issues identifying subtleties that are best guessed by humans, such as sarcasm, jokes, or the subtle differences in meaning from the examples shown previously. 

Nonetheless, there seems to be an increasing awareness from individuals and organizations about the importance of their presence online. We can measure that impact by the mushrooming of agencies and professionals specializing in social media monitoring. This has contributed to increased attention in the field from researchers (both in universities and industries), and we believe that it is likely that we will see significant advances in the area.

Our focus, as stated elsewhere in the book, is to introduce you to the algorithms in the simplest way possible. We will use a well-known dataset, which shares many of the properties of real-life datasets that you will find out there.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset