7

SENTIMENT ANALYSIS

Sentiment analysis is a special form of vocabulary analysis focusing exclusively on the emotional content present in text. No matter how carefully written, all documents contain telltale clues of an author's emotional state and views toward the subject discussed. Two newspapers covering a sporting event, one based in the hometown of the winner, and one in the loser's, will report the same factual score, but almost certainly will differ in their portrayals of it. The paper of the winning team will likely use a higher density of positive words to describe the game, while the losing team's paper will use a greater percentage of negative words. Similarly, a political speech motivating the public to vote for an upcoming referendum will contain very energetic words prompting the audience to action. Sentiment analysis is designed to quantify such latent emotional content by using precompiled lexicons to score the overall emotional charge of a given text. There are numerous lexicons available, ranging from simple concepts like tone and energy, to more complex concepts like the ability to conceptualize.

Examining Emotions

Evolution

Inferring emotional content from text has a long history in the field of content analysis, with practitioners as early as the 1950s discussing the “isomorphic relation between behavioral states and quantitative properties of lexical content.” (De Sola Pool, 1959, p. 90) Underlying the study of emotional content in language is the hypothesis that “the more units of content there are in a language sample about an emotion, the greater the intensity of that emotion in the speaker at the time he uttered the content” (De Sola Pool, 1959, p. 90). In other words, an author's words reflect not only the concepts she wishes to express, but subconsciously say a lot about her emotional state at the time she wrote them. Psychologists were the first to explore the field in detail, with Heise's 1965 lexicon of 1,000 words measured along dimensions of Evaluation, Potency, and Activation, followed a few years later by the Gottschalk and Gleser scales (1969) that today measure everything from inward and outward hostility to achievement strivings and human relations (http://www.gb-software.com/).

One of the most extensively tested sentiment databases is the Dictionary of Affect in Language (DAL), developed in 1984 (Sweeney and Whissell) and continuously refined over the last two decades. There are three dimensions of sentiment measured by the DAL: Evaluation, Activation, and Imagery. Evaluation is the emotional polarity of a word, whether it is positive or negative and the degree to which it falls in either category. The word love, for example, has a very strong positive score, while like has a lesser positive score. Activation refers to the energy level suggested by a word, such as the difference between arrest and detain, or promote and accept. Campaign speeches and other motivational text tend to make heavy use of highly active wording to encourage the audience to take action on a particular cause (Lazarevska, Sholl, and Young, 2005). Imagery measures the ability of the audience to internalize or conceptualize a particular idea. Words which are readily visualized, such as embrace, tend to cater to a broader audience than more abstract terms like accommodate. The current DAL contains nearly 10,000 entries and will match 90 percent of the words in a typical English document (Whissell, n.d.).

The Dictionary of Affect in Language has found application in a wide range of content analysis contexts, including presidential campaign speeches and magazine advertisements (Crandon and Lombardi, 2005), news media headlines (Fournier, Dewson, and Whissell, 1986) and even song lyrics by Paul McCartney and John Lennon, offering a biographical sketch of their lives through their emotions (Whissell and Dewson, 1986).

Evaluation

One of the first questions asked about automated sentiment analysis is how it compares with human-coded sentiment scores. While modern automatic text categorization techniques allow computers to achieve very high scores in bulk classification of large text archives, analyzing the emotional content of a document would appear to be a more nuanced task. A 1997 paper by the Wharton School of Business compared several human and machine-based sentiment analysis methods in their examination of hostile conflicts among American expatriate and Chinese managers in China (Doucet and Jehn, 1997). In this case, 76 American managers in Sino-American joint companies were asked to describe conflicts of intracultural and intercultural nature. The responses were transcribed into textual form for analysis. Human analysts judged each transcription and compiled a list of conflict terms, placing each conflict into a scale along several measures, such as whether it represented a battle or fight, whether it was about policies or authority, or if it related to scheduling problems. The authors used an electronic thesaurus and compiled a list of all words related to their core themes of conflict, placing transcriptions according to keyword matches against these categories. Finally, they also applied the Dictionary of Affect in Language to examine responses.

The researchers found that intercultural conflicts were characterized by passive language (embarrassed, incompetent), while intracultural conflicts were more active and aggressive, a situation ideally suited for machine sentiment analysis. Most importantly, they discovered that all of the techniques, including the human coding, resulted in the same final conclusions, but each selected very different vocabulary on which to base their conclusions. While their focus was on exploring interoffice hostility in the context of culture, the authors provide a strong validation for the use of automated techniques. Using human judges to score documents along a set of emotional scales requires significant expenditure of both money and time, while purely automated techniques like DAL can scale to archives of nearly unlimited size with minimal investment.

Unlike automated techniques, human coders often exhibit slight variations in their coding output. One coder might score the word anxious as being very negative, while another might score it as only somewhat negative or neutral. Given that automated sentiment tools are based on datasets compiled by human coders, reliability tests of those underlying coding datasets are critical to vouching for the overall accuracy of the system. The Dictionary of Affect in Language has been subjected to a rigorous battery of reliability tests several times in its life (Whissell, Fournier, Pelland, Weir, and Makarec, 1986), with reliability reaching as high as 87 percent in some dimensions. Data collection is quite extensive in the DAL, with words being scored as many as eight times each, and values merged into a final score, generating some 186,000 total scorings for the corpus.

Analytical Resolution: Documents versus Objects

The most common application of sentiment analysis is in scoring the tone of entire documents, but it can also be applied to specific objects within a larger text. In its document-level form, each word in the document is checked against the sentiment lexicon and assigned a score along each measured dimension. These scores are averaged together to arrive at a mean score for the entire document. Measuring emotional content at the document level yields an aggregated emotional picture of the document as a whole.

Object-level sentiment analysis has gained in popularity with the explosive popularity of consumer feedback web sites and blogs. Rather than measuring the overall sentiment of the entire document, semantic processing is performed to link sentiment words to the objects they describe. A review of a digital camera might be positive overall toward a particular model, but contain very positive remarks about its lens and very negative remarks about its battery life. Separating these emotional associations allows manufacturers to rapidly understand how consumers “in the wild” are using their products.

Hand-crafted versus Automatically Generated Lexicons

Most sentiment analyzers rely on hand-built lexicons of emotionally charged words, constructed from elaborate surveying processes involving large teams of trained human coders. These lexicons provide a strong foundation for out-of-the-box sentiment mining and represent a cross-section of emotional terms from many genres. Yet, creating a new lexicon to capture a novel emotional dimension requires having a large team of humans read through the entire list of English words, selecting words they feel have the greatest bearing on the desired dimension. The tremendous cost and reliability complexities involved have led to considerable research on the automatic construction of sentiment dictionaries.

One approach uses automatic text categorization tools, building a classifier based on a large sample of documents representing the desired emotion, such as positivity. This still requires human coding to select the training documents. A limitation to using a classifier is that it can only indicate a document as being in that category or not, rather than representing a strength score as with a traditional sentiment algorithm. For example, a categorization system would only be able to indicate that a document is “overall positive” or “overall negative,” rather than indicating how negative or positive it was. A more sophisticated approach is to compile a small list of seed terms and find other words that tend to be highly correlated with them. Correlated terms must be manually reviewed, as some will be legitimate tonal words, while others will simply reflect idiosyncrasies of the sample content. Thesauri and semantic taxonomies like WordNet can also be used to identify synonyms and other words semantically related to the seed terms. More advanced approaches build models of the grammatical contexts in which tone words most commonly appear, in order to identify candidates.

More recently, Google demonstrated a technique for using massive-scale network analysis to automatically construct a tonal lexicon from a set of initial seed terms (Velikovich, Blair-Goldensohn, Hannan, and McDonald, 2010). Using a training corpus of four billion web pages, they connected the more than 20 million phrases contained in those pages based on co-occurrence, resulting in a massive network capturing language use on the web. Starting with a set of positive and negative seed terms, they used network analysis techniques to trace the transitive connectivity of the seeds through the network, resulting in a final lexicon of 178,000 emotional terms. Of particular interest, the resulting lexicon includes common misspellings and colloquial terminology like “lol” and “cooooool” that are frequently used in social media.

Other Sentiment Scales

Sentiment is often thought of simply as positive and negative, but in reality there is a wide range of lexicons used to codify various perceptual aspects of language. For example, a 2005 study (Lazarevska, Sholl, and Young) explored the prevalence of conceptual complexity, self-confidence, control over events, need for power, in-group bias, distrust of others, and task orientation, in transcripts of speeches given by world leaders, terrorists, and terror-sponsoring states (as defined by the US Department of State and the FBI). The authors found strong correlations among language use in each category, developing a quasi-psychological model of terrorist speech along these measures.

Of direct commercial relevance, a 2010 paper presented an algorithm capable of predicting macro-scale movements in the US stock market by measuring six different moods in global Twitter posts: calmness, alertness, sureness, vitalness, kindness, and happiness (Bollen, Mao, and Zeng, 2010). While it might have been expected that tone would best track stock movements, the authors found that in actuality the level of calm language had the greatest predictive power. This is an important lesson, emphasizing that when using more sophisticated content analysis methods, a wide range of variables should be tested, even those that might not be suggested by existing theory.

Limitations

Of course, any purely automated technique has certain limitations that must be taken into account when applying it to a content analysis project. Sentiment mining treats incoming text as a so-called bag of words, ignoring ordering and structural constructs, and generating its scores based purely on word frequency. In reality, of course, sentiment-related factors like persuasion may derive significant effect from the “subtle manipulation of order, context, and sequence.” (De Sola Pool, 1959, p. 22). Concepts or emotional wording may be ordered in a document in such a way as to prime the reader toward a specific emotional charge. Computer-based sentiment mining will not catch such ordering, which is highly dependent on the way the human mind processes information. Yet, even the most carefully ordered persuasive document must rely on the use of select vocabulary that emphasizes persuasive tendencies in its audience, and which are therefore subject to machine processing. Sarcasm is a particularly confounding issue for automated sentiment analysis, as a deeper semantic understanding of the concepts discussed is often required to accurately discern situations in which sarcasm is being employed. Most document collections contain only occasional uses of sarcasm, however, so this should not be a major complication for the average project.

Measuring Language Rather Than Worldview

A critical concept to understand when making use of sentiment tools is that they are only capable of measuring the language used to express a topic, rather than the emotion of the underlying topic being discussed. The reason for this is that the emotional charge of an event is highly dependent on the worldview of the reader. For example, a news report of a company closing due to bankruptcy would likely be viewed as a very negative event by its former employees, while a competitor's management might view the closing as a positive development due to decreased competition. Similarly, in war coverage, one country's advance comes at the other's loss. Human coders, by virtue of bringing their own backgrounds and personal beliefs to bear, will often blend their view of the topics being discussed in a document with the language used to describe it. Computers, on the other hand, have no background knowledge outside of their sentiment lexicons and thus must evaluate a text based purely on its linguistic choice. This is an important distinction, as an analysis of war coverage might wish to examine whether one side or the other is being portrayed as winning. Automated tone coding can only report on whether the events are being linguistically portrayed in a positive or negative light. Rather than being a limitation, the machine's strict linguistic focus can actually benefit a content analysis, in that human coders are often unable to fully set aside their personal beliefs that can bias their interpretation of a text.

Chapter in Summary

Sentiment analysis is a form of vocabulary analysis that uses lexicons to measure the latent emotional charge of a body of text. Numerous lexicons exist, scoring text on dimensions from positivity to anxiousness, and a rich body of validation studies verifies the ability of automated techniques to measure even complex characteristics like emotion. Historically, most sentiment lexicons were compiled by hand, but automated approaches have been developed that can construct an entire tone lexicon from just a few seed words. Tone scoring can be applied to an entire document or windowed to measure only the tone of a particular actor in the text. Unlike human coders, automated tone scoring does not consider whether a concept itself is positive or negative, only how it is portrayed linguistically in the given text.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset