Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 13

SOMA: The Smart Social Customer Relationship Management Tool

Handling Semantic Variability of Emotion Analysis With Hybrid Technologies

L. Dini^a; A. Bittar*^,^a; C. Robin^†^,^a; F. Segond^‡^,^b; M. Montaner^§^,^c ^a Holmes Semantic Solutions, Gières, France
^b Viseo Technologies, Grenoble, France
^c Strategic Attention Management, Girona, Spain
^* www.ho2s.com
^† www.ho2s.com
^‡ www.viseo.com
^§ www.blueroominnovation.com/

Abstract

SOMA is a smart social customer relationship management tool for companies aiming to monitor and automatically deal with customers’ complaints and interactions in social networks. Emotion detection is crucial in analyzing customers’ messages and to empathize with them. Part of the project consists in highlighting, in a social network corpus, how linguistic phenomena such as negation, intentional contexts, indirect speech, rhetorical devices, and pragmatic phenomena such as genre, personal style, domain, and context have a dramatic impact on the process of detecting opinions and emotions in social networks. We describe our approach to emotion detection based on a hybrid approach combining statistics and rules. A statistical approach is used to solve classic bag-of-words problems and a symbolic rule–based approach is used to increase detection precision. Finally, we propose a new operational evaluation method grounded more in real-world problem solving than in traditional gold standard annotation approaches.

Keywords

SOMA; Smart social customer relationship management; CRM; Sentiment analysis; Machine learning; Symbolic approach

Acknowledgments

This work was partially funded by the Eurostars program, project SOMA number E!9202 (2015–17).

1 Introduction

Sentiment and emotion are key elements for companies to understand their customers so as to improve customer relationship management, to adapt both their services and their products and to optimize marketing strategies. In recent years social media have become a new channel of communication between companies and customers and, therefore, the most appropriate place to analyze and try to achieve their satisfaction.

SOMA (for “Smart Media Management for Social Customer Attention”) is an innovative R&D project cofunded by Eurostars aiming to create a corporate tool that allows companies to understand and interact efficiently with their customers through social media (eg, Facebook, Twitter, LinkedIn). The main objectives of the SOMA project are threefold. First, to monitor and analyze from a strategic point of view what is happening in the different social media channels of the company (eg, evaluate if the different customers use social media as a channel to complain, ask questions, give opinions, etc., or analyze the impact of a marketing campaign on the sentiment of the customers). Second, the definition of a common corporative strategy of communication through social media channels by means of suggested/automatic actions (eg, create critical alerts when complaints of influencers arrive or suggest cross-selling actions when congratulations or positive opinions arrive). Third, automatically incorporate all this valuable social information into the company’s customer relationship management, which is usually totally disconnected from social media. In other words, SOMA is merging unstructured and structured information to improve customer satisfaction. The unstructured information is that contained in social networks that relates to customers’ sentiments and emotions.

In this chapter we focus on emotion detection. Unlike opinion analysis, emotion analysis cannot be reduced to the categorization between positive, negative, and neutral, as expression of emotion is often much more nuanced and subtler than this. In what follows we first build a corpus. We then perform two tasks: classifying emotions into different types (eg, anger, disgust, fear, surprise) and distinguishing between emotional and nonemotional texts (not all the social media posts have emotions in them) using a hybrid approach. Finally, we provide the first evaluation results that we obtained.

2 Definition of Sentiment and Emotion Mining

In recent years, so-called emotional marketing has become a key factor of success for many business-to-consumer companies, especially global ones. Simply put, emotional marketing has the basic goal of convincing customers that a brand or a product is not just a brand or a product, but a kind of “friend.” Any emotional marketing strategy starts with an in-depth understanding of customers’ emotional or motivational drivers, and this is achieved either via open question surveys or by analysis of emotional reactions on social media. As mentioned already, sentiment and opinion analysis consists in “classifying a piece of text into positive versus negative classes,” while emotion analysis consists in “multiclassifying” text into different types of categories (anger, disgust, fear, happiness, sadness, and surprise).

Opinions can map to a multidimensional space of emotions with different activation levels, and this certainly gives more insights compared with traditional sentiment analysis techniques.

This chapter has the goal of validating the feasibility of automatic emotion analysis on social media. It should be noted that standard sentiment analysis, intended as opinion detection (positive vs. negative), possibly ranked on an intensity scale, is inappropriate for our purposes, both practically and conceptually. Practically, the fact that a customer expresses, for instance, a positive opinion has no impact on an emotional evaluation: the sentences “I love this cup” and “This cup is very good” have an identical opinion value but definitely a different emotional tonality. Conceptually, whereas opinions are usually expressed in a quite explicit way, the language of emotions is much more difficult to interpret as it relies on indirect signals and implicit knowledge (eg, “Harley isn’t just a motorcycle, it’s a lifestyle”).

3 Previous Work

Much research has been performed on sentiment analysis in the past decade. Wiebe et al. [1] performed an ambitious annotation effort, producing a 10,000-sentence corpus of world news texts marked up for a range of sentiment-related phenomena, although the typology of emotions focused on polarity rather than actual emotion types.

Alm et al. [2] created a corpus of children’s tales where each sentence (1580 in total) was annotated with one of the six emotions of Ekman [3] or a neutral value. They also describe a supervised machine learning system that detects emotional versus neutral sentences and detects emotion polarity within a sentence. Although this work captures actual emotion types, rather than just polarity, the type of texts is not compatible with the domain of corporate customer relationship management or social media.

Pak and Paroubek [4] created a corpus of tweets for sentiment analysis, but again focused only on emotion polarity rather than capturing the values of emotion types—the information that is crucial to our project.

Vo and Collier [5] created a corpus of Japanese tweets annotated according to an ontology of five emotions (similar to the Ekman typology [3]) and a system to automatically detect the prevalent emotion in a tweet, achieving a global F score of 64.7. This work is similar to our current project in that it deals with emotion types, although the language (Japanese) is not pertinent to our work.

Finally, the numerous SemEval campaigns that have included various tasks on emotion analysis have almost exclusively focused on the assignment of a polarity value to detected emotions. An exception is the 2007 campaign [6], Task 14: Affective Text, in which participants were required to detect and classify emotions and/or determine their polarity in a corpus of manually annotated news headlines extracted from news websites. In the annotated corpus, each headline is annotated with a numeric interval (0–100) that indicates, for each of the six emotions (anger, disgust, fear, joy, sadness, surprise), its strength in the headline. The six human annotators who prepared the corpus were instructed to “select the appropriate emotions for each headline on the basis of the presence of words or phrases with emotional content, as well as the overall feeling invoked by the headline.” The nature of this task is highly subjective, relying heavily on each annotator’s interpretation of a given headline and his or her emotional response. Furthermore, requesting annotation over such a fine-grained interval for each emotion allows even further room for disagreement. This high degree of subjectivity is reflected in the reported interannotator agreement (Pearson correlation coefficient) for the task, which “is not particularly high” (an average agreement of 53.67 across the six emotions). The resulting “gold standard” corpus for this task is, therefore, not a reliable yardstick against which to evaluate system performance. Indeed, evaluation results for the participating systems were relatively quite low (F scores ranging from 16.00 to 42.43).

To the best of our knowledge, no suitable corpus for the task exists: we created a new corpus of emotion-annotated tweets for the future evaluation of the SOMA system. This corpus of tweets represents the social media component of the global SOMA corpus that is to be created for use in the project.

4 A Silver Standard Corpus for Emotion Classification in Tweets

The first corpus, which we will call the “Emotion Tweet Corpus for Classification” (ETCC), is a “silver standard” in which each tweet is classified with a single emotion. In creating this corpus, we relied on the basic premise that some Twitter users, when expressing emotions, also tag their message with emotional hashtags. On the basis of this assumption we tried to construct a corpus where tweets are classified according to the six emotional classes used in SemEval 2007 (anger, disgust, fear, joy, sadness, and surprise) [6]. The choice of hashtag to be associated with emotions (one hashtag per emotion) was very important, as the hashtags needed to be common enough to allow the retrieval of a significant number of tweets, and to be unambiguous: for instance, for the emotion surprise we could not use the hashtag #surprise as it is semantically highly ambiguous (as an interjection, Surprise! is a noun meaning something that surprises, an act of surprising, etc.). Instead, we opted for the unambiguous hashtag #astonished. The six emotion hashtags we used were #angry, #astonished, #disgusted, #happy, #sadness, and #scared. The corpus collection phase was made much easier by the fact that since November 2014 [7] Twitter has a search interface not emphasizing recentness and allowing the retrieval of tweets since 2006: emotional hashtags could then be used as search keywords, and the necessary number of tweets for each emotion was collected (20,000 per emotion) with use of the approach described in [8].

We then performed some filtering to remove inappropriate tweets (formal filtering). We eliminated non-English tweets by specifying the language in the search query. We also removed tweets that were not composed of text (eg, by filtering out tweets that had a higher proportion of hashtags than other tokens). Tweets containing links to multimedia content were also filtered out, as in general the emotional hashtag in such tweets relates to the indicated media rather than the textual content of the tweet. After manual inspection we noticed that the number of tweets containing an emotional tag but no emotional text was still high. We therefore applied a further filter (affect filter) based on WordNet-Affect [9], we indexed all content with Lucene, and then we ran a fuzzy search, selecting only tweets containing an emotional word from the WordNet-Affect lexicon. Figures for the resulting corpus are given in Table 13.1.

Table 13.1

Figures for the Corpus of Emotion-Classified Tweets

SemEval 2007 Emotion	Hashtag	After Formal Filtering	After Affect Filtering
Anger	#angry	8738	5105
Surprise	#astonished	16,970	8635
Disgust	#disgusted	14,508	9084
Joy	#happy	3574	2009
Sadness	#sadness	3364	1724
Fear	#scared	10,525	5750

t0010

Finally, all hashtags appearing at the end of a tweet were removed and hashtags that occurred within (ie, before the end of the text) a tweet had their hash sign removed as in such cases they are often used in the place of regular words. For example, after this step, tweet 1 below becomes tweet 2:

1. #MoodSwings are a #symptom of being #Bipolar. If you’re #scared, #sad, #paranoid, or #suicidal, There’s help here: http://ow.ly/tByb6.

2. MoodSwings are a symptom of being Bipolar. If you’re scared, sad, paranoid, or suicidal, there’s help here: http://ow.ly/SBtb6.

Although the corpus was not created via a full manual annotation (whence its “silver” status), the criteria used for retrieval and selection of the texts were anchored in the actual textual forms of the tweets, as opposed to our relying on highly subjective annotator judgments, as was the case in SemEval 2007.

5 General System

In our work we have been pursuing and comparing two approaches: a symbolic approach to emotion analysis and one based on machine learning. It shows that while the task of classifying emotions in a closed-world assumption (each text has an emotional value) is relatively easy, the real challenge is to distinguish emotional from nonemotional text. We also show that a statistical approach outperforms a symbolic one in emotion classification, while the opposite is true when information filtering is considered.

Concretely, this work focused on two different tasks. The first is the categorization of tweets according to the emotion they express, a single emotion per tweet among a set of six basic emotions. As we explained in detail in the previous section, currently available corpora do not, to our knowledge, provide appropriate data for evaluation of the automation of this task, and a major objective of this work was to make up for this shortfall. The second task was to distinguish tweets that express an emotion from those that do not. We also created a corpus as evaluation data for this task.

5.1 Hybrid Operable Platform for Language Management and Extensible Semantics

The Hybrid Operable Platform for Language Management and Extensible Semantics (HOLMES) [10] is a natural language processing platform developed by Holmes Semantic Solutions. The main assumption behind the design of HOLMES is that the combination of different technologies (statistical and machine learning methods and symbolic or rule-based methods) is indispensable to achieve superior performance in generic text mining and information extraction tasks. HOLMES is based on a flexible processing model (similar to that of Stanford CoreNLP) where different annotators are arranged in a pipeline and each annotator has access to the annotations added in all previous stages of processing. The general strategy adopted in HOLMES is to introduce into the pipeline pairs of annotators with comparable functionalities, one based on (most often supervised) machine learning and the other based on symbolic methods. The role of the linguist then becomes to correct the output of the statistical model on the basis of appropriate rules. For example, HOLMES has both a conditional random field (CRF)–based named entity recognition module [11] and a correction module based on TokensRegex [12], a stochastic part-of-speech tagger and a linear pattern matching rule component, a MaltParser-based model for dependency parsing, and a graph transformation–based component for detecting and correcting parsing errors and for performing semantic analysis, which we describe below.

As outlined in [13], a prevailing trend in computational semantics is to suppose that a graph is an optimal representation of semantic structures. HOLMES’s semantic layer (which includes the sentiment analysis module) is based on this assumption. Basically predicates (in a first-order logic sense) are represented by arcs connecting nodes, which correspond to entities (tokens) detected in the text and that are enriched with specific semantic information. For example, the sentence “John ate an apple yesterday” has the semantic graph representation shown in Fig. 13.1.

f13-01-9780128044124 — Fig. 13.1 Semantic graph for the sentence “John ate an apple yesterday.”

Given that HOLMES performs syntactic analysis using a dependency graph output, it was natural to conceive the process of semantic analysis as graph transformation. HOLMES makes use of the Stanford Semgrex engine [14] to transform the dependency structure into a semantic graph. An in-house formal grammar was developed to provide linguists with a user-friendly interface for encoding the syntax-to-semantics transformations.

5.2 The Machine Learning Approach

The first test we ran, on the ETCC, was to use a classifier to discern the emotion expressed in each tweet. The corpus was split via random sampling into 80% training and 20% test. We used a multiclass linear classifier associated with a quasi Newton minimizer [15], under the Stanford natural processing language implementation.⁵ We paid particular attention to the feature selection process, and after several tests the best results were obtained with the following set of features:

• Word: The sequence of characters composing the word as they appear in the text.

• Lemma: Lemma and part-of-speech tag, as resulting from part-of-speech disambiguation.

• Noun phrase: We use the output of the dependency grammar to produce all possible well-formed noun phases out of input text. Noun phrases are passed to the classifier both as sequences of word forms and as sequences of lemmas.

• Dependencies: A certain subset of grammatical dependencies is passed to the classifier as a set of triples. For instance (verb, SUBJ, noun), (verb, OBJ, noun), (noun, MOD, adj), etc., where parts of speech are obviously replaced by the relevant lemma (eg, (hate, SUBJ, i), (have, OBJ, money)). As the grammar we use produces Stanford-style dependencies [16], the dependency features are close to a semantic representation.

For each tweet the classifier assigns a probability for each emotion (the total probability mass being 1), and each tweet is assigned the emotion with the highest probability.

5.3 The Symbolic Approach

Our symbolic approach to emotion annotation was done with our in-house system, Senti-Miner, developed within the company over several years [17, 18]. Senti-Miner is based on the HOLMES platform. Processing consists of three main stages that integrate into the usual preprocessing pipeline (sentence detection, tokenization, part-of-speech tagging, morphological analysis and lemmatization, dependency parsing). These stages are lexical tagging, token-based regular expression annotation, and dependency graph transformation. Each of these is described below:

• Lexical tagging with gazetteers: The main lexical resource used in Senti-Miner is a gazetteer of emotions (1577 lemmas) automatically extracted from the WordNet-Affect database. A mapping between the WordNet-Affect emotions and the six basic emotions used for this experiment was established. Classes of emotions that did not have a coherent mapping were discarded.
Classes that had multiple mappings were split. The resulting gazetteer used for this experiment contains 1302 lemmas. Furthermore, a separate gazetteer of Internet slang terms and their corresponding emotions (eg, LOL = JOY, WTF = SURPRISE), containing 416 entries, was also used.
A third gazetteer is used to disambiguate lemmas that are only emotions when presented with specific parts of speech. For example, “like” is an emotion as a verb, but not as a preposition; “close” is an emotion as an adjective or adverb, but not as a verb, etc. This gazetteer contains 1547 emotion lemmas with their possible parts of speech.
After part-of-speech tagging, all lexical items in the input text that have a lemma in one of the emotion gazetteers are tagged with their corresponding emotion and possible parts of speech. An example of output at this stage is shown in Fig. 13.2 (note that no disambiguation has occurred at this stage).

f13-02-9780128044124 — Fig. 13.2 Example output after lexical tagging.

• Token-based regular expression annotation (cf. Stanford TokensRegex [12]): After lexical tagging, a set of token-based grammar rules are first applied to correct emotion annotations on the basis of possible parts of speech. For example, the preposition “like” in “John eats like a pig” is not an emotion, and this is taken into account. A further set of rules is used to process certain multiword expressions that are able to be dealt with by a regular grammar without a deep syntactic analysis. These rules remove emotions from certain contexts; for example, “close minded” (sic), “with respect to,” etc. An example of output after this stage is shown in Fig. 13.3.

f13-03-9780128044124 — Fig. 13.3 Example output after token-based regular expression annotation.

• Dependency graph transformation grammar (using the Stanford Semgrex engine [14]): After dependency parsing, the final step is the sequential application of a set of graph transformation grammars that mark relations between emotion words and their arguments. These grammars have access to all annotations added during previous processing. Certain rules are used to remove emotions from certain contexts; for example, in the scope of a modal operator (eg, “You would be astonished,” “You should be happy,” etc.), to remove emotions from common interjections (eg, “Good night!,” “Happy Birthday/New Year/Anniversary,” “Merry Christmas!”), and in certain expressions (eg, “You have got to be kidding me”). Other rules add an EMOTION relation between an emotion word and its syntactic argument. For example:

• “John is angry”—EMOTION(“angry,” “John”).

• “This is a frightening book”—EMOTION(“frightening,” “book”).

• “John has sympathy for Mary”—EMOTION(“sympathy,”“John”).

• “John’s sympathy for Mary”—EMOTION(“sympathy,” “John”).

Furthermore, our grammar assigns one of two relations to indicate the status of the experiencer with respect to the emotion (causative or stative):

1. “John is a shy person.”

2. “This film impresses me.”

For example, in sentence 1, the grammar marks EXPERIENCER_STAT(“shy,” “John”), indicating that “shy” is a state of its subject, while in sentence 2 the grammar assigns EXPERIENCER_CAUSE(“film,” “impresses”), indicating that the subject of the emotion word “impresses” is the cause of the emotion. Although these two relations are output by our system, they were not used for the purposes of the current experiments. Example output of graph transformation is shown in Fig. 13.4.
The annotated relations, aside from the two just mentioned, mark the presence of an emotion in the final output. The final emotion is assigned to a given tweet, firstly, according to the number of occurrences found. If all detected emotions occur in equal numbers, the first one (from left to right) is assigned.

f13-04-9780128044124 — Fig. 13.4 Example output after dependency graph transformation.

6 Results and Evaluation

6.1 Tweet Emotion Detection

For this task we determined a baseline against which to gauge the performance of our classifiers by calculating precision, recall, and F score for each emotion in the ETCC (see Section 4) according to the simple presence or absence of the appropriate emotion hashtag in the tweet text (eg, “anger” in the ANGER tweets were considered true positives, “anger” absent from an ANGER tweet was a false negative, and so on). Baseline figures are presented in Table 13.2.

Table 13.2

Baseline Evaluation Figures for Emotion Classification

	Precision	Recall	F Score
Anger	0.96	0.37	0.53
Disgust	1.00	0.33	0.49
Fear	0.98	0.17	0.28
Joy	0.78	0.62	0.69
Sadness	0.99	0.32	0.48
Surprise	0.98	0.28	0.43
Average	0.95	0.35	0.49

t0015

Evaluation results for the machine learning classification of tweets (see Table 13.3) on the ETCC show differing performance across emotion types, reflecting the amount of data available for training for each emotion (see Table 13.1). The classifier achieved an average improvement in the F score of 9 percentage points over the baseline.

Table 13.3

Evaluation of Emotion Classification of Tweets via the Machine Learning Classifier

	Precision	Recall	F Score
Anger	0.53	0.46	0.49
Disgust	0.66	0.72	0.69
Fear	0.61	0.65	0.63
Joy	0.63	0.6	0.62
Sadness	0.54	0.37	0.44
Surprise	0.62	0.61	0.62
Average	0.6	0.57	0.58

t0020

Evaluation results for the classification of tweets using the symbolic classifier are presented in Table 13.4. The figures are significantly lower than those obtained via machine learning (F score lower by 17 percentage points) and are also lower than the proposed baseline (F score lower by 8 percentage points). The relatively low performance of the symbolic classifier can be explained by the fact that the system was not developed for this particular type of corpus (it was initially developed to extract emotional responses—to products or brands, etc.—provided in user-generated feedback). Indeed, the symbolic classifier proves less robust when faced with texts from a domain different from that for which it was developed.

Table 13.4

Evaluation of Emotion Classification of Tweets via the Symbolic Classifier

	Precision	Recall	F Score
Anger	0.75	0.33	0.46
Disgust	0.76	0.24	0.37
Fear	0.72	0.35	0.47
Joy	0.26	0.68	0.37
Sadness	0.24	0.37	0.29
Surprise	0.84	0.37	0.52
Average	0.60	0.39	0.41

t0025

6.2 Tweet Relevance

To evaluate the performance of the classifier in detecting emotional tweets in the ETCC, we ran the classifier with different score thresholds for emotion detection. For example, with a threshold set at 0.4, at least one emotion must have a score above 0.4 for the tweet to be classified as emotional. The reasoning behind this is that for a tweet with no emotional content, one would expect the classifier to attribute equal scores to all emotions and in the case of emotional ones different scores for each emotion. By varying the threshold score for emotion classification, we hoped to determine the optimal score for detecting the emotional relevance of a tweet. We did not include tweets with no emotions, as, in order to achieve balance in this domain-nonspecific corpus, such tweets would have to encompass an enormous range of varied data from all possible domains, which is obviously infeasible.

The graph in Fig. 13.5 shows the F-score results of the evaluation for each of the thresholds tested. As expected, lower score thresholds favored recall, with relatively low precision, while precision, although still mediocre, was better at higher thresholds. Best performance for the classifier was an F score of 0.26.

f13-05-9780128044124 — Fig. 13.5 F score for the detection of emotional versus nonemotional tweets with respect to the minimum score threshold.

As for the symbolic approach, always assigning the nonemotional category to tweets in the corpus provides the baseline figures shown in Table 13.5.

Table 13.5

Baseline Figures for Detecting Emotional Versus Nonemotional Tweets via Symbolic Methods

	Precision	Recall	F Score
Emotional	0	0	0
Nonemotional	0.91	1.00	0.95
Average	0.45	0.50	0.48

t0030

Evaluation figures for the symbolic methods for detecting emotional versus nonemotional tweets (Table 13.6) show a major improvement over the baseline (average F score of 0.72 vs. 0.48). The figures also show a major improvement over those obtained by the classifier (F score of 0.48 vs. 0.26 for detection of emotional tweets).

Table 13.6

Figures for Detecting Emotional Versus Nonemotional Tweets With Symbolic Methods

	Precision	Recall	F Score
Emotional	0.54	0.43	0.48
Nonemotional	0.94	0.96	0.95
Average	0.74	0.69	0.72

t0035

7 Conclusion

Companies know that customer satisfaction means business. Monitoring and analyzing customers in social media, making decisions on the basis of this strategic information, and analyzing the results of the marketing actions performed are nowadays essential, especially for business-to-consumer companies. SOMA will provide companies with an advanced tool to help them achieve these results by analyzing customers’ sentiments and emotions in social media with advanced techniques that allow deeper and more exhaustive analysis. Because of the nature of the corpus, our evaluation results for the task of emotion detection do not allow us to conclude that one type of method (symbolic or machine learning) is generally more successful than the other, although, in our particular case, the machine learning approach did perform better. In the task of detecting emotional versus nonemotional texts, which we evaluated on a gold standard corpus, the results are more reliable, and we may conclude that for this type of task, symbolic methods perform better.

In the SOMA project we also plan to use this work on emotion analysis to detect influencers in social networks (see Chapter 10 for more information on this topic).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 13: SOMA: The Smart Social Customer Relationship Management Tool: Handling Semantic Variability of Emotion Analysis With Hybrid Technologies

Create new playlist

Sign In

Sign Up