Chapter 7

Irony, Sarcasm, and Sentiment Analysis

D.I. Hernández Fariasa,b; P. Rossoa    a Technical University of Valencia, Valencia, Spain
b University of Turin, Turin, Italy

Abstract

Irony and sarcasm are sophisticated forms of speech in which authors write the opposite of what they mean. They have been studied in linguistics, psychology, and cognitive science. While irony is often used to emphasize occurrences that deviate from the expected, sarcasm is commonly used to convey implicit criticism. However, the detection of irony and sarcasm is a complex task, even for humans. The difficulty in recognizing irony and sarcasm causes misunderstanding in everyday communication and poses problems to many natural language processing tasks such as sentiment analysis. This is particularly challenging when one is dealing with social media messages, where the language is concise, informal, and ill-formed.

Keywords

Irony detection; Sarcasm detection; Figurative language processing; Social media; Twitter; User-generated content

Acknowledgments

This work was done in the framework of the SomEMBED MINECO TIN2015-71147-C2-1-P research project. The National Council for Science and Technology (CONACyT Mexico) funded the research of Delia Irazú Hernández Farias (grant no. 218109/313683 CVU-369616).

1 Introduction

Everyday, people make judgments about their environment. This is an inherent behavior of humans. There are different ways to express our opinions, one of the most interesting is by figurative language devices such as irony and sarcasm. This allows us to express ourselves in a particular way using words not only in their most salient meaning but also in a creative and funny sense. The use of words or expressions with a meaning that is different from the literal interpretation is known as figurative language.

Irony and sarcasm are two interesting and strongly related concepts. Usually people do not have a clear idea of what they are. However, from early childhood we begin to use them in our daily life. They have been a topic studied by different disciplines, such as linguistics, philosophy, psychology, psycholinguistics, cognitive science, and recently computational linguistics. Each discipline has tried to define what they are, how they are produced, and why they are used.

These figurative devices give us the opportunity to explore the interaction between cognition and language. Broadly speaking, irony and sarcasm are figurative language devices that serve to achieve different communication purposes. The commonest definition of irony refers to an utterance by which the speaker expresses a meaning opposite that literally said. There are different theories that attempt to explain what irony is. Grice’s theory [1] points out that the speaker intentionally violates the “maxim of quality” (the speaker does not say what he or she believes to be false) when he or she expresses an ironic utterance. Some theories such the one described in [2] propose define it beyond the literal sense of the words: for Wilson and Sperber [2] an ironic utterance is an “echoic mention” that alludes to some real or hypothetical proposition to demonstrate its absurdity. Attardo [3] considers an ironic utterance as a form of “relevant inappropriateness” in which the speaker relies on the ability of the listener to reject the literal meaning on the basis of the disparity between what is literally said and the context in which it is said. On the other hand, the “failed expectation” intention (ie, the speaker’s approval or disapproval of the entity or situation at hand) behind an ironic expression has been studied by Utsumi [4] and Kumon-Nakamura and Glucksberg [5].

Usually, irony is considered as a broader term that covers also sarcasm [6, 7]. Irony may be positive (ie, noncritical), while sarcasm usually is not [8, 9]. Sarcasm is commonly more aggressive and offensive than irony. In this work irony and sarcasm are treated as two different concepts.

Social media offer a face-saving way for people to express themselves, and they sometimes choose to use ironic or sarcastic utterances to communicate their attitude or evaluative judgment toward a particular target (eg, a public person, a product, a movie, or an event). The presence of ironic or sarcastic content in human communication may cause misunderstandings. Identification of this intention is not a trivial task even for humans: different cognitive processes are involved and knowledge of the environment is needed. For natural language processing tasks such as sentiment analysis, this kind of subjective user-generated content is a big challenge. In some cases the presence of ironic content plays a particular role: “polarity reversal.” This means, for instance, that an utterance seems to be positive but its real intention is negative (or vice versa).

We introduce the following example, extracted from an ironic set of Amazon reviews collected by Filatova [10]: “I would recomend this book to friends who have insomnia or those who I absolutely despise.”1 For a sentiment analysis system that exploits the basic approach of considering the frequency of positive and negative terms to assign a polarity, this sentence could be considered as positive. The words recomend (recommend), book, and friends are positive terms, while insomnia and despise denote a negative sense. Therefore in the sentence there are three positive terms and two negative terms, and the sentence could be identified as positive. However, this review conveys a meaning far from positive. The author expresses a negative judgment against the book in an imaginative way. On one hand, the author writes about recommending the book, which can be considered as a positive aspect about the target (the book), but at the same time there is a point about “friends who have insomnia” or “those who I absolutely despise.” Thus the author’s hidden intention could be to state that the “book” is so boring as to induce sleep (even in those who have insomnia). Research in irony could not only improve the performance of sentiment analysis systems but could also help us to understand the cognitive process involved and how humans process and produce utterances of this kind. After introducing the state of the art in irony and sarcasm detection, we investigate the impact that the use of these figurative language devices may have on sentiment analysis.

This chapter is organized as follows. In Section 2 we describe the state of the art in irony and sarcasm detection. In Section 3 we address the impact that figurative language has on sentiment analysis. We analyze three shared tasks that have been recently organized. Section 4 discusses future trends and directions. In Section 5 we draw some conclusions.

2 Irony and Sarcasm Detection

Irony and sarcasm detection are considered as special cases of text classification, where the main goal is to distinguish ironic (or sarcastic) texts from nonironic (or nonsarcastic) ones. To analyze figurative devices of this kind, it is necessary to consider not only the syntactic and lexical textual level (to extract salient features such as word position and punctuation marks) but also semantics (literal vs. nonliteral meaning of the words), pragmatics (words matching with the appropriate context), and discourse analysis (relation between the utterance at hand and the way in which it is expressed). However, the progress so far has been a result of the use of mainly syntactic, lexical, and shallow semantics.

Dealing with social media texts is a challenging task. They have specific characteristics: they are informal and use ill-formed language. People express themselves in a face-saving way by unstructured content. Usually, social media texts contain spelling mistakes, abbreviations, and slang. In Twitter, the text should be written in a maximum of 140 characters; therefore figurative language is expressed in a very concise manner, which causes an additional issue. When people express their opinions by ironic or sarcastic utterances, they can choose how to use the language to achieve their communicative goals. There is no particular structure to construct ironic or sarcastic utterances.

In a such way the main objective of irony and sarcasm detection is to discover features that allow us to discriminate ironic (or sarcastic) texts from nonironic (or nonsarcastic) texts. The interest in irony and sarcasm detection in social media requires we have user-generated data that allow us to capture the real use of figurative language devices of this kind. As in most natural language processing tasks, the lack of corpora is an issue. There are two main approaches for ironic/sarcastic corpus construction: self-tagging and crowdsourcing. The first one considers as positive instances those texts in which the author points out her intention using an explicit label (eg, the hashtags #irony and #sarcasm). Therefore in this case we rely on the author’s definition of what irony or sarcasm is. Crowdsourcing involves human interaction by the labeling of the content as ironic (or sarcastic). Mainly, the labeling process is done without any strict definition or guideline. Therefore it is a subjective task, where the agreement between annotators is often very low. In this way it is possible to obtain potential ironic and sarcastic texts produced by people in social media.

For computational linguistic purposes, irony and sarcasm are often considered as synonyms. The following subsections describe some proposed approaches to address irony and sarcasm detection. The first one is focused on work where the ironic intention was considered as an overall term, while the second one is focused on research where sarcasm was considered as a different concept.

2.1 Irony Detection

One of the first studies in irony detection was by Carvalho et al. [11]. They worked on the identification of a set of surface patterns to identify ironic sentences in a Portuguese online newspaper. The most relevant features were the use of punctuation marks and emoticons. Veale and Hao [12] conducted an experiment by harvesting the web, looking for a commonly used framing device for linguistic irony: the simile (two queries “as * as *” and “about as * as *” were used to retrieve snippets from the web). They analyzed a very large corpus to identify characteristics of ironic comparisons, and presented a set of rules to classify a simile as ironic or nonironic.

Reyes et al. [13] analyzed tweets tagged with the hashtags #irony and #humor to identify textual features for distinguishing between them. They proposed a model that includes structural, morphosyntactic, semantic and psychological features. Additionally, they considered the polarity expressed in a tweet using the Macquarie Semantic Orientation Lexicon.2 They experimented with different feature sets and a decision tree classifier, obtaining encouraging results (F measure of approximately 0.80).

Afterward, Reyes et al. [14] collected a corpus composed of 40,000 tweets, relying on the “self-tagged” approach. Four different hashtags were selected: #irony, #education, #politics, and #humor. Their model is organized according to four types of conceptual features—signatures (such as punctuation marks, emoticons, and discursive terms), unexpectedness (opposition, incongruency, and inconsistency in a text), style (recurring sequences of textual elements), and emotional scenarios (elements that symbolize sentiment, attitude, feeling, and mood)—by exploiting the Dictionary of Affect in Language (DAL).3 They addressed the problem as a binary classification task, distinguishing ironic tweets from nonironic tweets by using naïve Bayes and decision tree classifiers. They achieved an average F measure of 0.70.

Barbieri and Saggion [15] proposed a model to detect irony using lexical features, such as the frequency of rare and common terms, punctuation marks, emoticons, synonyms, adjectives, and positive and negative terms. They compared their approach with that of Reyes et al. [14] on the same corpus using a decision tree, and obtained results slightly better than those previously obtained. They concluded that rare words, synonyms ,and punctuation marks seem to be the most discriminating features. Hernández-Farías et al. [16] described an approach for irony detection that uses a set of surface text properties enriched with sentiment analysis features. They exploited two widely applied sentiment analysis lexicons: Hu&Liu4 and AFINN.5 They experimented with the same dataset used in [14, 15]. Their proposal was evaluated with use of a set of classifiers composed of Naïve bayes, decision tree, support vector machine (SVM), multilayer perceptron, and logistic regression classifiers. The proposed model improved on the previous results (F measure of approximately 0.79). The features related to sentiment analysis were the most relevant.

Buschmeier et al. [17] presented a classification approach using the Amazon review corpus collected by Filatova [10], which contains both ironic and nonironic reviews annotated by Mechanical Turk crowdsourcing. They proposed a model that takes into account features such as n-grams, punctuation marks, interjections, emoticons, and the star rating of each review (a particular feature from Amazon reviews, which, according to the authors, seems to help result in good performance in the task. They experimented with a set of classifiers (composed of naïve Bayes, logistic regression, decision tree, random forest, and SVM classifiers), achieving an F-measure rate of 0.74.

Wallace et al. [18] attempted to undertake the study of irony detection using contextual features, specifically by combining noun phrases and sentiment extracted from comments. They proposed exploiting information regarding the conversational threads to which comments belong. Their approach capitalizes on the intuition that members of different user communities are likely to be sarcastic about different things. A dataset of comments posted on Reddit6 was used.7

Karoui et al. [19] recently presented an approach to separate ironic from nonironic tweets written in French. They proposed a two-stage model. In the first part they addressed the irony detection as a binary classification problem. Then the misclassified instances are processed by an algorithm that tries to correct them by querying Google to check the veracity of tweets with negation. They represented each tweet with a vector composed of six groups of features: surface (such as punctuation marks, emoticons, and uppercase letters), sentiment (positive and negative words), sentiment shifter (positive and negative words in the scope of an intensifier), shifter (presence of an intensifier, a negation word, or reporting speech verbs), opposition (sentiment opposition or contrast between a subjective and an objective proposition), and internal contextual (the presence/absence of personal pronouns, topic keywords, and named entities). The authors experimented with an SVM as a classifier, achieving an F measure of 0.87.

To sum up, several approaches have been proposed to detect irony as a classification task. Many of the features employed have already been used in various tasks related to sentiment analysis such as polarity classification. The ironic intention is captured by the exploitation of mainly surface features such as punctuation marks and emoticons. These kinds of lexical cues have been shown to be useful to distinguish ironic content, especially in tweets. It may confirm in some way the necessity of users to add textual markers to deal with the absence of paralinguistic cues. Besides, many authors point out the importance of capturing the inherent incongruity in ironic utterances. To achieve this goal, the presence of opposite polarities (positive and negative words) and the use of semantically unrelated terms (synonyms and antonyms) have been considered in many approaches. Both kinds of features seem to be relevant to distinguish ironic from nonironic utterances. Decision trees are among the classifiers that produced the best results.

2.2 Sarcasm Detection

To determine whether specific lexical factors (eg, the use of some part of speech or punctuation marks) play a role in sarcasm detection, Kreuz and Caucci [20] asked some college students to read excerpts from paragraphs that originally contained the “said sarcastically” sentence (removed before the task). The participants were able to distinguish sarcastic from nonsarcastic utterances. This work represents a key to consider the influence that lexical factors can have in the analysis of social media content.

One of the first approaches that considered the #sarcasm hashtag as an indicator of sarcastic content was developed by Davidov et al. [21]. They introduced a semisupervised algorithm for sarcasm detection that considers as features frequent words, punctuation marks, and syntactic patterns so as to identify sarcastic utterances. They collected a dataset from both Amazon and Twitter; their results seem to be promising, with F measures close to 0.80.

Gonzalez et al. [22] performed an experiment on two datasets: a set of self-tagged tweets and a manually annotated set. They considered as sarcastic instances a set of self-tagged tweets containing the #sarcasm or #sarcastic hashtag, and as nonsarcastic instances some positive and negative tweets (retrieved with use of different hashtags, such as #happy, #joy, and #lucky and #sadness, #angry, and #frustrated respectively). As features they considered interjections and emoticons as well as some resources such as LIWC8 and WordNet-Affect.9 They attempted to distinguish between sarcastic, positive, and negative tweets. They applied an SVM and logistic regression as classifiers. Their reported results are related to both datasets; the overall accuracy rate was around 0.57. They suggested that their results demonstrate the difficulty of sarcasm detection for both humans and machine learning methods.

According to Riloff et al. [23], a common form of sarcasm in Twitter consists of a positive sentiment contrasting with a negative situation (eg, absolutely adore it when my bus is late #sarcasm). The goal of their research was to recognize sarcasm instances containing this pattern.10 They presented a bootstrapping algorithm that automatically learns phrases corresponding to negative situations. As sarcastic instances for the learning process, tweets that contained a sarcasm hashtag were retrieved. From the bootstrapping process they collected some positive sentiment verb phrases, predicative expressions, and negative situation phrases. They also performed some binary classification experiments using an SVM classifier. They used a set of features that contain not only their list of phrases but also n-grams and three sentiment and subjectivity lexicons (Hu&Liu, AFINN, and MPQA11 ). The best result (F measure 0.51) was achieved by a hybrid approach where a tweet is considered as sarcastic if either it contains a contrast (according to their list of phrases) or it is identified as such by the SVM (with unigram and bigram features).

Wang [24] presented a study to identify similarities and distinctions between irony and sarcasm. The study consisted of a quantitative sentiment analysis and a qualitative content analysis. A set of sarcastic and ironic tweets collected by the self-tagging approach was used. She found that sarcastic tweets were more positive that ironic ones.

Barbieri et al. [25] attempted to study the differences between ironic and sarcastic tweets. They addressed the problem as a binary classification task between tweets tagged with the #irony and #sarcasm hashtags. Their system is similar to the one presented in [15] for irony detection; they included two new features in their model: if a tweet contains a URL and named entities. The model was evaluated with use of a decision tree as a classifier. They obtained an F measure of 0.62; this result emphasizes the difficulty to distinguish between irony and sarcasm. Barbieri et al. mentioned the two most relevant features to distinguish between ironic and sarcastic tweets: the use of adverbs (more intense ones in sarcastic samples) and the sentiment value (sarcastic tweets are denoted by more positive words than ironic tweets).

Fersini et al. [26] addressed sarcasm detection by introducing an ensemble approach (the Bayesian model average). As features they used emoticons, punctuation marks, onomatopoeic expressions, part-of-speech labels, and a bag of words. They collected a set of tweets using the #sarcasm and #sarcastic hashtags, then three annotators were asked to determine the presence of sarcastic content in tweets. They also evaluated the ensemble method over the corpus presented in [14]. Their results, around 0.8 in F-measure terms in both corpora, seem to indicate that this strategy outperforms those that use traditional classifiers.

Rajadesingan et al. [27] developed a framework for sarcasm detection that uses a behavioral modeling approach. It defines some criteria so as to determine whether a tweet is sarcastic or not, by leveraging behavioral traits (using some of the user’s past tweets) and textual-content features (such as punctuation marks, uppercase words, and parts of speech). Rajadesingan et al. collected tweets that contain the #sarcasm and #not hashtags as sarcastic instances; as negative instances the last 80 tweets from each sarcastic sample’s author were retrieved. A binary classification task was performed between the sarcastic and nonsarcastic instances with use of decision tree, logistic regression, and SVM classifiers. Their results seem to be good, reaching rates above 0.70 in terms or accuracy.

A similar approach is that of Bamman and Smith [28], who stated that modeling the relationship between a sarcastic tweet and the author’s past tweets can improve accuracy. They presented some experiments to discern the effect of sarcasm by using features derived not only from the local context of the message itself (words in the tweet and parts of speech, among others). They also used information about the author, the relationship with his or her audience, and the immediate communicative context they both share (such as salient historical terms and topics and profile information). For evaluation purposes, all tweets with #sarcasm or #sarcastic in the GardenHose sample of tweets in the period from August 2013 to July 2014 were used as sarcastic instances, while for the nonsarcastic ones the 3200 most recent tweets from each “sarcastic author” (ie, the user who posted a tweet labeled with #sarcasm or #sarcastic in the subset) were retrieved. As a classifier a binary logistic regression was employed, achieving an accuracy of 0.851.

To sum up, there is a consistent body of work focused on sarcasm detection. It is a controversial issue whether irony and sarcasm are considered to be similar linguistic phenomena. Almost the same features used for irony detection have been employed for sarcasm detection. Among the most widely applied features we mention punctuation marks and part-of-speech labels. As classifiers, logistic regression and SVMs have been the most used ones for sarcasm detection. Recent approaches on sarcasm detection consider information beyond the text itself, exploiting contextual information and information about the user.

3 Figurative Language and Sentiment Analysis

In recent years the interest in understanding the role of irony and sarcasm in sentiment analysis has derived from different evaluation campaigns. Their main objective is not to identify ironic or sarcastic content but to develop systems that will be able to correctly classify the polarity of figurative language social media texts. The presence of figurative language devices such irony and sarcasm usually causes a polarity reversal. Irony and sarcasm detection is a necessary and important part for a sentiment analysis system because the performance of the latter is affected by the performance of the former. Maynard and Greenwood [29] performed an experiment to measure the effect of sarcasm on the polarity of tweets. They proposed a set of rules to improve the accuracy of sentiment analysis when sarcasm is present.

In the following, three different evaluation campaigns are introduced. In Section 3.1 we describe a pilot subtask to identify ironic content. A sentiment classification task in Twitter for both sarcastic and nonsarcastic social media text is presented in Section 3.2. Finally, a recent sentiment analysis task wholly dedicated to figurative language in Twitter is described in Section 3.3.

3.1 Sentiment Polarity Classification at Evalita 2014

In the context of Evalita12 2014, the sentiment polarity classification task [30] was organized. Its main focus was the sentiment classification at the message level of Italian tweets. The task was divided into three independent subtasks: (1) subjectivity classification; (2) polarity classification, and (3) irony detection. Participants were provided with a dataset composed of a collection of 6448 tweets in Italian (70% for training and 30% for the test) derived from two existing corpora: SENTI-TUT [31] and TWITA [32]. Each tweet in the dataset was labeled according to subjectivity (subjective or objective), polarity (positive, negative, neutral, or mixed) and the presence of ironic content. The systems were evaluated by means of the F measure for each subtask. Eleven teams participated in the sentiment polarity classification task (further information about each system can be found in [33]). Table 7.1 summarizes the results obtained13 by the teams that participated in the irony detection task.

Table 7.1

Sentiment Polarity Classification Task Results in F-Measure Terms

TeamTask 1Task 2Task 3
UNIBA29300.710.67
UNITOR0.680.620.57
IRADABE0.670.630.54
SVMSLU0.580.600.53
ITGETARUNS0.520.510.49
Mind0.590.530.47
fbkshelldkm0.550.560.47
UPFtaln0.640.600.46
Baseline0.400.370.44

t0010

All the participants outperformed the established baseline. The performance rates as the F measure for both subjectivity and polarity classification were near 0.70, while on subtask 3 the values were below 0.60. This confirms the difficulty of the ironic content–related subtask. The best ranked team for the first two subtasks (UNIBA2930 [34]) did not participate in the irony detection task (see Table 7.1). No system was developed to address particularly the irony detection subtask.

Most systems used supervised learning, and the SVM algorithm was the most popular. One further challenge for this task was the lack of Italian resources as well as natural language processing tools (such as tokenizers and part-of-speech taggers); however, some systems (eg, UNIBA2930 and IRADABE) translated some of the resources available in English into Italian. For classification purposes a variety of features were used such as a bag of words, punctuation marks, emoticons, and Twitter language markers (such as hashtags and mentions). UNITOR [35], the best ranked system in irony detection, proposed an “ironic vector” that captures the presence of some features such as punctuation marks, emoticons, a bag of words, and a sentiment analyis resource in Italian called Sentix14 to train an SVM classifier. IRADABE [36] exploited two different sets of features: textual (eg, n-grams, emoticons, parts of speech, and uppercase words), and information extracted from the in-house Italian version of English resources such as AFINN, SentiWordNet,15 Hu&Liu, DAL, and temporal compression and counterfactuality terms16 together with an SVM classifier. The SVMSLU [37] system addressed the problem using an SVM for classification of binary vectors of tokens together with punctuation marks, hashtags, and retweet marks. In ITGETARUNS [38] a set of linguistic rules was defined to classify the tweets; the author considered some markers such as intensifiers and diminishers and modal verbs. The Mind system [39] is based on a multilayer Bayesian ensemble learning; the authors addressed the task under a hierarchical framework. If a given sentence is detected as ironic, then its positive or negative polarity is reversed. On the other hand, if the sentence is ironic but its polarity has been classified as mixed, then it is switched to negative. The system takes into account only a vector composed of terms for which a Boolean weight was computed; no additional information was added. The description of the fbkshelldkm system is not available on the proceedings of the task.

Finally, the UPFtaln [40] system addressed the task by a decision tree classifier. This approach is similar to the one presented in [15] for irony detection. The main difference is the use of Italian resources: Italian WordNet 1.6,17 Sentix, and the CoLFIS corpus.18

3.2 Sentiment Analysis in Twitter at SemEval 2014 and 2015

In recent years, as part of SemEval,19 a task on sentiment analysis in Twitter has been organized [4143]. The participating systems were required to assign one of the following labels: positive, negative, or objective (neutral). The organizers provided two datasets20 for training and the test, composed of social media texts, mainly from Twitter.

In both 2014 and 2015 the participating systems were evaluated also on a subset of sarcastic tweets. In 2014 a small set of tweets that contained #sarcasm was added to the test set, whereas in 2015 a set of tweets were manually labeled as “sarcastic” by human annotators. In Table 7.2 we show the seven best performing systems among the 44 participating systems.

Table 7.2

Sentiment Analysis Task in F-Measure Terms for Both Regular and Sarcastic Tweets in the 2014 Edition of SemEval

SystemTwitter 2014Sarcasm 2014
TeamX70.9656.50
coooolll70.1446.66
RTRGO69.9547.09
NRC-Canada69.8558.16
TUGAS69.0052.87
CISUC_KIS67.9555.49
SAIL67.7757.26

The results obtained for the best ranked teams in the 2015 edition are shown in Table 7.3. The overall drop in the F measure between regular and sarcastic tweets is slightly less than in 2014. From the tables it can be seen there is an important drop in the performance when the systems were evaluated on the sarcastic tweets. Generally, sentiment analysis systems produce good results for regular content, but when the same systems are evaluated with sarcastic content the overall performance is affected. None of the proposed approaches directly tried to capture the sarcastic intention. All systems addressed the task as a supervised approach, taking into account features widely applied in sentiment analysis tasks such as a bag of words, part-of-speech tags, and punctuation marks.

Table 7.3

Best Results in the Sentiment Analysis Task in F-Measure Terms for Both Regular and Sarcastic Tweets in the 2015 Edition of SemEval

SystemTwitter 2015Sarcasm 2015
Webis64.8453.59
unitn64.5955.01
lsislif64.2746.00
INESC-ID64.1764.91
Splusplus63.7360.99
wxiaoac0.6352.22
IOA62.6265.77

Some of the systems used well-known resources such as, AFINN, Hu&Liu, and SentiWordNet. A more detailed description of the shared task and the participating systems can be found in [41, 42].

3.3 Sentiment Analysis of Figurative Language in Twitter at SemEval 2015

Task 11 at SemEval 201521 was the first sentiment analysis task addressing figurative language devices such as irony, sarcasm, and metaphors. The goal of the task was not to directly detect any of the previously mentioned devices but to perform sentiment analysis in a fine-grained scale ranging from − 5 (very negative) to +5 (very positive). Since irony and sarcasm are typically used to criticize or to mock, and thus skew the perception of sentiment toward the negative, it is not enough for a system to simply determine whether the sentiment of a given tweet is positive or negative [44]. The participants were asked to determine the degree to which a sentiment was communicated rather than to assign a more general score (such as in the previously described tasks).

A corpus composed of three subsets of tweets was supplied to the participants: trial (1025), training (8000), and test (4000). The corpus construction involved crowdsourcing and some tweets explicitly tagged with hashtags such as #irony, #sarcasm, #not, and #yeahright or that contained words commonly associated with the use of a metaphor (eg, “literally” and “virtually”). Further information can be found in [44].

Fifteen teams participated in the task on sentiment analysis of figurative language.22Table 7.4 shows the results of the seven best ranked systems according to the overall cosine similarity measure.

Table 7.4

Best Results in the Task on Sentiment Analysis of Figurative Language in Twitter (Cosine Similarity Measure)

TeamAllIronySarcasm
ClaC0.7580.9040.892
UPF-taln0.7110.8730.903
LLT_PolyU0.6870.9180.896
EliRF0.6580.9050.904
LT30.6580.8970.891
ValenTo0.6340.9010.895
HLT0.6300.9070.887

t0025

The best ranked system, ClaC [45], showed robustness across different sentiment analysis related tasks [46].23 ClaC is based on a pipeline framework that groups different phases, from preprocessing to polarity induction. It exploits some resources such as NRC-lexicon,24 Hu&Liu, and MPQA. In addition, the authors developed a new resource called Gezi (for more details, see [45, 46]). The main difference between their proposal for both tasks was the machine learning algorithm used for polarity assignment, an SVM for the regular one and M5P (a decision tree regressor) for figurative language tweets. Nevertheless, it did not achieve the best performance either for ironic or for sarcastic tweets in the figurative language task. The UPF-taln [47] system presented an extended approach that considered frequent, rare, positive, and negative words and also exploited a bag of words as features. To assign the polarity degree, the authors used a regression algorithm (random subspace with M5P). Their system achieved second place in the overall ranking.

Two similar and efficient approaches were the ones proposed by LLT_PolyU [48] and EliRF [49]. They scored the best results in irony and sarcasm detection respectively. LLT_PolyU and EliRF considered as features n-grams, negation scope windows, and sentiment resources (LLT_PolyU exploited Hu&Liu, MPQA, AFINN, and SentiWordNet and EliRF used Pattern,25 AFINN, Hu&Liu, NRC-lexicon, and SentiWordNet). In both systems, regression models (RepTree in LLT_PolyU and a regression SVM in EliRF) were used to calculate the polarity value.

The LT3 [50] and ValenTo [51] systems included in their set of features the presence of punctuation marks, emoticons, and hasthags. To capture potential clues of figurative content in tweets, LT3 took advantage of features to detect changes in the narrative as well as contrasting, contradictory, and polysemic words. In the LT3 system an SVM classifier was used to determine the polarity value of tweets. Furthermore, the ValenTo system exploits sentiment analysis resources (such as AFINN, Hu&Liu, General Inquirer,26 and SentiWordNet) as well as some containing emotional and psycholinguistic information (ANEW,27 DAL, SenticNet,28 and LIWC. Besides, a feature to reverse the polarity valence of a tweet when it contains a sarcastic intention was considered. In ValenTo a linear regression model was used to assign the polarity value. Finally, the HLT system [44] used an SVM approach together with lexical features such as negation and intensifiers and some markers of amusement and irony.

4 Future Trends and Directions

Irony detection and sarcasm detection have been addressed as a text classification task. Salient features such as lexical marks are mainly used to characterize ironic and sarcastic utterances. As figurative language devices, irony and sarcasm need to be studied beyond the scope of the textual content of the utterance. In this regard both the context in which utterances are expressed and common knowledge should be considered to identify the real intention behind an ironic or sarcastic expression. There are some attempts to take advantage of this kind of information. Wallace et al. [18] exploited contextual information of the forum where a comment was posted. Information about users who wrote sarcastic tweets (such as their past tweets) was considered by Rajadesingan et al. [27] and Bamman and Smith [28] so as to distinguish between sarcastic and nonsarcastic tweets.

Besides, it is necessary to consider how affective and emotional content is implicitly embedded in irony and sarcasm. Some work in the literature has already started to exploit affective information by using sentiment and affective lexica such as DAL [14, 16], AFINN and Hu&Liu [16], and SentiWordNet [15, 25].

With regard to the impact of sentiment analysis on irony and sarcasm detection, before the polarity of an utterance is determined it would be helpful to identify if the utterance expresses either ironic or sarcastic intention. Further investigations are needed to develop approaches that could efficiently identify ironic and sarcastic content to avoid misclassification of the polarity score of a subjective text.

5 Conclusions

People communicate their ideas in complex ways. Figurative language devices such as irony and sarcasm are often used to express evaluative judgments in an unconventional way. Irony and sarcasm are concepts that are difficult to define; however, they are often used in social media. In this sense user-generated content represents a big challenge. The progress achieved so far in irony and sarcasm detection has been a result of the exploitation of mainly the syntactic, lexical, and semantic levels of natural language processing. Similar approaches have been proposed to address the task as a binary classification. Currently, the biggest effort concerns identification of the most salient features that allow one to determine when the intended content of an utterance is ironic or sarcastic.

From the sentiment analysis perspective, the presence of irony and sarcasm affects the performance of the task. As we pointed out, state-of-the-art systems generally have good results when they are dealing with regular content, but when they are evaluated with ironic or sarcastic content, their overall performance is affected. Therefore robust sentiment analysis systems will need to understand when human communications in social media make use of figurative language devices such as irony and sarcasm.

References

[1] Grice H.P. Logic and conversation. In: Cole P., Morgan J.L., eds. Syntax and Semantics: Vol. 3: Speech Acts. San Diego, CA: Academic Press; 1975:41–58.

[2] Wilson D., Sperber D. On verbal irony. Lingua. 1992;87(1–2):53–76.

[3] Attardo S. Irony as relevant inappropriateness. J. Pragmat. 2000;32(6):793–826.

[4] A. Utsumi, Verbal irony as implicit display of ironic environment: distinguishing ironic utterances from nonirony, J. Pragmat. 32 (12) 1777–1806.

[5] Kumon-Nakamura S., Glucksberg S. How about another piece of pie: the allusional pretense theory of discourse irony. J. Exp. Psychol. Gen. 1995;124(1):3.

[6] Gibbs R.W. Irony in talk among friends. Metaphor Symbol. 2000;15(1–2):5–27.

[7] Whalen J.M., Pexman P.M., Gill A.J., Nowson S. Verbal irony use in personal blogs. Behav. Inform. Technol. 2013;32(6):560–569.

[8] Alba-Juez L., Attardo S. The evaluative palette of verbal irony. Eval. Context Pragmat. Beyond New Ser. 2014;242:93–116.

[9] Giora R., Attardo S. Encyclopedia of Humor Studies. Irony Thousand Oaks, CA: Sage; 2014 pp. 397–401.

[10] Filatova E. Irony and sarcasm: corpus generation and analysis using crowdsourcing. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation. 2012:392–398.

[11] Carvalho P., Sarmento L., Silva M.J., de Oliveira E. Clues for detecting irony in user-generated contents: Oh.!! it’s “so easy” ;-). In: Proceedings of the First International Conference on Information Knowledge Management Workshop on Topic-Sentiment Analysis for Mass Opinion. 2009:53–56.

[12] Veale T., Hao Y. Detecting ironic intent in creative comparisons. In: Proceedings of the 19th European Conference on Artificial Intelligence. IOS Press; 2010:765–770.

[13] Reyes A., Rosso P., Buscaldi D. From humor recognition to irony detection: the figurative language of social media. Data Knowl. Eng. 2012;74:1–12 Applications of Natural Language to Information Systems.

[14] Reyes A., Rosso P., Veale T. A multidimensional approach for detecting irony in Twitter. Lang. Resour. Eval. 2013;47(1):239–268.

[15] Barbieri F., Saggion H. Modelling irony in Twitter. In: Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics. 2014:56–64.

[16] Hernández Farías I., Benedí J.-M., Rosso P. Applying basic features from sentiment analysis for automatic irony detection. In: Paredes R., Cardoso J.S., Pardo X.M., eds. Pattern Recognition and Image Analysis. Springer International Publishing; 337–344. LNCS. 2015;9117.

[17] Buschmeier K., Cimiano P., Klinger R. An impact analysis of features in a classification approach to irony detection in product reviews. In: Proceedings of the Fifth Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 2014:42–49.

[18] Wallace B.C., Choe D.K., Charniak E. Sparse, contextually informed models for irony detection: exploiting user communities, entities and sentiment. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Vol. 1: Long Papers). 2015:1035–1044 Beijing, China.

[19] Karoui J., Benamara F., Moriceau V., Aussenac-Gilles N., Hadrich-Belguith L. Towards a contextual pragmatic model to detect irony in tweets. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015:644–650.

[20] Kreuz R.J., Caucci G.M. Lexical influences on the perception of sarcasm. In: Proceedings of the Workshop on Computational Approaches to Figurative Language. 2007:1–4.

[21] Davidov D., Tsur O., Rappoport A. Semi-supervised recognition of sarcastic sentences in Twitter and Amazon. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning. 2010:107–116.

[22] González-Ibá nez R., Muresan S., Wacholder N. Identifying sarcasm in Twitter: a closer look. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011:581–586.

[23] Riloff E., Qadir A., Surve P., Silva L.D., Gilbert N., Huang R. Sarcasm as contrast between a positive sentiment and negative situation. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013:704–714.

[24] Wang A.P. #irony or #sarcasm—a quantitative and qualitative study based on twitter. In: Proceedings of the 27th Pacific Asia Conference on Language, Information, and Computation. National Chengchi University; 2013:349–356.

[25] Barbieri F., Saggion H., Ronzano F. Modelling sarcasm in Twitter, a novel approach. In: Proceedings of the Fifth Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 2014:50–58.

[26] Fersini E., Pozzi F.A., Messina E. Detecting irony and sarcasm in microblogs: the role of expressive signals and ensemble classifiers. In: Proceedings of the IEEE International Conference on Data Science and Advanced Analytics. 2015.

[27] Rajadesingan A., Zafarani R., Liu H. Sarcasm detection on Twitter: a behavioral modeling approach. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. 2015:97–106.

[28] Bamman D., Smith N.A. Contextualized sarcasm detection on Twitter. In: Proceedings of the Ninth International Conference on Web and Social Media. 2015:574–577.

[29] Maynard D., Greenwood M. Who cares about sarcastic tweets? Investigating the impact of sarcasm on sentiment analysis. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation. 2014:4238–4243.

[30] V. Basile, A. Bolioli, M. Nissim, V. Patti, P. Rosso, Overview of the Evalita 2014 SENTIment POLarity classification task, in: Proceedings of the First Italian Conference on Computational Linguistics and the Fourth International Workshop EVALITA 2014, pp. 50–57.

[31] Bosco C., Patti V., Bolioli A. Developing corpora for sentiment analysis: the case of irony and Senti-TUT. IEEE Intell. Syst. 2013;28(2):55–63.

[32] Basile V., Nissim M. Sentiment analysis on Italian tweets. In: Proceedings of the Fourth Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 2013:100–107.

[33] Basili R., Lenci A., Magnini B. Proceedings of the First Italian Conference on Computational Linguistics and the Fourth International Workshop EVALITA 2014. Pisa, Italy: Pisa University Press; 2014.

[34] P. Basile, N. Novielli, UNIBA at EVALITA2014-SENTIPOLC Task: predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features, in: Proceedings of the First Italian Conference on Computational Linguistics and the Fourth International Workshop EVALITA 2014, pp. 58–63.

[35] G. Castellucci, D. Croce, D. De Cao, R. Basili, A multiple kernel approach for Twitter sentiment analysis in Italian, in: Proceedings of the First Italian Conference on Computational Linguistics and the Fourth International Workshop EVALITA 2014, pp. 98–103.

[36] I. Hernandez-Farias, D. Buscaldi, B. Priego-Sánchez, IRADABE: adapting English lexicons to the Italian sentiment polarity classification task, in: Proceedings of the First Italian Conference on Computational Linguistics and the Fourth International Workshop EVALITA 2014, pp. 75–81.

[37] A. Anisimovich, Self-evaluating workflow for language-independent sentiment analysis, in: Proceedings of the First Italian Conference on Computational Linguistics and the Fourth International Workshop EVALITA 2014, pp. 108–111.

[38] R. Delmonte, ITGETARUNS a linguistic rule-based system for pragmatic text processing, in: Proceedings of the First Italian Conference on Computational Linguistics and the Fourth International Workshop EVALITA 2014, pp. 64–69.

[39] E. Fersini, E. Messina, F.A. Pozzi, Subjectivity, polarity and irony detection: a multi-layer approach, in: Proceedings of the First Italian Conference on Computational Linguistics and the Fourth International Workshop EVALITA 2014, pp. 70–74.

[40] F. Barbieri, R. Francesco, S. Horacio, Relying on intrinsic word features to characterize subjectivity, polarity and irony of tweets, in: Proceedings of the First Italian Conference on Computational Linguistics and the Fourth International Workshop EVALITA 2014, pp. 104–107.

[41] Rosenthal S., Ritter A., Nakov P., Stoyanov V. SemEval-2014 Task 9: sentiment analysis in Twitter. In: Proceedings of the Eighth International Workshop on Semantic Evaluation (SemEval 2014). Dublin, Ireland: ACL and Dublin City University; 2014:73–80.

[42] Rosenthal S., Nakov P., Kiritchenko S., Mohammad S., Ritter A., Stoyanov V. SemEval-2015 Task 10: sentiment analysis in Twitter. In: Proceedings of the Ninth International Workshop on Semantic Evaluation. 2015:451–463.

[43] Nakov P., Zesch T., Cer D., Jurgens D. Proceedings of the 9th International Workshop on Semantic Evaluation. Denver, CO: Association for Computational Linguistics; 2015.

[44] Ghosh A., Li G., Veale T., Rosso P., Shutova E., Barnden J., Reyes A. SemEval-2015 Task 11: sentiment analysis of figurative language in Twitter. In: Proceedings of the Ninth International Workshop on Semantic Evaluation. 2015:470–478.

[45] Özdemir C., Bergler S. CLaC-SentiPipe: SemEval2015 Subtasks 10 B,E, and Task 11. In: Proceedings of the Ninth International Workshop on Semantic Evaluation. 2015:479–485.

[46] Özdemir C., Bergler S. A comparative study of different sentiment lexica for sentiment analysis of tweets. In: Proceedings of the International Conference Recent Advances in Natural Language Processing. INCOMA Ltd. Shoumen; 2015.

[47] Barbieri F., Ronzano F., Saggion H. UPF-taln: SemEval 2015 tasks 10 and 11. Sentiment analysis of literal and figurative language in Twitter. In: Proceedings of the Ninth International Workshop on Semantic Evaluation. 2015:704–708.

[48] Xu H., Santus E., Laszlo A., Huang C.-R. LLT-PolyU: identifying sentiment intensity in ironic tweets. In: Proceedings of the Ninth International Workshop on Semantic Evaluation. 2015:673–678.

[49] Giménez M., Pla F., Hurtado L.-F. ELiRF: A SVM approach for SA tasks in Twitter at SemEval-2015. In: Proceedings of the Ninth International Workshop on Semantic Evaluation. 2015:574–581.

[50] Van Hee C., Lefever E., Hoste V. LT3: sentiment analysis of figurative tweets: piece of cake #notreally. In: Proceedings of the Ninth International Workshop on Semantic Evaluation. 2015:684–688.

[51] Hernández Farías D.I., Sulis E., Patti V., Ruffo G., Bosco C. ValenTo: sentiment analysis of figurative language tweets with irony and sarcasm. In: Proceedings of the Ninth International Workshop on Semantic Evaluation. 2015:694–698.


1 In this ironic utterance two examples of misspelling in social media texts can be noted. The author writes “recomend” instead of “recommend” and “who” rather than “whom”.

2 http://www.saifmohammad.com/Release/MSOL-June15-09.txt

3 http://www.cs.columbia.edu/~julia/papers/dict_of_affect/

4 The resource is freely available: https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html#lexicon.

5 The resource is freely available: http://github.com/abromberg/sentiment_analysis/blob/master/AFINN/AFINN-111.txt.

6 http://www.reddit.com

7 Particularly comments posted on two pairs of polarized user communities (or subreddits) were selected: progressive and conservative subreddits (related to the US political spectrum) and atheism and Christianity subreddits.

8 http://www.liwc.net

9 http://wndomains.fbk.eu/wnaffect.html

10 To identify “stereotypically” perceived negative situations is per se a big challenge.

11 http://mpqa.cs.pitt.edu/lexicons/subj_lexicon/

12 Evalita is an initiative devoted to the evaluation of natural language processing and speech tools for Italian: http://www.evalita.it/.

13 For each task, two runs could be submitted: constrained (with use of only the training data provided) and unconstrained (with use of additional data for training). Table 7.1 presents the results for the constrained run; only three teams (UNIBA2930, UNITOR, and IRADABE) participated in both the constrained run and the unconstrained run on the three subtasks.

14 http://wikis.fu-berlin.de/pages/viewpage.action?pageId=671548598

15 http://sentiwordnet.isti.cnr.it/

16 The last three resources were previously used for irony detection in English by Reyes et al. [14].

17 http://multiwordnet.fbk.eu/english/home.php

18 http://linguistica.sns.it/CoLFIS/Home.htm

19 SemEval is an ongoing series of evaluations of computational semantic analysis systems.

20 More details about it can be found in [41, 42].

21 http://alt.qcri.org/semeval2015/

22 Some systems such as Clac, UPF-taln, and EliRF participated also in the related task on sentiment analysis in Twitter at SemEval 2015.

23 ClaC had the ninth best performance for both regular and sarcastic tweets in the task on sentiment analysis in Twitter [42].

24 http://www.saifmohammad.com/WebPages/ResearchInterests.html

25 http://www.clips.ua.ac.be/pattern

26 http://www.wjh.harvard.edu/~inquirer/

27 http://csea.phhp.ufl.edu/media/anewmessage.html

28 http://sentic.net/

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset