Chapter 4

Linked Data Models for Sentiment and Emotion Analysis in Social Networks

C.A. Iglesiasa; J.F. Sánchez-Radaa; G. Vulcub; P. Buitelaarb    a Technical University of Madrid, Madrid, Spain
b National University of Ireland, Galway, Ireland

Abstract

Language resource interoperability is still a major challenge in sentiment analysis. One of the current trends for solving this issue is the adoption of a linked data perspective for semantically modeling, interlinking, and publishing lexical and other linguistic resources. This chapter contributes to the development of the linguistic linked open data through a linked data model for sentiment and emotion analysis in social networks that is based on two vocabularies: Marl and Onyx for sentiment and emotion modeling respectively. These vocabularies are used for (1) affective corpus annotation, (2) affective lexicon annotation, and (3) sentiment and emotion services interoperability. Several aspects of the solution are discussed, such as the transformation of legacy resources, the generation of domain-specific sentiment lexicons, and the benefits of interlinking language resources for sentiment analysis with other resources such as WordNet or DBpedia.

Keywords

Linguistic linked open data; Ontology; Corpus annotation; Marl; Onyx; Lemon; Emotion analysis; Sentiment analysis

Acknowledgments

This work was funded by the European Union Horizon 2020 program under grant agreement no. 644632 MixedEmotions, by Science Foundation Ireland under Grant No. SFI/12/RC/2289 (Insight Centre for Data Analytics) and by the Spanish Ministry of Economy under the Project SEMOLA (Grant TEC2015-68284-R).

1 Introduction

Sentiment analysis is now an established field of research and a growing industry [1]. However, language resources for sentiment analysis are being developed by individual companies or research organizations and are normally not shared, with the exception of a few publicly available resources such as WordNet-Affect [2] and SentiWordNet [3].

Domain-specific resources for multiple languages are potentially valuable but not shared, sometimes because of intellectual property and license considerations, but often because of technical reasons, including interoperability. Several initiatives have addressed interoperability of language resources since the late 1980s, such as Text Encoding Initiative [4], but there is not yet a widely accepted global solution for integrating and combining heterogeneous linguistic resources from different sources [5]. In this respect the data interoperability problem has been addressed by linked data technologies, which have gained wide acceptance. Linked data [6] refers to best practices and technologies for publishing, sharing, and connecting structured data on the web. This approach has been followed by the linking open data (LOD) project, a grassroots community effort supported by the World Wide Web Consortium (W3C) whose aim is to bootstrap the Web of Data by identifying existing datasets available under open licenses, converting them to Resource Description Framework (RDF) format following the linked data principles, and publishing them on the web. The data cloud that originated from this initiative is known as the LOD cloud. Several communities such as the Open Linguistics Working Group [5] proposed the idea of adopting linked data principles for representing, sharing, and publishing open linguistic resources with the aim of developing a subcloud of the LOD cloud of linguistic resources, known as the linguistic linked open data (LLOD) cloud [7].

In addition, the use of linked data for modeling linguistic resources provides a clear path to their semantic annotation and linking with semantic resources of the Web of Data. This is especially important for making sense of social media streams, whose semantic interpretation is particularly challenging because they are strongly interconnected, temporal, noisy, short, and full of slang [8]. Moreover, several authors [9] have shown that the use of semantics in sentiment analysis outperforms semantics-free methods. Thus the availability of semantically annotated linguistic resources is crucial to the development of the field of sentiment analysis.

This chapter presents our contribution to the development of the sentiment LLOD cloud through a linked data model for sentiment and emotion analysis in social networks that is based on two vocabularies: Marl and Onyx for sentiment and emotion modeling respectively as described in Sections 2 and 3. The rest of the chapter illustrates how these vocabularies are used for annotation of sentiment language resources as well as services (Fig. 4.1). Section 4 describes the generation of a sentiment-linked data corpus based on the use of Marl, Onyx, and Natural Language Processing Interchange Format (NIF), which allows the representation of text with unique URLs. Section 5 describes how sentiment lexicons can be expressed with Marl, Onyx, and the Lexicon Model for Ontologies (lemon), which provides lexical information and differentiates between different domains and senses of a word. Section 6 introduces sentiment analysis interoperability through the use of a Representational State Transfer NIF application programming interface in combination with Marl and Onyx. Section 7 illustrates the benefits of this linguistic linked data approach in the generation of a domain-specific sentiment lexicon on a big linked data platform. Finally, we present our conclusions in Section 8.

f04-01-9780128044124
Fig. 4.1 Overview of the vocabularies used for modeling affective language resources and services. NIF, Natural Language Processing Interchange Format; REST, Representational State Transfer.

2 Marl: A Vocabulary for Sentiment Annotation

The Marl ontology [10, 11] is a vocabulary designed for annotation and description of subjective opinions expressed in text. The goals of the Marl ontology are to (1) enable the publishing of raw data on opinions and the sentiments, (2) deliver schema that will allow opinions coming from different systems (polarity, topics, and features) to be compared, and (3) interconnect opinions by linking them to contextual information expressed from other popular ontologies or specialized domain ontologies.

The Marl ontology is aligned with the W3C PROV Ontology (PROV-O) [12] to support provenance modeling. Provenance is information about the entities, activities, and people involved in producing a piece of data or thing, which can be used to form an assessment of its quality, reliability, and trustworthiness. The main concepts of PROV-O are entities, activities, and agents. Entities are physical or digital assets, such as web pages, spell checkers or, in our case, dictionaries or analysis services. Provenance records describe the provenance of entities, and an entity’s provenance can refer to other entities. For example, a dictionary is an entity whose provenance refers to other entities such as lexical entries. Activities are how entities come into existence. For example, starting from a webpage, a sentiment analysis activity creates an opinion entity describing the opinions extracted from that webpage. Finally, agents are responsible for the activities and can be a person, a piece of software, an organization, or other entities. The Marl ontology has been aligned with the PROV-O so that provenance of language resources can be tracked and shared.

The Marl ontology consists of four main classes as shown in Fig. 4.2: SentimentAnalysis, Opinion, AggregatedOpinion, and Polarity. They are related as follows: a SentimentAnalysis instance is an Activity that analyzes a source text to detect the Polarity of the Opinion expressed in that text. The main features of the extracted opinion are the polarity (hasPolarity), the polarity value (polarityValue) or strength, whose range is defined between minimum (minPolarityValue) and maximum (maxPolarityValue) values, and the described entity (describesObject) and feature (describesFeature) of that opinion. When several opinions are equivalent, we can opt to aggregate them into an AggregatedOpinion, which, in addition to the properties we have already mentioned, provides the number of users that the aggregated opinion represents (opinionCount). The polarity of opinions is represented by the Polarity base class. The Marl ontology comes with three instances of Polarity: Positive, Negative, and Neutral. Table 4.1 contains a comprehensive list of the properties associated with each of these classes.

f04-02-9780128044124
Fig. 4.2 Class diagram of the Marl ontology.

Table 4.1

Main Properties in the Marl Ontology

SentimentAnalysis
PropertyDescription
sourceSite or source from which the opinion was extracted
algorithmAlgorithm that was used in the analysis. Useful to group opinions by extraction algorithm and compare them
minPolarityValueLower limit for polarity values in the opinions extracted via this analysis activity
maxPolarityValueUpper limit for polarity values in the opinions extracted via this analysis activity
Opinion
PropertyDescription
describesObjectThe object the opinion refers to
describesObjectPartPart of the object the opinion refers to
describesObjectFeatureAspect of the object or part that the user is giving an opinion of
hasPolarityPolarity of the opinion itself, to be chosen from the available Polarity individuals
polarityValueDegree of the polarity in the range [minPolarityValue, maxPolarityValue]
algorithmConfidenceRating the analysis algorithm has given to this particular result. Can be interpreted as the accuracy or trustworthiness of the information
extractedFromOriginal source text or resource from which the opinion was extracted
opinionTextPart of the source that was used in the sentiment analysis
domainContext domain of the result
AggregatedOpinion
PropertyDescription
opinionCountThe number of individual opinions this AggregatedOpinion represents

t0010

The specification contains the full description of all properties: http://www.gsi.dit.upm.es/ontologies/marl.

3 Onyx: A Vocabulary for Emotion Annotation

The notion of emotion in psychological models is a subject of debate in current research. Scherer [13, 14] classifies the existing approaches for describing emotions into three categories: dimensional models, categorical models, and appraisal theories.

Dimensional models use one or more dimensions for distinct emotional states. The most widely accepted dimensions are arousal (or activation), valence (or pleasure), and dominance (or control). The dimension valence represents the pleasantness-unpleasantness dimension, and allows one to distinguish between positive and negative emotions. The dimension arousal refers to the intensity of the emotional activation, ranging from excited to calm. The dimension dominance represents the controlling and dominant nature of the emotion, ranging from dominant to submissive. Unidimensional models use the arousal or valence dimension. Multidimensional models use more than one dimension to characterize emotional states. There are several popular multidimensional models: two-dimensional models (valence, arousal) by Plutchick and Russell and three-dimensional models, such as the valence, arousal, and dominance model.

Categorical models, also known as discrete emotion models, suggest that there exists an innate set of basic emotions. The best known model is Ekman’s “big six”: anger, disgust, fear, happiness, sadness, and surprise.

Appraisal theories assume that emotions are elicited by a cognitive evaluation (appraisal) of situations and events that cause specific reactions in people, allowing for a large number of highly differentiated emotions. The result of the appraisal will generally have an impact on motivation, which changes the tendency for action.

Onyx [15] is a vocabulary that models emotions and the emotion analysis process itself. It can be used to represent the results of an emotion analysis service or the lexical resources involved (eg, corpora and lexicons). This vocabulary can be used to connect results from different providers and applications, even when different models of emotions are used.

At its core, the ontology has three main classes as shown in Fig. 4.3: EmotionAnalysis, EmotionSet, and Emotion. In a standard emotion analysis these three classes are related as follows: an EmotionAnalysis is run on a source (generally text, eg, a status update), and the result is represented as one or more EmotionSet instances that each contain one or more Emotion instances. The model of emotions in Onyx is very generic to reflect the lack of consensus on modeling and categorizing emotions. An advantage of this approach is that the representation and psychological emotion models are decoupled. Emotions have been described in a number of different ways, by means of categories (eg, “happiness”).

f04-03-9780128044124
Fig. 4.3 Class diagram of the Onyx ontology. IRI, Internationalized Resource Identifier; RDF, Resource Description Framework.

The EmotionAnalysis instance contains information about the source (eg, dataset) from which the information was taken, the algorithm used to process it, and the emotion model followed (eg, Plutchik’s categories). Additionally, it can make use of provenance to specify the agent in charge of the analysis, the resources used (eg, dictionaries), and other useful information.

An EmotionSet contains a group of emotions found in the text or one of its parts. As such, it contains information about the original text (extractedFrom), the exact excerpt that contains the emotion or emotions (emotionText), the person who showed the emotions (sioc:has_creator), the entity that the emotion is related to (describesObject), the concrete part of that object it refers to (describesObjectPart), the feature about that part or object that triggers the emotion (describesFeature), and the domain detected. All these properties are straightforward, but a comment about the domain property is necessary. Different emotions could have different interpretations in different contexts (eg, fear is positive when referred to a thriller but negative when it comes to cars and safety). When several EmotionSet instances are related, an AggregatedEmotionSet can be created that links to all of them. AggregatedEmotionSet is a subclass of EmotionSet that contains additional information about the original EmotionSet instances it aggregates. For instance, we could aggregate all the emotions related to a particular movie, or all the emotions shown by a particular user, and still be able to trace back to the original individual emotions.

Onyx’s Emotion model includes EmotionCategory, which is a specific category of emotion (eg, “sadness,” although more than one could be specified), linked through the hasEmotionCategory property; the emotion intensity via hasEmotionIntensity; action tendencies related to this emotion, or actions that are triggered by the emotion; appraisals and dimensions. Lastly, specific appraisals, dimensions, and action tendencies can be defined by extending Appraisal, Dimension, and ActionTendency, whose value should be a float number.

On top of that generic model we have included two different models: the WordNet-Affect taxonomy, and the Emotion Markup Language (EmotionML) vocabularies for categories, dimensions, and appraisals, which are detailed in Section 3.1.

Although emotional models and categories differ in how they classify or quantify emotions, they describe different aspects of the same complex emotion phenomenon [16]. Hence there are equivalence relationships between different categories or emotions in different models. To state such equivalence between emotion categories in Onyx one can use the properties defined in SKOS [17] such as skos:exactMatch or skos:closeMatch. This approach falls short when one is dealing with dimensional emotional theories or complex category theories.

Within a single model it is also possible that two separate emotions, when found simultaneously, imply a third one. For instance, “thinking of the awful things I’ve done makes me want to cry” might reveal sadness and disgust, which together might be interpreted as remorse. Some representations would refer to remorse as a complex emotion. Onyx purposely does not include the notion of complex emotions. It follows the same approach as EmotionML in this respect, as the Human-Machine Interaction Network on Emotion Annotation and Representation Language [18] included this distinction between simple and complex emotions but it was not included in the EmotionML specification. This simplifies the ontology and avoids discussion about the definition of complex emotions, since there are several possible definitions of a complex emotion, and different levels of emotions (eg, the hourglass of emotions model). One possible way to deal with such a situation is to add an AggregatedEmotion that represents remorse to the EmotionSet, linking it to the primary emotions with the aggregatesEmotion property.

Table 4.2 contains a comprehensive list of the properties associated with each of these classes. To group all the attributes that correspond to a specific emotion model, we created the EmotionModel class. Each EmotionModel will be linked to the different categories it contains (hasEmotionCategory), the Appraisal or Dimension instances it introduces (through hasAppraisal and hasDimension), etc. Having a formal representation of the categories and dimensions proves very useful when one is dealing with heterogeneous datasets in emotion analysis. In addition to being necessary to interpret the results, this information can be used to filter out results and for automation.

Table 4.2

Main Properties in the Onyx Ontology

EmotionAnalysis
PropertyDescription
sourceIdentifies the source of the user-generated content
algorithmEmotion analysis algorithm that was used
usesEmotionModelLink to the emotion model used, which defines the categories, dimensions, appraisals, etc.
EmotionSet
PropertyDescription
domainThe specific domain in which the emotion analysis was carried out
algorithmConfidenceNumeric value that represents the predicted accuracy of the result, as given by the algorithm in use
extractedFromText or resource that was subject to the analysis
hasEmotionAn emotion that is shown by the EmotionSet. An EmotionExpression may contain several emotions
Emotion
PropertyDescription
hasEmotionCategoryThe type of emotion, defined by an instance of the emotion ontology as specified in the corresponding EmotionAnalysis
hasEmotionIntensityDegree of intensity of the emotion
emotionTextFragment of the EmotionSet’s source that contained emotion information

t0015

The specification contains the full description of all properties: http://www.gsi.dit.upm.es/ontologies/onyx.

3.1 Onyx Extensibility: Vocabularies

Annotators can define their own ad hoc models and categories, but the linked data approach dictates that vocabularies and entities should be reused when appropriate. To remedy this, we offer several EmotionModel vocabularies that can be used with Onyx. As of this writing, we have modeled the quite extensive WordNet-Affect taxonomy as an EmotionModel, to be used as the reference for categorical representation. We also ported the main vocabularies defined for EmotionML [19], and created a model based on the hourglass of emotions model [20]. A list of vocabularies with a detailed explanation is publicly available [21].

3.2 Emotion Markup Language

EmotionML does not include any emotion vocabulary in itself. However, the Multimodal Interaction Working Group released a series of vocabularies that cover the most frequently used models of emotions [19]. Users have to define their own vocabularies or reuse one of the existing ones. We have developed a tool that generates an EmotionModel model from a vocabulary definition, including all its dimension, category, appraisal, or action tendency entries. Using this tool, we have processed the vocabularies released by the Multimodal Interaction Working Group. EmotionML has four types of vocabularies, according to the type of characteristic of the emotion phenomenon they represent: emotion categories, emotion dimensions, appraisals, and action tendencies. If an emotion model addresses several of these characteristics, there will be an independent vocabulary for each of them. In Onyx, instead of following this approach, we opted for adding all characteristics in the same model. This results in cleaner uniform resource identifiers (URIs) and helps represent the emotion model as a whole (Listing 1).

f04-05-9780128044124
Listing 1 Excerpt of a number of models from Emotion Markup Language in Onyx

4 Linked Data Corpus Creation for Sentiment Analysis

Some of the most valuable linguistic resources for sentiment analysis are labeled corpora for subjectivity, sentiment, and emotion analysis. Labeled corpora are used in a number of sentiment analysis tasks, such as polarity detection or sentiment lexicon generation. Nevertheless, there is not yet a widely accepted format for annotating sentiment corpora as illustrated in the following samples of a sentiment corpus (Listings 2 and 3) and an emotion corpus (Listing 4). Some additional problems we face are the different range of polarities in different datasets, as well as the use of different emotion models depending on the application field.

f04-06-9780128044124
Listing 2 Example of a labeled review in the electronics domain [22]
f04-07-9780128044124
Listing 3 Example of a labeled review in the restaurants domain [23]
f04-08-9780128044124
Listing 4 Example of emotion-labeled tweets with use of two different annotation schemes [2425]

In this chapter we present how a sentiment corpus and an emotion corpus can be annotated through the use of NIF together with Marl and Onyx respectively. NIF 2.0 [26] is an RDF/OWL-based format that aims to achieve interoperability between natural language processing tools, language resources, and annotations. NIF provides a way to identify strings using URIs and annotate them with an ontology [27]. Strings are identified by URIs following a URI scheme based on RFC5147 [28], which standardizes fragment ids for texts. For a given document we want to annotate with NIF, the URI scheme consists of the URI of the document itself, a separator (#), and the character at the beginning and end of the string we want to annotate. Character indices in NIF are counted offset based, starting at zero before the first character and with the gaps between the characters counted until after the last character of the referenced string. After URIs have been assigned to meaningful strings of the corpus, these URIs can be annotated with the NIF core ontology, which provides classes and properties to describe the relationships between substrings, text, documents, and their URI schemes. NIF can be extended with vocabularies, such as Marl or Onyx, for specific purposes.

4.1 Sentiment Corpus

In this section we illustrate how a sentiment corpus can be annotated with Marl and NIF. We will annotate a review corpus following the sentiment tasks identified in the aspect-based sentiment analysis task of the challenge at SemEval 2004 [23], which was originally annotated as shown in Listing 3. The first step for annotating a corpus is to publish the corpus in a public URI. On the basis of NIF, offset-based URIs are assigned to every meaningful string of the corpus. Let us start by annotating the review “I love their fajitas, their pico de gallo is not bad, and the service is correct” as given in Listing 5. First, we annotate that the review has an opinion (opinion/1) in the offset (0,80), which in this case matches the full review. Then we annotate that this opinion is positive as is the provenance of the sentiment analysis task (analysis/1). Finally, we annotate and link two meaningful terms with entities of the DBpedia knowledge base. Since NIF follows an offset-based URI fragment scheme, single-word terms (eg, fajitas) as well as multiword terms (eg, pico de gallo) can be annotated. Since these two aspect terms have an associated polarity, we annotate that the opinion opinion/1 is an AggregatedOpinion that includes two opinions (opinion/2 and opinion/3) about the two aspect terms previously annotated.

f04-09-9780128044124
Listing 5 Example of aspect term extraction and aspect term polarity detection

Aspect-based sentiment analysis identifies the aspects of a given target entity and the sentiment expressed toward each aspect. Aspect categories (eg, food, price) identify coarser features than aspect terms, and they do not necessarily occur as terms in a given sentence. In our example, the terms fajitas and pico de gallo refer to the same aspect category food, while the term service refers to the aspect category service. In Listing 6 we annotate aspect categories and introduce the use of the property marl:describesFeature for this purpose.

f04-10-9780128044124
Listing 6 Annotation of aspect categories and their polarity

4.2 Emotion Corpus

An emotion corpus can be annotated in the same way as a sentiment corpus but with use of the Onyx vocabulary instead of the Marl vocabulary. There are two main differences for annotating an emotion corpus. First, a sentiment analysis provides a certain polarity for a given string, while an emotion analysis can identify several emotions with different intensities. Second, the same corpus can be annotated on the basis of different emotion models. In Listing 4.7 we illustrate how the previous example can be annotated as a result of an emotion analysis.

f04-11-9780128044124
Listing 7 Example of an annotated emotion corpus

5 Linked Data Lexicon Creation for Sentiment Analysis

Sentiment lexicons are lists of words with an associated a priori polarity or emotion. Some of the most popular sentiment lexicons are the NRC Emotion Lexicon [29], the MPQA Subjectivity lexicon [30], SentiWordNet [31], the Sentiment140 NRC Twitter Sentiment Lexicon [32], and Affective Norms for English Words (ANEW) [33], which are shown in Listing 8.

f04-12-9780128044124
Listing 8 Example of sentiment lexicons [2933]. ANEW, Affective Norms for English Words

As shown, the available lexicons follow different formats, have different properties, and their values are not normalized. In addition to the interoperability problem, a major shortcoming of the available representations is that the polarity is assigned to a term, independently of its context. For example, the adjective cold will have a certain polarity (eg, positive) for cold beer but also for cold pizza. In this section we illustrate how the lemon lexicon model can be used together with Marl and Onyx to publish and interlink sentiment and emotion lexicons. Lemon builds on previous work on standards for the representation of lexical resources—that is, the Lexical Markup Framework [34]—but extends the underlying formal model and provides a native integration of lexicons with domain ontologies. The lemon model is described in detail in the lemon cookbook [35] and supports the linking of a computational lexical resource with the semantic information defined in an ontology. Lemon defines a set of basic aspects of lexical entries, including morphosyntactic variants and normalizations. Lexical entries can be linked to semantic information through lexical sense objects. In addition, lemon has a number of modules that allow further modeling. Currently defined modules are linguistic description, phrase structure, morphology, syntax and mapping, and variation.

5.1 Sentiment Lexicon

Sentiment lexicons can be expressed with lemon, which has been extended with two properties of Marl, marl:polarityValue and marl:hasPolarity. Polarities can be defined within a particular context of the lexical entry. For example, for the Spanish word susto (meaning “fright”), we can assign a positive polarity in horror movies but a negative polarity in children’s movies, as illustrated in Listing 4.9.

f04-13-9780128044124
Listing 9 Example of a sentiment lexicon entry

5.2 Emotion Lexicon

Lexical entries expressed with lemon can also be annotated with Onyx. Suitable emotion models should be selected depending on the application domain. An annotated example of the Affective Norms for English Words lexicon is shown in Listing 4,10, with use of the emotion model emoml:pad.

f04-14-9780128044124
Listing 10 Example of a sentiment lexicon entry

6 Sentiment and Emotion Analysis Services

NIF defines an input and output format for Representational State Transfer web services in the NIF 2.0 public application programming interface specification [36]. In this way, natural language processing tools and services can interoperate seamlessly. This specification defines a set of parameters that should be supported by NIF-compliant services, which have been extended for sentiment and emotion services as shown in Table 4.3.

Table 4.3

Parameters of an Emotion or Sentiment Analysis Service Compliant With Natural Language Processing Interchange Format

ParameterDescription
input(i)Serialized data (ie, the text or other formats, depends on the format)
informat(f)Format in which the input is provided: turtle (default), text, or json-ld
outformat(o)Format in which the output is serialized: turtle (default), text, or json-ld
prefix(f)Prefix used to create and parse URIs
emodel(e)Emotion model in which the output is serialized (eg, WordNet-Affect, PAD)
minpolarity (min)Minimum polarity value of the sentiment analysis
maxpolarity (max)Maximum polarity value of the sentiment analysis
language(l)Language of the sentiment or emotion analysis
domain(d)Domain of the sentiment or emotion analysis

PAD: Pleasure, Arousal, Dominance Emotion Model, ; URI, uniform resource identifier.

The open source project Senpy [37] provides a reference implementation for sentiment and emotion services that is extendable through plugins. A number of wrappers have been developed for popular services such as Sentiment140 [38] test service interoperability. In addition, in the context of the European projects EuroSentiment and MixedEmotions, this application programming interface has been implemented by several vendors.

The example in Listing 11 shows how a sentiment service is invoked and how the output of that service is annotated with use of NIF together with Marl and Onyx.

f04-15-9780128044124
Listing 11 Example of an emotion service output

7 Case Study: Generation of a Domain-Specific Sentiment Lexicon

This section presents a practical case study [39] of the use of the previously introduced technologies for the generation of a domain-specific sentiment lexicon from legacy language resources and enrichment with semantics and additional linguistic information from resources such as DBpedia and BabelNet. The language resources adaptation pipeline consists of four main steps highlighted by dark boxes in Fig. 4.4: (1) the corpus conversion step normalizes the different language resources to a common schema based on Marl and NIF; (2) the semantic analysis step extracts the domain-specific entity classes and named entities and identifies links between these entities and concepts from the LLOD cloud; (3) the sentiment analysis step extracts contextual sentiments and identifies SentiWordNet synsets corresponding to these contextual sentiment words; and (4) the lexicon generator step uses the results of the previous steps, enhances them with multilingual and morphosyntactic information, and converts the results into a lexicon based on the lemon and Marl formats. Different language resources are processed with variations of the given adaptation pipeline. For example, the domain-specific English review corpora are processed with the pipeline described in Fig. 4.4, while the sentiment annotated dictionaries are converted to the lemon/Marl format in the lexicon generator step.

f04-04-9780128044124
Fig. 4.4 Pipeline for domain-specific sentiment lexicon generation. NIF, Natural Language Processing Interchange Format.

Once the sentiment corpora have been converted and the domain-specific sentiment lexicon has been generated, we can use SPARQL to explore it as shown in Listing 12.

f04-16-9780128044124
Listing 12 Example of SPARQL queries for sentiment lexicons

8 Conclusions

This chapter has provided an introduction to the potential and advantages of the use of a linked data approach for sentiment linguistic resources to overcome the interoperability problem and take advantage of the distributed annotation nature of the linked data cloud. The chapter has introduced the main concepts and technologies to annotate sentiment and emotion corpora, lexicons, and services, and how we can link lexical and semantic resources.

The technologies presented in this chapter have been evaluated in a number of research projects. The research projects TrendMiner [40] and FinancialTwitterTracker [41] have used Marl to annotate the opinion detected in financial news and the social network Twitter. Both projects allow users to visualize time-based sentiment and activity on a particular topic of interest and to compare them visually with the time series of a financial instrument. The research project MixedEmotions [42] is applying both Marl and Onyx for annotation of sentiment and emotion in three business cases: (1) annotation of phone calls in a call center for quality assurance; (2) annotation of emotion and sentiment in news and social networks for a brand monitoring system; and (3) annotation of emotion and sentiment social networks for a social TV system.

All that said, much remains to be done. On the one hand, linked data technologies are still not widely used by the computational linguistic and natural language processing communities. On the other hand, there is an increasing number of available datasets in the LLOD, and an ongoing community effort that supports this approach, such as the W3C Community Group on Linked Data Models for Sentiment and Emotion Analysis, the W3C Best Practices on Multilingual Linked Open Data Community Group, the W3C Ontology-Lexica Community Group, the W3C Linked Data for Language Technology Community Group, and the Open Linguistics Working Group of the Open Knowledge Foundation.

Despite these concerns, the increasing popularity of linked data technology and the availability of tools and resources are tremendous incentives for its adoption. Regarding the sentiment analysis community, the W3C Community Group on Linked Data Models for Sentiment and Emotion Analysis provides a suitable forum for fostering the adoption of linked data practices and the generation of interoperable sentiment language resources and services.

This chapter has presented the current directions in the publication of language resources for sentiment analysis as linked data. One of the future directions is the harmonization of metadata of existing language resources, as proposed by the Linghub project [43], which collects and harmonizes metadata from some of the existing language resources and publishes the records as linked data. In addition, one important aspect is the development of business models for sentiment language resources. One of the initial initiatives in this respect is the Eurosentiment project [44], which proposed a multilingual resource pool supported by a subscription business model. Moreover, given the scarcity of sentiment language resources for many languages, it is necessary to investigate automatic machine translation of these resources [45] and avoid the negative effect of low quality in these translations [46]. Finally, since the content shared in social networks is not only textual, multimodal sentiment analysis [47] aims at alleviating the ambiguity of text with the fusion of sentiment analysis of textual, visual, and audio modalities. This requires research on semantic models for annotating sentiment and emotion in multimodal resources [42].

References

[1] Liu B. Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 2012;5(1):1–167.

[2] Strapparava C., Valitutti A. WordNet-Affect: an affective extension of WordNet. In: Proceedings of LREC. 1083–1086. 2004;4.

[3] Esuli A., Sebastiani F. Sentiwordnet: a publicly available lexical resource for opinion mining. In: Proceedings of LREC. 417–422. 2006;6.

[4] Ide N., Veronis J. Text Encoding Initiative: Background and Contexts. New York, NY: Springer Science & Business Media; . 1995;29.

[5] Chiarcos C. Linguistic linked open data (LLOD)-building the cloud. In: Proceedings of the Joint Workshop on NLP&LOD and SWAIE: Semantic Web, Linked Open Data and Information Extraction. 2013:1.

[6] Bizer C., Heath T., Berners-Lee T. Linked data-the story so far. Semant. Serv. Interop. Web Appl. Emerg. Concepts. 2009;205–227.

[7] Chiarcos C., Hellmann S., Nordhoff S. Towards a linguistic linked open data cloud: the open linguistics working group. TAL. 2011;52(3):245–275.

[8] Bontcheva K., Rout D. Making sense of social media streams through semantics: a survey. Semant. Web. 2012;1:1–31.

[9] Saif H., He Y., Alani H. Alleviating data sparsity for twitter sentiment analysis. In: MSM2012 Workshop Proceedings. 2012:2–9.

[10] Westerski A., Iglesias C.A., Tapia F. Linked opinions: describing sentiments on the structured web of data. In: Proceedings of the Fourth International Workshop on Social Data on the Web (SDoW2011). 2011:21–32.

[11] Sánchez-Rada J.F., Westerski A. Marl ontology specification. Grupo de Sistemas Inteligentes, Universidad Politécnica de Madrid; 2013. Available at http://www.gsi.dit.upm.es/ontologies/marl/.

[12] Lebo T., Sahoo S., McGuinness D., Belhajjame K., Cheney J., Corsar D., Garijo D., Soiland-Reyes S., Zednik S., Zhao J. Prov-o: the prov ontology. W3C Recommendation; 2013.

[13] Scherer K.R. Psychological models of emotion. Neuropsychol. Emot. 2000;137(3):137–162.

[14] Scherer K.R. On the rationality of emotions: or, When are emotions rational? Soc. Sci. Inform. 2011;50(3–4):330–350.

[15] Sánchez-Rada J.F., Iglesias C.A. Onyx: a linked data approach to emotion representation. Inform. Process. Manage. 2016;52(1):99–114.

[16] Schröder M. The SEMAINE API: towards a standards-based framework for building emotion-oriented systems. Adv. Hum. Comput. Interact. 2010;2010(319406):21.

[17] Miles A., Bechhofer S. SKOS simple knowledge organization system reference. W3C; 2009 W3C Recommendation.

[18] HUMAINE Association. HUMAINE Emotion Annotation and Representation Language (EARL): Proposal. HUMAINE Association; 2006.

[19] Ashimura K., Baggia P., Burkhardt F., Oltramari A., Peter C., Zovato E. EmotionML vocabularies. W3C; 2012.

[20] Cambria E., Livingstone A., Hussain A. The hourglass of emotions. In: Springer; 144–157. Cognitive Behavioural Systems. 2012;7403.

[21] Sánchez-Rada J.F., Iglesias C.A. Onyx specification. Unversidad Politécnica de Madrid; 2013.

[22] Cruz F.L., Troyano J.A., EnríQuez F., Ortega F.J., Vallejo C.G. ‘Long autonomy or long delay?’ The importance of domain in opinion mining. Expert Syst. Appl. 2013;40(8):3174–3184.

[23] Pontiki M., Papageorgiou H., Galanis D., Androutsopoulos I., Pavlopoulos J., Manandhar S. Semeval-2014 task 4: aspect based sentiment analysis. In: Proceedings of the Eighth International Workshop on Semantic Evaluation (SemEval 2014). 2014:27–35.

[24] Balabantaray R., Mohammad M., Sharma N. Multi-class twitter emotion classification: a new approach. Int. J. Appl. Inform. Syst. 2012;4(1):48–53.

[25] Mohammad S.M. # Emotional tweets. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics (SEM*). Association for Computational Linguistics; 2012:246–255.

[26] S. Hellmann, Integrating natural language processing (NLP) and language resources using linked data, PhD thesis, Universität Leipzig, 2013.

[27] Brümmer M., Ackermann M., Dojchinovski M. Guidelines for linked data corpus creation using NIF. Draft Community Group Report, September 29, 2015. W3C Community Group; 2015.

[28] Wilde E., Duerst M. URI fragment identifiers for the text/plain media type, Internet Engineering Task Force. 2008.

[29] Mohammad S.M., Kiritchenko S., Zhu X. NRC-Canada: building the state-of-the-art in sentiment analysis of tweets. In: Proceedings of the Second Joint Conference on Lexical and Computational Semantics Joint Conference on Lexical and Computational Semantics. 321–327. 2013;2.

[30] Wilson T., Wiebe J., Hoffmann P. Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. Association for Computational Linguistics; 2005:347–354.

[31] Baccianella S., Esuli A., Sebastiani F. SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: LREC. 2200–2204. 2010;10.

[32] Kiritchenko S., Zhu X., Mohammad S.M. Sentiment analysis of short informal texts. J. Artif. Intell. Res. (JAIR). 2014;50:723–762.

[33] Bradley M.M., Lang P.J. Affective norms for English words (ANEW): instruction manual and affective ratings. Technical report C-1, The Center for Research in Psychophysiology, University of Florida; 1999.

[34] Francopoulo G. LMF Lexical Markup Framework. New York: John Wiley & Sons; 2013.

[35] McCrae J., Aguado-de Cea G., Buitelaar P., Cimiano P., Declerck T., Gómez-Pérez A., Gracia J., Hollink L., Montiel-Ponsoda E., Spohr D. Interchanging lexical resources on the Semantic Web. Lang. Resour. Eval. 2012;46(4):701–719.

[36] Hellmann S., Lehmann J., Auer S., Brümmer M. Integrating NLP using linked data. In: New York: Springer; 2013:98–113. The Semantic Web-ISWC 2013..

[37] Coronado M., Sánchez-Rada F., Iglesias C.A. D7.5 evaluation and assessment of language resource pool. Eurosentiment Project; 2014.

[38] Go A., Bhayani R., Huang L. Twitter sentiment classification using distant supervision. In: CS224N Project Report, Stanford 1. 2009:12.

[39] Vulcu G., Buitelaar P., Negi S., Pereira B., Arcan M., Coughlan B., Sánchez-Rada J.F., Iglesias C.A. Generating linked-data based domain-specific sentiment lexicons from legacy language and semantic resources. In: Proceedings of the Fifth International Workshop on Emotion, Social Signals, Sentiment and Linked Open Data, co-located with LREC 2014. Reykjavik, Iceland: LREC2014; 2014:6–9.

[40] Krieger H.-U., Declerck T. TMO—The Federated Ontology of the TrendMiner Project. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014). European Language Resources Association; 2014:4164–4171.

[41] Sánchez-Rada J.F., Torres M., Iglesias C.A., Maestre R., Peinado R. A Linked data approach to sentiment and emotion analysis of twitter in the financial domain. In: Proceedings of the Second International Workshop on Finance and Economics on the Semantic Web. 51–62. 2014;1240.

[42] Sánchez-Rada J.F., Iglesias C.A., Gil R. A linked data model for multimodal sentiment and emotion analysis. In: Proceedings of the Fourth Workshop on Linked Data in Linguistics: Resources and Applications. Beijing, China: Association for Computational Linguistics; 2015:11–19.

[43] McCrae J.P., Cimiano P. Linghub: a linked data based portal supporting the discovery of language resources. In: Filipowska A., Verborgh R., Polleres A., eds. SEMANTiCS (Posters & Demos). 88–91. 2015;1481.

[44] Sánchez-Rada J.F., Vulcu G., Iglesias C.A., Buitelaar P. EUROSENTIMENT: linked data sentiment analysis. In: Proceedings of the 13th International Semantic Web Conference ISWC. 2014:145–148.

[45] Gabriela V., Paul B., Sapna N., Bianca P., Mihael A., Barry C., Fernando S.J., Iglesias C.A. Generating linked-data based domain-specific sentiment lexicons from legacy language and semantic resources. In: Proceedings of the Fifth International Workshop on Emotion, Social Signals, Sentiment and Linked Open Data (ES3LOD), co-located with LREC 2014. 2014:6–9 Reykjavik, Iceland.

[46] Klinger R., Cimiano P. Instance selection improves cross-lingual model training for fine-grained sentiment analysis. In: Proceedings of the Nineteenth Conference on Computational Natural Language Learning. Beijing, China: Association for Computational Linguistics; 2015:153–163.

[47] Morency L.-P., Mihalcea R., Doshi P. Towards multimodal sentiment analysis: harvesting opinions from the web. In: Proceedings of the 13th International Conference on Multimodal Interfaces. New York, NY: ACM; 2011:169–176.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset