Chapter 16

Conclusion and Future Directions

F.A. Pozzia; E. Fersinib; E. Messinab; B. Liuc    a SAS Institute Srl, Milan, Italy
b University of Milano-Bicocca, Milan, Italy
c University of Illinois at Chicago, Chicago, IL, United States

Abstract

In this chapter we provide some conclusions related to challenges previously detailed, discussing the potential future directions toward the next generation of sentiment analysis systems.

Keywords

Sentiment analysis; Social networks; Future directions; Irony and sarcasm detection; Suggestion mining; Opinion spam detection; Opinion leader detection; Sentiment visualization

After reading this book your feeling is probably that sentiment analysis is highly challenging. And it is. Although the research community has attempted many subproblems and proposed a large number of solutions, none of the subproblems have been completely solved. The past decade has seen significant progress in both research into and applications of sentiment analysis. This is evident from the large number of start-ups and established companies that offer sentiment analysis services. There is a real and huge need in industry for such services because all businesses want to know how consumers perceive their products and services, and those of their competitors (ie, competitive intelligence). In the past, people asked their friends for advice and opinions related to any kind of topic, such as which restaurant is the best one in the city, or who they should vote for in the next elections. Nowadays, consumers want to know the opinions and experiences of other users on the web before purchasing products or services. Governments and private organizations are also showing strong interests in obtaining public opinion about their policies and their public image. These practical needs and the technical challenges will keep the sentiment analysis field vibrant and lively for years to come. Building on what has been done so far, we believe that two research directions are particularly promising.

First, there are many opportunities to design novel machine learning algorithms able to learn from large volumes of textual data and to extract domain-specific knowledge for decision-making purposes. In particular, taking into account that social networks are actually networked environments, the work of incorporating content and relationship information will likely be the core contribution of the next sentiment analysis approaches.

Second, the next generation of sentiment analysis systems should enable us to see the full spectrum of the problem. We believe that a holistic or integrated approach will likely be successful if it will be able to deal with all subproblems at the same time (eg, polarity, irony and sarcasm, opinion spam, and leader) because their interactions can help solve each individual subproblem. We are optimistic that the problem will be solved satisfactorily soon for widespread practical applications. However, we do believe that it is possible to devise effective semiautomated solutions. The key is to fully understand the whole range of issues and pitfalls, cleverly manage them, and determine which portions can be done automatically and which portions need human assistance. In the continuum between the fully manual solution and the fully automated solution, we can push more and more toward automation.

However, much remains to be done also in most of the topics discussed in this book. Linked data technologies are still not widely used by the computational linguistic and natural language processing communities. On the other hand, there is an increasing number of available datasets and an ongoing community effort that supports this approach. Despite these concerns, the increasing popularity of linked data technology and the availability of tools and resources are tremendous incentives for its adoption. Regarding the sentiment analysis community, the World Wide Web Consortium Linked Data Models for Sentiment and Emotion Analysis Community Group provides a suitable forum for fostering the adoption of linked data practices and the generation of interoperable sentiment language resources and services. One of the future directions is the development of business models for sentiment language resources.

Irony and sarcasm detection have been addressed as a text classification task. Salient features such as lexical marks are mainly used to characterize ironic and sarcastic utterances. As figurative language devices, irony and sarcasm need to be studied beyond the scope of the textual content of the utterance. In this regard, both the context in which utterances are expressed and common knowledge could be considered to identify the real intention behind an ironic or sarcastic expression. There are some attempts to take advantage of this kind of information. Besides, it is necessary to consider how affective and emotional content is implicitly embedded in irony and sarcasm. Some work in the literature has already started exploiting affective information by using sentiment and affective lexica. Further investigation is needed to develop approaches that could efficiently identify polarity shift and polarity reversal of ironic and sarcastic content.

Considering that there is a limited amount of research available for the problem of suggestion mining, there are many aspects that remain unexplored. Context-based features have not yet been employed for the classification task in suggestion mining, and the available datasets are still inadequate to train robust classifiers. When bigger datasets are available, deep learning methods could be effectively developed for this task. Finally, from a computational linguistics perspective, the study of suggestions deals with a variety of linguistic phenomena, specially moods and modality. Although there is a considerable amount of linguistic research available on modality, there have been very few investigations on computational approaches for the detection of linguistic moods in text. The focus on suggestion mining would also revive interest in the both semantic and the machine learning approaches toward mood and modality analysis.

Although many algorithms have been proposed to detect fake reviews, there is still a long way to go before we can weed out opinion spamming activities. There are many interesting research directions that have not been or have barely been explored. For example, it would be interesting to compare reviews of the same product across different websites to discover abnormalities; for example, similar reviews (contents or ratings) that are written at the same time, by people with similar user ids, and using the same (similar) IP address. Another important future direction could leverage the language inconsistency. To suit different products and to stress personal experiences, fake reviewers may write something that is inconsistent or against social norms in different reviews. Also, web use abnormality should be studied further. Web servers record almost everything that a person does on a website, which could be valuable for detecting fake reviews.

When addressing opinion leader detection, we should take into account exogenous variables. Even if solutions based on network structure and diffusion models are characterized by a high degree of generality, many recent methods have shifted toward trying to estimate the individual’s intrinsic features, by considering the content produced, their observable relations, and their behavior on the net. A potentially promising development would be to correlate the models with external events that might influence the behavior of the individuals, or to use several sources of data to extract information on the same individual (eg, intersocial networks). Analyzing multimedia content in addition to text could also provide a better assessment of an individual’s characteristics.

A final concern relates to the modalities of sentiment summarization and visualization. When the results of sentiment analysis tasks need to be presented to an end user, a corresponding level of uncertainty should be taken into account (uncertain results shown as certain may lead to incorrect conclusions). To tackle this problem, it is important to convey the degree of uncertainty to the user as auxiliary information. In this way, users can decide how confident they should be in the conclusions they are drawing from the data. While several techniques for visualizing uncertainty have been proposed in the generic information visualization literature, they have recently been applied to opinion visualization. A common approach to encode uncertainty could be the introduction of error bars to represent the confidence interval. Unfortunately, recent research shows that error bars have many drawbacks from the perceptual perspectives. New techniques such as gradient plots (use transparency in bars to encode uncertainty) and violin plots (use width) could be considered as alternatives to error bars. For text visualization, other visual attributes such as color hue, or saturation of the background, can be used for the encoding of uncertainty. Moreover, opinion visualization needs to deal with the challenges related to the massive amount of data. Unfortunately, most of the visualizations discussed would be inadequate to handle very large amounts of raw opinion data. To tackle such situations, data reduction methods such as filtering and sampling, and aggregation could be considered.

All that said, what made sentiment analysis a trending topic is the changing of the Web from read only to read-write. This evolution created enthusiastic users interacting with each other and sharing information through social networks. Despite the recent and significant progress, sentiment analysis is still finding its own voice as a new interdisciplinary field. Computer scientists, linguists, and social scientists could potentially make major contributions to the field and to society. This is the reason why we have asked them to contribute to this book. Future sentiment analysis systems need broader and deeper common and commonsense knowledge bases. More complete knowledge should be combined with reasoning methods inspired by human thought and grounded on sociological and psychology theories. This will lead to a better understanding of natural language opinions and will efficiently bridge the gap between (unstructured) multimodal information and (structured) machine-processable data. Blending scientific theories of emotions and sentiment with the practical engineering goals of analyzing natural language text could lead to more human-inspired approaches to the design of intelligent opinion-mining systems capable of handling semantic knowledge, detecting sarcasm, making analogies, and learning new affective knowledge to finally detect, perceive, and feel emotions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset