Chapter 12

Sentiment Analysis With SpagoBI

I. Iennacoa; L. Pernigottib; S. Scamuzzoc    a Engineering Ingegneria Informatica, Turin, Italy
b Engineering Ingegneria Informatica, Bologna, Italy
c Engineering Ingegneria Informatica, Assago (MI), Italy

Abstract

SpagoBI is an entirely open source business intelligence suite, supported and governed by Engineering Group, which covers all the typical analytical areas required in business intelligence projects, adding innovative tools and solutions. It offers a wide range of analytical tools to perform reporting, multidimensional analysis, key performance indicator management, visualization through charts, dashboards, and cockpits, ad hoc querying and reporting, location intelligence, network analysis, and data mining. The product is constantly evolving in different directions, including the development of vertical modules addressing specific functionalities, such as simulation scenarios, performance management, and social network analysis. Sentiment analysis in the context of the SpagoBI social network analysis engine relies on a learning method allowing one to assign to each message a specific polarity by applying specific algorithms. This chapter illustrates the SpagoBI social network analysis engine, focusing specifically on the sentiment analysis algorithm and providing examples of its use in some use cases.

Keywords

SpagoBI; Reporting; Multidimensional analysis; Key performance indicator management; Dashboards; Cockpits; Social network analysis

1 Introduction to SpagoBI

SpagoBI is an open source business intelligence suite, developed and governed by Engineering Group, covering all areas of business intelligence with a set of analytical capabilities and cross-domain functionalities.

The suite has been developed by the balancing of open source flexibility and industry-grade software quality. It is released under an Open Source Initiative–approved license, namely the Mozilla Public License version 2.0. Everyone can study, use, modify, and distribute modified copies of SpagoBI under the terms of its license. The software can be downloaded from the forge of the OW2 consortium,1 a nonprofit and independent open source community.

Different kinds of documents can be built and different kinds of tasks can be performed with SpagoBI:

 reports

 charts

 dashboards and cockpits

 online analytical processing analysis on cubes

 Ad hoc querying and reporting

 location intelligence

 data mining

 network analysis

 what-if simulations

All SpagoBI documents can be linked to each other, if needed, and can be combined in the context of interactive cockpits. Data can come from different sources, including data warehouses, SQL and NoSQL databases, services, and files. In this way the final user has a coherent and secure environment in which to perform analysis on different sets of data by using the most appropriate tool for his specific requirements.

Different analytical scenarios can be implemented by use of SpagoBI; in this chapter will we focus in particular on how to leverage the SpagoBI tools to support scenarios of sentiment analysis on data collected from social networks. The approach that we describe is generic enough to be applied also to different usage scenarios. It relies on a data collection component, an engine to implement sentiment analysis algorithms by using statistical functions, and several visualization tools to present the results in effective ways.

2 Social Network Analysis With SpagoBI

2.1 Main Purpose

SpagoBI provides a specific tool for social network listening to and monitoring social networks (eg, Twitter). The end user can perform analysis on real-time data, such as streams of tweet data, as well as on historical data defined by a specific time period.

The main purpose of the SpagoBI Social Network Analysis module is to monitor tweets corresponding to particular keywords or accounts and to analyze them to provide the end user with the most meaningful information. This ranges from basic information (ie, number and trend of tweets and retweets, top tweets and top influencers, etc.) to more advanced analysis, such as topic extraction and sentiment classification of tweets’ texts derived from statistical algorithms written in R.

In the next sections we describe SpagoBI Social Network Analysis module’s main features with a particular focus on the sentiment analysis tool implemented.

2.2 Features

The SpagoBI Social Network Analysis module allows the end user to search for tweets without the need to use directly the Twitter Search application programming interface because they are managed directly in the module. The end user needs only to register its own app on Twitter to retrieve the consumerKey, consumerSecret, accessToken and accessTokenSecret to configure the SpagoBI Social Network Analysis module. In this way the end user can search tweets that contain hashtags or keywords specified simply by completing the Twitter Search Form.

In the form it is possible to choose between two search modalities:

 streaming: Tweets that contain the hashtags or keywords specified in the form are downloaded in real time starting at the moment at which the search is activated until the end user decides to stop it. Because of limitations imposed by Twitter, it is possible to maintain only one twitter search active.

 historical: Indexed tweets associated with the keywords specified by the end user are downloaded. The search retrieves tweets starting from the moment at which the search is begun up to 6–9 days before. Moreover, it is possible to specify the temporal range and schedule the search to repeat it.

In the form the end user can also specify resources and accounts to monitor:

 twitter accounts: Through the module it is possible to monitor up to three twitter accounts so that it is possible to see the trend of their number of followers over time.

 bitly shortened links: By specifying up to three bitly shortened links in the search form, the end user can monitor the number of clicks over time.

 SpagoBI documents: The end user can choose up to three SpagoBI documents, such as reports, charts, or online analytical processing cubes (see Section 1 for the full list of SpagoBI documents), to visualize and refresh the data updated up to the date of the last executed search.

The different searches created are shown in tables in the Twitter Search Form and the end user can execute different actions on them, as detailed below:

 continuous scanning table: This table shows the streaming search results:

 Start/stop. According to the search status, the end user can start or stop it. Starting a search will stop other active searches.

 Delete. Remove the search and the associated resources.

 Analyze. If the streaming search is in stop status, the end user can access the tab to visualize the results of his search.

 timely scanning table: This table shows the historical search results:

 Delete: Remove the search and the associated resources.

 Analyze: Once the downloading of the results is terminated, the end user can visualize it by clicking on this button.

 Scheduler: If the search is scheduled, the end user can stop it so that it will not be repeated.

Once the search has been stopped, the end user can visualize the results of his search in the Twitter Results tabs. In these tabs the results are divided into areas of interest and shown through different tools such as graphs, timelines, and maps.

The first tab is the Summary tab, where general information about the downloaded tweets is visualized. In particular, the total number of results, the total number of twitter users, and their diffusion in the social network are highlighted. In particular, the end user can visualize:

 tweets timeline: In this plot the end user can visualize a graphical representation of the trend of the tweets and retweets over time according to their publication date.

 tweets origin: Through pie charts it is possible to access information about the nature of the tweets (ie, if they are tweets, retweets, or replies) and the device used to publish the tweets.

 top and recent tweets: The end user can read the most relevant tweets and the most recent ones.

In the Topic tab, word clouds with hashtags and topics are shown. The topics are retrieved by means of a text mining algorithm.

The Network tab contains information based on the users who published the tweets downloaded by the search. The end user in this tab can visualize:

 top influencers: The users are ordered according to the number of their followers, and a description of their twitter profile is shown.

 mentions: With this word cloud, the mentions are shown and the dimension of the mention is correlated with the number of occurrences in the tweets downloaded.

 users’ interactions graph and map: The interactions between two users such as retweets or replies are shown in a network and in a map.

The end user can also find information about the geographical distribution of the tweets in the Distribution tab. To have temporal information about the number of clicks on the bitly resources and the followers monitored, the end user can select the Impact tab. Finally, the ROI tab shows the SpagoBI document associated with the twitter search updated to the last monitoring.

Together with all the graphs, visualizations, and information given by the tabs described that are made available, the SpagoBI Social Network Analysis module performs a sentiment analysis on the tweets downloaded, retrieving the polarity of the tweets in terms of positive, negative, or neutral tweets. In the Sentiment tab the user can see the percentage and number of positive (green), neutral (yellow), and negative (red) tweets and the distribution of the sentiment across the different topics of the tweets. This distribution is displayed in a radar graph and bar charts.

2.3 Use Case

Among all the possible applications, social network listening can be a powerful customer intelligence tool. It allows you to improve your market positioning and build stronger brand loyalty, but also to check what people are saying about your competitors or stem any negative social buzz.

A typical use case for the SpagoBI Social Network Analysis module is marketing campaign monitoring and analysis. As an example, we consider the launch of a new SpagoBI release. This event is usually associated with a marketing campaign, performed also on the main social networks.

Let us suppose that the launch of the new release is scheduled for November 1 and you want to monitor the impact of your marketing campaign on Twitter. You would like to define a continuous scanning search starting from October 28, a few days before the official launch, just to check the actual impact compared with the previous days.

Let us also suppose that you are using two accounts—@SpagoBI and @spagoworld—to promote your campaign, that you are always using the bitly URL http://bit.ly/1uCUO7X to redirect people to the website with all the event-related information, and that you have launched a new hashtag (#spagobi5) to refer to the new release.

You can then define a new online monitoring search in the SpagoBI Social Network Analysis module using:

 “spagobi” and “spagobi5” (your new hashtag) as keywords

 @SpagoBI and @spagoworld as accounts to monitor

 http://bit.ly/1uCUO7X as a resource to monitor

As explained in the previous section, this will allow you to check the number of followers of the monitored account and the numbers of clicks on the monitored resources. You can also schedule the accounts and resource monitoring to start 2 or 3 days after the monitoring of tweets. This can be useful since, very often, it is important to monitor tweets from the very beginning of the launch of the campaign, but, for instance, the impact on the number of followers of the corresponding accounts becomes visible starting from a few days later.

Once monitoring has been scheduled, the Summary results of your campaign monitoring will look like the ones shown in Fig. 12.1 plus additional information on the tweet sources and nature and two lists of the top and most recent tweets.

f12-01-9780128044124
Fig. 12.1 Summary tab results—upper part.

As shown in Fig. 12.1, in our case there is a high peak immediately after November 1: the number of tweets is almost four times the corresponding amount of the previous days. This could mean that the launch campaign has been very effective and has generated a good response in the social network.

Exploring the other sections, you can check the most discussed topics in the corresponding tab. For instance, you would like to know which of the new features of SpagoBI are the ones most discussed in the network. You can also compare the retrieved topics with the most used hashtags: sometimes hashtags correspond to the main topic of the tweet, other times they are just used to refer to other discussions or popular trends.

In the Network tab, shown in Fig. 12.2, you can investigate how the different accounts tweeting about your campaign interact with each other.

f12-02-9780128044124
Fig. 12.2 Network tab results: top influencers, top accounts, and network interactions (left) and geographical distribution of interactions (right).

For instance, when one is performing a marketing campaign, it is very important to know who the top influencers are. On one hand, you can directly address them with the main information that you want to spread over the network; on the other hand, you need to keep a careful eye on them: if they are not happy with your campaign, they could generate a negative social buzz. The user interaction graph, shown on the lower right in Fig. 12.2, can provide insights into the level of clustering of the network. A network split into different subgraphs (clusters) could suggests a in-depth analysis aimed at unveiling differences between the subgraphs: this knowledge can then be used to plan different customized subcampaigns directed at the different subgroups.

Geolocalized information can be recovered from the user interaction map located in lower part of the Network tab (shown on right in Fig. 12.2) and from the Distribution tab. Here you can check if the area where your customers are located has been effectively reached by your campaign, and you can identify new potential marketing areas among the interested ones.

Finally, the Sentiment tab is where you can monitor the polarity of the tweets related to your campaign, as shown in Fig. 12.3. This is one of the most important pieces of information when one is monitoring a marketing campaign: it allows you to understand the general network’s feeling about the new release and to quickly identify potential negative elements that require intervention. Indeed, in this tab the polarity information is also split according to the main topics: you can therefore discover immediately what people are happy or unhappy about and tune your campaign plan accordingly.

f12-03-9780128044124
Fig. 12.3 Sentiment tab results—upper part.

3 Algorithms Used

The SpagoBI Social Network Analysis module supports sentiment analysis to detect in an automatic way the polarity of the tweet concerning a specific argument. To perform this task, a supervised algorithm to assign labels to the tweets that indicate if the tweet is positive, negative, or neutral has been adopted. The specific method used is the naïve Bayes classification algorithm.

In machine learning, naïve Bayes classification algorithms are a family of probabilistic classifiers based on the Bayes theorem. In the specific case of sentiment analysis the goal is to classify text or documents in the positive, negative, or neutral class. The implementation of the classifier is characterized by three phases:

 training phase: In this phase the system is trained with use of a training set constituted by previously classified texts to obtain the probability that a word is positive, negative, or neutral given the class assigned to the text.

 testing phase: The algorithm is tested to calculate the accuracy of the method.

 classification phase: In this phase the input of the algorithm is a set of not classified texts and it is determined if they are positive, negative, or neutral.

A preprocessing phase precedes the steps described. In this phase, spaces, URLs, tags, and numbers are deleted, all the words are set to lowercase, and the emoticons are replaced by placeholders so they can be used in the classification.

The algorithm used is implemented in the R language. R is a programming language and software environment for statistical computing that is integrated into SpagoBI. The SpagoBI Social Network Analysis module uses R to perform sentiment analysis and topic modeling. Advanced users can also use R inside SpagoBI to perform their own analysis and visualize the results in SpagoBI.

4 Conclusion

In this chapter we illustrated how the SpagoBI open source business intelligence suite supports social network analysis through data mining algorithms and data visualization techniques. The main value associated with the integration of social network analysis in a business intelligence suite relies on its mutual benefit: business intelligence brings to social network analysis several tools and techniques to make the results of the analysis easier to communicate with efficacy and attractiveness to final users, while social network analysis brings extended valuable data to make effective analysis and drive better decisions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset