F. Pallavicinia; P. Cipressob; F. Mantovania a University of Milano-Bicocca, Milan, Italy
b Applied Technology for Neuro-Psychology Lab, IRCCS Istituto Auxologico Italiano, Milan, Italy
The web is composed of a multitude of sources. The differences between these sources are as vast as night and day. There are some key differences among them, and an accurate understanding of what they are can help to define more efficient ways to analyze the rich information they contain. This chapter introduces the psychological and sociological processes underlying social network interactions, which will be discussed within the framework of relevant theoretical constructs and methods of analysis (with a special focus on social network analysis). This chapter will highlight the differences and specific features that characterize online social network dynamics and then point out how this understanding can be effectively integrated into traditional sentiment analysis methodological approaches to empower their reliability and validity.
Online communication; Computer-mediated communication; Social networks; Social media; Social network Analytics; Opinion mining; Facebook; Twitter
The exponential growth in the use of digital devices, together with ubiquitous online access, provides unprecedented ground for the constant connectivity of people and offers tremendous capabilities for publicly expressing opinions, attitudes, or reactions regarding many aspects of everyday human activities [1]. Social media, such as blogs, forums, and social network platforms (eg, Facebook, LinkedIn, Twitter, Instagram, YouTube) are quickly becoming an integral part of people’s lives, the virtual spaces where daily individuals share opinions and information and maintain and/or expand their relational network. The massive use of online social networks and the abundance of data collected through them has raised exponentially the attention of the scientific and business community toward them [2–4]. Nowadays, the constant refinement of analytical tools is offering a richer array of opportunities to analyze these data for many different purposes [5]. Differences in features and characteristics of online social networks are reflected in the huge amount of different statistics and metrics that it is possible to track and analyze. The most adopted metrics are numeric, relatively easy to obtain, and freely available, such as engagement and influence metrics [6]. However, metrics of this types are often defined as “vanity metrics,” since they do not interpret or contextualize the data collected.1,2 For this reason, other types of methods of analysis has been introduced. Among them, one of the most used is sentiment analysis (SA) [7], which is the analysis of the feelings (ie, opinions, emotions and attitudes) behind the words using natural language processing tools. SA is considered a quality metric, which looks behind numbers to understand how information about emotion and attitudes is conveyed in language [7]. Given the rising interest in the application of SA to data from online social networks, the research in this area has acknowledged the limitations coming from handling the complex characteristics of natural language (and related inferences) without considering the data collected through social networks as “networked data.” Most of the work in SA [8, 9] is based merely on textual information expressed in online posts and comments. Early approaches to overcome this important limitation are emerging in recent literature, trying, for example, to leverage information on friendship relationships between individuals, since connected users may be likelier to hold similar opinions3 [10, 11]. However, these features only approximate the rich relation structure encoded in an online social network. Among possible complementary analytical methods that are starting to be introduced in the analysis of data collected through online social networks, one of the most interesting is social network analysis (SNA), which, through a quantitative-relational approach, makes it possible to consider relational data (ie, existing connections and links between users on social networks). Within this context this chapter will first define online social networks and briefly describe their history, highlighting the differences and specific features that characterize them. Then the psychological and sociological processes underlying online social network interactions will be discussed within the framework of relevant theoretical constructs and methods of analysis (with special focus on SNA). Finally, the chapter will point out how this understanding can be effectively integrated into SA methodological approaches to empower their reliability and validity.
When analyzing the communication that takes place through online social networks, one must consider that communication follows specific rules and expectations of computer-mediated communication environments [12]. Different levels of virtuality and differences in terms of available repertoire of signaling systems significantly affect the ways that interlocutors communicate with each other across different media [13, 14]. In addition, depending on the specific online social networks, there are different possibilities for users to communicate with each other and to express themselves. On each social network platform, people have several possibilities to interact, and there are very different types of data that can be collected through them (eg, texts, videos, photos). There are some key differences among these sources, and an accurate understanding of what they are can help us to define more efficient ways to analyze the rich information they contain.
Digital media are defined as the set of media based on digital technologies that have common characteristics that differentiate them from the media that preceded them (radio, press, television, etc.) [15]. Among digital media, the use of social media (ie, the part of the services and online communication platforms based on the exploitation of social dynamics) has risen especially. The terms online social networks and social media are often confused with each other. However, there are several differences between them.4,5 The main one is that the term social media refers only to the web, while social network does not. The concept of a social network is much older than the advent of the World Wide Web and advanced technological devices; it results from historical studies conducted within sociology, and, in particular, sociometry; that is, the science that studies relationships between people from a quantitative point of view [16, 17]. However, when the term social network is considered referring to the features identified by Boyd and Ellison [13], the history of social networks, as will be discussed in the following paragraph, is much more recent.
An online social network could be considered the prototypical form of social media, and it can be defined as “a platform based on new media” that allows users to manage both their social network (organization, extension, exploration, and comparison) and their social identity (description and definition) [18]. According to Boyd and Ellison [13], there are three constitutive elements of a social network:
• the presence of a virtual space (forum) in which users can make and present their own profile; the profile must be accessible, at least in partial form, to all the users of the network;
• the possibility to create a network with other users with whom they can communicate;
• the possibility to analyze the characteristics of an individual own network, in particular the connections with other users [16, 17].
Social networks sites, in particular, are defined as web services where people can (1) construct a public or semipublic profile within a bounded system, (2) define a list of users with whom they establish a connection, and (3) view their list of connections and those made by the others within the system [13].
Most of the functions of online social networks have been available separately in the different tools that preceded them: the creation of networks was possible even with newsgroups, and the creation and sharing of personal content online was possible through websites and blogs. However, with reference to the presence of all the features identified by Boyd and Ellison [13], the first social network site was SixDegrees.com, designed in 1997 by Andrew Weinreich, born as an online dating website. SixDegrees.com allowed its users to create relationships only with people who were “friends of friends” (ie, who had a certain degree of connection between them). This innovative feature was aimed at preventing the sharing of false information and the presence of users with bad intentions, as happened on similar dating sites. The goal of this strategy was threefold:
• to give the possibility to verify the user’s profile information by asking one’s own friends;
• to obtain indirect information about people through the analysis of their social network;
• to increase the possibility that individuals engage with each other.
SixDegrees.com had more than 1 million users, and was active until 2001. Starting in the first decade of this century, many social networking sites were born, trying to capitalize on the winning ideas of SixDegrees.com. In these years, popular names, including Friendster, MySpace, Facebook, and YouTube, were created [19].
Facebook, which in a few years has become the most famous and most used online social network worldwide, was initially created by Mark Zuckerberg in 2004 as a tool to connect students at Harvard University. The idea was to develop a social network to support a closed community, creating the online version of the university yearbook. Then Facebook was expanded to connect all US universities, and was eventually opened to all, until its spread as a mass phenomenon in 2006, when it had 12 million users. From a strategic point of view, the winning move that allowed the rise of Facebook was primarily to progressively increase the relational and expressive opportunities of the service, through the introduction of applications including the profile page, groups, photos, notes, and events. Another important decision was to make the Facebook system able to fully cover the needs of the user, becoming an aggregator of information and services. This was possible thanks to the development of Facebook Platform, consisting of a set of procedures (application programming interface) used to create a usable app inside Facebook. Another milestone in the history of online social networks is February 14, 2005: Chad Hurley, Steve Chen, and Jawked Karim, three young PayPal employees, registered the domain name YouTube and on April 23 of the same year the first video - Me at the Zoo - was uploaded. Just 10 months after the official launch, this platform had already set a record: in 2006 YouTube had an average of 65,000 video uploads per day and 20 million unique accesses. The success did not go unnoticed, and in October 2009 YouTube was sold to Google for US$1.65 billion. Just 1 year after the founding of YouTube, in 2006 Twitter was created. Its creators Jack Dorsey, Evan Williams, and Biz Stone wanted to create a system able to allow people to communicate via SMS with a small number of friends. So Twitter was developed as a microblogging system, a service for exchange of information, which allows users to send messages (ie, a tweet) of 140 characters. As stated by its creators, Twitter can be defined as “a real-time information network that connects you to the latest information about what you find interesting” [20]. Since the main reason for the publication of a tweet is sharing, Twitter has developed a system to expand the target audience: the hashtag. This feature facilitates the ability to follow topics and threads of interest: if a word is preceded by the # (in English “hash”) symbol, then clicking on it leads to the result.
Beyond the shared features identified by Boyd and Ellison [13], online social networks are very different in their features and in the expressive possibilities they offer to their users. According to this general definition, there are different types of online social networks that need to be distinguished further. There is currently no systematic and exhaustive categorization of the different social networks that is unanimously agreed on. However, it is possible to group them and organize their complexity according to their different features, including the types of user-generated content and the types of relationships that are allowed between users.
After having explored the differences between online social networks on the basis of these characteristics, we will present the most used statistics and metrics so as to provide an overview on methods of analysis and interpretation of data collected through them.
According to the types of user-generated content, online social networks can be divided into several categories [21], including:
• profile-based social networks: focused on the users and on their desire to express themselves and communicate with their contacts (eg, Facebook, MySpace); among social networks of this type, Facebook is the first resource concerning the social sphere of people (friends, family, etc.), where individuals share content especially about their private lives, personal interests, and activities;
• microblogging social networks: focused on the shared message, which has to be short and clear (eg, Twitter). Twitter is the most famous one, and is often described as a site of “amateur journalism” [20] where people share content especially about specific and current events and situations;
• content-based social networks: focused on the content posted by users (eg, YouTube, Flickr, Instagram).
Online social networks represent virtual places where people can meet and communicate with each other. On these platforms people tend to shape their contacts as they do offline [22, 23]. However, the possibilities of interaction between people in online social networks are linked to the specific characteristics of the platform used and, in particular, to the ways through which they allow contact between users. In particular, the type of relationship that people can have on a social network can be divided as follows [18]:
• Two way, or “friendship” (eg, Facebook): this allows users who are friends with each other to their access friends’ profiles, contact them directly through a private chat (ie, Messenger), read new messages on their bulletin board, explore their social network, and know the actions within the social network (ie, membership in groups, places visited, etc.). This mechanism allows the creation of a closed social network: only people accepted as friends can access it, no one is a total stranger, and anyone can be identified as a friend of someone else.
• “Star” (eg, Twitter): this clearly distinguishes between sender and receiver. The message issuer can be general (ie, shared with all the receivers on the social network) or individual (ie, directed to a specific receiver). Through this mechanism a user can be both a sender and a receiver depending on the social network to which the user is connected. The mode of connection in a star relationship is open: most receivers (followers) have no other contact with the sender, apart from that in the social network. The model of communication is “from one to many”: individuals share content that could arouse interest, or, better, could be retweeted. Tweeting is not mandatory, and people can follow others according to their shared interests.
Differences in the features and characteristics of online social networks are reflected by the huge amount of different statistics and metrics that can be tracked and analyzed. Although the following list is not meant to be exhaustive, it provides the most relevant metrics and statistics adopted in research on online social networks, described according to specific data application and objects:
1. Engagement metrics: numerically quantify a phenomenon and the features that led to its spread. They include:
• Amplification metrics: computed counting of the number of shares for Facebook and retweets for Twitter. An analysis over time of metrics of this type allows feedback to assess the content shared by a user within that user’s social network.
• Applause metrics: represent an approval rating from the audience of a particular content; it is expressed on Twitter, Facebook, and YouTube as “like.”
• Conversation rate: number of the conversations per post. On Facebook, YouTube, and LinkedIn they consist of comments, and on Twitter they consist of replies.
2. Influence metrics: analyze quantitatively users who participated in the conversations, and they include content per time (ie, a fundamental measure to define a phenomenon). This is defined as the ability to generate a multitude of content in a limited period of time.
3. Reach: number of unique individuals (account) that have been exposed to the content analyzed and who have had the opportunity to engage with it. If the same content is presented several times (shared by multiple friends or displayed on more social network platforms, the reach will give a value for each unique user. It is considered a measurement of potential impact: a significant increase of the value of impression (see below), with equal reach, means that people involved have had more opportunities to be exposed to the content and therefore to notice exposure.
4. Impression: the number of times that some content has had the opportunity to be seen within the social network platform, without taking into consideration the duplication of users (each user has the possibility to be exposed to the content multiple times on multiple devices and by multiple shares on the platform).
• Total audience: Total of people who have participated in a post regardless of the specific social network platform used.
• Number of unique users: one of the metrics most used to assess the efficacy of action on social networks. It is considered an index of users’ real engagement since it represents the ability of some online content to “activate” the social network. The actions that are examined are the number of tweets and retweets on Twitter, and Facebook the number users who have commented on or “liked” posts on Facebook.
• Number of active/passive users: this is used primarily to understand phenomena related to specific events (eg, political elections, sport events). The number of active users on Twitter is defined as the number of users who tweet, while passive ones are users who limit themselves to mere retweets; on Facebook active users are considered individuals who comment on posts, while passive ones are users who only “like” posts.
From the point of view of psychology, the success of social networks is linked to their ability to meet basic and very different needs of people. As we will discuss in the following paragraphs, in particular, online social networks allow people to receive social support, to engage in and enjoy thinking, and to define identity [14, 24, 25]. On the one hand, people can use social networks to ask for and to offer social support; on the other hand, they can use social networks to describe themselves and to compare themselves with others, defining their social identity.
One of the main needs that social networks are able to satisfy is “the need to belong”: people need to be appreciated and to be socially accepted [26–28]. Social networks offer, virtual places where individuals can present themselves, interact with others, and share and express opinions [29]. The need to belong can be understood on the basis of three basic needs that underlie an individual’s group seeking behavior: (1) inclusion (ie, the need to belong to or include others in a circle of acquaintances), (2) affection (the need to love or be loved by others), and (3) control (the need to exert power over others or give power over the self to others) [30].
From a psychological point of view, another important reason that drives people to use social networks is the “need for cognition”; that is, the “individual’s tendency to engage in and enjoy effortful cognitive endeavors” [31]. The need for cognition is strongly linked to information seeking behavior and varies among individuals. It is related to an individual’s tendency to engage in and enjoy thinking [31], and it has a moderating effect on variables such as attitudes and purchase intentions, and also on the web [32].
By means of web communities, personal blogs, and especially social networks, every individual can easily create a space on the web to present himself or herself [33, 34], which means managing personal information, expressing attitudes and opinions, uploading private content, and taking and sharing photos. Through social network sites, people can shape and change their representation, reconstructing how they see themselves and others. Social networks allow individuals to decide how to present themselves to the people who make up their network (impression management). The main instrument for self-presentation is the personal profile, where individuals can describe themselves according to a number of characteristics (eg, interests, music) and share multimedia content as photos and videos.
In this context, many studies have been conducted to understand the personality traits and the motivational factors associated with self-presentation activities on social networking sites such as Facebook, Twitter, or Instagram [13, 33, 35, 36]. For example, extraversion (such as sociable, energetic, and enthusiastic people) was often found to predict high social network use, a high number of friends, and high engagement in networking activities [37, 38]; differently, neuroticism (such as tense, irritable, and moody people) has been found to be associated with use of social networks for self-disclosure and belongingness-related motivations, since neurotic individuals tend to perceive social networks as safe and controllable places for self-expression [39–41]. Some recent approaches focused on precise features of the phenomenon, such as personality correlates of preferences among different social networks [42], link creation [43], content of profile pictures [44], and status update [45].
The study of social networks, as privileged places of exchange and activators of social processes, is intrinsically linked to sociology, the academic field that studies the nature of human communication, the effects of mass communication, and the connections between the social system and the mass [46, 47]. An interesting key to understanding the phenomena that take place on social networks is represented by the sociological theory of constructuralism [48].
The process by which people interact, exchange information, and consequently learn is the central component of Carley’s theory [49, 50], which describes how shared knowledge, representative of cultural forms, develops between individuals through social interaction. Constructuralism argues that through interaction and individual learning, the social network (who interacts with whom) and the knowledge network (who knows what) coevolve. This approach to the coevolution of knowledge and social relationships has considerable explanatory power over the dynamics of social networks [51, 52], and has proved to be an effective tool for social simulation [53, 54]. Three important concepts to understand interactions between people on social networks are tie strengths, homophily, and source credibility, and will be discussed next.
The properties of the linkage between individuals on social networks are critical to an understanding of the process of social influence through them. Online, as in real life, all the communications between people take place within a social relationship that may be categorized according to the closeness of the relationship between individuals [55, 56]. This concept is well represented by the sociological construct of tie strength; that is, “a multidimensional construct that represents the strength of the dyadic interpersonal relationships in the context of social networks” [55] and includes closeness, intimacy, support, and association [57]. The strength of the tie may range from strong to weak depending on the number and types of resources that are exchanged, the frequency of exchanges, and the intimacy of the exchanges between them [58]. Research suggests that tie strength affects information flows. Individuals in a strong tie relationship tend to interact more frequently and exchange more information than those in a weak tie relationship [59]. In addition, strong ties have greater influence than weaker ties on the behavior of receivers because of the frequency and perceived importance of social contact among strong-tie individuals [60].
Related to, but conceptually distinct from tie strengths, is the construct of homophily [59]. Homophily can be defined as the extent to which pairs of individuals are similar in terms of certain attributes, such as age, gender, education, or lifestyle [61], and it explains group composition in terms of the similarity of members’ characteristics. The main homophily principle is that the similarity of individuals predisposes individuals to a greater level of interpersonal attraction, trust, and understanding than would be expected among dissimilar individuals [62]. Thus individuals tend to affiliate with others who share similar interests or who are in a similar situation [63]. The stronger the social tie connecting two individuals, the more similar they tend to be [64, 65]. Tie strength, therefore, increases with homophily [61].
Source credibility theory identifies source expertise and source bias as elements that affect the credibility of an information source [66, 67]. Source expertise refers to the perceived competence of the source providing the information. A source should be perceived as more credible when it (1) possesses greater expertise and (2) is less prone to bias. Source bias, also conceptualized as source trustworthiness, refers to the possible bias/incentives that may be reflected in the source’s information [68]. Whether or not a message sender is perceived as an “expert” (and thus of high credibility) is determined from an evaluation of the knowledge that person holds [69], as well as if—by virtue of his or her occupation, social training, or experience—that person is in a unique position. Reputation is thus key to allocation of a value to information [70]. In the online environment, such evaluations must be made from the relatively impersonal text-based resource exchange provided by actors in the site network. Knowledge of the individuals’ attributes and background is limited, and evaluation will take place in a reduced-cues or altered-cues environment.
As already pointed out, SA is one of the most used methods adopted to analyze data collected through online social networks [4, 71, 72]. This method, unlike purely numeric metrics, offers the possibility to investigate the opinions and attitudes expressed online by means of natural language processing tools [7]. However, one of the main problems with the interpretation of SA is that it does not allow one to consider data within the online network in which they have been collected. This is an important limitation since online social networks are characterized by definition by a highly relational nature [52].
Not considering the “network context” can lead to important misunderstandings in reading and interpreting collected data. For example, during the 2012 US presidential election, USA Today published in May and November a “sentiment score” of Obama and Romney, the candidates in the US presidential election of that year, based on Twitter data.6 The newspaper reported a major change in the sentiment index in both candidates, commenting on the data in the light of presidential results. However, after a more careful look at the data reported, it appeared that while the May survey had been made on a sample made especially by Obama supporters, the November survey had involved mainly Romney supporters. The difference in the sentiment scores was therefore not due to a change in public opinion toward the candidates but was simply due to differences in the sample selected for the survey. This mistake would not have occurred if the data had been considered within the context in which they had been collected.
Among alternative analytical methods, or rather complementary analytical methods, that are being introduced in the analysis of data obtained through social networks, one of the most interesting is SNA [64, 73, 74]. SNA, through a quantitative-relational approach, makes it possible to consider data as “networked” (ie, considering existing connections and links between users). Models of SA and SNA have been successfully applied in various fields. However, what is still missing and has been little explored is a deep understanding of the ways in which the two can interact so as to increase the validity and the significance of the collected data thought online social networks. We will now briefly introduce the main characteristics of SNA so as to discuss in the final part of this section how it could be effectively integrated in traditional SA approaches.
SNA basically consists of a series of mathematical and computational techniques that, using network and graph theories, can be used to understand the structure and the dynamics of real or artificial networks [64, 73, 74]. Most of the early work was conducted on data collected from individuals in particular social settings to study a specific phenomenon [16, 75]. Nowadays, the huge computational capacity of personal computers and the fact that people increasingly entertain relations on online social networks [23] have made SNA an important tool for psychologists and other social scientists to study interactions between people. SNA adopts a quantitative-relational approach, rather than relying on characteristics and attributes of individuals (eg, number of messages sent and received), and is based on relational data (or links, contacts, or ties) that characterize a group of people or a set of organizations of varying complexity (eg, families, groups of friends, associations). Relationships are represented by interactions of various kinds (friendship, money, flows of information). The potentiality of SNA is essentially twofold: the application of the theory of graphs to data relationships and, consequently, the description of the structure of the interaction though mathematical-algebraic indices [72, 76].
Social networks are generally represented through graphs, which have the advantage of making a clear and immediate picture of the social structure. Graphs are the mathematical structure of a sociogram, visually expressed as a network composed of connected nodes. Therefore graphs are the spatial representation of social relationships among individuals. Graphs are useful because they represent graphically the social relationships and above all provide a formal representation of them (see Fig. 2.1). Moreover, it is possible to calculate an index to describe specific structural dimensions, such as density, inclusion, and cohesion.
The properties of the linkage between individuals on online social networks are critical to an understanding of the process of social influence through them. This concept is well represented by the sociological construct of tie strength [55], as discussed earlier, that represents the strength of the dyadic interpersonal relationships in the context of social networks [55], and that has been found to affect information flows [59, 60]. Interestingly, studies conducted on SNA suggest one path forward: how one person will evaluate another can often be predicted from the network in which they are embedded. In online social network scenarios, specific features are associated with edges between two people, such as comments they made about each other or messages they exchanged. Such behavioral features may contain a strong sentiment signal, which is useful for predicting edge signs and may be used to fit a conventional sentiment model. A purely edge feature–based sentiment model cannot account for the network structure since it reasons about edges as independent of each other.
Starting from these premises, recent studies have tried to jointly consider SA and social network analytics so as to give better predictions than either one can on its own [71, 72,77–82]. West et al. [72], in particular, have developed a graphical model that synthesizes network and linguistic information to make more and better predictions about both. To capture such interactions, they developed a model that provides an example of how joint models of text and network structure can excel where their components parts cannot.
On the basis of another fundamental sociological principle, that of homophily [59], recent work has been trying to include features of user connections to predict attitudes about political and social events using SNA methods and indexes
Thomas et al. [82], in particular, used party affiliation and mentions in speeches to predict voting patterns from the transcripts of US Congress floor debates. They showed that the integration of even very limited information regarding interdocument relationships can significantly increase the accuracy of support/opposition classification. Incorporating agreement information provides additional benefit only when the input documents are relatively difficult to classify individually.
Tan et al. [71] used information about relationships between users of Twitter (eg, follows and mentions) to improve SA to predict attitudes about political and social events. Working within a semisupervised framework, they proposed models that are induced either from the Twitter follower/followee network or from the network in Twitter formed by users referring to each other using @- mentions. Their results revealed that incorporating social network information can indeed lead to statistically significant sentiment-classification improvements over the performance of an approach based on support vector machines having access to only textual features. In more detail, they found that (1) the probability that two users share the same opinion is indeed correlated with whether they are connected in the social network and (2) use of graphical models incorporating social network information can lead to statistically significant improvements in user-level sentiment polarity classification with respect to an approach using only textual information.
Bermingham et al. [77] tried to combine SA and SNA to explore the potential for online violent radicalization. In particular, through a detailed analysis of a real YouTube dataset, they developed a model that synthesizes textual and social network information to jointly predict the polarity (positive or negative) of person-to-person evaluations. More specifically, they incorporated in their model both intranet work measures (ie, centrality and betweenness) and whole-network (density and average communication speed) analytics. Adopting their dictionary-based polarity scoring method to assign positivity and negativity scores to YouTube profiles and comments, they were able to characterize users and groups of users by their sentiment toward a set of concepts that were of particular interest to jihadists.
Pozzi et al. [80] stated that considering friendship connections is a weak assumption for modeling homophily: online, as offline, two friends might not share the same opinion about a given topic. Starting from this criticism, they proposed an alternative method to represent homophily; that is, a user who approves of something (eg, by “likes” or a retweet). A semisupervised framework was used to estimate user polarities about a given topic by combining post content and weighted approval relations on microblogs (Twitter). The study showed that incorporation of approval relations significantly outperformed the text-only based approach, leading to significant improvements over the performance of complex supervised classifiers based only on textual features.
Related ideas were pursued by Ma et al. [79] and Hu et al. [78], who added terms to their models enforcing homophily between friends with regard to their preferences.
Ma et al. [79], proposed two social recommendation methods that use social information to improve the prediction accuracy of traditional recommender systems. More specifically, the social network information was used in the design of two social regularization terms to constrain the matrix factorization objective function. In addition, friends with dissimilar tastes were treated differently in the social regularization terms so as to represent the taste diversity of each user’s friends. The experimental analysis on two large datasets (one dataset contained a social friend network, while the other dataset contained a social trust network) showed that these proposed methods outperform other state-of-the-art algorithms.
Similarly, Hu et al. [78] proposed a mathematical optimization formulation that incorporates the sentiment consistency and sociological theories of emotional contagion for sentiment classification. They used a method called a “sociological approach to handling noisy and short texts” (SANT), which extracted sentiment relations between tweets on the basis of social theories, and modeled the relations using a graph Laplacian matrix. They reported that the proposed method can utilize sentiment relations between messages to facilitate sentiment classification and effectively handle noisy Twitter data. An empirical study of two real-world Twitter datasets showed the superior performance of the adopted framework in handling noisy and short tweets, and SANT achieves consistent performance for different sizes of training data.
Finally, Sperious et al. [81] explored the possibility of exploiting the Twitter follower graph to improve polarity classification, under the assumption that people influence one another or have shared affinities with regard to topics. More specifically, they proposed incorporating labels from a maximum entropy classifier, in combination with the Twitter follower graph. The user’s followers were used as separate features and combined with the content matrix. They constructed a graph that has users, tweets, word unigrams, word bigrams, hashtags, and emoticons as its nodes; users are connected on the basis of the Twitter follower graph to the tweets they created, and tweets are connected to the unigrams, bigrams, hashtags, and emoticons they contain. Sperious et al. compared the label propagation approach with the noisily supervised classifier itself and with a standard lexicon-based method using positive/negative ratios on several datasets of tweets that had been annotated for polarity. They showed that a maximum entropy classifier trained with distant supervision works better than a lexicon-based ratio predictor, improving the accuracy for polarity classification on from 58.1% to 62.9%. By using the predictions of that classifier in combination with a graph that incorporates tweets and lexical features, they obtained even better accuracy of 71.2%.
The increasing use of online social networks and the consequent large amount of available data have rapidly increased attention on the problem of how to analyze data of this type. As pointed out in this chapter, SNA is one of the most interesting methods that are starting to be introduced in the analysis of data collected through online social networks. This method, unlike other methods such as SA, allow one to consider data as “networked,” increasing their explanatory and predictive value. Recent studies on this issue have provided some interesting evidence for the benefits of using SNA combined with other methods in analyzing online social network data [71, 72,77–82]. These can be summarized as follows:
• Use of social networks analytics allows one to consider structure interactions and the roles that individual users play. Considering the specific online social networks context, SNA makes it virtually possible to compare different types of online communities or to monitor over time the structure of interactions and the roles that users have within the social platform.
• SNA permits the reduction of the quantitative data resulting from automatic tracking for assessment of the quality (eg, the frequency with which individuals use social networks and their types of interests). Social network analytics (as centrality and neighborhood) allows one to go beyond mere numerical data and to assess the role and function of the users in the process of collaborative construction of knowledge on online social networks.
• Use of SNA combined with SA support to disambiguate sentences. For example, a statement such as “I really hate beer but I love Heineken” is very difficult to interpret correctly unless the network where it was expressed is considered [72].