Chapter 15

Price-Sensitive Ripples and Chain Reactions

Tracking the Impact of Corporate Announcements With Real-Time Multidimensional Opinion Streaming

K. Moilanena,b; S. Pulman*,a,b    a TheySay Ltd., London, United Kingdom
b University of Oxford, Oxford, United Kingdom

Abstract

Publicly quoted companies make official announcements and release potentially price-sensitive information at regular intervals. While the rich financial performance cues present in corporate announcements are of critical intrinsic importance to the market, it is equally important for companies that make announcements to be able to monitor the impact, ripple effects, and chain events triggered by downstream public reaction—subjective discussions, analyses, recommendations, predictions, and general speculation—among analysts, traders, investors, other companies, the press, and the wider public. In this chapter, we describe how deep, multidimensional opinion streaming, powered by a large-scale custom natural language processing pipeline for sentiment, affect, and irrealis analysis, is used to monitor, quantify, and estimate the impact of corporate announcements and other related events in real time across financial feeds, social media firehoses, blogs, forums, and news to provide rich price-sensitive feedback and insights for corporations and the wider market audience alike.

Keywords

Real-time opinion streaming; Corporate announcements; Financial sentiment; Market sentiment; Compositional sentiment analysis; Affect analysis; Speculation analysis; Irrealis mood

Acknowledgments

We thank Simon Guest and David Morgan for their contribution to this chapter.

1 Introduction

Publicly quoted companies have rigorous regulatory requirements and fiduciary responsibilities to make official announcements and to release potentially price-sensitive information at regular intervals (eg, at the end of financial accounting periods) or when some material events or changes in a corporation’s circumstances and operational environments have occurred. Such announcements have to be released in an orderly manner so as not to leak any insider knowledge that may give a person or an organization an advantage in the market.

Formal financial announcements typically contain a full report including quarterly profit and loss figures and balance sheet performance, and often compare current performance with prior performance (eg, year-to-date performance). Corporate announcements are traditionally followed by a public conference for both analysts and the press in which senior management can be questioned further about the latest announcement. In essence, the audience in a public forum are scrutinizing whether the company is performing well or poorly, particularly against prior expectations. The expectations of the audience (and the market) are a function of what was promised by the company in previous financial announcements and any previously announced material contracts (sales or mergers and acquisitions) that have been secured and executed by the company. In particular, analysts, brokers, fund managers, and shareholders will be listening for any signals that can indicate (1) better-than-expected performance (indicating that the company is undervalued), (2) worse-than-expected performance (indicating that the company is overvalued), or (3) obfuscation (indicating potential threats or problems). On the basis of such assessment, investment decisions (to buy, sell, or hold) will be made and reports will be issued to downstream clients concerning their assessment and interpretation of the company. Depending on the level of optimism versus pessimism, a company may start to be seen in positive or negative light. In addition, regular announcements by larger, higher-profile companies may attract further negative or positive press in the financial columns.

Many other emergent material announcements can also have a similar positive versus negative effect, such as major product releases, changes in organization structure (eg, CEOs resigning or being fired unexpectedly, unexpected senior management changes, new board announcements), mergers and acquisitions events and activities, sales (eg, winning or losing major contracts), legal events (eg, litigation threats, regulatory changes), and unforeseen crises and disasters (eg, the Volkswagen emissions scandal, car recalls, deepwater oil spills, civil war).

Regular announcements are still reported primarily through traditional1 news channels, which can be described as typically regulated, non-real-time, structural, formal, and factual, and their primary audience is mostly made up of professionals and organizations in and around the financial services industry (eg, analysts, traders, investors, companies).

Regular announcements are of critical intrinsic importance to the market because announcements often contain a great many price-sensitive financial performance cues—explicit as well as implicit references to the fitness, activities, circumstances, and prospects of a given company. Although such intrinsic upstream performance cues alone constitute a clear market-moving force, an even greater variety of performance cues are present in the extrinsic downstream public reaction to and conversation around corporate announcements that can create potentially very strong ripple effects in the market. In particular, rich streams of less formal and regulated, instantaneous, more rampant, less structural, less factual, more subjective, and more emotional reactions can be mined and gauged across a multitude of social media channels that can reach a much wider audience faster than any traditional public forums and conferences.

This chapter describes our approach to monitoring and quantifying in real time the impact of corporate announcements and other related events using deep, multidimensional opinion streaming that is powered by a large-scale natural language processing (NLP) pipeline and seeded with financial feeds, social media firehoses, blogs, forums, and news. Our main focus is on affective “soft” metrics derived from sentiment, emotion, speculation, intent, risk, and other related signals that augment traditional, more factual “hard” metrics such as raw volume or share price. The main goal of our streaming system is to provide feedback and rich price-sensitive insights for the end users who want to shed light on the following key questions:

1. Who is talking about the announcement and the company (when, where, how, and in what contexts)?

2. How do people feel about the announcement (company)?

3. How does the conversation around the announcement (company) evolve?

4. What drives sudden peaks (or troughs) in public opinion around the announcement (company)?

2 Architecture

2.1 Data Sources and Filters

The real-time opinion streams are derived from two main sources; namely, (1) a full, unthrottled Twitter2 firehose and (2) a pool of mixed streams from general news, blog, post, and forum sites that are polled approximately 3000 times a day. All raw data sources are filtered with multiple keyword and other content filters to make them maximally relevant to a specific corporate announcement for a specific end user. We employ simple Boolean keyword filters that rely on positive and suppressive negative matching as well as the ability to specify additional context anchors.

The keyword-filtered streams vary dramatically in terms of their average keyword hit levels, reflecting corporations, events, and circumstances in the financial world. While announcements about large multinational corporations (eg, IBM, Berkshire Hathaway) are understandably voluminous and often bursty, smaller corporations (eg, Thundermin Resources) tend to be mentioned considerably less frequently, as can be seen in the raw keyword hit estimates in Table 15.1, for example. Such variance poses a challenge for real-time opinion streaming in that no single data filtering precision versus recall trade-off point will be able to cover all corporate announcements across all individual end users. In particular, low-volume corporate announcements are especially troublesome because their recall levels may be too low and because they may also reveal deficiencies in the underlying NLP analyses, both of which can have a devastating impact on the usability of the system.

Table 15.1

Sample Frequency Estimates of Raw Keyword Hit Rates on Google (https://www.google.com), Bing (http://www.bing.com/), and Topsy (Accessed November 29, 2015. The service is no longer available.)

Raw Keyword QueryHitsSearch Engine
Berkshire Hathaway
‘‘Berkshire Hathaway’’14,600,000Google (all time)
5,320,000Bing (all time)
6188Topsy (last 30 days)
with +earnings +results +2015331,000Google (all time)
325,000Bing (all time)
Thundermin Resources
‘‘Thundermin Resources’’24,700Google (all time)
4140Bing (all time)
11Topsy (last 30 days)
with +earnings +results +201515,300Google (all time)
959Bing (all time)

t0010

Typical content aggregation platforms attempt to remove or normalize duplicate documents and messages reposted and syndicated by upstream sources or social media users. Corporate announcements behave differently in this regard because repeated (and therefore amplified) content can constitute a highly relevant price-sensitive signal in itself. For this reason, we do not apply any deduplication filters.

Although Boolean keyword filters can generate sufficiently clean and focused topical data streams for most corporate announcements, they cannot eradicate all forms of irrelevant content—ostensibly accurate keyword matches in contexts which do not contain any useful information for the end user. To make the opinion streams even more relevant and focused, and to aid downstream opinion signal creation, we run as secondary filters three additional soft classifiers, the sensitivity of which the end user can control. These secondary filters, which all use standard supervised classifier architectures with trigram features, target (1) sarcasm and irony, (2) advertisements (cf. spam), and (3) humor.

Sarcasm is of particular importance in that it can skew sentiment classifiers’ predictions. While the noise caused by sarcasm is typically marginal or tolerable in high-volume opinion streams from formal news sources, it can have a detrimental impact on the sentiment predictions in low-volume opinion streams from social media sources. Although current sarcasm and irony detection performance levels are far from perfect [13], standard supervised classifiers can be useful as additional relevance filters, especially at high confidence levels. Advertisements that refer to a company directly or indirectly via references to product or services can in turn skew sentiment predictions toward positive sentiment. Positive spam is a particular problem with high-volume opinion streams tracking extremely popular companies (eg, Apple, Tesla, Visa). Humor generates yet another kind of noise in that references to a given company in humorous contexts may well contain relevant sentiment signals but they are axiomatically less important than those from factual contexts. Considering the limited capabilities of current humor approaches [4, 5], we similarly apply humor filtering only at high confidence levels.

2.2 Core Natural Language Processing and Opinion Metrics

We process each keyword-filtered text with an NLP pipeline with the aim of capturing and profiling as many direct and indirect references to a specific corporation and its announcement as possible. We first tokenize text with a custom tokenizer that supports (1) finite-state patterns for domain-specific multiword expressions to capture complex company names and symbol sequences (eg, “SPDR S&P MidCap 400”) that may not align fully with downstream syntactic parses (which typically reflect formal grammars), (2) generic complex emoji and emoticon sequences, and (3) hashtag-internal tokenization with character-level CKY chart parsing (eg, “#beatsstreet”). Deeper and richer tokenization, an often overlooked task, can boost downstream processing; for example, part-of-speech tagging, dependency parsing, sentiment analysis, and emotion tagging [6, 7].

We use a similar finite state approach for sentence breaking. Our custom sentence breaker exploits rich resources optimized for the financial genre and accounts for peripheral hashtag, social media user ID, and other symbolic fields around core sentence content (eg, “$WSM $AXP $VOYA $NTAP $RRGB $WDC $KMX - all new 52-week lows - yep, it’s all about the oil! $SPY $QQQ $IWM $DIA3), again to reduce noise in downstream processing.

After tokenization and sentence breaking, each sentence is processed by a Hidden Markov model part-of-speech tagger and a (nonrecursive) noun phrase/verb group chunker. These preparsing components have been trained on (among other) text from the British National Corpus,4 with the tagset then being mapped so as to be compatible with that used in the Penn Tree Bank.5 The preparsing components have been further engineered over a number of years so as to perform accurately on the various kinds of phenomena found in naturally occurring text out in the wild beyond limited gold standards used in academic experiments, including structural intricacies of language use in social media and complex constructs in the financial genre.

Part-of-speech-tagged and shallow-chunked sentences are then processed by a deterministic, incremental, dependency parser [8, 9]. Again, intensive feature engineering and generalization means that this parser performs quite accurately even when faced with text that is not very well behaved linguistically. The set of dependency relations used is compatible with those produced by the Stanford Parser [10] (and this is one possible output format) but internally they are further mapped to a richer set of dependencies based on the Cambridge Grammar of the English Language formalism and approach to the description of English [11].

2.3 Opinion Metrics

Using the output of the core NLP components described earlier, we generate, for each document, multidimensional opinion metrics for corporate announcements at multiple structural and syntactic levels that encompass (1) individual and aggregated noun phrases (cf. “entities,” terms, keywords, key phrases), (2) syntactic relations between noun phrases, (3) sentences, and (4) documents.

2.4 Indexing

Having generated the opinion metrics (see Section 3), we then index and time-stamp them for (1) each sentence, (2) each individual entity mention, and (3) the top n most salient sentiment relations between entities. Unlike partial aspect-level sentiment analysis approaches that are grounded on the interdependencies between opinion expressions and opinion targets and holders [12], we index all noun phrases detected in text. Exhaustive coverage is critical in our target domain because corporate announcements and unrestrained conversation around them involve not only explicit, subjective opinions (eg, “Many regard the company’s plan as overly optimistic”) (often with explicit opinion holders or targets) but also implicit sentiment in the form of references to states of affairs and events in the world that have positive or negative connotations (eg, “The company has appointed a new CEO”) that can axiomatically cover an extremely diverse range of topics, among which implicit opinion holders and targets are not uncommon. The ranking function that is used to determine which sentiment relations are indexed takes into account raw frequencies, syntactic salience estimates that favor specific grammatical roles, extreme positive/negative sentiment distributions, and other factors.

2.5 Real-Time Opinion Streaming

The resultant time-stamped data points are then aggregated into various temporal buckets at various resolution levels ranging from seconds to months that eventually render real-time opinion streams, each with multidimensional opinion metrics with which the end user can monitor specific corporate announcements and conversation around them. Typical practical use cases involve (1) querying the streams with a specific time window and resolution; (2) searching the streams with full-text search; (3) filtering the streams to focus on specific opinion metrics (and ranges and combinations thereof); (4) sorting and manipulating query results to detect and visualize extreme values, peaks, troughs, outliers, and interesting (salient or frequent) noun phrases; (5) comparing opinion streams; and (6) piping the streams into external trading or other predictive models. The end user can also set up alerts for specific keyword occurrences and thresholded triggers for the opinion metrics.

3 Multidimensional Opinion Metrics

In the context of corporate announcements, “sentiment” is an imprecise umbrella term for a very large number of both subjective and objective, and factual and nonfactual dimensions and topics that go beyond the typical distinction between positive versus negative evaluative polarity (cf. good vs. bad; favorable vs. unfavorable; desirable vs. undesirable; thumbs up vs. thumbs down; one star vs. five stars) that characterize what generic subjectivity classification, sentiment analysis, and opinion mining approaches [1316] assume and focus on. In practical use cases in and around the finance industry, “sentiment” is taken to refer to any circumstances, states of affairs, dependencies, and events in the world that ultimate lead to positive financial performance (ie making money) versus negative financial performance (ie losing money) in some form [17]. In other words, financial sentiment expressions do not need to express an opinion but can describe or refer to some fact and still have an implied evaluative orientation (unlike, say, pure opinion statements in movie reviews). Financial sentiment therefore covers a vast range of extralinguistic factors such as the general market mood (bullish vs. bearish); recommendations for stocks (buy vs. sell); general stock demand versus supply levels; product releases vs. product recalls; beating versus missing expectations in earnings results or general performance; mergers, acquisitions, and takeovers (which can be advantageous or disadvantageous to a company); securing versus losing contracts; senior management changes and employee layoffs (hiring vs. firing); litigation (guilty vs. not guilty); overall state of the economy (boom vs. recession); governmental regulations and policies (relaxed vs. tightened); environmental issues; catastrophes; overall corporate reputation and trust image; and scandals.

It is evident that generic sentiment analysis is not in itself enough to fully capture distinctions of the above kind. We therefore argue that sentiment analysis of corporate announcements is best viewed and approached as a multidimensional task that goes beyond vanilla opinion-oriented sentiment analysis. Our system accordingly profiles corporate announcements and conversation around them with multidimensional opinion metrics that emerge from (1) fine-grained multilevel sentiment analysis, (2) affect analysis, (3) irrealis analysis, (4) comparative analysis, and (5) topic tagging, which we describe next.

3.1 Fine-Grained Multilevel Sentiment

3.1.1 Compositional sentiment parsing

To capture and profile all topics, issues, and events related to a corporate announcement, exhaustive coverage is highly desirable. To achieve that, we parse each document with an exhaustive compositional multilevel sentiment parser [18] that assigns fine-grained sentiment distributions to all syntactic constituents in a sentence. The sentiment parser consults a sentiment grammar to compose sentiment recursively from word-level leaf nodes all the way to the sentence root, and hence generates, for each sentence, a stack of sentiment (sub)contexts with fine-grained distributions of all positive/neutral/negative sentiment detected in them. Each composition step has access to information about lexical prior polarities, lexical sentiment reversal potential (eg, none, decreasing), lexical sentiment ambiguity potential (eg, two-way or three-way ambiguous sentiment carriers such as crude, aggressive, or old), morphosyntactic features, and compositional tendencies pertaining to the commonest syntactic constructions in English (reflecting the Cambridge Grammar of the English Language grammar [11]).

Although compositional sentiment parsing is relatively robust toward lexical sentiment ambiguity, domain dependency effects, and incomplete or fragmentary syntactic parses, we disambiguate specific frequent sentiment-ambiguous words and constructions with syntactic dependency predicates before sentiment composition. For example, in the genre of corporate announcements, the three-way ambiguous adjective legal can be disambiguated and asserted as negative when it acts as a prehead modifier to a noun such as cost or challenge, while the three-way ambiguous verb beat can be asserted as positive when it hosts as its direct object complement a noun phrase headed by a noun such as analysts or street.

3.1.2 Entity scoring

Once exhaustive compositional sentiment parses have been generated, we score each noun phrase using its weighted (sub)sentential sentiment distribution stack [19], and, for some specific predicators, lexical shallow-semantic entity frames [20, 21].

Example 15.1 illustrates how entities in various contexts are scored by the entity scorer:

(15.1)

a. Apple stock 86.4% negativeresults 56.6% negativequarter 87.9% negativeApple stock is pretty flat in the wake of the results, hovering around the $100 mark in after-hours trading - with so many cuts by analysts after supply chain warnings earlier in the quarter, was the bad news already priced in?”6

b. fulfillment costs 100% negativecompany 54.7% negative 45.3% positivelogistics 66% positive“To Jeremy’s point about fulfillment costs, that’s a big problem for the company as it has worked to improve its fulfillment and logistics in recent years.”7

c. fourth-quarter 2015 adjusted earnings 63% negative 37% positiveeconomic growth 73.1% negative 26.9% positivemining and oil and gas industries 75.5% negative“Caterpillar’s fourth-quarter 2015 adjusted earnings declined 45% to 74 cents per share, reflecting weakening economic growth primarily in developing countries, and ongoing weakness in mining and oil and gas industries.”

3.1.3 Sentiment confidence estimation

Some end users and use cases require extremely high precision levels in the sentiment predictions even if that leads to reduced volume levels in the opinion streams. Increased precision can be achieved by suppression of sentiment predictions that the system regarded as more difficult in some way. To enable the end user to threshold the system’s confidence in this regard, the sentiment parser provides a sentiment confidence estimate for each sentiment prediction it makes. In our case, confidence is not a typical probability measure against some underlying training data but rather is a direct estimation of the structural complexity, sentiment ambiguity, and error potential of a piece of text.

The ambiguity indicators used by the parser detect (1) three-way-ambiguous and two-way-ambiguous sentiment carriers (eg, old, upturn), (2) reversal operators (eg, never, decreasing), (3) unknown out-of-vocabulary words, and (4) complexity and saturation measures targeting various grammatical, morphosyntactic, and lexical dimensions. Example 15.2 illustrates how sentiment confidence estimates can be used to rank sentiment predictions for the end user:

(15.2)

a. conf: 0.84 ∣“Burge In Talks Over New Deal”

b. conf: 0.64 ∣“Reforms for state-owned companies including CITIC Group and Sinopec may fuel a rerating of state-owned entities, HSBC adds but cautions “we think a rally might be underway due to the cheap valuations but this is more likely to be a short-term rebound than a new cyclical bull market.””

c. conf: 0.56 ∣“Mr. Chu confirmed that he has no disagreement with the Board and there are no matters relating to his resignation that need to be brought to the attention of The Stock Exchange of Hong Kong Limited and the shareholders of the Company.”

3.1.4 Relational sentiment analysis

Exhaustive sentiment parses—fine-grained sentiment distributions assigned to all syntactic constituents and all entity noun phases in a sentence—have an added benefit in that they can be used as useful proxies for intrasentential relational entity sentiment analysis. Because they consider entity noun phrase pairs that are bridged syntactically, relational sentiment distributions are subtly different from those assigned to individual entity noun phrases. Sentiment relations allow the end user to keep track of specific aspects and features of a given corporation. Consider the sample sentence in Example 15.3 (nonpronominal entities in head noun and genitive subject-determiner positions underlined):

(15.3)

“The stable outlook reflects our expectation that @COMPANY@’s operating performance will be stable over the next 12–24 months and its financial position will not significantly weaken despite difficulties in the Middle East.”

Exhaustive, pairwise relational sentiment analysis across the entities in the sentence can be used to determine that most aspects of the company that the sentence is about are strongly or mostly positive but also that some negativity is implied. The relation (outlook, expectation) is strongly positive, (outlook, position) is slightly less positive, and (performance, months) is strongly positive; at the same time, a moderate amount of negativity exists in the relation (outlook, Middle East), while (difficulties, Middle East) is strongly negative.

Instead of being presented with only one flat sentential sentiment score or scores for only some entities in the sentence, the end user can exploit deep sentiment relations of the above kind to formulate highly focused queries about a much greater number of aspects, facets, or features.

3.2 Multidimensional Affect

Sentiment analysis can provide the end user with a holistic assessment of positive versus negative aspects of a corporate announcement. It does not, however, provide any signals pertaining to basic (primary) versus complex (secondary) emotions that go beyond positive versus negative evaluative polarity. Of the most commonly agreed on basic emotion categories and dimensions [22, 23], two are particularly important in corporate announcements; namely, calmness versus agitation (cf. activity, potency, arousal) and fear. To enable the end user to detect emotion signals in the data streams, we enrich document-level and sentence-level sentiment predictions with multidimensional emotion scores that are provided by a shallow-compositional emotion tagger. The model provides fine-grained emotion scores along (1) bipolar (happy⇔sad; calm⇔uncalm; like⇔dislike) and (ii) unipolar (angersi1_e; fearsi1_e; shamesi1_e; surprisesi1_e) emotion dimensions [22, 23].

These seven dimensions are scored jointly and individually by a transition graph that is seeded with prior emotion weights assigned to each word. The weights are obtained from an emotion lexicon that specifies probabilistic weights for all likely emotion dimensions (or emotion senses) that a given word can have. The graph traverses through the text and considers all possible scopes and transitions across all emotion dimensions activated by the seeds detected in the text. The transitions are scored and disambiguated with weighted transducers that target specific emotion compositions that are specified in a calculus that has access to syntactic and linear temporal information. All emotion scores are provided as raw, unbounded values and as five normalized bands that mimic Kappa correlation levels—weak (0.0, 0.2), fair (0.2, 0.4), moderate (0.4, 0.6), strong (0.6, 0.8), and extreme (0.8, 1.0)—so that the end user can align the affect scores with other bounded opinion metrics, and Threshold the signals at a less granular level.

Example 15.4 illustrates the kinds of multidimensional affect signals that the end user can detect:

(15.4)

a. calm: 0.428 ∣happy: 0.673 ∣like: 0.54 “As the company’s founder, Dorsey is likely to be afforded a respect and even reverence which an external appointment would need to work for.”8

b. calm: −0.283 ∣fear: 0.98 ∣happy: −0.588 ∣like: −1.062 ∣sure: −0.444 “However, @COMPANY@’s management believe that it is not a sign of the company making any mistakes or losing out to competitors, but rather that of general global slowdown in the industry and natural changes in their business model.”

c. calm: 0.452 ∣fear: 0.512 ∣happy: −1.037 ∣like: −0.39 ∣sure: −0.451 ∣surprise: 0.166 “Given the unexpected fall in @COMPANY@’s EBITDA and revenue in 2014 as compared to 2013, as well as the continuing bleak outlook for steel prices and seaborne iron ore price, there is worse to come in 2015…”

3.3 Irrealis Modality

The distinction between realis and irrealis mood plays a central role in the analysis of corporate announcements. Irrealis mood covers various forms of (1) speculation (cf. “forward-looking language” hedges, conditionals, wishes, predictions, forecasts, warnings), (2) intent (cf. plans), and (3) imperatives (commands), et cetera [24]. Irrealis signals allow the end user to explore, partition, and interpret sentiment and affect metrics in an even more flexible manner.

We capture multiple irrealis types at the sentence level using finite-state taggers with rich hand-crafted shallow-semantic patterns over words and part-of-speech sequences [25]. The taggers detect three higher-level irrealis types (speculation, intent, and risk), as well as more specific subtypes such as speculation.prediction, speculation.conditional, intent.buy, intent.expand (a business), risk.estimate (failing estimates), and risk.legal.

Examples of prototypical irrealis signals present in many corporate announcements are shown in Example 15.5, together with the tags assigned by the taggers:

(15.5)

a. “The stable outlookspeculation.prediction reflects our expectation speculation.prediction that the @COMPANY@’s operating performance will be stable speculation.prediction over the next 12–24 months speculation.future and its financial position will not significantly weaken speculation.prediction despite high capital expenditure.

b. Liability claimsrisk.legal related to our products or our handling of hazardous materials could damage speculation our reputation and have a material adverse effect risk on our financial results.

c. “The plan to mergeintent.merge@COMPANY@ and @COMPANY@ should lead to speculation.prediction cost savings and will result in speculation a more balanced exposure in emerging markets.

Although all irrealis expressions are highly informative and useful for the end user, the system needs to be particularly sensitive toward any and all negative signals and cues pertaining to uncertainties and risks as they are symptomatic of negative financial performance—one of the key signals analysts and the wider public look for in corporate announcements. To account for such interpretative sensitivity bias, we weight irrealis sentences that exhibit negative sentiment polarity by a very large amount in sentiment aggregation and suppress positive irrealis sentences by a moderate amount.

3.4 Comparisons

Corporate announcements typically contain price-sensitive comparative information about a company, its past, its competitors, and the market. Unlike approaches that approach sentiment and comparative constructions jointly [26] and hence pay less attention to noncomparative sentiment signals, we analyze the two dimensions separately so as to ensure exhaustive sentiment coverage and to offer more flexible queries for the end user.

We tag comparative sentences with finite-state taggers with rich hand-crafted shallow-semantic patterns over words and part-of-speech sequences. The taggers, which are functionally akin to the irrealis taggers described earlier, capture generic comparison expressions [27, 28], as well as more detailed comparison subtypes such as comparison.money, comparison.time, and comparison.evaluation. Example 15.6 illustrates sample comparative expressions detected by the taggers:

(15.6)

a. “EBITDA amounted to60.3 million over the first nine months of 2014, up 3.2% compared with a year earliercomparison.time.

b. “Landing Jack Dorsey as the permanent CEO is the by far the bestcomparison.evaluation choice Twitter can make right now.

c. “Meanwhile, higher revenuescomparison.money and lower net credit losses comparison.money were offset by higher operating expenses comparison.money and a higher effective tax rate comparison.money.

3.5 Topic Tagging

The system lastly tags entity mentions, sentences, and documents with topic tags that reflect some desired classification and aggregation criteria; for example, tax, m&a, board_of_directors, or legal_issues. The topic tags serve two main purposes; namely, (1) to make it easier for the end user to query and filter the opinion streams at a higher conceptual or topical level and (2) to profile and summarize corporate announcements through aggregated sentiment and emotion predictions. Multiclass topic tagging is achieved with a classifier society with simple finite-state taggers alongside standard supervised learning architectures with trigram features. Because of the inherent variation in the actual topic tag sets used by different end users, the system exposes user-definable resources for both classifier types.

4 Discussion

Whenever a publicly quoted company makes an official announcement, a series of events is triggered in its audience of analysts, traders, investors, other companies, the press, and the wider public. Seen from the point of view of the market, upstream financial performance cues in announcements—explicit and implicit evidence about the fitness, activities, circumstances, and prospects of a given company—are obviously of critical intrinsic importance. Seen from the point of view of a company that makes an announcement, it is equally essential to be able to capture the even greater variety of performance cues that abound in the (typically more subjective) extrinsic downstream public reaction around the announcement in the form of formal and informal discussions, analyses, recommendations, predictions, and speculation. Accordingly, the main goal of our real-time opinion streaming system is to enable companies to track, understand, quantify, and react to the ripple effects and chain reactions that official announcements trigger.

The core analytical tasks required for such a system pose many computational and practical challenges. Considering the range of topics, issues, and events that will ultimately surface in the ensuing downstream public commentary, and the way they are interpreted, it is evident that standard sentiment analysis is not enough to fully support the task and that it has to be augmented with other opinion metrics that touch on affect, irrealis modality, comparisons, general topics, and other related dimensions in language. Even though the affective “soft” metrics that are derived from such analyses can be rich and highly informative, many end users use them as a powerful complement to typical factual “hard” metrics (eg, raw volume of mentions, share price, volatility, number of followers, and number of retweets) to yield even richer signals. Accordingly, real-time opinion streaming is best viewed and approached as a holistic, multifaceted problem to which sentiment analysis provides an essential but partial solution.

Unlike in academic experiments and laboratory conditions, only some challenges pertain to the actual intrinsic classification and scoring accuracy of the underlying sentiment, affect, and other related core NLP analyses. On the one hand, rather broad relative9 temporal changes tend to matter more than individual low-level data points to users who monitor high-volume and high-velocity opinion streams. Such streams can in this sense exploit the boost and protection (“statistical whitewash”) provided by large data distributions to mask many errors, anomalies, or biases in the underlying analyses (or in the raw streams themselves). On the other hand, low-volume opinion streams can expose individual data points, in which case intrinsic accuracy does matter greatly as even a handful of incorrect or anomalous analyses can skew the opinion streams and decrease the end user’s confidence in the system.

This dichotomy means that an opinion streaming system needs to be able to support multiple users’ unique precision-versus-recall criteria, preferences, requirements, and resources, all of which can change frequently in demanding real-world conditions. Such flexibility introduces a number of complications to the design of a real-time system, especially around core classifier architectures, resources, and data stream filters. Consider the upstream keyword filters described in Section 2.1 that play a major role in governing the usability of the entire system, for example. Despite their simplicity, Boolean filters offer a number of important practical benefits for the end user, which makes them preferable to more sophisticated data stream filtering methods that rely on supervised learning. Firstly, Boolean(-like) filtering rule definitions and resources from other systems, especially legacy ones, can be incorporated and reused easily. Secondly, most end users understand how to (re)configure and fine-tune them, which makes them easier and faster to debug, verify, maintain, and extend. Thirdly, because their matches are crisp, not probabilistic, Boolean filters are often easier to interpret than probabilistic predictions. Fourthly, they enable the end user to start opinion streams quickly without having to obtain training data for supervised learning, which is critical in the context of low-volume corporate announcements for which only few training examples might be available (either initially or eventually). Lastly, some users’ desired keyword filtering criteria change so rapidly that supervised training data compilation activities simply become too cumbersome in practical terms.

There are many similar important practical requirements for the core sentiment analysis architecture. Compositional approaches to sentiment [17,2932] offer, in general, a number of important benefits over flat classification methods in that (1) they are often relatively robust to ill-formed or ambiguous inputs due to compositional masking effects and basic support for scope phenomena; (2) non-learning-based compositional methods appear to be more resistant to domain dependency effects that can adversely affect flat approaches; (3) their predictions can be interpreted, verified, and debugged more easily and in a more structured manner (as full, exhaustive parse traces can be accessed), which is not the case with flat approaches; (4) their sentiment predictions can be fine-tuned directly to interpret sentiment in some desired manner (eg, stance, point of view) at multiple structural levels or for each individual user, which is, in general, harder to achieve with flat approaches without recourse to multiple models; (5) they can offer a uniform, linguistically sound, and coherent representation across multiple structural levels, again without recourse to multiple models; (6) they can deal with sentiment reversal phenomena in a more structured manner; and (7) their crisp, more deterministic output can, unlike statistical “black box” components, offer extra psychological, procedural, or legal reassurance to the end user that the system can in fact be controlled.

If sentiment (or some other form of) word sense disambiguation (see Section 3.1.1) is attempted, the availability of sufficient training data is likewise a major obstacle for supervised learning approaches to that task. Because there are multiple lexemes that need to be disambiguated, a very large amount of training data would be required collectively, even within a single domain. In this sense, more direct disambiguation mechanisms that target specific lexemes are simply more convenient than those that use supervised learning in practical terms.

Similarly, sentiment confidence estimates (see Section 3.1.3) can be obtained from vanilla probabilities stemming from supervised learning. Note, however, that a predictive probability score does not necessarily indicate how difficult a piece of text is with respect to sentiment ambiguity—unless a specific sentiment ambiguity scoring model has been trained for the said task. Moreover, simple probabilities run the risk of reflecting the distributions in their underlying training data too closely, which can lead to poorer generalization in practical cross-domain and cross-topic use.

Because the ultimate practical goal is to provide real-time feedback for the end user on a continuous basis, long(er)-term maintenance concerns cannot be ignored either. Any classifiers that exploit machine learning can become heavy in this regard as they introduce technical debt in various forms, such as interfaces (at data, component, and system levels), dependencies, feedback loops, changes in the external world, and system-level antipatterns in long-term use [33]. Although more direct non-learning-based approaches require maintenance as well, they tend to be easier and faster to update, verify, and maintain.

5 Conclusion

We have described a custom large-scale NLP pipeline that executes fine-grained multilevel compositional sentiment analysis, multidimensional affect analysis, irrealis modality detection, and other analyses that have been optimized to the financial domain. By combining sentiment analysis with other analytical dimensions, one can monitor, quantify, and estimate in real time the downstream ripple effects and chain events of corporate announcements and other related events across multiple data sources. The resultant affective “soft” metrics can provide rich price-sensitive feedback and information to companies and the wider market audience about how a company’s announcement is interpreted and viewed in different contexts, what the public mood around it is, and how downstream conversation around it develops.

References

[1] Bamman D., Smith N.A. Contextualized sarcasm detection on Twitter. In: Proceedings of the Ninth International Conference on Web and Social Media. 2015:574–577.

[2] Barbieri F., Saggion H. Modelling irony in Twitter. In: Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics. 2014:56–64.

[3] Reyes A., Rosso P., Veale T. A multidimensional approach for detecting irony in Twitter. Lang. Resour. Eval. 2013;47(1):239–268.

[4] Zhang R., Liu N. Recognizing humor on Twitter. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. 2014:889–898.

[5] Mihalcea R., Pulman S. Characterizing humour: an exploration of features in humorous texts. In: 2007:337–347. Computational Linguistics and Intelligent Text Processing..

[6] Qadir A., Riloff E. Learning emotion indicators from tweets: hashtags, hashtag patterns, and phrases. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014:1203–1209.

[7] Gimpel K., Schneider N., O’Connor B., Das D., Mills D., Eisenstein J., Heilman M., Yogatama D., Flanigan J., Smith N.A. Part-of-speech tagging for Twitter: annotation, features, and experiments. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011:42–47.

[8] Nivre J. Algorithms for deterministic incremental dependency parsing. Comput. Linguist. 2008;34(4):513–553.

[9] Zhang Y., Clark S. Syntactic processing using the generalized perceptron and beam search. Comput. Linguist. 2011;37(1):105–151.

[10] de Marneffe M.-C., MacCartney B., Manning C.D. Generating typed dependency parses from phrase structure parses. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation. 2006:449–454.

[11] Huddleston R., Pullum G.K. The Cambridge Grammar of the English Language. Cambridge, UK: Cambridge University Press; 2002.

[12] Qiu G., Liu B., Bu J., Chen C. Opinion word expansion and target extraction through double propagation. Comput. Linguist. 2011;37(1):9–27.

[13] Liu B. Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 2012;5(1):1–167.

[14] Liu B. Sentiment analysis and subjectivity. In: Indurkhya N., Damerau F.J., eds. Handbook of Natural Language Processing. second ed. Boca Raton, FL: CRC Press; 2010:627–666.

[15] Pang B., Lee L. Opinion mining and sentiment analysis. Found. Trends Inform. Retr. 2008;2(1–2):1–135.

[16] Wiebe J., Wilson T., Cardie C. Annotating expressions of opinions and emotions in language. Lang. Resour. Eval. 2005;39(2):165–210.

[17] Malo P., Sinha A., Korhonen P., Wallenius J., Takala P. Good debt or bad debt: detecting semantic orientations in economic texts. J. Assoc. Inform. Sci. Technol. 2014;65(4):782–796.

[18] Moilanen K., Pulman S. Sentiment composition. In: Proceedings of the International Conference RANLP-2007. 2007:378–382.

[19] Moilanen K., Pulman S. Multi-entity sentiment scoring. In: Proceedings of the International Conference RANLP-2009. 2009:258–263.

[20] Ruppenhofer J., Rehbein I. Semantic frames as an anchor representation for sentiment analysis. In: Proceedings of the Third Workshop in Computational Approaches to Subjectivity and Sentiment Analysis. 2012:104–109.

[21] Reschke K., Anand P. Extracting contextual evaluativity. In: Proceedings of the Ninth International Conference on Computational Semantics. 2011:370–374.

[22] Tracy J.L., Randles D. Four models of basic emotions: a review of Ekman and Cordaro, Izard, Levenson, and Panksepp and Watt. Emot. Rev. 2011;3(4):397–405.

[23] Scherer K.R. What are emotions? And how can they be measured? Soc. Sci. Inform. 2005;44(4):695–729.

[24] Morante R., Sporleder C. Modality and negation: an introduction to the special issue. Comput. Linguist. 2012;38(2):223–260.

[25] Velldal E., 'vrelid L., Read J., Oepen S. Speculation and negation: rules, rankers, and the role of syntax. Comput. Linguist. 2012;38(2):369–410.

[26] Ganapathibhotla M., Liu B. Mining opinions in comparative sentences. In: Proceedings of the 22nd International Conference on Computational Linguistics. 2008:241–248.

[27] Jindal N., Liu B. Mining comparative sentences and relations. In: Proceedings of 21st National Conference on Artificial Intelligence. 2006:1331–1336.

[28] Kessler W., Kuhn J. A corpus of comparisons in product reviews. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation. 2014:2242–2248.

[29] Dong L., Wei F., Liu S., Zhou M., Xu K. A statistical parsing framework for sentiment classification. Comput. Linguist. 2015;41(2):293–336.

[30] Socher R., Perelygin A., Wu J., Chuang J., Manning C.D., Ng A., Potts C. Recursive deep models for semantic compositionality over a sentiment Treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013:1631–1642.

[31] Polanyi L., Zaenen A. Contextual valence shifters. In: 2006:106–111. Exploring Attitude and Affect in Text: Theories and Applications. Papers from the 2004 AAAI Spring Symposium. Technical Report SS-04-07..

[32] Wilson T., Wiebe J., Hoffmann P. Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. 2005:347–354.

[33] Sculley D., Holt G., Golovin D., Davydov E., Phillips T., Ebner D., Chaudhary V., Young M., Crespo J.-F., Dennison D. Hidden technical debt in machine learning systems. In: 2015:2494–2502. Advances in Neural Information Processing Systems 28..


* https://www.cs.ox.ac.uk/people/stephen.pulman

1 Note that Twitter is gaining prominence as a primary channel for corporate announcements as well (see Justin Baer (2015). Goldman Sachs earnings are moving to Twitter. The Wall Street Journal. Accessed October 7, 2015. http://www.wsj.com/articles/goldman-sachs-earnings-are-moving-to-twitter-1444261919).

2 http://support.gnip.com/sources/twitter

3 http://stocktwits.com/coolkevs/message/47079864

4 http://www.natcorp.ox.ac.uk

5 http://www.cis.upenn.edu/~treebank/home.html

6 http://blogs.ft.com/tech-blog/liveblogs/2016-01-26

7 http://blogs.marketwatch.com/thetell/2016/01/28/amazon-earnings-expected-to-show-record-holiday-quarter-live-blog

8 http://europe.newsweek.com/twitters-new-ceo-five-challenges-facing-jack-dorsey-333956

9 Especially with bursty high-volume opinion streams, many end users merely monitor the relative ups (peaks) and downs (troughs) in the streams at a relatively superficial level until sudden relative changes (or alerts) catch their attention (colloquially referred to as ocular regression).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset