7
Argument Mining Applications and Systems

There have been a large number of approaches to automated argument mining. The mainstream approach consists in applying machine learning (ML) techniques on manually annotated corpora to obtain a tool which automatically identifies argument structure in texts. In these cases, a corpus and its annotations determine the capabilities of the resulting tool. As an alternative to ML, some tools have been developed that rely on hand-crafted rules. For example, J. Kang and P. Saint-Dizier [KAN 14] present a discourse grammar implemented as a set of rules and constraints.

In this chapter, we present some systems that have been developed to automatically recognize the argumentative structure and the arguments in texts and to classify the detected arguments according to their type (e.g. counterarguments, rebuttals).

7.1. Application domains for argument mining

Argument mining serves the purpose of identifying why a speaker/writer holds such opinion. Some argument mining systems aim at describing argumentation (i.e. simply discovering arguments and, possibly, the relationships between several arguments), while another application of argument mining is to evaluate arguments (i.e. determining whether they are sound or to assess their influence in decisions). The most obvious application for argument mining systems is the analysis of the argumentative structure of a text (whether it is written or transcribed from oral data) to find the conclusion and the supporting or attacking elements for it [LIP 15]. Such analyses can have a further target in order to provide practical applications as we will see below.

Application domains for argument mining are varied. We list here some applications, contexts and fields that can benefit (or have benefited) from advances in argument mining. The aim here is not to provide an exhaustive list of the domains of applications but to present some of the possible fields of application and some tasks related to argument mining.

7.1.1. Opinion analysis augmented by argument mining

Knowing whether one is for or against, or whether one likes or dislikes something is the goal of opinion mining. But a deeper analysis may involve discovering why one holds an opinion. Argument mining tools can be applied to product reviews, for instance, in order to offer a company’s marketing department the possibility to know the reasons why customers appreciate (or not) a product or service [SCH 12]. Another domain of application would be deliberation or politics because decision makers can learn more about citizens’ opinions and expectations.

7.1.2. Summarization

Summarization consists in reducing a text to its main argument components (or highlighting them) [BAR 16] (see also section 4.3). It can be useful in many domains such as the legal and medical fields or for grasping the gist of a debate.

7.1.3. Essays

Persuasive essays are texts in which an author takes a stand on a topic and tries to convince readers. They therefore contain many arguments, which seems a natural choice for argument mining because essays are structured texts [STA 17].

7.1.4. Dialogues

Dialogues are another application domain for argument mining: people sharing their opinion may use arguments to convince their interlocutor(s) and may successfully do so. Whether they take place online (e.g. e-debates, online comments or Tweets) or orally (e.g. debates or discussions), dialogues present an additional challenge for argument mining since they are less constrained and many arguments are often more interrelated than in written contexts [BUD 14a].

7.1.5. Scientific and news articles

Scientific articles are texts containing arguments either to defend the author’s position or to refute another author’s stance or else to contrast several works. Scientific articles are above all argumentative since an author presenting her findings needs to explain why these findings are important. Discovering argumentation in scientific papers can help, for instance, to summarize findings [TEU 02].

News articles are argumentative texts too, in which an author presents a fact and can provide arguments. As in scientific articles, in this case the argumentation is usually clear and well performed since readers must follow and understand the author’s stance and explanations.

7.1.6. The Web

The Web provides an immense source of arguments: people use web technologies to discuss topics, provide their opinions and comment on others’. This environment probably represents the biggest venue for argumentative natural language. Scholars have seen in this rich environment the opportunity to easily find arguments performed in an unconstrained manner even though it represents a huge challenge for argumentation mining [HAB 17].

7.1.7. Legal field

The development of argument mining systems was mainly initiated in the legal domain, surely because of its obvious argumentative character and rather constrained discourse, somehow easing the argument mining task [MOE 07]. Argument mining applications can be used to retrieve old cases in order to automatically find precedents for a current case, for example.

7.1.8. Medical field

The medical field is attracting attention, for instance, to detect the relation between symptoms and diseases [BOJ 01]. Applying argument mining to the medical field can save time for healthcare practitioners, for instance, by providing them with a summary of a patient’s case or by automatically supplying them with a set of similar cases, which can help in establishing a protocol or rapidly diagnosing a condition.

7.1.9. Education

Argument mining can be adapted to the education field. Student essays, for instance, are usually structured texts; this provides the opportunity to easily detect argumentation. Indeed, to structure their essays, students use discourse connectives or even titles that can help to detect arguments and/or to determine the argumentative structures [FAU 14]. Applying argument mining systems to education may support the automatic correction and marking of students’ works.

7.2. Principles of argument mining systems

There are two main approaches to automatic argument mining: rule- and corpus-based approaches. In rule-based systems, an expert writes rules to identify and interpret arguments. These systems tend to be accurate, mostly if they belong to a limited domain, but have low coverage. On the other hand, corpus-based systems are inferred by ML from examples that have been analyzed by annotators. These systems can have low accuracy but tend to have bigger coverage, especially if the number of examples is big enough or the strategies to generalize from examples are adequate. It can be expected that an automated approach, be it rule-based or ML, performs well in cases where annotators have a higher agreement (see section 6.3.2).

As [LIP 16] summarize it, every argument mining system has its own granularity (i.e. the level of detail at which arguments are searched: sentences, paragraphs, etc.), genre (dialogues, news, etc.), target (e.g. detection of claims or of premise/conclusion relations) and goal (e.g. detection and classification). Just like manual analyses of arguments, argument mining systems encompass several interrelated tasks. Argument mining systems, therefore, tend to follow the same principles and carry out subtasks step by step. Most systems developed so far rely on a pipeline architecture, meaning that the original, unstructured texts are gradually processed to eventually produce a structured document showing the detected arguments and (possibly) their components. This pipeline is summarized in Figure 7.1 (see also section 5.4). The output generally takes shape of an argument graph. Note that the automatic analysis of machines fails mostly in the same issues where humans have a lower agreement. However, humans perform consistently better than automated procedures when knowledge, reasoning and common sense are involved, without shallow cues like discourse markers.

images

Figure 7.1. The argument mining pipeline

We summarize here the different stages of an argument mining exercise, along with some machine learning (ML) and natural language processing (NLP) techniques, which can be used to perform each task.

7.2.1. Argumentative discourse units detection

The first typical stage of argument mining is to delimit text spans that could constitute claims. Then, sentences that have a debatable character must be identified. This step corresponds to the detection of argumentative discourse units (ADUs) during manual analyses of arguments; it is as complex for a computational system as for a human annotator to distinguish sentences with an argumentative function from the ones that are non-argumentative (see section 6.1). For this reason, in some systems, input texts are segmented prior to being processed [STA 14]. Linguistic cues can provide useful information to determine whether a text span is argumentative or not (e.g. scalar adjectives, verbs, adverbs or modals).

Classifiers are the most commonly used technique to distinguish argumentative units from the non-argumentative ones. Classifiers usually use Bag of Words representations; roughly, if a word that has been predefined as being typical of argumentative sentences is found in a text, the classifier will codify the unit as being argumentative. Argument mining systems can also rely on the following ML and NLP techniques:

  • text segmentation [CHO 00];
  • sentence classification [KIM 14];
  • question classification [ZHA 03].

When the output of the argument mining process is an argument graph, ADUs take the form of nodes. See section 8.1 for more details on the detection of ADUs by argument mining systems.

7.2.2. Units labeling

Once ADUs have been detected, the next step is the labeling of units, that is, determining the role of each unit in the argumentation (e.g. determining whether a unit is a claim, a conclusion, a premise, etc.). At this stage, the argument components must be classified according to their type: claim, premise, conclusion, counterargument, etc. A few NLP and ML techniques can be applied to carry out this task:

  • sequence labeling [NGU 07];
  • named entity recognition [NAD 07].

Just like during manual annotations (see section 6.1), the unit labeling task depends on the purpose of the mining process and the theory at stake too: one may not be interested in identifying warrants, for example, and only focus on the detection of premises and conclusions (e.g. for product reviews).

7.2.3. Argument structure detection

The following step in the argument mining pipeline is the detection of the argument structure. In other words, the system must represent the links between the previously extracted and clearly identified (or labeled) ADUs. At this stage, the system must label the links according to their detected type, for example, conflict, rebuttal, cause, etc. As argued in [PEL 13], two types of relations are usually concerned here: causal ones (i.e. supports) and contrastive ones (i.e. conflicts).

While some links are easily detected by argument mining systems, others are much more challenging, not because the type of relation is hard to distinguish but because some syntactic constructions are inherently complex. Different NLP techniques are therefore applied. Let us take the following examples:

The vaccine is toxic because the adjuvant is toxic.

If a system has correctly segmented the text in two argumentative segments (“the vaccine is toxic” and “the adjuvant is toxic”), it may easily identify the relation between them via a discourse relation classification [LIN 09]: the connective because is a clear indication of causality. Let us now take the same example without the connective:

The vaccine is toxic. The adjuvant is toxic.

This time, the task of automatically identifying the relation between the segments may be trickier: the absence of explicit marker of causality (or cue phrases) is challenging. For a human annotator, this sentence may not be analyzed with difficulty if s/he has sufficient knowledge to know that adjuvants are used to produce vaccines. For a system, however, acquiring such knowledge is hard. An additional technique must therefore be put into place such as a semantic textual similarity classifier [ACH 08].

Situations in which argumentative support is not explicitly marked (as in the example above) are very frequent in texts. Conflicts, however, tend to be more clearly signalled, as it would be complicated for a reader to follow a text in which contrasting opinions are not explicitly marked. Hence, markers can be used as cues to automatically uncover conflicts. Nevertheless, the problem of detecting conflictual relations arises frequently in dialogues, where an opponent may not need to clearly state her disagreement (e.g. “I disagree with what you’ve just said”) and, rather, will state a new (opposing) claim. If disagreements can be understood straightforwardly in a discussion, automatically detecting them is a whole different story.

Here are some NLP techniques that are usually applied:

  • textual entailment [AND 10];
  • link prediction [GET 05];
  • detection of verb tenses for temporal ordering of events [MAN 03];
  • detection of word pairs in causal relations (e.g. fall-hurt) [CHA 06].

Other relations exist beside supports and conflicts. As an example, while some works of research consider that examples are supports for claims (hence the Argument from Example argument scheme), others deem them to be a different type of argumentative relation. Similarly, while restatements can be considered as sheer repetitions (as in [PEL 13], who put the original argumentative unit and its restatement in one single node in the analytical diagrams), they can also be seen as having another role in the argumentation. For example, restatements may add to the force of an argument [KON 16].

Again, some relations may be of no interest for some projects. As an example, conflict relations may be left aside if the goal is simply to detect the main claim in a news article.

7.2.4. Argument completion

When elements of an argument do not appear in a text, they may need to be reconstructed. For instance, reconstructing enthymemes or inducing implicit warrants may be necessary to obtain the most complete argument possible [RAJ 16, SAI 18]. Reconstructing implicit arguments is a challenging task, though this step is not mandatory (for example, when one is only interested in what the text explicitly presents). Another interesting application of automatic argument completion is when one is interested in identifying the argument schemes: as we have seen in section 6.1.5, some elements of an argument scheme may not be explicitly stated but can be reconstructed.

7.2.5. Argument structure representation

After the argument component detection – and reconstruction – one may want to present the general argumentative structure of the analyzed text. The generated output is usually structured as a graph. The overall graphical representation obviously depends on the argument model (or framework), on which the system is based. If the model is only interested in premise/conclusion relationships, the formalization is therefore more straightforward than in models which go beyond such a link and try to elicit rebuttals and warrants, for example (see section 6.2).

Results of the automatic annotation of arguments and argument structures can also be presented in XML (Extended Markup Language), which is useful for the exchange of data between different programs and because it offers the possibility to clearly represent structures (such as tree structures).

7.3. Some existing systems for argument mining

In this section, we will present some existing argument mining systems. To the best of our knowledge, no system allows the automatic carrying out of all the tasks presented above: most of them only focus on the detection of claims but many intend to execute several ensuing tasks such as the structure detection task. Hitherto, argument mining systems deliver little accuracy but results tend to be better when the model and the domain of application are clearly delimited. As a result, ad hoc systems are often privileged. Indeed, when applying ML techniques on manually annotated corpora to obtain a tool that automatically identifies argument structure in texts, the corpus and the annotations totally determine the capabilities of the resulting tool. We list below some systems that have been developed in order to automatically detect arguments and/or argument structures.

7.3.1. Automatic detection of rhetorical relations

Such systems have been designed to automatically build RST trees – or tantamount structures (see section 6.1.8) – highlighting, for example, cause and contrast relations [MAR 99, MAR 02, BLA 07]. The main corpora that are used to train such systems are the RST Treebank and the Penn Discourse Treebank (PDTB) [MAR 02, HER 10, LIN 09]. Different methods (supervised or semisupervised) have been applied across the literature triggering various results.

O. Biran and O. Rambow [BIR 11] rely on four different corpora (Wikipedia pages, RST Treebank, blog threads from Live Journal and Wikipedia discussion pages) along the different stages of development of their model for the identification of justifications. They use a naive Bayes classifier (a supervised ML method) and reach F1-score = 39.35 for the Wikipedia discussion pages.

A. Peldszus and M. Stede [PEL 13] identify two major limitations to these systems, including that most works turn out to be oversimplistic because they only consider nucleus–satellite relations and do not identify more complex structures.

7.3.2. Argument zoning

Although argument zoning is not really argument mining, it can be considered as one of the cornerstones for argument mining processes. In [TEU 99a], the author has worked on scientific papers and proposed a way of automatically detecting zones within them and range them into seven categories:

  • Aim: the goal of the paper;
  • Textual: statements indicating the sections structure;
  • Background: generally accepted scientific background;
  • Own: statement describing the author’s work;
  • Contrast: when the author compares the current work with others’;
  • Basis: statements showing agreement with other works;
  • Other: when the author describes others’ works.

The author has used a naive-Bayes approach for the automatic classification triggering very good results for the category Own (86%) but only 26% for the Contrast category.

7.3.3. Stance detection

In [GOT 14], the authors limit themselves to the prediction of stance, i.e. the attitude of an author/speaker toward a subject: the author of a text (here, posts in Debatepedia) can be for or against a given topic. The authors used sentiment lexicons and named entity recognition and achieved accuracy around 0.80. Although this work relates more to sentiment mining, it can be considered as a first step toward argument mining.

7.3.4. Argument mining for persuasive essays

In [STA 17], the authors propose a model for identifying argument components (major claims, claims and premises) and detecting argumentation structures in texts coming from an online forum where users provide correction and feedback about other users’ research papers, essays or poetry. Their model also differentiates between support and attack relations. They obtain 95.2% of human performance for component identification, 87.9% for argumentation structures detection and 80.5% for support and attack relations. This pipeline model is one of the first approaches that allows identifying the global argument structure of texts.

7.3.5. Argument mining for web discourse

I. Habernal and I. Gurevych [HAB 17] have created an experimental software for argumentation mining in user-generated web content. Their system is under free license and available at http://kp.tu-darmstadt.de/data/argumentation-mining. The authors identify argument components in web discourse (with a large variety of registers and domains) using Structural Support Vector Machine (SVM) and a sequence labeling approach reaching an overall Macro-F1 = 0.251.

The authors of [CAB 12] worked on 19 topics from Debatepedia (online dialogical texts) and used existing research on textual entailment (the off-the-shelf EDITS system) and argumentation theory to extract arguments and evaluate their acceptability. They achieved an F1 score of 0.75.

7.3.6. Argument mining for social media

C. Llewellyn et al. [LLE 14] and M. Dusmanu et al. [DUS 17] have developed systems specifically for the domain of social media (here, Twitter). Social media present many difficulties to be treated by standard NLP tools: language, in particular, has many deviations from standard usage.

In [LLE 14], the authors classified claims, counterclaims, verification inquiries and comments from tweets, using supervised ML techniques such as unigrams, punctuation, Part of Speech (POS), naive Bayes, SVM and decision trees. They obtain an agreement between human and automatic annotations ranging from κ = 0.26 to κ = 0.86.

For the distinction between factual and opiniated tweets, M. Dusmanu et al. [DUS 17] apply classification algorithms (Logistic Regression and Random Forest) and obtain an F1 measure of 0.78 for argument versus non-argument classification.

T. Goudas et al. [GOU 14] carry out claim/premise mining in social media texts. They identified argumentative sentences with F1 = 0.77, using the Maximum Entropy (ME) classifier. To identify premises, they used BIO encoding of tokens and achieved an F1 score of 0.42 using conditional random fields (CRFs).

7.3.7. Argument scheme classification and enthymemes reconstruction

V. Feng and G. Hirst [FEN 11] have built a system for the automatic classification of five argument schemes (as defined in [WAL 08]): practical reasoning, arguments from example, from consequence, from cause to effect and from verbal classification. The final goal of their approach is the reconstruction of enthymemes.

This work has made use of the Araucaria corpus, saving the authors from the burden of discarding non-argumentative units and identifying premises and conclusions. Their system attains up to 98% accuracy in differentiating between scheme pairs.

7.3.8. Argument classes and argument strength classification

J. Park and C. Cardie [PAR 14] used SVM and other features such as n-grams, POS or sentiment clue words to classify propositions from online user comments into three classes: unverifiable, verifiable non-experimental, and verifiable experimental. The classification then allows to provide an estimate of how adequately the arguments have been supported (i.e. whether the arguments are strong). In their work, they ignored non-argumentative texts and achieved MacroF1 = 68.99%.

7.3.9. Textcoop

Textcoop1 is a platform that uses grammars for discourse processing and automatic annotation of various linguistic features. It has been used to annotate different types and genres of texts such as procedural texts [SAI 12], debates [BUD 14b], as well as opiniated texts such as user reviews [VIL 12]. The system allows the detection of conclusions (main claims), supports and argument strength. This rule-based system also allows detecting structures that go beyond the claim–premise relation such as specialization, definition or circumstance [KAN 14].

Although the tool is not designed solely for argumentation, its use for the detection of argumentative structures and illocutionary forces (see section 6.2.3) has triggered satisfactory results, for example, [BUD 14b] indicates that the system correctly identified 85% of ADUs and 78% of illocutionary connections.

7.3.10. IBM debating technologies

The largest argument mining dataset to date is currently being developed at IBM Research2 (see also [AHA 14]). In [RIN 15], the authors developed a system for automatically detecting evidences from text that support a given claim. This task finds several practical applications in decision support and persuasion enhancement, for instance.

7.3.11. Argument mining for legal texts

R. Mochales Palau and M.-F. Moens [MOC 09] have worked on the ECHR corpus (see section 6.5) and implemented ML techniques (rather than linguistic features) for the detection of arguments via a naive Bayes classifier and obtain 73% accuracy. Their system also allows classifying argumentative units as either premises or conclusions with good results: 68.12% for the detection of premises and 74.07% for conclusions. Finally, their system permits the detection of argumentative structures (i.e. how argumentative units relate to each other) and obtain 60% accuracy.

7.4. Efficiency and limitations of existing argument mining systems

The first issue with argument mining is the inherent complexity of the task that makes the development of such systems very difficult and time consuming. Hence, most systems described in the literature are often evaluated on small datasets; as a consequence, results must be cautiously judged.

Given the various approaches to argument mining that exist and the different goals of the systems developed so far, it would be meaningless to try and compare them and their results. However, authors reporting on their systems’ results tend to agree on the tasks that yield satisfactory results and the ones that need further improvements. Thus, R. Mochales Palau and M.-F. Moens [MOC 09] report an accuracy of 73% for ADUs detection and Budzynska et al. [BUD 14b] report an accuracy of 85%. The classification of units – as claims, premises or conclusions, for instance – yields very different results, as we have seen in section 7.3. The relation detection task, in turn, does not seem to return that much divergent results: R. Mochales Palau and M.-F. Moens [MOC 09] obtain 60% and C. Stab and I. Gurevych [STA 17] reach 87.9%.

The difference in the results between the models also comes from the fact not all authors agree on the definition of the various argument components and use different techniques for automatically detecting them. Indeed, each argument mining system has been designed for a specific genre and goal. Furthermore, rule-based systems tend to be more accurate than corpus-based systems (which rely on ML techniques) but the domain of application has to be clearly delimited. To the most of our knowledge, no system has been constructed with a general-purpose spirit; this is obviously understandable given the complexity of the task. But efforts in that direction have to be made. One step in that direction would be the opportunity to share corpora and annotation schemes between systems. For example, different models rely on the same corpora for different purposes. This proves that a corpus for argument mining tasks would be valuable to a wide community of researchers. A framework may allow this sharing and reusing of data and annotations: the Argument Interchange Format (AIF) [RAH 09], which has not primarily been developed for argument mining purposes but which may suit the task as well as different lines of research (see also section 6.2.3). Moreover, as M. Lippi and P. Torroni [LIP 16] emphasize it, the development of Internet and the myriad of arguments expressed online provide an unprecedented venue for argument mining but, up to now, argument mining systems have only been tested on relatively small corpora, which raises the question of scalability. Finally, most of the argument mining efforts have been applied to English, but they are easily portable to other languages or domains, by annotating corpora in those other languages (see also section 6.5.9).

It must be noted as well that systems that rely upon a pipeline approach (such as [SAI 12] or [STA 17]) present another challenge since potential errors arising during the first steps of the pipeline influence the results of the following steps. For instance, if a unit has been wrongly annotated as a claim in the first stages, the resulting argument structure detection will be false.

7.5. Conclusion

We have seen that argument mining highly mirrors argument annotation: manual annotations can serve to train, test and develop systems that will automatically replicate the tasks. Such systems find applications in various domains such as the medical and legal fields or public deliberation.

While we have seen in Chapter 6 that manual analyses of argument are highly challenging, we can see that automatic analysis is even more complex, yielding disparate and lower results. Nevertheless, the literature review presented in section 7.3 clearly shows that argument mining is still in its early stages without widely accepted and established annotation schemes, approaches and evaluation. We believe, however, that the systems proposed so far will allow argument mining techniques to evolve and improve. Argument mining is indeed a very recent yet extremely attractive research area; while the first works related to argument mining appeared at the beginning of 2010s (see, for instance, [MOC 09, MOC 11, SAI 12]), a large number of conferences and events dedicated to argument mining has started to attract scholars: the first ACL workshop on argument mining took place in 2014 and international conferences such as COMMA have since received many papers related to this topic.

The aim of this chapter was to provide an overview of the goals, applications and techniques for argument mining. The following chapter, instead, can serve as a concrete – yet brief – example of the argument mining task.

  1. 1 The system is available from the authors upon request.
  2. 2 http://researcher.watson.ibm.com/researcher/view_group.php?id=5443.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset