4
Offline Methods for Studying Language Comprehension

In this chapter, we begin with the presentation of experimental methods which can be applied to the study of comprehension, while focusing on so-called offline methods. These methods are typically used for observing the result of the comprehension process once it has been achieved, but do not provide access to the comprehension process itself; this is why they are called offline methods. We first discuss what are known as explicit tasks, where participants are asked to consciously assess certain aspects of the language, or certain linguistic stimuli. We then describe so-called implicit tasks which assess the comprehension process indirectly. As we see throughout this chapter, the distinction between explicit and implicit tasks appears more as a continuum rather than as two clearly distinct categories. Since language comprehension covers a broad range of processes, from word recognition to discourse comprehension, we won’t deal with each of these areas in particular. Instead, we make a general presentation of the different types of tasks that can be implemented and illustrate them with the help of studies devoted to these different fields. We also see that it is possible to use different techniques in parallel in order to collect indicators which signal complementary comprehension processes.

4.1. Explicit tasks

In an explicit task, participants must consciously use their language skills to judge stimuli, such as the grammaticality of a sentence. In this case, the purpose of the study is not concealed, as it explicitly deals with language comprehension. Explicit tasks often rely on the metalinguistic abilities of the individuals tested, on their capacity to consciously reflect about language and its uses, and to report their linguistic knowledge or intuitions. For example, this would be the case with a task requiring judgment of whether a certain formulation of a speech act is polite or not.

We will present five types of explicit tasks: metalinguistic tasks, acceptability judgments, questionnaires, interpretation tasks and comprehension tests. As we will see in the course of the chapter, explicit tasks have limitations due to the fact that they crucially depend on the level of linguistic competences or metalinguistic abilities of the participants. This limitation is particularly problematic for the application of such tasks to populations whose metalinguistic abilities or linguistic competences are restricted or weaker, such as children or people suffering from language impairments.

4.1.1. Metalinguistic tasks

Along the continuum between explicit and implicit, metalinguistic tasks can be considered as the most explicit ones. In this type of task, participants are asked to consciously reflect about language as an object of study rather than as a means of communication. In practice, metalinguistic tasks may deal with all the aspects of language, such as phonological awareness, syntax or the comprehension of conversational implicatures. These tasks may be framed within different methodological paradigms, depending on the metalinguistic abilities investigated. To study phonological awareness, for example, one possibility would be to ask the participants to split words into syllables. To study syntax, one possibility would be to ask the participants to explain a grammatical rule accounting for the grammaticality or ungrammaticality of a series of sentences. Finally, to study the comprehension of conversational implicatures, one possibility would be to ask participants to explain the difference between literal meaning and contextually communicated meaning for a number of statements. In all cases, metalinguistic tasks are based on the ability of the individuals to reflect upon language and to report the product of their reflection.

Our first example of a metalinguistic task comes from the study by Colston and Gibbs (2002), concerning the comprehension of irony and the metaphor. According to their hypothesis, understanding irony requires second-order inferences (such as “X believes that Y believes that Z”), related to the interlocutor’s intentions and beliefs. However, these are not necessary for understanding the metaphor, which only requires first-order inferences (“X believes that Y”). In their study, participants had to read small scenarios and adopt the viewpoint of one of the characters. The last sentence of each scenario conveyed either a metaphor or irony. An example of such a scenario for the metaphorical condition involved a teacher talking about a student and ending the comment with “this one is really sharp”, an expression describing somebody as witty. In this case, through the use of a metaphor, the teacher was referring to a real virtue in the student (the fact of being intelligent). In the ironic condition, the scenario changed to a teacher looking for a pair of scissors, and – unable to find one that really worked – uttering “this one is really sharp”. In this case, the teacher was referring to an instrument in need of sharpening, but mentioned this property ironically, as the instrument was unsharpened.

Colston and Gibbs (2002) used different methods for assessing the comprehension of metaphorical and ironic sentences; here, we will discuss the second experience of their article, using a metalinguistic task. After the presentation of each scenario, the participants (university students) had to indicate their agreement with five statements aimed at assessing different metalinguistic skills necessary to understand metaphors or irony. Such statements targeted the comprehension of the speaker’s intention as in (1a) and (1b), the fact that this person did not actually think what they are saying, as in (2a) and (2b), the reference to this person’s personal beliefs as in (3a) and (3b), the existence of several possible beliefs as in (4a) and (4b), and finally the fact that the ironic sentences made fun of previous beliefs, as in (5a) and (5b):

(1a) The teacher’s remark reflects her current belief that the student is smart. (metaphor).

(1b) The teacher’s remark reflects her current belief that the scissors are not sharp. (irony).

(2a) The teacher’s remark reflects the fact that she is only pretending that the student is a cutting instrument. (metaphor).

(2b) The teacher’s remark reflects the fact that she is only pretending that the scissors are sharp. (irony).

(3a) The teacher’s remark refers to her prior belief (meaning her belief about the student before the conversation) that the student should be smart. (metaphor).

(3b) The teacher’s remark refers to her prior belief (meaning her belief about the scissors before the conversation) that the scissors should be sharp. (irony).

(4a) The teacher’s remark reflects her multiple beliefs, in that she is both referring to her present belief that the student is smart and her prior belief that the student should be smart. (metaphor).

(4b) The teacher’s remark reflects her multiple beliefs, in that she is both referring to her present belief that the scissors are not sharp and her prior belief that the scissors should be sharp. (irony).

(5a) The reason that the teacher possibly refers to her prior belief that the student should be smart is to mock this expectation given that the student is smart. (metaphor).

(5b) The reason that the teacher possibly refers to her prior belief that the scissors should be sharp is to mock this expectation given that the scissors are not sharp. (irony).

The results indicated a similar level of agreement for the statement concerning the speaker’s intention, between the irony and the metaphorical condition. For all the other assertions, agreement was always higher in the ironic condition than in the metaphorical condition. According to the authors, these results show that people are aware of the essential difference between irony and metaphor, particularly in relation to the necessary metalinguistic reasoning for understanding irony.

Another example on how to implement a metalinguistic task comes from Borghi et al.’s (2016) study, which delved into the meaning of abstract concepts. The aim of the study was to determine to what extent the comprehension of abstract concepts depends on linguistic and contextual elements. There are different ways of conceptualizing the representation of concrete and abstract concepts. A first approach would be to consider that representations, and cognition in general, are built on the basis of abstract symbols. In such a conception, the sentence “Laura passes the yellow ball to John” would simply be a proposition based on the predicate TO PASS, the subject LAURA, the object BALL, modified by the adjective YELLOW, to the recipient JOHN. A second approach, following the perspective of embodied – or grounded – cognition, would be to consider that representations and cognition are not only based on abstract symbols, but also on experience. From this perspective, the meaning of a linguistic stimulus emerges via a simulation process, based on experience, in order to build a representation going beyond the stimulus itself. Understanding the sentence “Laura passes the yellow ball to John” could, for example, involve a simulation of Laura’s movement for passing the ball. It could also activate the image of a tennis ball that matches the color of the ball in the sentence, or adopt Laura’s perspective rather than John’s.

Borghi et al. (2016) hypothesized that if comprehension was grounded in experience, then abstract concepts such as risk, danger and prevention should be understood differently by people having a different relationship with these notions in everyday life. To test this hypothesis, the researchers chose four groups of people with different theoretical and practical expertise in the field of safety and security (S&S) at the workplace. There were managers with first-class theoretical knowledge but no practical experience, security technicians with both theoretical and practical knowledge, trade union delegates specifically trained on safety issues, and workers lacking theoretical knowledge, but with everyday practical experience in these issues. The authors asked the participants to define the terms risk, danger and prevention, transcribed and then coded the definitions. For the coding phase, they made a distinction between four types of components which emerged from the replies: situational, introspective, taxonomic and attributive components. They then analyzed which types of components were related to the three concepts, as well as the types of components reported by the different groups of participants.

The results of the study showed that, for the three concepts, the participants mainly reported situational components, related to their own experience, supporting the hypothesis that the comprehension of an abstract concept is grounded in experience. By comparing the different definitions reported by the four groups, Borghi et al. (2016) found that workers provided the greatest number of components. Next, came the security technicians, the managers and then the union delegates. Experience seemed to play a role in the conceptualization of abstract concepts, since the participants with the most practical experience – the workers – offered the definitions containing the most components. By looking more closely at the types of components mentioned by each group, the researchers were also able to show that while all the groups mainly relied on situational components, the groups with the most practical experience offered more introspective elements than the others. The groups with the most theoretical experience, for their part, focused on taxonomic and attributive components. The results of this study support the hypothesis that representations and knowledge are not only symbolic, but, most importantly, are also grounded in experience.

The examples described above illustrate the advantage of metalinguistic tasks for reaching the conscious reflection processes required for language comprehension. They also illustrate the fact that such tasks have two important prerequisites: that the people tested have conscious access to their intuitions about language, and that they have the ability to report such intuitions. Some demographics, such as young children or people suffering from language impairments, cannot be tested with metalinguistic tasks. Furthermore, these tasks can quickly become complex. Going back to the example of Colston and Gibbs (2002), we can observe that the statements to be evaluated are complex sentences, some of which involve several subordinates. In order to be able to carry out this metalinguistic task, it is necessary for the participants to have good linguistic competences. These cannot be taken for granted, even in a population of adults, since there are multiple individual differences in language proficiency (Kidd et al. 2018; Zufferey and Gygax 2020).

4.1.2. Acceptability judgments

As their name implies, acceptability judgments are tasks in which people are asked to judge the acceptability of a sentence or a linguistic form. These judgments can be assessed by different means, either through binary evaluations (yes/no) or via an acceptability scale, offering several levels of acceptability. This second possibility makes it possible to qualify the responses, by means of a scale comprising intermediate levels ranging from not at all acceptable to totally acceptable. In this case, the measurement corresponds to the perceived degree of acceptability. Instead of measuring the acceptability of an isolated stimulus, it is possible to simultaneously present two stimuli and have the participants choose the most acceptable one. This procedure makes it possible to directly compare two linguistic stimuli. A final way of collecting acceptability judgments is to present a reference stimulus that is associated with a certain degree of acceptability and to have the degree of acceptability of other stimuli evaluated in comparison with the former. This method, drawn from psychophysics, is called magnitude estimation and was borrowed by linguistics (Bard et al. 1996). Despite their differences, it is interesting to observe that these methods all seem to be informative in their own right (Weskott and Fanselow 2011).

Acceptability judgments have mainly been used in the fields of syntax and semantics. Linguists, particularly in the Chomskyan tradition, think that acceptability judgments shed light on the structure of the knowledge of language and represent a direct reflection of it (Chomsky 1986). However, certain studies having used this method resort to a methodology which does not correspond to experimental methodological standards. As a matter of fact, these studies were often carried out on a small number of participants, linguists themselves, with few stimuli and few response options, which permitted only basic analyses of data. In the scientific literature, there is a lively debate on the merits of these studies and on the possibility of drawing reliable conclusions on the basis of only a few stimuli evaluated only by experts. Various studies have shown that the responses of naive subjects differ from those of experts, which calls into question resorting to the latter (Gordon and Hendrick 1997; Dabrowska 2010).

When used in compliance with the principles of experimental methodology, by presenting numerous items to many naive subjects, acceptability judgments can provide quality information. Below, we will illustrate different ways of implementing a study based on such judgments.

An example of an acceptability task can be found in Zufferey et al. (2015b), who investigated the influence of L1 transfer effects on the comprehension of discourse connectives in a second language. According to Zufferey et al., transfer-based errors in production are often mirrored by the use of connectives among learners of a second language. For example, French-speaking learners of English tend to misuse the connective if (si in French) for conveying contrastive relations (instead of the connective while), whereas, in English, it cannot fulfill this function. This can be explained by the fact that in French, the connective si can convey a conditional relation, as in the sentence “si elle ne vient pas demain, je lui téléphonerai” (if she doesn’t come tomorrow, I’ll call her) or a contrastive relation, as in “si en Belgique ce groupe a beaucoup de succès, il est encore inconnu en France” (while this group is very successful in Belgium, it is still unknown in France). Likewise, it has been shown that Dutch-speaking learners of English tend to use the connective when for conveying conditional relations instead of the connective if. Again, this can be explained by L1 transfer effects.

In their study, Zufferey et al. (2015b) tested French-speaking and Dutch-speaking learners of English, as well as native English speakers. Sixteen sentences requiring the use of the connective if (6) were created, as well as 16 sentences requiring the use of the connective while (7). In order to build an incorrect version of each sentence, the connective if was replaced by when (8), and the connective while was replaced by if (9):

(6) The kids don’t look very tired today. If they don’t take a nap now, we can go out for a walk.

(7) The admission policy for foreign students is variable across universities. While in some of them all students can enroll, in others there is an entrance examination.

(8) The kids don’t look very tired today. When they don’t take a nap now, we can go out for a walk.

(9) The admission policy for foreign students is variable across universities. If in some of them all students can enroll, in others there is an entrance examination.

The study included a reading task, which we will not detail, as well as a sentence judgment task, which we will focus on. In this last task, the different sentences contained either the correct connective or the wrong one. In addition to these sentences, there were filler sentences, aiming to hide the purpose of the experimental manipulation. We will go back to the notion of fillers in Chapter 6, but it is already useful to say that, in an experiment, the experimental material is hidden within the non-experimental material so that the central manipulation cannot be detected. In this experiment, the filler sentences corresponded to sentences containing the connective when, conveying a temporal relation (the correct version), sentences containing relative propositions introduced by a correct or an incorrect relative pronoun, as well as sentences containing obvious mistakes (such as subject and verb agreement mistakes), in order to verify the participants’ level of attention.

In the judgment task, the participants received the different experimental and filler items in writing and had to indicate for each sentence whether it was correct or not. If the sentence was judged as incorrect, the participants had to circle the mistake in order to check that the connective or the filler mistakes were the source of their response. The number of correct answers was then compared among the different conditions.

The results showed that the incorrect use of when (conditional relation instead of temporal) was less detected by Dutch speakers than by French or English speakers. Likewise, the incorrect use of if (contrastive relation instead of conditional) was less detected by French speakers than by Dutch or English speakers. These results clearly support the hypothesis of a transfer effect in the ability of learners to detect the misuse of connectives in their second language. It is important to note that the results of the online task performed on the same material suggested that incorrect uses of connectives were detected at the reading stage, even if these were not necessarily consciously reported later. This indicates that it is often necessary to test comprehension in different ways, in order to have a more global picture of the processes involved and about the suitability of the different tasks for detecting them.

Acceptability judgments can also be made using a variety of materials, including linguistic and visual cues. For example, Coventry et al. (2001, experiment 1) studied the influence of certain components of a visual scene on the comprehension of spatial prepositions in English: over, under, above and below. Numerous studies have been carried out on spatial prepositions, in order to better grasp the relations underlying each of them (including, for example, Carslon-Radvansky and Radvansky (1996) and Logan and Sadler (1996)), showing, for instance, that above differs from over, in that it defines a higher point which is not in direct contact with the reference object.

It has also been shown that other variables influence the use of these spatial prepositions, such as the frame of reference (Levinson 1996) or the presence of a functional relationship between the elements described in the visual scene (Carslon-Radvensky and Radvansky 1996). In the study we will discuss here, Coventry et al. manipulated two variables: the geometric relationships between the different objects in the scene and their function.

The geometric relationship was operationalized as the position of the object in relation to the ground: the canonical orientation (usual position), an angle of 45° or an angle of 90°. The function of the objects was manipulated in the following way: an element in the scene was missing for the function of the object to be fulfilled, the object fulfilled its function or it didn’t. The combination of the modalities of each variable led to nine possible images for each scene, as illustrated in Figure 4.1 for a rain scene. There were eight different scenes in total, making a total of 72 possible images.

Cartoon illustration of nine different positions of a man holding an umbrella in rain.

Figure 4.1. Examples of situations presented in Coventry et al. (2001)

Each image was associated with two pairs of sentences to be evaluated: one pair containing the prepositions over and under, and another pair containing the prepositions above and below. Participants received a booklet containing the different images associated with different pairs of sentences, presented in a random order. Their task was to assess the acceptability of each sentence for each image, on a scale ranging from 1 (totally unacceptable) to 7 (totally acceptable). The results showed that the two independent variables examined played a role in the choice of the appropriate spatial preposition. As in previous studies, the acceptability of the different prepositions was maximum when the object was in its canonical orientation, and decreased when the angle increased. In relation to the object function, the acceptability of the prepositions was higher when the object fulfilled its function and lower when it didn’t fulfill it. Furthermore, the function influenced the acceptability of the prepositions only in the case where the angle’s position was not standard. Interestingly, the results also revealed that the acceptability of the different pairs of prepositions was not influenced in the same way by the variables studied. The pair over/under was more prone to the influence of function than the pair above/below, whereas the opposite was apparent for the influence of angle.

There is also a particular case of acceptability judgments, known as lexical decision tasks. In these tasks, participants have to decide whether a linguistic stimulus is a real word in their language or not. Lexical decision tasks are usually associated with online measurements, as the time required for decision-making is often recorded and analyzed. For this reason, this type of task will be presented in the next chapter devoted to online measurements. However, they can also be applied without measuring the response time, in order to collect the participants’ answers. For example, Lemhöfer and Broersma (2012) developed an instrument to test proficiency in English as a second language based on the answers to a lexical decision task, including only 60 items. This questionnaire, Lextale, was later adapted to other languages such as French (Brysbaert 2013), Dutch and German.

In summary, acceptability judgments have several advantages. To begin with, they offer the possibility of studying forms which have never been pronounced or are impossible to find in a corpus. They are also easily set up, since they can be carried out with simple means (paper and pencil tasks or online questionnaires). Finally, they can be combined with metalinguistic response justification tasks, in order to answer specific research questions. The main limitation of these tasks is that the quality of acceptability judgments is not always optimal. First, acceptability judgments may not reliably reflect the structure of the language, as participants are sometimes influenced by other factors, such as the overall meaning of the sentence. Therefore, conclusions drawn from acceptability judgments often have to be corroborated by other methods. We will see that this limitation may also apply to other tasks that we will develop later. In general, every method has its limitations and conclusions should be based on results drawn from several studies implementing different tasks. In the example by Zufferey et al. (2015b) presented above, combining the results of online and offline studies made it possible to hypothesize that learners can unconsciously detect mistakes which are not consciously reported when the task requires explicit reflection upon linguistic rules.

A different yet related point is that the validity of the measurement resulting from judgment tasks can be called into question. Indeed, it is possible that the participants do not share the same opinion as linguists about what is (grammatically) acceptable and what is not. Similarly, the acceptability of stimuli may not depend exclusively on the variable under investigation, but on other aspects, such as the difficulty of processing them or their improbability (Branigan and Pickering 2017). The validity of acceptability judgments has often been questioned, since these vary significantly from individual to individual, and even between test phases (Gibson and Fedorenko 2010, 2013; Schütze 2016). A final limitation related to acceptability judgments is the necessity they imply, for the interviewees to have a certain metalinguistic competence. Thus, the above-mentioned limitations concerning metalinguistic tasks can also be applied to these tasks.

4.1.3. Questionnaires

Rather than measuring the acceptability of a linguistic stimulus, it is possible to set up questionnaires for testing comprehension indirectly, how a statement was perceived or whether an argument was found convincing or not. For example, this is the case for the study by Schumann et al. (2019) on the factors influencing the effectiveness of a particular type of fallacious argument called the straw man. The straw man fallacy corresponds to an exaggeration of the original argument advanced by an opponent, in order to present it as unacceptable and easily refute it. Schumann et al. identified three linguistic variables which could influence the acceptance of straw man fallacies, and separately tested them in three experiments. In each experience, dialogues were presented, including the intervention of a first speaker presenting a point of view followed by an argument (10), then the response of a second speaker, which could contain straw man-type (11) fallacious argument, or not:

(10) Barbara: it is essential to support young parents because having a child means a lot of financial charges.

(11) Alexander: let’s increase family allowances since it is only about money.

These fallacious arguments were constructed so as to test three variables of interest. Here, we will only present one of them by way of illustration. One of the hypotheses of the study was that a fallacious argument introduced by a connective like puisque (closely related to the English since) – associated with subjectivity – might trigger the fallacious character of the argument, compared to an argument introduced without using such a connective. Following the presentation of a dialogue, the participants had to respond to four questions relating to the arguments presented in order to – indirectly – assess the effect of the straw man features on the acceptance of such arguments. The first question aimed to evaluate whether the characteristic exaggeration of the straw man was detectable. The second question assessed whether the logical link was perceived to be deficient when the straw man argument was present. The third question assessed the participants’ degree of agreement with the person who had employed a straw man argument. Finally, the fourth question evaluated the participants’ degree of agreement with the first speaker’s initial affirmation. The results particularly revealed that fallacious straw man arguments were better accepted when the argument was not introduced by a connective rather than when it was preceded by the connective puisque. Therefore, some connectives such as puisque alert participants about the subjective and potentially fallacious nature of an argument. In a second set of experiments, they replicated these results in English with since, demonstrating that this effect is not specifically linked to one French connective (Schumann and Zufferey 2020).

The type of questionnaire used by Schumann et al. (2019) differs from the metalinguistic tasks described above in the sense that language is considered as a vector for communication. The comprehension element under investigation does not relate to the linguistic structure itself, but to the participants’ perception about the merit of an argument. Questionnaires make it possible to build items to specifically match the interests of every study, and applicable to all areas of linguistics. However, when building a questionnaire, it is necessary to carefully ponder the many aspects involved, such as the number of questions, the wording of the questions, the answer options, etc. These aspects will not be developed in this book, but we offer some references tackling them at the end of the chapter.

4.1.4. Forced-choice preference tasks

Another means to explicitly assess comprehension are preference tasks, where participants have to choose one from among several answers on the basis of instructions related to certain linguistic properties. These tasks can also take the form of matching tasks between linguistic stimuli and images, in which the stimuli can be words or sentences. In these tasks, participants are asked to choose one image from among many that best represents the stimulus, or to choose a word or phrase that best suits the stimulus presented in the form of images.

For example, Colonna et al. (2012, experiment 1) studied anaphora resolution in French and German speakers, specifically focusing on the influence of subject or object topicalization in sentences. Previous studies had revealed a difference between French and German as to the resolution of ambiguous pronouns, like in (12). While German speakers preferred to associate the pronoun with the first noun mentioned (the postman), French speakers matched it with the second noun (the sweeper):

  1. (12) The postman met the street-sweeper before he went home.

In this study, the researchers hypothesized that these differences in language preferences could be found in ambiguous sentences such as (13). They also wished to find out whether these preferences could be modified by topicalizing either the subject (14) or the object (15) of the sentence. They therefore created items in German and in French, similar to the examples below, which they presented in three versions: an ambiguous version, a version with subject topicalization and a version with object topicalization. Every sentence was followed by a proposition containing the information provided in the subordinate, except for the subject, which the participants had to complete with the name they found appropriate (16):

  1. (13) Peter slapped John when he was young.
  2. (14) As for Peter, he slapped John when he was young.
  3. (15) As for Peter, John slapped him when he was young.
  4. (16) …was young.

The results of this study showed that in the ambiguous condition (13), the choice of the first referent (Peter) was much more frequent among German speakers (69.4%) than among French speakers (38.5%). This was expected, based on the results obtained with sentences like (12). Subject or object topicalization produced different results depending on the language of the participants. Among German speakers, subject topicalization did not increase the proportion of people choosing the first referent. This can be explained by the fact that the first referent is the standard choice in this language. Object topicalization, on the other hand, entailed a decrease in the proportion of people choosing the first referent, indicating that topicalizing makes it possible to influence anaphora resolution, but only when it brings an unusual order to light. Among French speakers, subject topicalization had no effect, and the object was always preferred as a referent. On the other hand, object topicalization increased the preference for the subject. In the latter, we can find the influence of topicalization once more. These results show that topicalization influences anaphora resolution but that it is difficult to reach a general conclusion regarding this effect.

Preference tasks can be adapted to children or to populations with language impairments. It is possible to construct such tasks using non-linguistic stimuli and to ask the subjects to point to the image corresponding to their interpretation. In this case, we speak of pointing tasks. For example, Bernicot et al. (2007) were interested in the way children understand various forms of nonliteral language uses. In order to investigate this question, they examined children between 6 and 10 years old, dividing them into three groups: 6, 8 and 10 years old. They operationalized the difficulty of the nonliteral form by choosing to observe various forms requiring different types of inferences: indirect requests as in (17), idioms as in (18), semantic-inference implicatures as in (19) and conversational implicatures requiring a sarcastic inference as in (20):

(17) Cold is coming through the window. (Close the window.)

(18) Change your tune. (Change the subject.)

(19) Should I mow the lawn? The nephews are sleeping in their room. (No.)

(20) Should I open the umbrella? No, I really like getting sunburnt. (Yes.)

For every type of nonliteral form, researchers created four short stories presented as images, which were then shown to the children on a screen, in the form of a video game. The children had to choose the last picture in the story, which represented an action verifying whether the nonliteral form had been understood or not. For example, in the case of idioms, the image represented either a character changing the subject or a character changing a record. The data collected made it possible to quantify the number of children nonliteral language uses and to analyze the influence of the variables age and nonliteral form on the number of correct answers given. The results showed that age influenced the comprehension of nonliteral forms, 10-year-olds giving more correct answers than 8-year-olds, the latter giving more correct answers than 6-year-olds, except for indirect requests, for which all age groups obtained similar scores. At the same time, the difficulty of the inferences necessary for understanding the nonliteral form also influenced their comprehension. Children gave more correct answers for semantic-inference conversational implicatures than for indirect requests, then for idioms, and, finally, for sarcastic-inference implicatures. This study also showed that the mastery of each type of form is reached at different ages. At the age of 6, children generally understand semantic-inference implicatures and are close to mastering indirect requests; at the age of 8, they generally master indirect requests, and by the age of 10, they master idioms. The mastery of sarcastic-inference implicatures is not reached at the age of 10 years, and does not seem to appear until later in development.

However, these results do not provide all the information one might desire to collect about the acquisition of nonliteral forms by children. While they properly show to what extent children can understand these forms, it is not clear whether they understand the reason why these forms do not convey the literal meaning of what is stated. In order to answer this second question, researchers added a metalinguistic task at the end of each comprehension test, inviting the children to explain why they had chosen a certain image. The answers were then evaluated by researchers, and classified into three categories. The first category corresponded to irrelevant explanations which were simple descriptions or reformulations of what was happening in the chosen image. The second category corresponded to simple explanations, based on the context in which the nonliteral form appeared. The third category included elaborate explanations, in which the children proved their ability to distinguish what was said from what was meant in the nonliteral form. By analyzing the responses, researchers found that the understanding of nonliteral forms develops before the metalinguistic skills associated with such forms. In their study, children were first able to explain idioms (at 8 years old), then sarcastic-inference implicatures and semantic-inference conversational implicatures (at around 10 years old). However, none of the groups was able to explain indirect requests. To sum up, this study made it possible to show that the mastery of nonliteral forms was not related to the metalinguistic abilities displayed by children.

This example illustrates the fact that different methods offer different and complementary insights on the same process, and that each of them has advantages and disadvantages. Preference tasks have the advantage of not being based on metalinguistic abilities, unlike the tasks presented above. Indeed, respondents do not need to explain their choices and can simply follow their intuition. These tasks also make it possible to determine the interpretation these individuals prefer among the ones proposed to them. However, they must be combined with other methods for understanding the processes that underlie the comprehension of a stimulus.

4.1.5. Comprehension tests

The last type of explicit tasks we will present corresponds to comprehension tests. In these tests, linguistic stimuli – generally in the form of sentences or texts – are presented, followed by one or more questions relating to their content. From a formal point of view, this type of test is very similar to the questionnaire described above. However, we will present it separately, since it aims to measure the comprehension of linguistic stimuli in a more explicit and profound manner than the questionnaire. In fact, comprehension tests make it possible to collect two different types of indicators. Firstly, they can be used to determine what is inferred from sentences or texts. Secondly, they provide a way for measuring comprehension, in terms of answer accuracy. In some cases, open-ended questions may also be offered, in order to observe how responses may vary depending on the variables investigated in the study.

Even though they have been put aside for a while and replaced by online tasks (see Chapter 5), offline comprehension tests are highly informative, in that they allow access to the product of comprehension or mental representations resulting from linguistic stimuli (see Ferreira and Yang (2019), for an in-depth discussion on the difficulty of assessing comprehension). For example, comprehension tests have shown that mental representations constructed during the reading of a sentence are not automatically correct and complete, but often are simply good enough representations (Ferreira et al. 2002; Sanford and Graesser 2005). Indeed, various studies have shown that readers do not necessarily process every linguistic stimulus in depth. For example, the question “how many animals of each kind did Moses take on the ark?” is often answered as two (Erickson and Mattson 1981), without noticing the fact that it is Noah and not Moses who is supposed to have built an ark.

In order to better understand the conditions that may provoke a relatively superficial processing of sentences during comprehension, Ferreira (2003) examined various conditions. She was interested in the influence of an unusual structure, cleft sentences, as well as the active or passive mode of the sentence. In the rest of this section, we will describe the experiment testing this last variable in detail, in which experimental items describing simple transitive events were presented. Every item was written either in the active or the passive form, and every argument of the verb could either appear as the agent of the action, or as its theme or subject. A third of these items were symmetrical, that is, the relationship between the arguments was plausible in both directions, as in “The man kissed the woman/The woman kissed the man”. Another third of the items were reversible, meaning that one arrangement was more plausible than the other as in “The dog bit the child/The child bit the dog”. The last third of the items were asymmetrical, that is, the inversion of the elements led to a loss of meaning, as in “The mouse ate the cheese/The cheese ate the mouse”.

Ferreira (2003) asked university students to perform a comprehension task, in which every item was presented orally. For each item, the participants had to indicate either the agent or the theme (or subject) of the sentence.

The analysis of the results showed that the participants gave more correct answers when it came to determining the agent of the sentence rather than its theme. Better performances were visible when the sentences were in the active rather than in the passive voice. These results were valid for symmetrical sentences as well as for reversible sentences or asymmetrical sentences. In the case of asymmetrical sentences, performance was also better when sentences described plausible events. These results show that active sentences are easier to understand than passive ones, and that for the latter, it seems more difficult to separate the information coming from the syntactic form than from thematic roles. In addition, these results show that when the sentence conveys improbable content (as in the case of the cheese eating the mouse), people rely on their knowledge of the world rather than on the content of the sentence in order to completely understand it. This can explain why discourse comprehension is not always optimal.

As we saw in the previous example, it is possible to build a comprehension test for investigating a specific research question by developing items that can help us to manipulate the variables of interest, and then to ask questions relating to these items. In order to obtain a measure of general comprehension, it is possible to turn to tests which have already been constructed and validated, that is, tested by other people and discussed in one or several scientific publications. Such tests make it possible to get to know the standards in which the expected results are placed. There are many standardized tests, generally accessible directly via the people who developed them, or through a test library. Examples of such tests are the developmental reading assessment (Beaver and Carter 2019) or the Peabody picture vocabulary test (Dunn and Dunn 2007).

4.2. Implicit tasks

We will now turn to implicit tasks for measuring offline comprehension. Even if some of the above-mentioned tasks can also be considered to be implicit, the tasks we will now present are special, in that (a) they do not directly ask an opinion of the persons tested, and (b) try to access representations or mental processes which cannot necessarily be approached by means of explicit tasks. Implicit tasks make it possible to circumvent some limitations inherent in explicit tasks, such as their strong dependence on the (meta)linguistic abilities of the participants, and the difficulty in explicitly accessing certain processes. For example, this is the case of the processes underlying the organization of the mental lexicon. Implicit tasks are generally associated with the online study of comprehension, as we will see in the next chapter, but they can also be applied offline, in particular, via action tasks, in which the behavior resulting from a linguistic stimulus is observed, or by recall or recognition tasks, as we will see below.

4.2.1. Action tasks

In action tasks, the participants have to perform an action on the basis of a linguistic stimulus. This action often involves playing with figurines in order to reconstruct a scene described orally or in writing. This type of task is particularly well suited to demographics such as children and people with language impairments, since comprehension can be measured without the participants having to use language or provide metalinguistic explanations.

An example of such a task is presented in the study by Chan et al. (2010) on the acquisition of the canonical subject–verb–object (SVO) transitive word order in children. Chan et al. tested three groups of English-speaking children aged approximately 2 years, 2 years and 9 months and 3 years and 5 months old, using an intermodal preferential looking online task, which we will not describe here, and an act-out task. The material for the experiment comprised 10 pairs of plastic animals and six verbs, two of which were familiar verbs (kick and push) and four were invented verbs (meek, pilk, gorp and tam), each describing a specific action. For example, the verb tam corresponded to the action of swinging. Before the task, one of the experimenters presented the child with the animal figurines and made sure that the child knew each animal well. Following this, the task itself began. It consisted of six trial sets (corresponding to the different verbs) and each trial set consisted of three phases. During the demonstration phase, the experimenter took a pair of animals and made one animal act on the other by saying “Look! This is VERB-ing!”. It is important to observe that during the demonstration phase, the verbs were presented in isolation, the above sentence indicating neither subject nor object. During the training phase, the experimenter then gave the animals to the child and asked them to perform the same action by repeating “Yes, this is VERB-ing!”. Then, the experimenter reversed the animals’ roles, and the demonstration and training phases were repeated. The last phase corresponded to the testing phase, in which the experimenter gave two new animals to the child and said “Look, the A is VERB-ing the B! Can you do it?” and then waited for the child to complete the action.

The results of the study showed that the older the children, the higher the number of correct answers. The type of verb also played a role, as familiar verbs gave rise to a higher number of correct answers than invented verbs. Analyzing the results per age category, Chan et al. (2010) found that children over 3 years old obtained similar results for the two types of verbs and, most of the time, they represented the scene correctly. 2-year-olds, on the other hand, mostly failed to represent the scene (between 32% and 39% of correct answers) for both types of verbs. The intermediate group succeeded in correctly describing the scene 80% of the times when familiar verbs were involved, and 63% of the times with invented verbs.

As we can see in this example, action tasks are implicit in the sense that participants are not directly asked to assess their understanding or to explain a linguistic stimulus. However, the instructions and the procedure are not necessarily implicit: the task of the previous study, as well as the explanations and encouragement by the experimenter, placed the emphasis on a very specific action.

It is nonetheless also possible to manipulate the implicit or explicit nature of instructions in an action task. As for Kissine et al. (2015), they studied the comprehension of indirect requests by children with Autism Spectrum Disorder (ASD). Children with ASD have often been said to present a global pragmatic impairment. However, studies have shown that some pragmatic skills are preserved among children with ASD, which suggests that these skills might be related to two different processes: one based on the theory of mind (the ability to draw inferences as to the intentions of others) and the other based on contextual indicators, which can forego this type of inference. As indirect requests highly depend on the context in which they are issued, Kissine et al. hypothesized that these should be understood even by people who lack functional theory of mind skills.

In their study, Kissine et al. (2015) tested children with ASD aged between 7 and 12 years, and 3-year-old neurotypical children. The control group was chosen so that the children would have similar skills to ASD children, in terms of linguistic development and theory of mind. The experiment in which the children took part was as follows. Every child sat in a quiet room with two experimenters, one of whom interacted with the child, while the other pretended to read a magazine. The first experimenter presented the child with four copies of Mr. Potato Head, a toy made up of a head attached to feet and which can be decorated with various elements such as a nose, eyes, glasses, hat, etc. After presenting the figurine to the child, the latter could add the elements he or she wished. After a certain time and according to the elements already attached by the child, the first experimenter pronounced the sentence “Oh, he doesn’t have a hat/glasses!” In this context, this statement was an indirect request to add a hat or glasses to the toy, whereby the addition of the accessory contained in the request represented a correct answer.

Besides, in order to verify that the child’s action corresponded to their comprehension of an indirect request and not to an automatic action based on a linguistic stimulus, two other target sentences were later presented in the experiment. Once the first Mr. Potato Head was assembled, the first experimenter invited the child to create a second one. During the assembly of the second toy, the second experimenter, ostensibly looking at her magazine, repeated the same sentence as the previous one, “Oh, he doesn’t have a hat/glasses!” At that time, the first experimenter moved near the second one, looked at the magazine and repeated the sentence again. In these two cases, the sentence should not be interpreted as an indirect request, since it was not directly addressed to the child.

Kissine et al. (2015) then coded the actions performed by the children following the different sentences and compared the results between the two groups of children. Children with ASD all responded correctly to the indirect request, whereas the children in the control group were less likely to do so. In addition, children with ASD also adopted proper behavior in response to the second and the third appearance of the target sentence, by not adding an accessory to their toy. The children in the control group had more difficulty not reacting when the sentence was spoken for the third time, which suggests that the task was more difficult for them due to their young age. These results support the idea that children with ASD understand indirect requests based on contextual cues. They also enable us to revisit previous results based on metalinguistic tasks suggesting that the comprehension of such requests was not yet acquired by these children. Once again, this study reveals that different tasks are based on different processes and abilities, and that it is therefore always advisable to combine various approaches.

In addition to not depending on metalinguistic skills, action tasks have the advantage of having good ecological validity, by making it possible to keep the experimental situation as close as possible to a daily-life situation, and to reduce the stress or the apprehension the participants could feel. However, they have an important limitation as to the cognitive skills they require. Since it is necessary to keep the stimulus in mind in order to prepare and to perform the action resulting from it, they can pose problems for populations with memory or executive function deficits, such as people with aphasia, for example. Their use in children can also be problematic for the same reasons. In the experiment conducted by Chan et al. (2010), for example, the results obtained in the action task did not show a difference between the conditions in 2-year-old children, whereas such differences did in fact emerge in the online task. As we can see once more, varying the methods seems to be the best solution for overcoming these limitations.

4.2.2. Recall tasks and recognition tasks

Recall tasks and recognition tasks provide access to the mental representations that people construct during language processing. They are based on the assumption that when something is understood on the basis of a linguistic stimulus, this element is encoded and stored in memory. Testing the participants’ memory, after reading or listening to a linguistic stimulus, allows us to access their linguistic representations, since the linguistic processing has already been carried out. In this type of task, not only are the correct answers interesting, but so are the errors made. These reveal the similarity between the stimuli or the processes underlying comprehension, as we will see later in the examples.

A recognition or recall task generally takes place in three phases: (a) a first learning phase, in which the linguistic items to be remembered are presented; (b) a break or another task, in order to “empty” the short-term memory; and (c) a test phase, in which the previous linguistic items and other items are presented. During this last phase, the participants have to decide whether the items presented are the same as the ones presented in the first phase, or simply recall the items presented in the first phase. In order to analyze the results, it suffices to determine the number of correct answers for the recognition task. As regards the recall task, it is necessary to decide which answers are considered correct among those produced by the participants.

A classic example of a recognition task can be found in the study by Bransford et al. (1972), aimed at determining whether mental representations are exclusively representations of the propositional structure of the sentences, or whether they contain information going beyond this structure. To do this, the authors created 14 scenarios for which it was possible to construct two different situations by manipulating an adverb and a pronoun. The best known example is the one presented below, featuring turtles, a fish and a log. For each scenario, a pair of sentences described the same situation, as in (21) and (22), and a pair of sentences described different situations, as in (23) and (24):

(21) Three turtles rested on a floating log, and a fish swam beneath them.

(22) Three turtles rested on a floating log, and a fish swam beneath it.

(23) Three turtles rested beside a floating log, and a fish swam beneath them.

(24) Three turtles rested beside a floating log, and a fish swam beneath it.

In the learning phase, a sentence relating to each scenario was presented orally, either (21) or (23). In the test phase, the researchers presented the sentences heard and additional sentences to the participants, who had to indicate which sentence had been presented before, as well as their degree of certainty about their response. When sentence (21) was presented in the first phase, (21) and (22) were presented in the test phase. When sentence (23) was presented in the first phase, (23) and (24) were presented in the test phase. The main point of the researchers’ manipulation was the fact that the second sentence of the pair was different from the first sentence at the propositional level, since the final pronoun was modified. This modification of the pronoun, however, only resulted in a modification of the situation described for the pair (23) and (24), but not for the pair (21) and (22). If we build our representations on a propositional basis, we should generally be able to distinguish the sentences presented during the training phase from those added at the testing phase, and no difference should appear between the types of pairs during the recall phase. On the other hand, if we build representations going beyond the text, we can expect more recognition errors between (21) and (22), which describe similar situations, than between (23) and (24), describing different situations. The results corroborated the latter scenario, supporting the hypothesis that our mental representations go beyond the simple content of text or discourse.

Another example, this time relating to a recall task, is a classic study aimed at showing the importance of context for reading comprehension. In this study, Bransford and Johnson (1972) made their participants listen to short texts like the one presented below and asked them to retain as much information as possible from the text so that it could be recalled at the end of it:

“The procedure is actually quite simple. First, you arrange things into different groups depending on their makeup. Of course, one pile may be sufficient depending on how much there is to do. If you have to go somewhere else due to lack of facilities, that is the next step, otherwise you are pretty well set. It is important not to overdo things. That is, it is better to do too few things at once than too many. In the short run this may not seem important, but complications can easily arise. A mistake can be expensive as well. The manipulation of the appropriate mechanisms should be self-explanatory, and we need not dwell on it here. At first the whole procedure will seem complicated. Soon, however, it will become just another facet of life. It is difficult to foresee any end to the necessity for this task in the immediate future, but then one never can tell.” (Bransford and Johnson 1972, p. 722)

At first glance, this text is difficult to follow and it is difficult to remember it all at once at the end. Now, imagine that before your reading, you received the information that the passage would be about washing clothes. In this case, the text becomes much easier to understand, because your general knowledge helps you to build a context in which to interpret the sentences as you hear them. In one experiment, Bransford and Johnson (1972, experiment 2) separated their participants into three groups. The first group received an indication of the context before listening to the passage, the second group received this information after listening to the passage, whereas the third group received no indication at all. As expected, the indication before listening to the text facilitated the correct recall of the elements.

These examples show that recognition or recall tasks provide implicit access to mental representations as well as to the variables that can influence such representations. It is, however, necessary to observe that the measurement made during such tasks depends not only on the comprehension process, but also on the processes involved within the task itself. For recall tasks, in particular, it has been shown that the first and last pieces of information presented are generally better remembered (Potter and Levy 1969). In this case, it is essential to take this phenomenon into account when setting up the presentation order of the items, by randomizing it, for instance (see Chapter 6 for more information).

4.3. Conclusion

In this chapter, we have presented various offline methods which can be applied to the study of language comprehension. We have shown that these tasks lie on a continuum between explicit and implicit. While explicit tasks test comprehension in a more direct way, implicit tasks do so more indirectly. We have also seen that explicit tasks are often based on the evaluation of linguistic stimuli (e.g. the use of specific questions), most of which require the use of specific metalinguistic abilities, which can hinder their use with certain populations. Implicit tasks provide a way of circumventing this problem, by examining comprehension in a roundabout way. Throughout this chapter, we have also argued that different methods can lead to different results, and that it is necessary to combine different methods in order to benefit from the advantages of each one, while going beyond their limitations.

4.4. Revision questions and answer key

4.4.1. Questions

  1. 1) If you had to choose between a metalinguistic task and an action task, in your opinion, which method would be the most suitable for studying the comprehension of syntactically complex sentences in people suffering from aphasia?
  2. 2) What is the difference between a recognition task and a recall task?
  3. 3) Two researchers want to set up an acceptability task. The first one wishes to use a YES/NO scale, whereas the second one wishes to use a scale from 0 (not at all acceptable) to 5 (completely acceptable). What could be the arguments by each of them?
  4. 4) Researchers wish to study the comprehension of irony in L2 in beginner learners. Which of the methods presented in this chapter do you find most appropriate?
  5. 5) What is the point of offline comprehension tests?
  6. 6) How could the comprehension test carried out by Ferreira (2003) (see section 4.1.5) be applied to preschool children?

4.4.2. Answer key

  1. 1) The different forms of aphasia are characterized by a difficulty in producing language. This can be almost complete in cases of global aphasia or limited to certain aspects of language production in other cases. Different processes are required from participants when carrying out metalinguistic tasks, such as consciously accessing their intuitions about language and succeeding in expressing them. People with aphasia suffer from certain deficits, which can greatly interfere with the processes required for carrying out metalinguistic tasks. For this reason, it may be more suitable to test them using action tasks, where there is no need to resort to language. On the other hand, action tasks require good memory abilities, such as working memory, which can sometimes also be affected in people with aphasia. It would therefore be necessary to make sure that these abilities are properly preserved in the participants before implementing an experiment based on an action task.
  2. 2) Recognition and recall tasks both examine the mental representations developed during language processing, by testing the memory of the participants after a learning phase. The difference between these two methods lies on how the memory is tested. During a recognition task, the stimuli of the learning task are presented in parallel with new stimuli, and participants have to make a distinction between the stimuli already presented and the new stimuli. In a recall task, no stimulus is presented, and participants have to recall as many elements as possible.
  3. 3) The first researcher, wishing to use a binary YES/NO scale, could argue that the different acceptability scales are all as informative as one another. On this basis, the participants’ evaluation task could be simplified by offering them only two choices, acceptable and not acceptable. The researcher could add that it is difficult to evaluate a degree of acceptability, in the sense that a statement contains a grammatical error or not and is semantically relevant or not. Finally, they could question the usefulness of a six-category scale, from which definitive conclusions cannot necessarily be drawn based on the difference between categories.

The second researcher, who wishes to use a six-point scale, could argue that there may be different degrees of perceived acceptability, depending on the importance of the linguistic aspects manipulated in the experience and the participants’ intuitions. For example, a simple sentence containing a grammatical error could be considered completely unacceptable, whereas a complex sentence containing the same error could be considered partially unacceptable (since there would be proportionately more correct elements in the sentence). This researcher could also argue that reducing the participants’ response to YES/NO could force participants choices, whereas their responses may be more nuanced. As regards the number of points used on the scale, the second researcher could agree with the opinion of the first one and propose a scale containing more points, which could subsequently be considered as a ratio scale for the analyses.

  1. 4) The answer to this question depends on the research question examined, which, in this case, is the comprehension of irony in a second language. An adequate task for this type of question could be a metalinguistic task, in which the participants should report their interpretation of an utterance. Another possible task could be a preference task, where the participants could show their comprehension of the irony of the utterance by choosing a response as in the experiment by Bernicot et al. (2007) (see section 4.1.4). An action task based on the utterance could also be implemented. The second element to take into account when answering the question is the language in which the study is carried out. If this is done in the language that people are learning, it is necessary to take into account that it can be difficult for them to express themselves in this language. It is also possible that their present knowledge of the language does not yet allow them to understand or to convey complex ideas. In this case, the use of certain metalinguistic tasks is compromised, and it would be more appropriate to turn to preference or to action tasks. If the experiment is run in the participants’ mother tongue, then it would be possible to examine their comprehension of irony by means of metalinguistic tasks.
  2. 5) Offline comprehension tests provide access to the content of mental representations that people build during language comprehension. In other words, they make it possible to observe the elements that people retain and what they have really understood when processing a linguistic stimulus. The use of such tasks enables examination of the influence of specific variables on comprehension. Not only do they contribute to the evaluation of comprehension skills in general, but also to the definition of groups of people with good or not so good comprehension skills, on the basis of standards validated by other researchers.
  3. 6) In her study, Ferreira (2003) examined the influence of the active or passive mode of the sentence on comprehension. Each sentence had two elements, one appearing as an agent and the other as a theme. After each sentence, the participants had to indicate either the agent or the theme of the sentence. In order to adapt this experience to preschoolers, rather than asking them to indicate the agent or the theme of the sentence, it might be more appropriate to implement a preference task. In this case, we would present them with two images for each sentence, staging its elements, and whose arrangement might correspond to the statement in question or not. For example, for the sentence The cat is chased by the mouse, one image could show a cat chased by a mouse, whereas the other image could show a mouse being chased by a cat. Children should then simply indicate which image corresponds to the statement. It would also be possible to implement an action task, by asking the children to reproduce the content of the statements presented to them with the help of figurines.

4.5. Further reading

For a detailed presentation of acceptability judgments, see Schütze and Sprouse (2013). Rasinger (2010) and Wagner (2015) develop different aspects related to the creation and use of questionnaires. The book by Gonzalez-Marquez et al. (2007b) offers numerous examples of the application of the techniques discussed in this chapter to the field of cognitive linguistics. For a comparison of the pros and cons of offline and online measurements for testing comprehension, see Ferreira and Yang (2019).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset