6
Lexical Stress in English Pronunciation

ANNE CUTLER

English lexical stress and its pronunciation implications

Not all languages have stress and not all languages that do have stress are alike. English is a lexical stress language, which means that in any English word with more than one syllable, the syllables will differ in their relative salience. Some syllables may serve as the locus for prominence-lending accents. Others can never be accented.

In the word language, for example, the first syllable is stressed: LANGuage (henceforth, upper case will denote a stressed syllable). If the word language receives a principal accent in a sentence, either by default (She studies languages) or to express contrast (Did you say language games or anguish games?), the expression of this accent will be on language’s first syllable. The second syllable of language is not a permissible location for such accentuation. Even if we contrive a case in which the second syllable by itself is involved in a contrast (What was the new password again: “language” or “languish”?), it is more natural to express this contrast by lengthening the final affricate/fricative rather than by making each second syllable stronger than the first. The stress pattern of an English polysyllabic word is as intrinsic to its phonological identity as the string of segments that make it up.1

This type of asymmetry across syllables distinguishes stress languages from languages that have no stress in their word phonology (such as, for instance, many Asian languages). Within stress languages, being a lexical stress language means that stress can vary across syllable positions within words, and in principle can vary contrastively; this distinguishes lexical stress languages from fixed-stress languages (such as Polish or Finnish), where stress is assigned to the same syllable position in any word (the penultimate syllable in Polish; the initial syllable in Finnish).

The “in principle” qualification on contrastivity holds not only for English; in all lexical stress languages, minimal pairs of words varying only in stress are rare. English has only a few (INsight versus inCITE and FOREbear versus forBEAR, for example); they require two successive syllables with full vowels, and this is in any case rare among English words. Stress alone is not a major source of inter-word contrast in English.

One way in which English does vary stress across words, however, is by the role stress plays in derivational morphology. Adding a derivational affix to an English word, and thus creating a morphologically related word of a different grammatical class, very often moves the location of the primary stress to a different syllable; we can adMIRE a BAron as a PERson who is aristoCRATic or express our admiRAtion for his baRONial ability to perSONify the arisTOCracy.

Rhythmically, English prefers to avoid successive stressed syllables, and alternation of stressed and unstressed syllables characterizes English speech. There is an obvious implication of this preference for stress alternation, together with the fact that English words may have only one primary stressed syllable but may have three, four, or more syllables in all: there are different levels of stress. Thus in admiration and aristocracy, with primary stress in each case on the third syllable, the first syllable bears a lesser level of stress (often referred to as “secondary stress”; see the metrical phonology literature, from Liberman and Prince 1977on, for detailed analyses of relative prominence in English utterances).

Finally, English differs from some other lexical stress languages in how stress is realized. The salience difference between stressed and unstressed syllables is realized in several dimensions; stressed syllables are longer, can be louder and higher in pitch or containing more pitch movement than unstressed syllables, and the distribution of energy across the frequency spectrum may also differ, with more energy in higher-frequency regions in stressed syllables (for the classic references reporting these analyses, see Cutler 2005a). The difference between a stressed and an unstressed version of the same syllable can be clearly seen in Figure 6.1.

c6-fig-0001

Figure 6.1 The verb perVERT (upper three panels) and the noun PERvert (lower three panels), which differ in stress, spoken by a male speaker of American English in the carrier sentence Say the word …. again. The three display panels of each figure are: top, a broad-band spectrogram; middle, a waveform display; below, a narrow-band spectrogram. Vertical lines in each panel indicate the onset and offset of the example word pervert. The figure is modelled on a figure created by Lehiste and Peterson (1959: 434). The stressed syllables (the second syllable of the verb, in the upper panels, and the first syllable of the noun, in the lower panels) are longer, louder, and higher in pitch than the unstressed versions of the same syllables (the first syllable of the verb, the second syllable of the noun). The length difference can be particularly well seen in the broad-band spectrogram, the loudness difference in the waveform, and the pitch difference in the narrow-band spectrogram, where the higher the fundamental frequency (pitch), the wider the spacing of its resonants (the formants, forming stripes in the figure).

All these dimensions are suprasegmental, in that a given sequence of segments retains its segmental identity though it can be uttered in a shorter or longer realization, with higher or lower pitch, and so on (see Lehiste 1970for a still unsurpassed account of suprasegmental dimensions). All lexical stress languages use such suprasegmental distinctions, but English also distinguishes stressed and unstressed syllables segmentally, in the patterning of vowels. In English, vowels may be full or reduced. Full vowels may be monophthongs (e.g., the vowels in Al, ill, eel) or diphthongs (as in aisle, oil, owl), but they all have full vowel quality. Reduced vowels are centralized, with schwa the most common such vowel (the second vowel in Alan or the first in alone). Any stressed syllable in English must contain a full vowel (e.g., the first vowel in language). Any syllable with a reduced vowel (e.g., language’s second syllable) may not bear stress.

In this last feature, English obviously differs from lexical stress languages without reduced vowels in their phonology (e.g., Spanish); in such languages, suprasegmental distinctions are the only means available for marking stress. In English, the segmental reflection of stress is so important that linguists have observed that it is possible to regard English as a two-level prominence system: full vowels on one level, reduced vowels on the other (Bolinger 1981; Ladefoged 2006). This segmental feature is crucial to the functioning of stress, not only in the phonology but also in language users’ production and perception of words and sentences. As we shall see, its role in speech perception in particular entails that when a slip of the tongue or a non-native mispronunciation causes alteration of the patterning of full and reduced vowels, then recognition of the intended word is seriously hindered.

The perception of English lexical stress by native listeners

If lexical stress by itself rarely makes a crucial distinction between words, how important is it for recognizing words? The segmental building blocks of speech – vowels and consonants – certainly do distinguish minimal pairs of words. We need to identify all the sounds of creek to be sure that it is not freak, Greek, clique, croak, crack, creep, or crease. However, minimal pairs such as incite/insight occur so rarely in our listening experience that there would be little cost to the listener in ignoring the stress pattern and treating such pairs as accidental homophones, like sole/soul, rain/rein/reign, or medal/meddle. Languages do not avoid homophony – quite the reverse – in that new meanings tend not to be expressed with totally new phonological forms, but are by preference assigned to existing forms (web, tweet, cookies). This preference occurs across languages and putatively serves the interest of language users by reducing processing effort (Piantadosi, Tily, and Gibson 2012). Indeed, there is evidence from psycholinguistic laboratories that English words with a minimal stress pair do momentarily make the meanings of each member of the pair available in the listener’s mind (Cutler 1986; Small, Simon, and Goldberg 1988), just as happens with accidental homophones such as sole/soul (Grainger, Van Kang, and Segui 2001).

This by no means implies that stress is ignored by English listeners. The role of any phonological feature in speech perception is determined by its utility; listeners will make use of any speech information if it helps in speech recognition, and they will use it in the way it best helps. Vocabulary analyses show that there is indeed little advantage for English listeners in attending to the suprasegmental reflections of stress pattern over and above the segmental structure of speech, as this achieves only a relatively small reduction in the number of possible words to be considered (Cutler, Norris, and Sebastián-Gallés 2004; Cutler and Pasveer 2006; this English result contrasts significantly with the large reductions achieved when the same analyses are carried out for Spanish, Dutch, and German, all of which are lexical stress languages, but none of which have the strong segmental reflection of stress found in English).

Vocabulary analyses reveal, however, that there is a highly significant tendency for stress in English words to fall on the initial syllable, and this tendency is even greater in real speech samples (Cutler and Carter, 1987).2 There is an obvious reason for this: about a quarter of the vocabulary consists of words with unstressed initial syllables, but most of the words in this set have a relatively low frequency of occurrence (pollution, acquire, arithmetic). The higher-frequency words, i.e., the ones most often heard in real speech, are shorter and more likely to have just a single stressed syllable that is either the word-initial syllable (garbage, borrow, numbers) or the only syllable (trash, take, math). This pattern has a very important implication for listeners to English: it means that in any English utterance, a stressed syllable is highly likely to be the beginning of a new word. Since most unstressed syllables are reduced, it is furthermore even a reasonable bet that any syllable containing a full vowel is likely to be the beginning of a new word.

English listeners grasp this probability and act on it. Segmentation of speech signals into their component words is a nontrivial task for listeners, since speech signals are truly continuous – speakers run the words of their utterances together, they do not pause between them. Listeners, however, can only understand utterances by identifying the words that make them up, since many utterances are quite novel. Any highly predictive pattern, such as the English distribution of stress, is therefore going to prove quite useful.

Psycholinguistic experiments with a task called word-spotting, in which listeners detect any real word in a spoken nonsense sequence, provided the first demonstration of English listeners’ use of the pattern of full and reduced vowels in segmentation. The input in word-spotting consists of sequences such as obzel crinthish bookving and the like (in this case, only the third item contains a real word, namely book). A word spread over two syllables with a full vowel in each (e.g., send in sendibe [sɛndɑɪb]) proved very difficult to detect, but if the same word was spread over a syllable with a full vowel followed by a syllable with a reduced vowel (e.g., send in sendeb [sɛndɘb]), it was much easier to spot (Cutler and Norris 1988). Response times were faster in the latter case and miss rates were lower. In the former case, detection of the embedded word is hindered by the following full vowel because it has induced listeners to segment the sequence at the onset of the second syllable (sen - dibe). They act on the strategy described above: any syllable with a full vowel is likely to be a new word. Consequently, detection of send requires that its components (sen, d) be reassembled across this segmentation point. No such delay affects detection of send in sendeb because no segmentation occurs before a syllable that has a reduced vowel.

Missegmentations of speech show exactly the same pattern. Listeners are far more likely to erroneously assume a stressed syllable to be word-initial and unstressed syllables to be word-internal than vice versa (Cutler and Butterfield 1992). In an experiment with very faint speech, unpredictable sequences such as conduct ascents uphill were reported, for instance, as the doctor sends her bill – every stressed syllable becoming word-initial here. In collections of natural slips of the ear the same pattern can be observed; thus the song line she’s a must to avoid was widely reported in the 1980s to have been heard as she’s a muscular boy, with the stressed last syllable taken as a new word, while the unstressed two syllables preceding it are taken as internal to another word. Jokes about misperception also rely on this natural pattern – an old joke, for instance, had a British Army field telephone communication Send reinforcements, we’re going to advance perceived as Send three-and-fourpence, we’re going to a dance. Once again, stressed syllables have been erroneously assumed to be the beginnings of new words.

This segmentation strategy works well for English and more than compensates for the fact that stress distinctions by themselves do not often distinguish between words. In fact the stress-based segmentation used by English listeners falls in line with strategies used for speech segmentation in other languages, which tend to exploit language-particular rhythmic characteristics. In French and Korean, rhythmic patterns (including poetic patterns) are syllable-based and so is listeners’ speech segmentation (Mehler et al. 1981; Kim, Davis, and Cutler 2008). In Japanese and Telugu, rhythm (again, poetic rhythm too) is based on the mora, a subsyllabic unit, and speech segmentation is mora-based too (Otake et al. 1993; Murty, Otake, and Cutler 2007). English, with its stress-based poetic forms and stress-based speech segmentation, further confirms the cross-language utility of speech rhythm for segmentation (see also Chapter 7 on rhythmic structure in this volume).

Given the acoustic reflections of stress described above, visible in Figure 6.1, English stressed syllables are, of course, more easily perceptible than unstressed syllables. They are easier to identify out of context than are unstressed syllables (Lieberman 1963) and speech distortions are more likely to be detected in stressed than in unstressed syllables (Cole, Jakimik, and Cooper 1978; Cole and Jakimik 1980; Browman 1978; Bond and Garnes 1980). Nonwords with initial stress can be repeated more rapidly than nonwords with final stress (Vitevitch et al. 1997; note that such nonwords are also rated to be more word-like, again indicating listeners’ sensitivity to the vocabulary probabilities).

However, there is a clear bias in how English listeners decide that a syllable is stress-bearing and hence likely to be word-initial; the primary cue is that the syllable contains a full vowel. Fear, Cutler, and Butterfield (1995) presented listeners with tokens of words such as audience, auditorium, audition, addition, in which the initial vowels had been exchanged between words. The participants rated cross-splicings among any of the first three of these as insignificantly different from the original, unspliced tokens. Lower ratings were received only by cross-splicings involving an exchange between, for example, the initial vowel of addition (which is reduced) and the initial vowel of any of the other three words. This suggests that preserving the degree of stress (primary stress on the first syllable for audience and secondary stress for auditorium, an unstressed but full vowel for audition) is of relatively little importance compared to preserving the vowel quality (full versus reduced).

In other stress languages, suprasegmental cues to stress can be effectively used to distinguish between words. In Dutch, the first two syllables of OCtopus “octopus” and okTOber “october” differ only suprasegmentally (not in the vowels), and in Spanish, the first two syllables of PRINcipe “prince” and prinCIpio “beginning” likewise differ only suprasegmentally. In both these languages, auditory presentation of a two-syllable fragment (princi-, octo-) significantly assisted subsequent recognition of the matching complete word and significantly delayed subsequent recognition of the mismatching complete word – for example, recognition of principe was slower after hearing prinCI- than after a neutral control stimulus (Soto-Faraco, Sebastián-Gallés, and Cutler 2001; Donselaar, Koster, and Cutler 2005). This delay is important: it shows that the word mismatching the spoken input had been ruled out on the basis of the suprasegmental stress cues. The delay is not found in English. Actually, directly analogous experiments are impossible in English since the segmental reflections effectively mean that there are no pairs of the right kind in the vocabulary! In English, the second syllables of octopus and october are different, because the unstressed one – in octopus – has a reduced vowel, which thus is quite different from the stressed vowel in october’s second syllable. However, in some word pairs, the first two syllables differ not in where the stressed syllable is but in what degree of stress it carries; for instance, admi- from ADmiral has primary stress on the first syllable, while admi- from admiRAtion has secondary stress on the first syllable. In the Dutch and Spanish experiments, such fragments had also been used and had duly led to facilitation for match and delay for mismatch. Cooper, Cutler, and Wales (2002) found a different pattern, however, for such pairs of English words; match facilitated recognition but, crucially, mismatch did not inhibit it, showing that here suprasegmental information for stress had not been used to rule out the item it mismatched.

We conclude, then, that for English listeners the most important reflections of their language’s stress patterning are the segmental ones. These are drawn on with great efficiency in parsing utterances and recognizing words. The suprasegmental concomitants of the stress variation, in contrast, are to a large degree actually ignored. Direct evidence for this comes from an experiment by Slowiaczek (1991) in which English listeners heard a sentence context (e.g., The friendly zookeeper fed the old) followed by a noise representing a stress pattern (cf. DAdada or daDAda). The listeners then judged whether a spoken word was the correct continuation of the sentence as signaled by the stress pattern. Slowiaczek found that listeners frequently ignored the stress pattern, for instance accepting gorilla as the continuation of this sentence, even when the stress pattern had been DAdada, or accepting elephant when the stress pattern had been daDAda. They apparently attended to the meaning only (a contextually unlikely word, such as analyst, thus was rejected whether the stress pattern matched it or not).

Slowiaczek (1990) also found that purely suprasegmental mis-stressing of English words (e.g., switching secondary and primary stress, as in STAMpede for stamPEDE) did not affect how well noise-masked words were recognized. This was fully in line with the earlier studies, which had shown that the stress pattern did not help to discriminate minimal stress pairs (Cutler 1986; Small, Simon, and Goldberg 1988) and that mis-stressing English words did not inhibit recognition if no segmental change but only suprasegmental changes were made (Bond 1981; Bond and Small 1983; Cutler and Clifton 1984; see also the section below on mispronunciation of stress).

The English vocabulary does not offer much processing advantage for attention to suprasegmental information; English listeners, therefore, largely concentrate on the cues that do provide rapid recognition results, i.e., the segmental cues. Because English stress has segmental as well as suprasegmental realizations, and the segmental patterns are systematically related to the location of word boundaries, attending chiefly to segmental patterns still allows English listeners to use stress information in segmenting utterances into their component words.

The production of English lexical stress by native speakers

The perceptual evidence does not suggest that speakers adjust suprasegmental parameters separately while articulating English, nor that stress is computed on a word-by-word basis during speech production. Rather, the evidence from perception would be compatible with a view of speech production in which the segmental structure of a to-be-articulated word is retrieved from its stored representation in the mental lexicon, and the metrical pattern of the utterance as a whole is mapped as a consequence of the string of selected words. Exactly such a view is proposed by the leading psycholinguistic modelers of speech production (Levelt 1989; Levelt 1992; Shattuck-Hufnagel 1992; Levelt, Roelofs, and Meyer 1999).

Some relevant evidence comes from slips of the tongue: English native speakers do occasionally make slips in which stress is misplaced (Fromkin 1976; Cutler 1980a). However, it seems that such errors may be an unwanted side-effect of the derivational morphology of English! That is, the errors exhibit a very high likelihood of stress being assigned to a syllable that is appropriately stressed in a morphological relative of the target word. Some examples from published collections of stress errors are: hierARCHy, ecoNOmist, homogeNEous, cerTIFication. These four words should have received primary stress respectively on their first, second, third, and fourth syllables, but the stress has been misplaced. It has not been randomly misplaced, however; it has landed precisely on the syllable that bears it in the intended words’ relatives hierarchical, economics, homogeneity, and certificate respectively.

This pattern suggests, firstly, that words with a derivational morphological relationship are stored in proximity to one another in the speakers’ mental lexicon. This is certainly as would be expected given that the organization of a production lexicon serves a system in which meaning is activated first, to be encoded via word forms located in the lexical store. Secondly, the stress error facts suggest that the location of primary stress is represented in these stored forms in an abstract way: given the typical patterning of such derivationally related sets of English words, in many cases the mis-stressing led to a vowel change. Again, this makes sense: each word has its canonical segmental structure (sequence of vowels and consonants) represented in the lexicon, and since words may have more than one syllable with a full vowel, an abstract code is needed to indicate which syllable should receive primary stress. In a stress error, the marking assigned to a particular syllable in one word among a group of related entries has accidentally been applied to the same syllable in another word.

In producing an utterance, then, speakers have to construct an overall smooth contour in which each of the selected words is appropriately uttered and, most importantly, in which the meaning of the utterance as a whole (for instance, the focal emphasis, the expression of a statement or of a question, and the relation of the words in the utterance to the ongoing discourse) is correctly captured. Pitch accents will be applied in accord with the choices driven by such discourse constraints (see Shattuck-Hufnagel and Turk 1996for much relevant evidence). Remaining in the domain of lexical stress, where the pitch accents fall will be determined by the markings that, within any polysyllabic word, denote the location of primary stress. As already described, only a stressed syllable can be accented in a sentence.

There is considerable evidence that speakers plan a metrical structure for their utterance and that it is based on the alternating rhythm described in the first section above (see, for example, Cummins and Port 1998). English slips of the tongue in which a syllable is accidentally omitted or added tend to lead to a more regular rhythm than the correct utterance would have had (Cutler 1980b), a pattern that is also found in the way syllables are added by optional epenthesis in the rhythmically similar language Dutch (Kuijpers and Donselaar 1998). Experiments in which speakers are asked to read words from a screen or recall arbitrary word pairs have been shown to elicit faster responses when successive words have the same stress pattern (e.g., Roelofs and Meyer 1998for Dutch and Colombo and Zevin 2009for Italian); however, careful explorations with such tasks in English by Shaw (2012) have shown that the facilitation – in this language at least – is not due to activation of a stored template of the metrical pattern. Iambic words (detach, lapel, etc.) were read out more rapidly after any repeating stress sequence (iambic: belong, canal, forgive or trochaic: reckon, salad, fidget) than after any varying sequence (salad, belong, reckon or salad, reckon, belong). Instead, the facilitated production seems to arise here from predictability of a repeating pattern for articulation. This argues against the metrical pattern of a word in an utterance being a template that is stored as a whole in the lexicon; instead, what is stored is, as suggested above, the segmental structure of the word, along with a code marking the position on which primary stress may fall. All other aspects of a word’s metrical realization in an utterance fall out of the word’s sequence of syllables containing full versus reduced vowels.

Mispronunciation of stress

Although the evidence from slips of the tongue suggests that stress errors will not occur very often (because they tend to involve multisyllabic derivationally complex words with derivationally complex relatives, and such words have a fairly low frequency of occurrence anyway), it is nevertheless interesting to consider what effects mis-stressing would have on the acoustic realization of a word and on how the word is perceived.

The first syllable of any polysyllabic word may be either stressed (with a full vowel) or unstressed (with a reduced vowel). If the correct pronunciation of the initial syllable has a reduced vowel, then a speaker who is mispronouncing has little option but to alter the vowel quality. Mispronouncing any stressed syllable can also involve changing the vowel (either to a reduced vowel or to any other and hence incorrect vowel). We saw that English listeners do not attend much to suprasegmental cues in recognizing words, but they do pay great attention to the pattern of strong and weak vowel realizations (especially in their lexical segmentation). Thus the kind of mispronunciation that alters vowel quality should be one that is highly likely to impede successful recognition of the word by native listeners, and repeated experimental demonstrations in the 1980s confirmed that this is indeed so. The results include:

  • Different kinds of phonetic distortion impact upon word recognition in differing ways, but the most disruptive type of distortion is changing a vowel, and particularly changing a vowel in a stressed syllable (Bond 1981).
  • Shadowing (repeating back) incoming speech is only disrupted by mis-stressing if the mis-stressing involves a change in vowel quality (Bond and Small 1983).
  • Semantic judgments on spoken words are also relatively unaffected by mis-stressing except when the misplacement leads to a vowel quality change (Cutler and Clifton 1984).
  • Any vowel quality change is equally disruptive; the number of distinctive features involved is irrelevant (Small and Squibb 1989).

The reason for this pattern is to be found in how spoken-word recognition works. When a speech signal reaches a listener’s ear, the words that are potentially contained in the incoming utterance automatically become available for consideration by the listener’s mind – a process known as lexical activation. The word “potentially” is important here; frequently it is the case that many more candidate words are fleetingly activated than the utterance actually contains. Consider the utterance: Many vacant shops were demolished. These five words present the listeners with a range of such fleeting possibilities: (a) the first word that is fully compatible with the incoming signal is actually men; (b) by the second syllable, many is also activated, but that second syllable could also combine with the third to make a word beginning eva-, i.e., the utterance might be men evade …; (c) the sequence of the reduced syllable -cant and the syllable shop could be can chop; (d) assuming that were is unstressed, then were plus the unstressed initial de- of demolished is a possible utterance of would a; (e) the stressed syllable of demolished could briefly activate words beginning with that syllable, such as molecule, mollify.

We are usually quite unaware of all such potentially present words in the speech we hear, and of their brief activation, as we rapidly and certainly settle on the correct interpretation of an utterance; but decades of research on spoken-word recognition have shown that this is indeed how this efficient process works (for more detail, see the review by McQueen 2007or the relevant chapters in Cutler 2012). It is a process in which alternative interpretations of the signal compete with one another, in that the more support any one word receives from the signal, the less likely the other interpretations become. If a candidate word is mismatched by the input, the mismatch has immediate effect and the word is no longer a viable choice (in the above example, men evade becomes an impossible interpretation once the /k/ of many vac- arrives. Relevant spoken-word recognition evidence may be found in Vitevitch and Luce 1998and Soto-Faraco, Sebastián-Gallés, and Cutler 2001). Interestingly, the effects of mismatch can be automatically modulated by the listener if background noise suggests that the signal might be unreliable (McQueen and Huettig 2012; Brouwer, Mitterer, and Huettig 2012), but the standard setting is that mismatch instantly counts against mismatched candidates.

Consider therefore what will happen when a word is mispronounced in any way: the input will activate a population of candidate words that may deviate from the set of candidates a correctly pronounced version would have activated. In the worst case, the intended word will not even be included in the activated set.

Obviously the effects of mismatch mean that to keep the intended word in the set as much as possible it must be correct from the beginning, so that the “safest” mispronunciation, so to speak, is one right at the end of a word. This will lead to misrecognition only if the utterance happens to correspond to an existing word – as when speakers of languages with obligatory devoicing mispronounce finally voiced English words that happen to have a finally unvoiced minimal pair (e.g., saying save as if it were safe or prize as if it were price). In many or even most cases, however, a final mispronunciation will not lead to misrecognition – the target word will have been recognized before the mispronunciation arrives (telephome and ostridge and splendith are fairly easy to reconstruct despite the final mispronunciations of place of articulation, voicing, and manner respectively). The very same mispronunciations in the word-initial position, in contrast – say, motable, jeeky, thrastic – make the words harder to reconstruct even when we see them in writing with all the word available at once; even then, the wrong beginning throws us off. The spoken form, coming in over time rather than all at once, misleads us even more decisively. In the case of motable, the incoming speech signal could initially call up mow, moat, motor; the input jeeky may call up gee, jeep, jeans; and thrastic may call up three, thread, thrash. That is, the sets of lexical candidates will at first not even include notable, cheeky, or drastic, and the chance of finding them as the intended word depends, firstly, on the eventual realization that none of the activated word candidates actually matches the signal, followed, secondly, by a decision, perhaps by trial and error, that the offending mispronunciation is in the initial phoneme.

Mis-stressing can cause similar difficulty for the listener whenever it affects the segments that make up the word – that is, whenever a vowel is changed. Mis-stressing will NOT cause difficulty if it involves suprasegmentals only, e.g., when secondary and primary stresses are interchanged; as the early research already mentioned has shown, mis-stressed words where vowels are unchanged (e.g., stampede pronounced as STAMpede) are recognized easily. However, such mis-stressing can only happen in words with two full vowels (like stampede), and, though words of this type can be readily found for experimental purposes, there are in fact not so many of them and they do not occur often in real speech. Stress and vowel realization are so tightly interwoven in the English lexicon, and the lexicon is so strongly biased towards short words and towards words with initial stress, that the most common word type in the vocabulary is a bisyllable with a full vowel in the first syllable and a weak vowel in the second (e.g., common, vowel, second). Real speech actually contains a majority of monosyllables (where the possibility of mis-stressing does not arise), because the shortest words in the vocabulary are the ones that are used most frequently. As described in the section above on the production of English lexical stress by native speakers, the polysyllabic words in real speech conform even more strongly to the preferred patterns than does the vocabulary as a whole. In other words, where there is opportunity for mis-stressing in real speech, it is most likely to involve a word with stress on its initial syllable and a reduced vowel in its unstressed syllable(s). Thus on average any mis-stressing will indeed involve a vowel change and thus be hard for listeners to recognize.

Consider some examples and the consequent activated lexical candidates. Again the rule holds that early effects of mis-stressing are more harmful to recognition than later effects. Common with stress shifted to the second syllable and a reduced first vowel could initially activate a large set of words with unstressed initial com- – commodity, commit, commercial, and so on. Mis-stressed borrow could similarly activate initially unstressed words such as barometer or baronial or bereft. The intended word would not be among the listeners’ cohort of initially activated lexical candidates. Moreover, English listeners’ tendency to assume stressed syllables to be word-initial could result in temporary activation of word candidates beginning with the erroneously stressed second syllables -mon and –row, for example, monitor or rowing.

Analogous problems arise with a shift of stress in a word that, correctly spoken, would have a reduced vowel in the first syllable. Thus mis-stressed October would activate octopus, octave, octane (and for listeners from some dialect areas, such as the author’s own Australian English, auction, okra, and ocker as well). Mis-stressed addition will activate additive, addle, adder, or adamant. Once again, in each case the initially activated set of candidate words contains a misleading array of words unrelated to what the speaker intended to say.

Finally, serious confusion will also arise even with an error in which the stress is correctly assigned but a reduced vowel is produced as a full vowel: delay in which the first syllable is compatible with that of decent or dealer, number in which the second syllable is compatible with the beginning of burning or birthday. Once again, the English listener’s overlearned tendency to treat every full syllable as a potential word onset will result in two sets of lexical candidates where, with correct pronunciation, there should have been just one. Given the role that vowel reduction plays in stress realization, such mispronunciations are indeed errors of stress.

All such mis-stressings will, then, certainly delay recognition of the intended word. It may not rule it out; we do usually work out what people mean when they make a slip of the tongue, or when part of what they have said is inaudible. Indeed, mispronunciations of vowels are actually easier for listeners to recover from than mispronunciations of consonants (Cutler et al. 2000). This is because, in running speech, vowels are influenced by the consonants that abut them to a greater extent than consonants are influenced by adjacent vowels, and this asymmetry has led listeners to build up experience with having to alter initial decisions about vowels more often than initial decisions about consonants. (The ability to adjust decisions about vowels is also, of course, handy in dealing with speakers from other dialectal areas, given that, in English, vowels are the principal carriers of dialectal variation. Not all speakers of English have the same vowel in the first syllable of auction, okra, and octave; see the previous section on the production of English lexical stress by native listeners for far more on this topic). Mis-stressing that includes mispronunciation of a vowel will activate an initial set of word candidates in which the intended word is not included, and further processing of the incoming speech will probably fail to produce a matching interpretation. The listener will have to reset the vowel interpretation and reanalyze; thus recognition will be delayed.

It is also significant that when native English-speakers make slips of the tongue that shift stress, the result will be most likely to activate a word that is very closely related to the intended word – certificate instead of certification, and so on. The effect will be to make accessible some aspects of the relevant meaning anyway and reanalysis is likely to be far swifter in such a case.

Lexical stress and non-native use of English

Both the production and perception of English lexical stress can offer problems, directly or indirectly, to the non-native user. In speech production, non-native users whose native phonology has no distinctions of stress face the challenge of pronouncing English stress in a native-like manner. In fact even learners whose native language has stress, but realizes it in a different way from the English, can be challenged by this task, whether the native language of the learner in question has fixed stress placement or has lexical stress that is realized purely suprasegmentally (see, for example, Archibald 1997; Guion, Harada, and Clark 2004; Peperkamp and Dupoux 2002). Indeed, even with both suprasegmental and segmental reflections of stress, two languages can differ in the relative strength of stress realization in each dimension, which can again complicate the acquisition of accurate pronunciation (Braun, Lemhöfer, and Mani 2011).

As the evidence summarized in the second section of this chapter makes clear, however, the most important production challenge that English lexical stress poses for the non-native user is actually a segmental one. English native listeners pay attention to whether vowels are full or reduced and use this information not only to identify words but also to segment running speech into its component words. The primary challenge therefore is not to utter a full vowel when the target utterance requires a reduced vowel, since this – as laid out in the previous section on mispronunciation of stress – is exactly what will mislead native listeners and potentially cause them to make inappropriate assumptions about where word boundaries are located. (Thus if the word target is uttered with correctly placed stress on the initial syllable, but with the second syllable unreduced – so that it sounds like get – it is liable to be perceived as two words rather than one; the same will happen if in correctly stressed utterance either its second or third syllable is not reduced.) Non-native speakers of English from a variety of language backgrounds do indeed produce full vowels where reduced vowels would be called for (e.g., Fokes and Bond 1989; Zhang, Nissen, and Francis 2008). Native listeners’ comprehension is then indeed affected by this. Braun, Lemhöfer, and Mani (2011) had British English and Dutch talkers produce English words such as absurd, polite (with an unstressed initial syllable), and used these in a word recognition task like those of Soto-Faraco, Sebastián-Gallés, and Cutler (2001) and others described in the second section above. Auditory presentation of the initial syllables (e.g., ab-) of native talkers’ productions significantly assisted British English listeners’ subsequent recognition of the matching complete words; the initial syllables from the Dutch talkers’ productions (much less reduced than the native talkers’ syllables) did not facilitate word recognition at all.

The stress production picture has another side, however, that is also shown by the evidence documented in the previous section; if a non-native user of English incorrectly assigns stress (without altering the pattern of full and reduced vowels), this may not even be noticed by native listeners, and in any case is unlikely to cause them comprehension problems. (Primary stress should fall on the first syllable of SUMmarise and on the third syllable of inforMAtion, but the evidence from the studies of mis-stressing suggests that listeners will also succeed in identifying summaRISE or INformation, with the correct vowels but misplaced primary stress location.)

In perception, non-native listeners will bring to speech input all the useful strategies that long experience with their native language has encouraged them to develop (Cutler 2012). These may or may not match the listening strategies encouraged by the probabilities of English; where they do not match, they will generate speech perception difficulty unless listeners can succeed in inhibiting their use. At the word recognition level, such perceptual problems fall into three principal groups: pseudo-homophony, spurious word activation, and temporary ambiguity.

Pseudo-homophones are words that are distinguished by some contrast that a non-native listener does not perceive: If English /r/ and /l/ cannot be distinguished, then wrap and lap become homophones. Pseudo-homophones are not a serious problem for the non-native listener (or indeed for native listeners processing non-native pronunciation), simply because, as discussed in the second section above, every language contains many homophones and all listeners have to be able to understand them by choosing the interpretation appropriate to the context. There is no way to understand the utterances It’s a mail and It’s a male except in relation to the discourse context. Given the extent of homophony in the English vocabulary, the number of homophones added by any one misperceived phonemic contrast is trivial (Cutler 2005b). Stress minimal pairs are especially rare; for a non-native listener who cannot hear a stress difference in INsight versus inCITE, these words will become homophones, but as we saw, they are effectively homophones for native listeners too (Cutler 1986; Small, Simon, and Goldberg 1988).

Spurious lexical activation and prolonged ambiguity are more serious problems. The first occurs when embedded “phantom words” are activated for the non-native listener and produce competition that native listeners are not troubled by; remaining with the /r/-/l/ phonemic contrast, an example is competition from leg in regular. Such extra activation and competition has been abundantly demonstrated in non-native listening (Broersma 2012; Broersma and Cutler 2008, 2011). The second occurs when competition is resolved later for the non-native than for the native listener (e.g., register is distinguished from legislate only on the sixth phoneme, rather than on the first). This phenomenon has also been extensively documented (Cutler, Weber, and Otake 2006; Weber and Cutler 2004). Misperception of lexical stress by non-native users could in principle lead to such problems of competition increase, for example, if native expectations assume that stress placement is fixed and appropriate lexical candidates match to part of the input (thus while native listeners would segment that’s likely to boomerang on you at the stressed boo-, expectation of final stress might lead to activation of taboo and meringue). Such issues have not yet been investigated with the empirical techniques for examining lexical competition.

In perception as in production, however, the literature again suggests that there is a second side to the non-native stress story. A non-native user whose first language encourages attention to suprasegmental cues to stress could apply the fruits of this language experience to English; even though English listeners do not use such cues, English speakers certainly provide them (Cutler et al. 2007). Indeed, in judging the stress level of excised or cross-spliced English syllables, native speakers of Dutch (whose language requires attention to suprasegmental stress cues) consistently outperform native English listeners (Cooper, Cutler, and Wales 2002; Cutler 2009; Cutler et al. 2007). Although the English vocabulary does not deliver sufficient lexical payoff for native listeners to exploit the suprasegmental cues to stress, it is conceivable that non-native listeners who are able to use them could thereby derive some compensation for the competition increases caused by other listening shortcomings.

Conclusion

In phonology, lexical stress in English is encoded to a significant extent in the segmental patterning of a word; it does not act principally to distinguish one word from another; but it does provide highly useful cues to listeners as to where word boundaries are to be located in speech signals. In speech production, pronunciation of English lexical stress is thus a multi-dimensional exercise: the segmental sequence is produced along with a code for a primary stress location, which is used in computing the metrical pattern of the utterance as a whole. In speech perception, listeners attend primarily to the segmental sequence in identifying words and use the rhythmic patterning of full and reduced vowels to segment speech.

For the non-native speaker of English, the pronunciation patterns described in this chapter, and their perceptual consequences, potentially present both good news and bad. The good news is that stress errors that are purely suprasegmental may be uttered with impunity, as English listeners hardly attend to suprasegmental patterning. The bad news is that any stress error resulting in a mispronounced vowel – and most stress errors do have this effect – will throw the native listener into mis-segmentation and at least temporary lexical confusion.

Acknowledgments

Thanks to Janise Farrell for comments on an earlier draft of the text.

REFERENCES

  1. Archibald, J. 1997. The acquisition of English stress by speakers of non-accentual languages: lexical storage versus computation of stress. Linguistics 35: 167–181.
  2. Bolinger, D. 1981. Two Kinds of Vowels, Two Kinds of Rhythm, Indiana University Linguistics Club.
  3. Bond, Z.S. 1981. Listening to elliptic speech: pay attention to stressed vowels. Journal of Phonetics 9: 89–96.
  4. Bond, Z.S. and Garnes, S. 1980. Misperceptions of fluent speech. In: Perception and Production of Fluent Speech, R.A. Cole (ed.), 115–132. Hillsdale, NJ: Erlbaum.
  5. Bond, Z.S. and Small, L.H. 1983. Voicing, vowel and stress mispronunciations in continuous speech. Perception and Psychophysics 34: 470–474.
  6. Braun, B., Lemhöfer, K., and Mani, N. 2011. Perceiving unstressed vowels in foreign-accented English. Journal of the Acoustical Society of America 129: 376–387.
  7. Broersma, M. 2012. Increased lexical activation and reduced competition in second-language listening. Language and Cognitive Processes 27: 1205–1224.
  8. Broersma, M. and Cutler, A. 2008. Phantom word recognition in L2. System: An International Journal of Educational Technology and Applied Linguistics 36: 22–34.
  9. Broersma, M. and Cutler, A. 2011. Competition dynamics of second-language listening. Quarterly Journal of Experimental Psychology 64: 74–95.
  10. Brouwer, S., Mitterer, H., and Huettig, F. 2012. Speech reductions change the dynamics of competition during spoken word recognition. Language and Cognitive Processes 27: 539–571.
  11. Browman, C.P. 1978. Tip of the tongue and slip of the ear: implications for language processing. UCLA Working Papers in Phonetics 42: i–149.
  12. Cole, R.A. and Jakimik, J. 1980. How are syllables used to recognize words? Journal of the Acoustical Society of America 67: 965–970.
  13. Cole, R.A., Jakimik, J., and Cooper, W.E. 1978. Perceptibility of phonetic features in fluent speech. Journal of the Acoustical Society of America 64: 44–56.
  14. Colombo, L. and Zevin, J.D. 2009. Stress priming in reading and the selective modulation of lexical and sub-lexical pathways. PLoS ONE 4: e7219.
  15. Cooper, N., Cutler, A., and Wales, R. 2002. Constraints of lexical stress on lexical access in English: evidence from native and nonnative listeners. Language and Speech 45: 207–228.
  16. Cummins, F. and Port, R.F. 1998. Rhythmic constraints on stress timing in English. Journal of Phonetics 26: 145–171.
  17. Cutler, A. 1980a. Errors of stress and intonation. In: Errors in Linguistic Performance: Slips of the Tongue, Ear, Pen and Hand, V.A. Fromkin (ed.), 67–80, New York: Academic Press.
  18. Cutler, A. 1980b. Syllable omission errors and isochrony. In: Temporal Variables in Speech, H.W. Dechert and M. Raupach (eds.), 183–190, The Hague: Mouton.
  19. Cutler, A. 1986. Forbear is a homophone: lexical prosody does not constrain lexical access. Language and Speech 29: 201–220.
  20. Cutler, A. 2005a. Lexical stress. In: The Handbook of Speech Perception, D.B. Pisoni and R.E. Remez (eds.), 264–289, Oxford: Blackwell.
  21. Cutler, A. 2005b. The lexical statistics of word recognition problems caused by L2 phonetic confusion. In: Proceedings of INTERSPEECH 2005, Lisbon, September, 413–416.
  22. Cutler, A. 2009. Greater sensitivity to prosodic goodness in non-native than in native listeners. Journal of the Acoustical Society of America 125: 3522–3525.
  23. Cutler, A. 2012. Native Listening: Language Experience and the Recognition of Spoken Words, Cambridge, MA: MIT Press.
  24. Cutler, A. and Butterfield, S. 1992. Rhythmic cues to speech segmentation: evidence from juncture misperception. Journal of Memory and Language 31: 218–236.
  25. Cutler, A. and Carter, D.M. 1987. The predominance of strong initial syllables in the English vocabulary. Computer Speech and Language 2: 133–142.
  26. Cutler, A. and Clifton, C. 1984. The use of prosodic information in word recognition. In: Attention and Performance X: Control of Language Processes, H. Bouma and D.G. Bouwhuis (eds.), 183–196, Hillsdale, NJ: Erlbaum.
  27. Cutler, A. and Norris, D. 1988. The role of strong syllables in segmentation for lexical access. Journal of Experimental Psychology: Human Perception and Performance 14: 113–121.
  28. Cutler, A., Norris, D., and Sebastián-Gallés, N. 2004. Phonemic repertoire and similarity within the vocabulary. In: Proceedings of the 8th International Conference on Spoken Language Processing, S.H. Kim and M.Jin Bae (eds.), vol. 1, 65–68, Seoul: Sunjin Printing Co. (CD-ROM).
  29. Cutler, A. and Pasveer, D. 2006. Explaining cross-linguistic differences in effects of lexical stress on spoken-word recognition. In: Proceedings of the 3rd International Conference on Speech Prosody, R Hoffman and H. Mixdorff (eds.), 237–240, Dresden: TUDpress.
  30. Cutler, A., Weber, A., and Otake, T. 2006. Asymmetric mapping from phonetic to lexical representations in second-language listening. Journal of Phonetics 34: 269–284.
  31. Cutler, A., Sebastián-Gallés, N., Soler-Vilageliu, O., and Ooijen, B. van 2000. Constraints of vowels and consonants on lexical selection: cross-linguistic comparisons. Memory and Cognition 28: 746–755.
  32. Cutler, A., Wales, R., Cooper, N., and Janssen, J. 2007. Dutch listeners’ use of suprasegmental cues to English stress. In: Proceedings of the Sixteenth International Congress of Phonetic Sciences, Saarbrücken, 1913–1916.
  33. Donselaar, W. van, Koster, M., and Cutler, A. 2005. Exploring the role of lexical stress in lexical recognition. Quarterly Journal of Experimental Psychology 58A: 251–273.
  34. Fear, B.D., Cutler, A., and Butterfield, S. 1995. The strong/weak syllable distinction in English. Journal of the Acoustical Society of America 97: 1893–1904.
  35. Fokes, J. and Bond, Z.S. 1989. The vowels of stressed and unstressed syllables in nonnative English. Language Learning 39: 341–373.
  36. Fromkin, V.A. 1976. Putting the emPHAsis on the wrong sylLABle. In: Studies in Stress and Accent, L. Hyman (ed.), 15–26, Los Angeles, CA: University of Southern California.
  37. Grainger, J., Van Kang, M.N., and Segui, J.(2001. Cross-modal repetition priming of heterographic homophones. Memory and Cognition 29: 53–61.
  38. Guion, S.G., Harada, T., and Clark, J.J. 2004. Early and late Spanish–English bilinguals’ acquisition of English word stress patterns. Bilingualism: Language and Cognition 7: 207–226.
  39. Kim, J., Davis, C., and Cutler, A. 2008. Perceptual tests of rhythmic similarity: II. Syllable rhythm. Language and Speech 51: 342–358.
  40. Kuijpers, C. and Donselaar, W. van. 1998. The influence of rhythmic context on schwa epenthesis and schwa deletion. Language and Speech 41: 87–108.
  41. Ladefoged, P. 2006. A Course in Phonetics, Boston, MA: Thomson/Wadsworth.
  42. Lehiste, I. 1970. Suprasegmentals, Cambridge, MA: MIT Press.
  43. Lehiste, I. and Peterson, G. 1959. Vowel amplitude and phonemic stress in American English. Journal of the Acoustical Society of America 31: 428–435.
  44. Levelt, W.J.M. 1989. Speaking: From Intention to Articulation, Cambridge, MA: MIT Press.
  45. Levelt, W.J.M. 1992. Accessing words in speech production: Stages, processes and representations. Cognition 42: 1–22.
  46. Levelt, W.J.M., Roelofs, A., and Meyer, A.S. 1999. A theory of lexical access in speech production. Behavioral and Brain Sciences 22: 1–38.
  47. Lieberman, P. 1963. Some effects of semantic and grammatical context on the production and perception of speech. Language and Speech 6: 172–187.
  48. Liberman, M. and Prince, A. 1977. On stress and linguistic rhythm. Linguistic Inquiry 8: 249–336.
  49. McQueen, J.M. 2007. Eight questions about spoken-word recognition. In: The Oxford Handbook of Psycholinguistics, M.G. Gaskell (ed.), 37–53, Oxford: Oxford University Press.
  50. McQueen, J.M. and Huettig, F. 2012. Changing only the probability that spoken words will be distorted changes how they are recognized. Journal of the Acoustical Society of America 131: 509–517.
  51. Mehler, J., Dommergues, J.-Y., Frauenfelder, U., and Segui, J. 1981. The syllable's role in speech segmentation. Journal of Verbal Learning and Verbal Behavior 20: 298–305.
  52. Murty, L., Otake, T., and Cutler, A. 2007. Perceptual tests of rhythmic similarity: I. Mora rhythm. Language and Speech 50: 77–99.
  53. Otake, T., Hatano, G., Cutler, A., and Mehler, J. 1993. Mora or syllable? Speech segmentation in Japanese. Journal of Memory and Language 32: 358–378.
  54. Peperkamp, S., and Dupoux, E. 2002. A typological study of stress “deafness”. In: Papers in Laboratory Phonology VII, C. Gussenhoven and N.L. Warner (eds.), 203–240, Berlin: De Gruyter.
  55. Piantadosi, S.T., Tily, H., and Gibson, E. 2012. The communicative function of ambiguity in language. Cognition 122: 280–291.
  56. Roelofs, A. and Meyer, A.S. 1998. Metrical structure in planning the production of spoken words. Journal of Experimental Psychology: Learning, Memory, and Cognition 24: 922–939.
  57. Shattuck-Hufnagel, S. 1992. The role of word structure in segmental serial ordering. Cognition 42: 213–259.
  58. Shattuck-Hufnagel, S., Ostendorf, M., and Ross, K. 1994. Stress shift and early pitch accent placement in lexical items in American English. Journal of Phonetics 22: 357–388.
  59. Shattuck-Hufnagel, S., and Turk, A.E. 1996. A prosody tutorial for investigators of auditory sentence processing. Journal of Psycholinguistic Research 25: 193–247.
  60. Shaw, J.A. 2012. Metrical rhythm in speech planning: priming or predictability. In: Proceedings of the 14th Australasian International Conference on Speech Science and Technology (SST), Sydney, Australia.
  61. Slowiaczek, L.M. 1990. Effects of lexical stress in auditory word recognition. Language and Speech 33: 47–68.
  62. Slowiaczek, L.M. 1991. Stress and context in auditory word recognition. Journal of Psycholinguistic Research 20: 465–481.
  63. Small, L.H. and Squibb, K.D. 1989. Stressed vowel perception in word recognition. Perceptual and Motor Skills 68: 179–185.
  64. Small, L.H., Simon, S.D., and Goldberg, J.S. 1988. Lexical stress and lexical access: homographs versus nonhomographs. Perception and Psychophysics 44: 272–280.
  65. Soto-Faraco, S., Sebastián-Gallés, N., and Cutler, A. 2001. Segmental and suprasegmental mismatch in lexical access. Journal of Memory and Language 45: 412–432.
  66. Vitevitch, M.S. and Luce, P.A. 1998. When words compete: levels of processing in perception of spoken words. Psychological Science 9: 325–329.
  67. Vitevitch, M.S., Luce, P.A., Charles-Luce, J., and Kemmerer, D. 1997. Phonotactics and syllable stress: implications for the processing of spoken nonsense words. Language and Speech 40: 47–62.
  68. Weber, A. and Cutler, A. 2004. Lexical competition in non-native spoken-word recognition. Journal of Memory and Language 50: 1–25.
  69. Zhang, Y., Nissen, S.L., and Francis, A.L. 2008. Acoustic characteristics of English lexical stress produced by native Mandarin speakers. Journal of the Acoustical Society of America 123: 4498–4513.

NOTES

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset