MARILYN MAY VIHMAN
The earliest publications to address phonological development were diary studies by European scholars. These culminated in Jakobson’s attempt to build a grand model of the “universal and constant laws” that might govern the process (Jakobson 1949: 378). English played only a small part in these theoretical beginnings. However, in the past 40 years of intensive acquisition research inspired by Chomsky’s (1965) strong nativist claims, data from children acquiring English have heavily dominated the field. This makes it particularly interesting to ask what the specific characteristics of English phonology are from a developmental point of view, since English has implicitly served as a kind of general model for acquisition (see the “universal tendencies … or constraints” proposed by Smith (1973: 206) on the basis of his generative-rule-based study of his son Amahl’s acquisition of English).
Fortunately, cross-linguistic studies of both perceptual processing and early word production have become so much more common in the past 10 or 20 years that it is now possible to place the acquisition of English in a broader framework, in which the pervasive individual differences across children can be weighed against the typological evidence to identify those aspects of the ambient language that most clearly affect early infant language development. At the same time, such a framework allows us to separate out the “universal” elements (like those that concerned Jakobson, still embedded in markedness ideas today: see Kager 1999; Kager, Pater, and Zonneveld 2004, but with the advantage of a far more extensive database than was available earlier). It also allows us to consider the patterns of English in relation to perceptual and motoric aspects of infant development more generally.
English was long taken as the model “stress-timed” language, classically contrasted with the “syllable-timing” of languages like French or Spanish (Pike 1945; Abercrombie 1967). However, empirical studies have failed to identify any solid basis for this persistent two-way typology (see Dauer 1983). More recently, an approach to quantifying rhythm class along a continuum has been widely adopted instead (Ramus, Nespor, and Mehler 1999; Grabe and Low 2002; White and Mattys 2007– but see now Arvaniti’s (2012) thorough-going questioning of these methods). English continues to serve as the most characteristic language at the stress-timing end of the continuum.
This characterization of English is relevant here because infants show, from birth, a sensitivity to native-language rhythms, grounded in their pre-natal auditory experience with the sound of speech as filtered through the amniotic surround in the last trimester, when the auditory system is complete (Lecanuet 1993), preferring that language in experimental studies (Cooper and Aslin 1994; Mehler et al. 1988) and also distinguishing non-native languages, but only if they differ in rhythm class (Nazzi, Bertoncini, and Mehler 1998). Importantly, it is not prosodic differences alone that appear to support these infant responses but some combination of rhythm with other prosodic properties or with the characteristic phonotactic patterning of the language (Ramus 2002). The conclusion that we may draw from these studies is that the enduring characterization of English as “stress-timed” is best thought of as resulting not only from its lexically meaningful use of strong stress and concomitant vowel reduction but also from its inclusion of complex and varied syllables, with their codas, clusters, and diphthongs (Laver 1994: ch. 16.6). As we shall see, most of these elements are challenging to infant learners.
Along with the language-specific experience of rhythm, infants are well equipped, in the first months of life, to discriminate segmental contrasts. This can be seen as one of the biological foundations for language learning, although it is specific neither to language nor to humans (Jusczyk 1997; Vihman 2014). It is now well established, however, that this early ability fades quite rapidly with exposure to a particular language, resulting in infants already being more responsive by the end of the first year to the differences between phonemes contrasted in their own language than to unfamiliar contrasts (Werker and Tees 1984). The mechanism behind the phenomenon now known as “perceptual narrowing” (Lewkowicz 2011; Maurer and Werker 2014) remained unexplained for some 20 years. Since the mid-1990s, however, the importance of distributional or statistical learning in infancy has been intensively studied, mainly through the experimental use of artificial languages; this has led to the hypothesis that it is experience with the bimodal distribution of variants in the input (which results from the existence of a phonological contrast) that maintains in the infant listener the ability to discriminate, while essentially unimodal (or unstructured) distribution of phones not supported by phonological contrast does not (Maye, Werker, and Gerken 2002; see also Anderson, Morgan, and White 2003).
Dramatic perception-related advances have been shown experimentally to occur in the first year, especially from 6 to 9 months. At the earlier age, infants exposed to English show a familiarity preference for listening to their own language only when the contrasting language is prosodically distinct. Specifically, American infants attend longer to an English word list when contrasted with Norwegian but not when contrasted with Dutch; by 9 months American infants listen longer to English in comparison with Dutch as well, demonstrating an advance in familiarity with the segmental level of speech at which the differences between English and Dutch become apparent (Jusczyk et al. 1993). Both the earlier preferential response based on the prosody of English and the later response based on the common segmental patterns of English are presumably the outcome of distributional learning based on consistent exposure to input that demonstrates these ambient language characteristics.
Further advances suggest the same kind of implicit learning. At 9 but not at 6 months, infants learning English prefer to listen to the more common strong–weak or trochaic pattern of English disyllabic words than to the less common weak–strong or iambic pattern (Jusczyk, Cutler, and Redanz 1993) and to common than to uncommon (but nevertheless permissible) phonotactic sequences (Jusczyk, Luce, and Charles-Luce 1994). Similarly, by 9 months infants distinguish commonly occurring within-word consonant clusters from those that occur only between words, at the same time demonstrating an expectation of word-initial stress that leads them to associate within-word clusters with the typical English strong–weak lexical pattern (Mattys et al. 1999). Further evidence of the effect of the dominant trochaic pattern is seen in word-learning experiments in which infants familiarized with trochaic nonwords recognize these in (or “segment” them out from) a short passage by age 7.5 months, whereas familiarization with iambic nonwords leads to segmentation only in infants three months older (Jusczyk, Houston, and Newsome 1999).
It has so far proven impossible to replicate this last study with infants as young as 7.5 months exposed to other languages. The ability to segment (unfamiliar) disyllabic words trained in the laboratory has been demonstrated for Dutch only by 9 months (Kuijpers et al. 1998) and for French only considerably later – by 12–16 months (Nazzi et al. 2006), although monosyllables familiarized in the laboratory are segmented by 8 months in English (Jusczyk and Aslin 1995), French (Gout 2001, as cited in Nazzi et al. 2006) and German (Höhle and Weissenborn 2003). Most strikingly, even infants exposed to British rather than American English have proven unable to recognize trained disyllabic words in passages in two British labs (DePaolis et al. 2012). Since in this case differences between the languages, or rather dialects, would seem insufficient to account for the failure to replicate the findings of Jusczyk, Houston, and Newsome (1999), the explanation seems likely to involve differences in the extent of prosodic modulation or “exaggeration” in speech to infants in the two cultures (see Fernald et al. 1989), an account that receives further support from the fact that infants exposed to Canadian French – where North American cultural preferences for highly modulated “baby talk” may also be seen – show the familiarization effect for disyllables as early as American infants (Polka and Sundara 2012).
The experimental studies reviewed above provide clear evidence of advances, over the first year, in familiarity with the prosodic and segmental patterns of the ambient language. However, infants also begin to gain familiarity with the form of particular lexical items over this period. The very first word forms to be recognized, not surprisingly, are those that refer to the central characters in the infant’s life – the infant himself (Mandel, Jusczyk, and Pisoni 1995) and his caretakers (Bortfeld et al. 2005). The evidence from American studies places knowledge of such names as early as four to six months although British studies have been unable to replicate the findings (Vihman and Keren-Portnoy 2013a).
In a separate line of study, infants have been found to show a robust ability to recognize untrained words familiar from everyday exposure and presented in word lists by 11 months, but not earlier (Vihman et al. 2004 (with British infants)). Manipulations of the forms of these common words have established that infant recognition is based particularly on the shape of the accented syllable – in English, on the word-initial consonant specifically (Vihman et al. 2004). In a follow-up study DePaolis, Vihman, and Keren-Portnoy (2014) found that the same common words could be recognized when embedded in sentences – i.e., could be segmented, without familiarization in the lab – only one month later, at 12 months. These studies test a different aspect of infant learning. Rather than demonstrating advances in implicit familiarity with the sound of the native language, they establish long-term infant memory for words that recur from one day to the next, in the routinized situations of the child’s life. Thus it is not surprising that infants succeed in these studies a bit later – but it is also worth noting that the findings have been successfully replicated, with infants of the same age, wherever they have been tested (i.e., for isolated words: in Dutch, Swingley 2005; Italian, Vihman and Majorano 2014; and American English, DePaolis, Keren-Portnoy, and Vihman 2010; see also Vihman et al. 2007 for a replication using both Event-Related Potentials and the behavioral Head-turn Preference Procedure, testing cross-sectional groups of British infants at 9, 10, 11, and 12 months).
One aspect of development over the first year that we have not yet considered is production. It is striking that many of the changes we report above occur between 6 and 9 months – an age range that closely resembles that usually cited for the emergence of the first adult-like syllables, or “canonical babbling”, in most typically developing infants (6–8 months in Oller 2000). These facts are likely to be related. Production of speech-like syllables provides the infant with cross-modal familiarity (internal, or proprioceptive, as well as external, or auditory, and, in the case of labials at least, also visual) with sound patterns that necessarily also occur in input speech, although the match will in most cases be only approximate (and will differ in characteristic ways between the adult male and female voices and that of the infant him- or herself; for a model of the way in which this difference may be overcome to allow recognition of the match see Callan et al. 2000). The cross-modal experience should be a particularly potent aid to the infant in beginning to recognize words in the longer sequences to which he or she is primarily exposed (Bahrick, Lickliter, and Flom 2004) – i.e., in the segmentation task with which so many studies have been concerned.
Two recent studies were designed to test the proposal that infants’ own vocal production influences the way they process speech. In both cases, infants were recorded in multiple home sessions until they showed frequent and stable use of one or more consonants (British English, 18 infants: DePaolis, Vihman, and Keren-Portnoy 2011; Italian, 30 infants: Majorano, Vihman, and DePaolis 2014; for a differently designed study of infants learning British English or Welsh, with similar results, see DePaolis, Vihman, and Nakai 2013). They were then tested in the lab with nonwords that featured a stop consonant that the infant was consistently producing (disregarding differences in voicing, which are not well controlled at this age), one that the infant was not yet producing with any regularity and a fricative pair (/s, z/) that none of the infants had in repertoire. The findings were the same in the two studies: infants who had achieved consistent production of only a single consonant preferred to listen to nonwords featuring that consonant, whereas infants with good production experience of at least two different consonants showed a significant preference for the unknown stop pair (the groups showed similar interest in the fricative pair, which was unrelated to production experience). The findings, though seemingly paradoxical, can be interpreted in terms of the hypothesis of a matching process between infant vocal production and input speech (an “articulatory filter” in Vihman 1993). Once a single adult-like consonant is part of an infant’s regular production repertoire, that consonant, or more likely the syllables in which it occurs, gains particular salience. However, at the point when two or more such consonants are in repertoire, the infant begins to generalize, gaining a stronger sense of phonological possibilities and a concomitant interest in (or responsiveness to) the unfamiliar sounds (see Hunter and Ames 1988for a general model of shifts in infant attention from what is familiar to what is novel, and Vihman, DePaolis, and Keren-Portnoy 2014, for further discussion of these findings).
Efforts to find ambient language effects through adult listeners making judgments as to infants’ origins based on their babble have proven largely ineffective (Engstrand, Williams, and Lacerda 2003). However, close analysis of infant vocalizations provides good evidence of such effects already in the prelinguistic period. As could be expected, based on the findings from experimental studies of perceptual processing, prosodic aspects of the language of exposure are the first to be expressed in infant production. Whalen, Levitt, and Wang (1991) identified more rising pitch contours in French infants’ reduplicated babbling than in those of English infants (age range 6 to 12 months). This agrees with Kent and Murray (1982), who also reported primarily falling contours for their American subjects over the first year. A study of the vowels produced by five 10-month-olds each exposed to British English, French, Arabic, and Cantonese showed subtle differences within the low and central vowel space typical of this age, reflecting the patterning of vowels in the adult languages (Boysson-Bardies et al. 1989): English infants tended to produce more front vowels, in agreement with Kent and Murray (1982), reflecting the relatively high incidence of front vowels in adult English. With regards to consonants, a core set is consistently identified in the babbling of infants exposed to any language (Locke 1983), primarily stops and nasals, glottals, and glides. However, consonants also show an effect of ambient language influence as early as 10 months. Based on four groups of five infants each learning English, French, Japanese, and Swedish, for example, Boysson-Bardies and Vihman (1991) reported significantly more use of labials in English and French than in the other two languages.
What was suggested already in the first decade of audio-recorded observation of infant production (Oller et al. 1976) was later confirmed in studies of 10–20 infants acquiring American English: babbling practice is directly related to the first word forms of any given infant (Vihman et al. 1985; Vihman, Ferguson and Elbert 1986; McCune and Vihman 2001). Thus the tendencies that we see in babbling – in which a limited production repertoire constrains the range of possible ambient-language effects – are also seen in the first words, which tend to be not only similar to babbling but, at the same time, relatively accurate replicas of their adult models. The accuracy of the first word forms, first noted by Ferguson and Farwell (1975), can be explained in terms of the articulatory-filter proposal mentioned above. Practice, through babbling, with certain vocal patterns leads to deeper knowledge of those patterns, which accordingly become particularly salient to the infant in input speech. Given repeated exposure to certain high-frequency lexical items, the first words that an infant attempts are likely to be unconsciously “selected” from among those that match the sounds he or she is already able to make. The result is not only continuity with babble but also highly constrained first-word targets and relative accuracy in first word production. To illustrate these latter points the first 5–6 words of the 17 monolingual English-learners included in Appendix I in Menn and Vihman (2011), are reproduced here (see the Appendix in this chapter).
We can draw on this sample – a mix of diary and observational studies, with child ages ranging from 9 to 20 months (mean 12 months) – to gain a more concrete idea of the starting point for the acquisition of the English phonological system. About half (41) of the 83 first words attempted are monosyllables, with 40 disyllables and one instance each of banana and patty-cake, both produced as disyllables; this compares with a cross-linguistic mean of 32% monosyllables over all words attempted by the 48 children (Menn and Vihman 2011; the American children actually produce slightly more of the words as monosyllables: 0.55). For comparison, a mean of 0.69 of the content words produced by five American mothers in speech to their 12-month-olds were monosyllables (Vihman et al. 1994a: the mothers produced 0.23 disyllables and 0.08 longer words). Thus exposure to English input leads the American children to attempt and produce more monosyllabic words than is “universally” typical of first-word production.
The onsets in the Appendix are single consonants in all but eight of the targets (disregarding glottal stop), with just five words – uh-oh (3 occurrences), up (2), all-gone, all done, and Edgar – accounting for the remainder. Of the 75 onset consonants, all but 14 are stops, nasals, /h/, or glides (0.81 altogether), in accordance with the phonetic tendencies of babbling. A few words account for the exceptions (that/there (3), juice, light, and see (2 each) and five others). For comparison, the mothers sampled in Vihman et al. 1994a produced 0.56 initial stops and 0.11 nasals (0.23 fricatives or affricates, 0.17 liquids). The children’s own forms match the target onset consonant in all but 20 words (disregarding both voicing changes and onset-vowel insertions), which include all of those with fricative, affricate, or liquid onsets. Finally, the single most commonly targeted onset consonant is /b/ (21 words), but coronal onsets are slightly more commonly targeted than labials (33 coronals, or 0.49, excluding the seven h-initial words; 31 labials, or 0.46); the velars are underrepresented, at 0.06 (compare the mothers’ sample, with 0.34 initial labials, 0.43 coronals, and 0.22 velars). On the other hand, the labials match the targets in the child forms (except banana, reduced to its final syllables), while the coronals have varied outcomes.
Only four target words have onset clusters (block, cracker, quack-quack, squirrel); the children generally reduce the cluster to a stop, although the child attempting cracker variously produces it with [p-], [kw-], [w-], and [k-]. Including the mid-vowel off-glides as well as /ɑɪ/ and /ɑʊ/ (/ɔɪ/ is never targeted), 30 target words have diphthongs (12 different words). Of these, only four are never produced with a diphthong (bow-wow, Jacob, nose, uh-oh). Thus clusters are clearly more challenging than diphthongs.
Finally, consider two more aspects of these first words. Only 25 words with codas are targeted (0.30), compared with a mean of 0.67 of input content words with codas in English (Vihman et al. 1994b); of these, only four are produced with a coda consonant, all of them sibilants (box, bus, juice, shoes). Besides these four words, which all include stop onset as well as fricative coda, only six words have more than one true consonant (i.e., excluding combinations with glide or glottal) within a single word (cracker, dog, doggie, Jacob, put on, and thank you).
This then is the point of departure for acquisition of the English phonological system. The first words are close to their targets in length and in onset consonant. Child “selection” or bias in attempting words is apparent in the predominance of one- and two-syllable targets, although English content words provide relatively few challenging long words in any case (as compared with Italian, Japanese, or Spanish, for example). Similarly, the predominance of stop and nasal onsets in the words targeted seems to reflect infant preferences. There is also a bias in favour of /b/ and a clear advantage for labials in production. Clusters tend not to be targeted but diphthongs pose no apparent obstacle, although they are produced where required less than two-thirds of the time. Words with codas are undertargeted and the coda is seldom produced when needed. The additional, perhaps less obvious, difficulty for first-word production is presented by the need to remember, plan, and articulate two (or more) different consonants in a single-word production; this is seldom achieved at this point.
We have suggested that most learning, in the prelinguistic period, is implicit, distributional (based on gaining familiarity with the prosodic and segmental patterns most frequently heard), and procedural (developing motoric routines that underlie repeated production of particular sounds and sequences of sounds). To learn to produce word forms in appropriate situations of use, however, the infant must draw on explicit learning as well – learning with attention and, eventually, with intention, often in dyadic interaction. This is the foundation for phonology, since the construction of a phonological system depends on word learning (Vihman and Keren-Portnoy 2013b).
Once a child has begun to produce a few words, he or she is in a position to learn from his or her own output – a small but highly familiar “database”. As more new words are learned the infant’s knowledge of the sound system continues to grow (and the receptive vocabulary is generally much larger than the expressive vocabulary); however, the child’s repertoire of production plans – particularly for different consonants – grows far more slowly. Accordingly, we frequently see the child settle on a small number of prosodic structures or patterns that have been called “word templates”; this provides a “holding pattern”, while the child’s motoric and planning skills – and their memory for word forms – improves. The templates differ by child but show a “family resemblance” within language groups, so that we can look for characteristic patterns used by children acquiring English (Vihman, in press).
Early word templates for three of the children included in the Appendix have been described longitudinally, over the period in which their favored templates first developed. We draw here on those descriptions, which show considerable inter-child variability, and then compare these children with others, including two children whose prosodic structures have been described at a more advanced lexical point.
Vihman and Velleman (1989) detail the emergence of a template that seems to have been designed to allow Molly to produce codas, which she targeted frequently but nevertheless found difficult to produce. The study covers five months, from her first spontaneous use of four words in a 30-minute session (the start of established word use: the “four-word-point”, or 4wp, at 10 months) to a cumulative vocabulary of over 70 words (35 words in the session). Both stop and nasal codas were attempted, through a sequence of identifiable stages – presystematic production, experimentation, and emergence of a predominant pattern or template. For both coda types, this first involved the addition of a support vowel (e.g., bang [pan:ə], clock [kak:ɪ̥], both at 1;1) and later the restructuring of target forms to fit the template (e.g., Nicky [ɪn:i], glasses [kak:ʰi̥], both at 1;3).
Vihman et al. (1994b) recount the emergence over 6 to 8 months of templates in two children. Timmy produces CV and CVCV forms almost exclusively, sometimes with the addition of a nontarget onset vowel. His range of consonant use in word forms grows very gradually over the period from 10 months (4wp: [ba] only, with variants including both voiced and voiceless bilabial fricatives) to 16 months, when Timmy has nine consonants and the three corner vowels, variously used with [b, t, k, m, n]. By this time Timmy is producing a limited number of variegated disyllables (e.g., cookie [kaki], goodbye [gaba]), some of them reflecting restructuring, similar to what we saw in Molly (Simon [nama], [nimi], coffee [kuki], good boy [kibi]).
The second child, Alice, showed unusually high use of the glide [j] in her babbling and first words. The word pattern she developed began with a high proportion of use of front-rising diphthongs (10 months), which was then paralleled by a preference for the disyllabic sequence <(C)VCi>, with palatalization often affecting the medial consonant (e.g., blanket [baji], dolly, daddy [daji], and, by 16 months, belly [vei], bunny [buɲ:i, beiŋji], and shiny [ta:ji] along with such more radically restructured forms as elephant [ʔ ɪᴊɨ, ʔai:njʌ], flowers [pa:ji], iron [ʔaɪŋ:, ãɪ˜ji], lady [jeiji, ijei], and mommy [ma:ɲi, əma:ɲi] (notice the focus here on targets ending in -i, a common English pattern for disyllables, especially in speech to children).
The most commonly reported pattern is probably consonant harmony (less used in English than in a more rhythmically regular language such as Finnish, however: Vihman and Wauquier, in press). An example is seen in Menn’s (1971) diary study of her son Danny, whose first words appeared at 16 months and who developed, by 25 months, a strong harmonized <CVC> template (e.g., bread [bʌb], jeep [bip], dog [gɔg]). In contrast, Jaeger (1997) reports her daughter’s more unusual use, by the time she had some 100 words (age 23 months), of a front–back consonant melody used in both one- and two-syllable words, sometimes with metathesis to achieve the favored structure (e.g., butter [pʌtu], cheek [tikʰ], frog [pakʰ], but also David [pita], kite [taɪk], and sheep [piç]; see also Vihman and Croft 2007).
Two classic studies of templates in children acquiring English illustrate additional patterns. Waterson (1971) describes several different “schemas” or prosodic structures into which her son organized his word forms at a time when he had some 150 words in use (aged 17 to 19 months). These include monosyllables with sibilant coda (brush [byʃ], dish [dɪʃ], fetch, fish [ɪʃ], vest [ʊʃ]) and disyllables with reduplication or harmony (another [ɲaɲa], finger [ɲɪ:ɲɪ]; biscuit [be:be:], Bobby [bæbu:]). Priestly (1977) describes his son’s four-month use of a <CVjVC> pattern (another “melody”), at age 22 months, when he had well over 100 words. Here again some forms were relatively similar to the target (peanut [pijat], carrot [kajat]) while others freely restructured target forms (chocolate [kajak], flannel [fajan], rhinocerous [rajan]).
Harmony and melody patterns alike provide the child with support for planning as well as memory, in that a set frame with variable elements is more accessible for both purposes than a set of open choices. The very idiosyncracy of child templates makes it difficult to generalize from them, but these child “solutions” to the problem of remembering and producing a growing set of forms give us a good idea of what constitutes a challenge. As we see from these few examples, some templates address the problem of codas, others that of changing vowels or place or manner of consonants across the word; some deal with more than one of these issues.
To obtain “norms” indicating the consonant use to be expected at different ages, early studies used single-word naming tests based on picture presentation (e.g., Sander 1972). As Stoel-Gammon (1987) pointed out, this is not generally successful in arriving at an idea of two-year-old phonology, since many children of that age are resistant to testing and those able and willing to participate may not be representative of the age group as a whole. Accordingly, Stoel-Gammon used recordings of spontaneous speech to obtain data from a relatively large group of American children. Out of 34 participants all but one produced more than 10 different adult-based words in the session and were accordingly included in the study. The transcripts used for analysis were based on a maximum of 50 words; these variably reflected from 20 to 112 different word types (mean 36). This thus corresponds roughly to the period of word template use described above, although not all children necessarily make use of them (Vihman 2014: Appendix 3). Stoel-Gammon reports three analyses of her data: (i) word shapes produced, (ii) inventories of initial and final consonants, and (iii) accuracy (using Shriberg and Kwiatkowski’s (1982) percent consonants correct [PCC] measure).
Although rhythm is an important factor from the earliest period of speech processing, target-like rhythm is not arrived at in production until after some years of language use. Allen and Hawkins (1980) observed that the speech of children acquiring English tends to sound syllable-timed at age one or two, due to the relatively slow speech rate and children’s tendency, in the first months of speech production, to give full weight to each syllable and to produce peripheral rather than central vowels, even in unstressed syllables. Allen and Hawkins note the difficulty of assessing the development of phonological rhythm, given the myriad factors that enter into it – the various functions of phonetic duration and the mix of phonological, lexical, syntactic, and stylistic constraints on stress. However, the recent advances in measuring rhythm cross-linguistically mentioned above have led to corresponding advances in developmental accounts: Two recent studies compare children acquiring British English with children acquiring languages with more syllable-timed characteristics.
Mok (2011, 2013) investigated five children each acquiring English and Cantonese as monolinguals (as well as a group of bilingual children) at ages 3 and 2;6 years, respectively. The younger monolingual English children already had significantly more variability in overall utterance duration, more variability in successive syllables, and a lower proportion of vocalic intervals than the monolingual Cantonese, based on recordings of spontaneous speech; the differences in successive syllables reached significance only at age 3. As suggested in our opening discussion of rhythm, the impression of English as stress-timed depends in part on syllable structure. At 2;6 the simpler syllable types CV and CVC accounted for 71% of the syllables produced. Altogether, the five English-speaking children produced a mean of 6 syllables with clusters in any position (and attempted 10 such syllables); clusters were most commonly attempted and produced in the final position (CVCC). The monolingual English-speaking children also produced longer stressed than unstressed syllables in utterance-medial trochaic words.
Payne et al. (2012) investigated the speech of monolingual children acquiring English, Spanish, and Catalan, three each at ages two, four, and six years. They derived measures of the relative proportion of both vocalic and consonantal intervals from acoustic analyses of semi-structured conversations (based on pictured action scenes). They found differences by ambient language, even in the youngest children, with the English-learners producing a lower proportion of vocalic intervals already at age two. However, the variability in consonant intervals, which should be lower in more syllable-timed speech, proved to be higher in the children than in the adults overall and to decrease over time, even in English, despite the fact that the range of syllable types is greater in adult English than in the two Romance languages and should thus also increase developmentally over time. The reason for this somewhat paradoxical finding is that the relatively simple open syllables of the early years are accompanied by high variability in phonetic consonant duration, due to poor motor control. With increasing age and lexical knowledge the children gain phonetic mastery while at the same time making phonological advances (i.e., increased used of codas and clusters in all word positions), which leads to a more adult-like level of consonantal variability as well as to sharper cross-linguistic differences. Based on both of these studies, then, we can conclude that mastering the rhythmic pattern of English takes at least as long as achieving accurate segmental production.
What then shall we say of English as the language on which to base our understanding of phonological development in general? From the point of view of perceptual processing, English is readily accessible. Its strong lexical stress facilitates segmentation into words (contrast French, with its phonological phrase-based accent, for example) and words are basic to phonological learning.
However, from the point of view of production English seems to be relatively difficult. Although it has more monosyllables than most European languages, a learner advantage, it also has a relatively high proportion of diphthongs, clusters, and, relatedly, syllable types (although clusters are far more common in Slavic languages, for example, and may accordingly be produced more often, if not more accurately, at an early point in lexical development in children learning those languages; see Szreder 2013). We have indicated that stops, nasals, glottals, and glides are produced early in English, as in other languages. We should add that the interdentals, voiced fricatives, and rhotic approximant are typically the last consonants to be acquired. Furthermore, the production of the full range of consonant clusters is seen only after most English consonants have begun to be accurately produced and the characteristic rhythm of English is achieved only by about age six.
In fact, no one language provides an ideal “model” of acquisition:. The starting point is similar for children everywhere, given the biological foundations in prenatal exposure, ancient perceptual capacities, and slow motoric advances, but different ambient languages channel these capacities in different ways even before word use begins to appear.
Voiceless symbols indicate stops perceived as having short-lag VOT, voiced symbols, pre-voicing, a raised [ʰ], long lag.
Alice (Vihman, Velleman, McCune 1994): American English, 9–10 months | ||
[beɪbi] | baby | [pɛpɛː], [tɛɪtiː] |
[dædi] | daddy | [dæ] |
[hɑɪ] | hi | [hɑːiː], [ʔɑːjɛ], [hɑɪje] [haɪjʌ]… |
[mɑmi] | mommy | [mːɑnːə] |
[noʊ] | no | [nj] |
Daniel (Danny) (Menn 1971): American English, 20 months | ||
[gʊbɑɪ], [bɑɪbɑɪ] | goodbye, byebye | [bæbæ, baba, gægæ] |
[hɛloʊ] | hello | [hwoʊ] |
[hɑɪ] | hi | [hæ, haɪ] |
[noʊ] | no | [ono, no, nu] |
[noʊz] | nose | [o] |
[skwəɹl̩] | squirrel | [gæ, goʊ] |
Daniel (Stoel-Gammon and Dunn 1985): American English, 12 months | ||
[bənænæ] | banana | [nænæ] |
[lɑɪt] | light | [ai], [dai] |
[ʔʌʔoʊ] | uh-oh | [ʔʌʔo] |
[wɘsðæt] | what’s that | [wəsæ] |
Deborah (Vihman, in press): American English, 10 months | ||
[bæ:] | baa | [bæ:] |
[beɪbi] | baby | [be], [pipe], [bebe] |
[hɑɪ], [hɑɪjʌ] | hi, hiya | [hai], [ai], [haie], [aie], [e:], [a:] |
[mʌŋki] | monkey | [mam:ɛ] |
[ʌʔ:oʊ] | uh-oh | [ʌʔ:ɛ] |
Emily (Vihman, in press): American English, 13 months | ||
[bæ:bæ:], [bɑʊwɑʊ] | baa, bow-wow | [pæpæ], [bæbæ], [ʔapɪæ], [pæ:] |
[bi:dz] | beads | [bi], [phi] |
[dædi] | daddy | [tæ], [hadatɛ] |
[ʌp] | up | [ʌp], [ʌpə], [ʌpije], [æb] |
Jacob (Menn 1976): American English, 13 months | ||
[noʊ] | no | [nʌ:::], [ŋɛʌ] |
[dʒeɪkəb] | Jacob | [dikʌ], [dεikʌ], [gεikʌ], [æku], [dεikʌ], [æku] |
[θæŋkju:] | thankyou | [didʌ], [dɪdεjdi], [ɫɛjʌ], [daˈza], [di], [daˈdʌ], [bɛ], [dɜt], [gɑdu], etc. |
[ðɛɹ] | there | [dɑ], [dʌm], [dʌh], [dɛ], [dæ] |
[thoʊst] | toast | [dœʌ] |
Joan (Velten 1943): American English, 11–12 months | ||
[bæŋ], [bɑɾəl] | bang, bottle | [ba] |
[bʌs], [bɑks] | bus, box | [bas] |
[pʊɾɔn] | put on | [baza], [ba:za] |
[ðæt] | that | [za] |
[ʌp] | up | [ap] |
Jonah (Vihman 1996: App. B): American English, 13 months | ||
[bɑɾəl] | bottle | [bwɪdʊ] |
[bɑʊwɑʊ] | bow(wow) | [bɑʔ], [bʊɑ], [bæ] |
[ɛdgəɹ] | Edgar (dog’s name) | [dɑdɑ] |
[noʊ] | no | [ənæ::] |
Jonathan (Braine 1974): American English, 15 months | ||
[hɑɪ] | hi | [ʔai] |
[dʒus] | juice | [du] |
[noʊ] | no | [do] |
[si:] | see | [di] |
[ðæt], [ðɛɹ] | that, there | [dæ, dʌ, da, dɛ] |
Leslie (Ferguson, Peizer, and Weeks 1973): American English, 11 months | ||
[dædi] | daddy | [dædæ] |
[dɔgi] | doggie | [gaga] |
[mami] | mommy | [mama] |
[pæti], [pædi] | patty(-cake) | [bæbæ] |
Molly (Vihman and Velleman 1989): American English, 10–11 months | ||
[beɪbi] | baby | [bæpæ] |
[kɹ̥ækəɹ] | cracker | [pɑkæ], [kwɑ], [wæhk], [pækwɑ], [kʌk] |
[mu::] | moo | [meʔje] |
[nɑɪʔnɑɪt] | night-night | [hʌn:ʌ], [noʊnæ] |
Sarah (Stoel-Gammon and Dunn 1985): American English, 11 months | ||
[beɪbi] | baby | [bebi] |
[bɑɪbɑi] | byebye | [baɪbaɪ] |
[dɑgi] | doggie | [dɔgi] |
[dʒu:s] | juice | [dus] |
[mɑmɑ] | mama | [mama] |
Sean (Vihman and Kunnari 2006): American English, 12 months | ||
[Algçn] | allgone | [ɔdæ:] |
[buː] | boo | [pʊ] |
[dɑg] | dog | [tak] |
[tɪk] | tick | [tɛh], [tɪʔ], [tɪ̥], [tʊ˚t] |
[wʊf] | woof | [wʊ], [ʔʊʔ], [ʔoʊ] |
[skwəɹəl] | squirrel | [gæ, goʊ] |
T. (Ferguson and Farwell 1975): American English, 11 months | ||
[dædi] | daddy | [dæji, dæi] |
[dɑg] | dog | [dɔ] |
[hɑɪ] | hi | [ai], [hai] |
[si:] | see | [hi] |
Timmy (Vihman, Velleman, and McCune 1994): American English, 11 months | ||
[bɔl] | ball | [pæ], [bæ], [ʔəpæ], [ab:a] |
[blɑk] | block | [əphə], [ʔʌβæ], [pæ] |
[khɑɹ] | car | [kɑə], [ɑk:ɑh] |
[khɪɾi] | kitty | [khə̥], [khɑ̥], [kɑkɑ], [ʔukɑ]… |
[khwæʔkhwæk] | quack-quack | [khə̥], [khɑ], [khɑhkɑh], [gaga]… |
Tomos (Vihman, in press b): UK English, 17 months | ||
[bædʒə] | Badger | [babm̩:], [bʌbm̩] |
[bæŋ] | bang | [ba], [bæ], [baʊ], [da] |
[haɪja] | hiya | [jaja], [dajæ:] |
[nəʊ] | no | [na], [næ], [nə] |
[ta] | ta ‘thank you’ | [ba], [pa], [ba:], [ɹa:] |
Will (Stoel-Gammon and Dunn 1985): American English, 12 months | ||
[ɑldʌn] | all done | [dada], [ada] |
[dɑʊn] | down | [dæ], [dʌ], [dau] |
[lɑɪt] | light | [di] |
[ʃu:z] | shoes | [tsis, θiz] |
[ʌʔ:oʊ] | uh-oh | [ʔʌʔo], [hʌho] |