Summary of empirical evidence of phonetic category formation.
This chapter reviews theories and research about phonetic category formation in bilingual children. Investigating phonetic categories provides us with a way to answer one of the longstanding theoretical issues in bilingualism, that is, whether bilingual children possess one versus two linguistic systems in the learning of their respective languages. In this chapter, theoretical backgrounds of phonetic categories in bilingual adults and children are reviewed. Then, empirical evidence showing phonetic categories in bilingual children is summarized. Finally, a development model of phonetic category formation in simultaneous and sequential bilingual children is proposed. Based on the model, detailed phonetic categories do not form across-the-board and bilingual children may invoke multi-dimensional representations of phonetic categories.
- phonetic category
- stop consonants
- bilingual children
Over the last 30 years, the organization of phonetic systems in bilingual speakers has been extensively examined (see  for more information). Phonetic category formation refers to the processes by which bilingual or second language leaners come to distinguish phonetic details of share phonemes in each language . Most studies dealt with adult speakers who learn a second language (L2) after they have fully acquired their first language (L1). These studies mainly focused on how the influence of one language on the other depends on the learner’s age of exposure to the L2. The current chapter deals with phonetic characteristics of sound produced by bilingual children. Bilingual children are different from adult bilinguals or L2 learners in that their language systems of two languages continue to develop during developmental processes. Thus, it is of interest to examine bilingual children to understand how phonetic categories develop and are organized across languages and how L1 and L2 systems interact with each other. Two further questions are raised in investigating phonetic categories in bilingual children. First, currently available studies examining phonetic development in bilingual children mainly focused on stop consonant production; limited evidence regarding whether phonetic characteristics of other categories (e.g., vowels) are similar to stops is available. Second, children who are exposed to two languages are either simultaneous or sequential bilingual. Whether phonetic categories of simultaneous bilingual and sequential bilingual children show similar characteristics is not well-examined either. In order to address these questions, first, I outline the currently dominant theoretical models of phonetic category formation in bilinguals. Then, a comprehensive review of existing literature of phonetic categories in bilingual children is provided. Finally, a proposed model of development of phonetic category formation is formulated. Directions for future research on phonetic category formation in bilingual children are also suggested.
2. Theories of bilingual speech acquisition
2.1. Speech learning model for phonetic systems in adult bilingual and L2 speakers
Flege  offers the Speech Learning Model (SLM) to account for how individuals learn to produce the vowels and consonants of their L2. The aim of the model is to explain production limitations of experienced L2 learners focusing on more perceptual aspects of learners rather than their motoric constraints. Thus, SLM posits that difficulty to produce a certain L2 phoneme is attributed to a perceptual limitation to discern the sound, and not to production difficulty. A basic assumption in Flege’s model is that phonetic elements of the L1 and L2 are related to each other at the level of allophones, and the language-specific aspects of speech sounds are formed in long-term representation called phonetic categories. Since perception plays an important role in the establishment of phonetic categories, if bilingual speakers are able to perceive phonetic differences between L1 and L2, then a new phonetic category can be established for the L2 sound. The likelihood of establishing a new category is further increased by the degree of dissimilarity between an L2 sound and it’s closest related L2 sound [1, 3].
Flege further hypothesized that a single phonetic category is used to process similar L1 and L2 sounds due to equivalent classification. If sounds in the L1 and L2 are perceptually linked, then their perceived similarities may block category formation by what Flege refers to as the “mechanism of equivalence classification” . For this process, phonetic category assimilation may occur. Flege and Eefting  examined the voice-onset-time (VOT) values of Spanish and English stop consonants as produced by Spanish-English bilinguals. They noted that Spanish-English bilinguals produced stop consonants in English with VOT values resembling those seen in Spanish, suggesting that phonetic category formation was blocked due to similarities in stop consonant production. Flege also predicts two circumstances in which bilingual productions may differ from that of monolinguals: a bilingual’s category formation is deflected away from the L1 category in order to maintain phonetic contrast between categories sharing a common L1-L2 phonological space; as a result, representations produced by bilinguals are based on features different from monolinguals. Bohn and Flege  investigated the production of German and English vowels by adult German learners of English. They note that these bilingual speakers produced vowels in such a way that they were able to maintain contrasts within the individual’s phonological space. During this process, phonetic category dissimilation may take place.
2.2. Linguistic system models in bilingual children
While SLM was developed to explain adult bilingual and L2 learners, the linguistic system model is a theory regarding language acquisition in bilingual children. The focus of this theory is whether bilingual children develop one or two linguistic systems in the learning of their respective languages. The one-system model, known as the Unitary Language System (ULS), was originally hypothesized by Volterra and Taeschner  and the two-system model, also known as the Dual Language System (DLS) hypothesis, was posited by Genesee . Under the ULS model, during early language development, bilingual children would take received input from both languages and combine the information into a single language system. As the language acquisition process continues, bilingual children develop more advanced linguistic skills, and undergo a differentiation process. It is during this process that these children distinguish between languages and achieve bilingual status. The DLS hypothesis stands as an alternative to the ULS hypothesis. The DLS hypothesis posits that children establish two separate linguistic systems from the beginning of the language acquisition process. Under this model, children receive dual language input and separate this information into two distinct language systems. These children do not undergo a period in which their linguistic systems are merged. They have separate linguistics systems from the onset of the acquisition processes; thus children are always considered to be bilinguals under the DLS hypothesis. Since the ULS and DLS hypotheses are significant for understanding bilingual children, more detailed information on each hypothesis is discussed.
2.2.1. Unitary Language System hypothesis
Under the ULS hypothesis, Volterra and Taeschner  claim that between infancy and the age of 3, children progress through three stages in order to become bilingual. The first stage of language acquisition in bilingual children shares many similarities with the language development of monolingual children. As children receive language input from both languages, they organize the information into one system. Volterra and Taeschner provide evidence for this by noting a lack of translation equivalents during the early stages of language development. Children receiving dual language input appear to avoid learning words in both languages that share the same meaning. Volterra and Taeschner developed three stages based on a study conducted with two Italian-German bilingual sisters and from data taken from Leopold . Speech samples from these three subjects were taken between 1 year and 2 months (1.2) and 3.9. Their parents indicated that they used the one parent, one language policy and thus only spoke to the children in their native languages. Data obtained during their study seemed to suggest that children do in fact learn translation equivalents between languages. Volterra and Taeschner refuted this idea by suggesting that word meanings have contextual ties which influence the child’s use of a word; thus they would not be considered a translation equivalent. During the second stage of language acquisition, the child is able to differentiate between the lexicons of each language but still continues to apply the same syntactic rules to both languages. Evidence for this stage of language development in bilingual children is seen in the presence of translation equivalents. The child’s language now indicates that he or she has words in both languages with equivalent or corresponding meanings. The presence of translation equivalents indicates that the child is able to distinguish lexical items of one language from the other, and sort them by language. Despite the distinction made between lexical items of each language, the incorporation of grammatical components from one language into the other continues to suggest a unified system. During the third stage, language acquisition in the bilingual child is complete. Both the lexical and syntactic linguistic systems are differentiated. Volterra and Taeschner found that the children from the Italian-German study and the Leopold study distinguished and applied the appropriate syntactic rules of each respective language as early as 3.9. It is at this stage that children become bilingual.
2.2.2. Dual Language System hypothesis
Paradis and Genesee  argue that bilingual children may acquire separate linguistic systems, and pose an additional option to the DLS by further categorizing these systems into autonomous (no interaction between the two language systems) and interdependent (interaction between the linguistic systems). If these linguistic systems are formed autonomously, then we would expect acquisition of each language by a bilingual to mimic that of a monolingual speaker of each respective language. However, if the two linguistics systems interact during language acquisition, we would expect to see three processes such as transfer, acceleration or delay in each language. Transfer occurs when bilinguals incorporate grammatical elements of one language into the grammar of another language. Acceleration happens when grammatical properties occur earlier in bilinguals than in typical developing monolinguals. Delay is the process in which the burden of simultaneous language acquisition causes delays in the grammatical development of bilinguals when compared to monolinguals.
3. Empirical evidence of phonetic category formation in bilingual children
This section reviews research examining phonetic categories of bilingual children. Investigating phonetic categories in bilingual children started in early 1980s and continued until now, although limited studies have been conducted. This section only reviews studies examining normally developing bilingual children and adolescents focusing on speech production. If any bilingual study examined only one language without addressing the theoretical question (e.g., one vs. two systems or how one language influences the other language), the study is not included in this review [e.g., 11, 12]. After reviewing studies that met the inclusion criteria, the studies are summarized based on the following aspects such as languages, sound category, age of bilingual children, type of bilingual children, etc. (see Table 1).
|Languages||Ages (years; months)||Types of bilingual||Number of bilingual participants||Types of study||Sound category||Monolingual control group||Two systems|
|Konefal and Fokes ||4, 7, 10||Sequential||3||Case study||Stop||No||Unknown|
|Deuchar and Clark ||1.7, 1.11, 2.3||Simultaneous||1||Case study||Stop||No, parents’ input speech||Yes|
|Yavas ||2nd graders||Sequential||10||Group||Voiceless stop||No||Yes|
|Fabiano-Smith and Bunta ||3||Sequential/simultaneous||8||Group||Voiceless stop||Yes||No|
|Muru and Lee ||5–6, 10||Sequential||32||Group||Stop||No||Yes|
|Baker and Trofimovich ||10, 16 and adults||Sequential||40||Group||Vowel||Yes||Yes|
|Lee and Iverson ||5, 10||Simultaneous||30||Group||Stop||Yes||Yes|
|Lee and Iverson ||5,10||Simultaneous||40||Group||Vowel||Yes||Yes|
|Lee and Iverson ||3||Simultaneous||12||Group||Stop/vowel||Yes||Yes/no|
|Johnson and Wilson ||2.10, 4.8||Simultaneous||2||Case study||Stop||No, parents' input speech||Yes/no|
|Harada ||6, 8, 10||Sequential||15||Group||Voiceless stops||Yes||Yes|
|Watson ||5, 6, 8, 10||Simultaneous||20||Case study||Stop||No||Yes|
|Mack ||10||Simultaneous||1||Case study||Stop||Yes||Yes|
|Yang et al. ||3.7–5.3||Sequential||1||Case study||Vowel||No||Unknown|
|Yang and Fox ||5||Sequential||15||Group||Vowel||Yes||Yes|
|Khattab ||5, 7, 10||Simultaneous||3||Case study||Stop||Yes||Yes/no|
|Whitworth ||9.11, 12.5||Simultaneous||2||Case study||Stop||No||Yes|
|Kehoe et al. ||2–3||Simultaneous||4||Case study||Stop||Yes||Yes/no|
|Simon ||3||Sequential||1||Case study||Stop||No||Unknown|
It is necessary to define bilingual children before each study is discussed. Bilingual children are commonly categorized into simultaneous and sequential , but the ages at which each group is categorized vary depending on the researchers. For example, Padilla and Lindholm  apply the term bilingual speaker to individuals who have simultaneously acquired two languages, and have generally received an equal amount of exposure and input from each language from birth. Genesee et al.  apply the term to individuals who have been exposed to their L2 within the first year of life while McLaughlin  and Hamers and Blanc  considered simultaneous bilingual children as having acquired the L2 before the L1 is established. Based on the Padilla and Lindholm as well as Genesee et al.’s definitions, certain bilingual children such as Korean-English bilinguals are always categorized as sequential or consecutive because most children in that language started to be consistently exposed to English only when they were enrolled into English-speaking daycare centres, preschools or kindergartens (unless one of the parents is English-speaking) . This may lead to considerable heterogeneity of sequential bilingual children . Lee and Iverson  argued that it is necessary to identify when L1 is established in order to determine bilingual status as simultaneous versus sequential. In other words, the identification should be based on a solid developmental milestone rather than an arbitrary age. In this chapter, following Hammers and Blanc, I consider simultaneous bilingual children as those who first learned L1, and then L2, before 5–6 years of age because a child’s sound system is not fully developed until 7 years of age . Although a child is exposed to L2 before age 5 or 6, he or she should be exposed to L1 and L2 for a substantial period to become a simultaneous bilingual. If any study tests 3-year-old bilingual children who had been exposed to L2 for less than 1–2 years, these children are considered as sequential bilinguals.
To my knowledge, the earliest studies examining stop production in bilingual children were conducted by Bond et al.  and Konefal and Fokes . Two of three children in Konefal and Fokes were also included in Bone et al., when they were young. Thus, only Konefal and Fokes’s results are discussed here. Konefal and Fokes examined three female Spanish-English children who were born in a Spanish-speaking country and moved to the US. These children were 4, 7 and 10 years of age. It is not certain about the duration of English language exposure, but these children had been exposed to English for approximately 3 years. Both English and Spanish stops were examined. English and Spanish languages have both voiced and voiceless stops, but the acoustic features (e.g., VOT) are different between the languages. VOT refers to the temporal interval between the release of stop closure and the onset of voicing of a following vowel. English voiced stops are produced with short lag VOT whereas Spanish voiced stops are produced with voicing lead. English voiceless stops are produced with long lag VOT while Spanish voiced stops are produced with short lag VOT. Since the 10-year-old girl had a language disorder, only results of the other two children are discussed here. The authors found that the 4- and 7-year-old children produced English and Spanish voiced stops and voiceless stops differently. The 7-year-old child was able to produce Spanish voiced stops with voicing lead, but the 4-year-old child was not able to. These studies mainly focused on comparing between normal and disordered children without direct comparisons between English and Spanish phonetic categories. It is not certain whether the bilingual children distinguished stops across English and Spanish.
Watson  examined stop consonants of 5-, 6-, 8- and 10-year-old French-English bilingual children. Five children for each age group were recruited in this study. Compared to Bone et al. and Konefal and Fokes, all children were well-balanced between English and French. Similar to Spanish, French stops are produced with either voicing lead (voiced stops) or short lag (voiceless stops). Watson found that the 5-year-old children had established voiced and voiceless contrast for only English, but not for French. Three of the five children did not show voiced and voiceless contrast. However, by the age of 6, the bilingual children developed voiced and voiceless distinction for each language. Watson also reported that VOT values decrease as age increases in bilingual children. Due to the small number participants per age group (five children), no statistical analyses were conducted. Although Watson examined both English and French within each bilingual child, he did not systematically compare English with French stop categories. The main interest was whether bilingual children demonstrate voiced and voiceless distinction in each language. Furthermore, bilingual children stop productions were not compared with those of monolingual English- or French-speaking children. Regardless of these limitations, Watson concluded that bilingual children can and do master two separate patterns.
In the 1990s, limited studies were still made to examine bilingual children’s production characteristics. Unlike previous studies, however, these studies employed control data from monolingual counterparts or input speech to compare bilingual children’s speech. Mack  examined stops produced by a 10-year-old French-English bilingual child and a monolingual English- or French-speaking child. Her question was to investigate the extent to which the two languages of a bilingual are interdependent or influence each other. Mack found that the French-English bilingual child produced English voiced stops similarly as compared to the English monolingual child; however, French voiced stops produced by the bilingual child were different from the monolingual French child. The French voiced stops were produced with short lag VOT like English stops, exhibiting transfer from English into French. In terms of voiceless stops, the bilingual child’s English voiceless stops were produced with much longer VOT than the monolingual child, but within a normal range. Although the author did not specify the mechanism for the longer VOT in this child, the longer VOT may be explained as a dissimilation effect to maximize different voiceless stops between English and French. The bilingual child’s French voiceless stops were produced with a longer VOT than the French monolingual child; but its VOT values were not within a normal range. Mack claimed that the bilingual child showed some degree of independence between the phonetic systems of his two languages in that the bilingual child demonstrated a distinction between the VOTs of his English and French voiceless stops; but there was also evidence of L2 language influence on L1.
Deuchar and Clark  examined a younger bilingual child in order to investigate early acquisition of the voicing contrast in the child’s two languages. This child was exposed to both English and Spanish relatively equally from birth by a Spanish-speaking father and an English-Spanish bilingual mother. Deuchar and Clark collected VOT measurements of utterance-initial stops in both English and Spanish productions made at three ages 4 months apart, which corresponded to the following ages: 1.7, 1.11 and 2.3. This study differed from previous research in that it also analysed the data that served as the Spanish and English input for the child, thus allowing for an additional layer of comparative analysis not typically seen in other studies. The authors found a lack of a voicing system in both English and Spanish at age 1.11, the establishment of a clear voicing system in English at age 2.3 but only the beginnings of a similar system in Spanish. The Spanish data did not reflect the caregivers’ voicing contrasts but rather progression towards an English-adult speaker voicing contrast. Interestingly, an analysis of the parent’s productions in Spanish revealed that the lag measurements were comparable to those of the child at age 2.3. When English and Spanish stops were compared within the child, voiceless stops were significantly different from each other; but voiced stops were not by 2.3. Deuchar and Clark claim that “at least, there is not a single, unified English/Spanish system” (, p. 363). The child may acquire English stop pairs earlier than Spanish because of the greater differences in the lag between English voiced and voiceless stops. Although they included speech input as a comparison, age equivalent monolingual children’s data are still needed to fully understand bilingual child’s phonetic category formation.
In the 2000s, more and more researchers examined phonetic category development in bilingual children. Small case studies were mainly conducted during an initial period; however, a relatively larger number of bilingual children followed. Unlike previous studies, studies during this era examined vowels in addition to stops. Khattab  tested three English-Arabic bilingual children (aged 5, 7 and 10) and age equivalent English- or Arabic-speaking children. The three bilingual children were siblings and raised in a city in the UK. Both parents were native Arabic speakers. Arabic was spoken to the children at home, but all three bilingual children were English-dominant. Arabic stops fall into two categories: stops with voicing lead and stops with short lag, similar to Spanish and French. The author found that the 5-year-old bilingual child only distinguished voiceless stops across languages, but she produced similar VOT for Arabic and English voiced stops. Arabic voiced stops were produced with short lag, instead of voicing lead. The other older children had acquired distinct VOT patterns for both voiced and voiceless stops, but the patterns did not always mirror those of their monolingual counterparts. The oldest child failed to produce the Arabic voiced stops with voicing lead VOT, suggesting that an interaction effect of English on Arabic.
Another small scale study examining different language users was conducted by Johnson and Wilson . They examined two Japanese-English bilingual children whose ages were 2.10 and 4.8. They were sisters that lived in a bilingual family in Japan. When the children were 2.11 and 1.1, they moved into Canada. Both children had been exposed to a relatively equal amount of English and Japanese at home based on the one parent and one language principle. Both Japanese and English stops were examined using VOT. Japanese stops are similar to Spanish, French and Arabic in that voiced stops are produced with voicing lead whereas voiceless stops are produced with short lag VOT. Similar to Deuchar and Clark , parents’ input speech was collected for comparison as well as VOT values from existing literature (, for English) and Homma ([29, 30] for Japanese). The authors found that both children differentiated voiced and voiceless stops for each language. English voiced stops were produced with short lag whereas English voiceless stops were produced with long lag. None of the bilingual children produced Japanese voiced stops with voicing lead. Both bilingual children produced Japanese voiceless stops with long lag, which may be an influence from English. In short, the younger child produced similar English and Japanese stops for either voiced or voiceless; the older child produced English voiceless stops with longer VOT than Japanese voiceless stops. Although the authors did not specify the underlying mechanism for the longer VOT, it may be considered as a dissimilation process to maximize English and Japanese voiceless stop categories.
Kehoe et al.  examined another language group of bilingual children, that is, four Spanish-German bilingual children aged 2.0–3.0. Voicing contrast and VOT values between German and Spanish are similar to those of English and Spanish. The bilingual children’s VOT production was compared to three German children and to previous literature findings in Spanish. They found three patterns of VOT development. First, two bilingual children showed delay in the phonetic realization of voicing. These children did not acquire German voicing contrasts; Second, one child showed a transfer effect that he produced German voiced stops with voicing lead (Spanish-like) whereas he produced Spanish voiceless stops with long lag VOT (German-like); third, one child did not demonstrate any cross-language influence. By age 3, none of the German-Spanish bilingual children acquired Spanish voiced stops. In terms of cross-languages, two children distinguished German and Spanish voiceless stops; however, the other two children did not make such distinctions.
While previous studies mainly focused on stop productions, limited studies started investigating vowel production in bilingual children. Whitworth  examined vowel length and VOT acquisition in two German-English bilingual children, aged 9.11 and 12.5. Both children were exposed to both languages from birth based on the one parent, one language approach. The mother only spoke German whereas the father only spoke English to the children. English was the language used while the children attended schools and communicated with their friends. Thus, these children were English-dominant. The 9-year-old child possessed an English accent when he spoke German while the 12-year-old’s German is native-like with a northern standard German accent. German and English are produced with short lag VOT for voiced and long lag VOT for voiceless stops with a small difference in VOT values within each category. The author found that the younger child distinguished German and English voiceless stops, but not voiced stops, whereas the older child differentiated both voiced and voiceless stops across two languages. The author argued that the results seem to support two linguistic hypotheses. However, the VOT patterns these children showed were different from English and German. For example, although the younger child distinguished English and German voiceless stops, English VOT was shorter than German VOT, which shows the reverse pattern. Similarly, the older child produced longer VOT for German voiced stops than for the English voiced stop, which also appears to be a reverse pattern. One of the major criticisms of this VOT study was that the author did not control the place of articulation. It is well known that VOT for velar stops are produced with longest VOT for both voiced and voiceless stops. However, the author did not provide any information on how many tokens were included regarding place of articulation. In addition to VOT, the length of tense and lax vowels was also examined in Whitworth’s study. According to the author, German lax vowels are approximately half as long of German tense vowel . English tense vowels are one-third longer than English lax vowels . Both children produced English tense vowels significantly longer than German tense vowels; however, they did not differentiate English and German lax vowels. Although the author claimed that the younger child did distinguish them, the difference was marginal (p < .06).
So far, most studies were limited to case studies with a small number of participants involved. A larger group study was conducted by Baker and Trofimovich  to investigate how the phonetic vowel representation would be similar or different between long and short exposure duration for each age group. In this study, Baker and Trofimovich included four groups of Korean-English bilingual speakers. All participants were born in Korea and moved to the US at various ages. Two groups were adults with either shorter (M = 0.9 year) or longer (M = 6.9 years) exposure duration to English. The other two groups were older children. One of the children’s group was aged 10.2 years with 1.3 years of exposure duration; the other group was aged 16.9 years with 8 years of exposure duration to English. The authors found that the earlier the exposure to two languages, the more likely a bilingual will produce distinct acoustic realization of L1 and L2 sounds. For example, bilingual children with longer exposure duration distinguished English /ɪ/ from /i/, /æ/ from /ɛ/ and /u/ from /ʊ/ better than children with shorter exposure duration. They also found an L2 transfer effect on L1 in that the Korean /u/ was centralized in bilingual children with longer exposure. Age equivalent monolingual children were also recruited to compare with bilingual children with early exposure. They found that these bilingual children produced English /i/, /u/ and /ɛ/ similarly as monolingual children but they differed from monolingual children in their production of English /ɪ/, /ʊ/ and /æ/. They produced these vowels with higher F1 vowels than monolingual children. The authors suggested that the bilingual children with longer exposure duration demonstrated some evidence of L1 vowel influence on L2 vowels. Baker and Trofimovich conducted the first well-designed group study to provide important findings for how bilinguals organized their phonetic systems and the complex interactions between L1 and L2. However, these bilingual children started to be exposed to English at a later age, it is not certain whether language influence patterns appear in young bilingual children who acquire both languages at a young age.
Yavas  conducted a study where he examined older aged Spanish-English sequential bilingual children (10 2nd graders). These children were monolingual Spanish-speaking until age 5 in Florida, US; then started learning English in kindergarten and had been exposed to English 2–3 years. Unlike previous studies, Yavas used mixed sentences to elicit Spanish and English stops. For example, “Pon el papel on the table”. Only voiceless stops were elicited in both languages. Yavas did not conduct any statistical analysis; only a qualitative description for each individual child was addressed. The author reported that Spanish-English consecutive bilingual children’s data supported that heterogeneity of bilinguals. One bilingual child’s stop production was similar to monolinguals; this child manifested a totally separate system for English or Spanish. Four of the bilingual children showed a separate system for both languages with variations. For instance, one child produced bilabial English stops with shorter VOT, but with an acceptable range and the other child differentiated one place, but not the other places. Only one child did not differentiate two systems at all. Yavas concluded that the bilingual children showed unique and specific linguistic patterns. Yavas collected Spanish stops from mixed sentences while English stops with English only sentences. It is not certain whether such method leads to accurate production results. Also, Yavas examined only voiceless stops for older age children. It would be more useful if both voiced and voiceless stops were examined. In fact, whether voiced Spanish stops are influenced by English would be of interest.
Harada  examined VOT produced by 15 English-Japanese bilingual children in a Japanese immersion program in the US. The bilingual children were from grade 1 (age 6), grade 3 (age 8) or grade 5 (age 10). The children’s primary language is English, but they started to learn Japanese after enrolling in the immersion program. Thus, these children are categorized as sequential or consecutive bilinguals. This study also included 5 English-Japanese bilingual adults, 10 monolingual Japanese children, 5 monolingual Japanese adults and 5 monolingual English adults. However, no monolingual English-speaking children were included. Also, five English-Japanese bilingual teachers in the immersion program participated. Only English and Japanese voiceless stops were examined. Harada found that the bilingual children produced Japanese voiceless stops with significantly longer VOT values than the monolingual Japanese children and the immersion teachers. Within the comparison, the bilingual children’s Japanese stops were produced with significantly shorter VOT than English voiceless stops. These results indicated that the bilingual children make a phonetic distinction between Japanese and English although their VOT values are different from monolinguals.
In the 2010s, more comprehensive studies examining phonetic category formation have been conducted. Each study employed a relatively large number of children, and compared bilingual children’s speech with that of monolingual counterparts. Also, recent studies examined a variety of bilingual language groups such as Korean-English, Chinese-English or Dutch-English bilingual children. In addition, these studies made attempts to evaluate SLM in bilingual children.
Lee and Iverson [2, 19, 38] conducted a series of studies examining phonetic category formation in Korean-English bilingual children. First, Lee and Iverson  examined the phonetic representation of Korean and English stops produced by 5- and 10-year-old Korean-English bilingual children. The bilingual children’s stop productions were compared to age equivalent English- and Korean-speaking children. They had two research questions. First, when do Korean-English bilingual children establish fully independent phonetic systems for each language? Second, what kind of mechanisms (assimilation or dissimilation) do bilingual children employ in their development process? Each age or language group was compared of 15 children; a total of 90 children participated in this study. Investigating Korean-English bilingual children was of interest because Korean stops show a three-way laryngeal contrast and are distinguished by vowel-onset fundamental frequency (hereafter fo) in addition to VOT . Unlike bilingual children whose languages have only voiced and voiceless distinctions for stop category, Korean-English bilingual children may have difficulty in differentiating phonetic categories of stops due to its complexities.
Lee and Iverson  reported that Korean-English bilingual children were able to make phoneme distinctions within each language. Both age groups of bilingual children clearly produced all English and Korean stops. When the authors compared English and Korean stops produced by 10-year-old Korean-English bilingual children, it was found that all possible comparisons were significantly different in terms of either VOT or fo values, indicating that 10-year-old Korean-English bilingual children established fully distinctive stop categories across two languages as monolingual English- or Korean-speaking children did. However, 5-year-old bilingual children did not distinguish stop categories across languages when they fall in the same VOT regions although these stop pairs were fully distinctive in monolingual children. For example, English voiced and Korean fortis stops are produced with short lag VOT. When compared, Korean fortis were produced with significantly higher fo values than English voiced stops. Similarly, English voiceless and Korean lenis and aspirated stops are produced with long lag VOT. Korean lenis stops are produced with lower fo than English voiceless whereas Korean aspirated stops are produced with longer VOT than English voiceless stops. These stop pairs were significantly different between the two 5-year-old monolingual groups, but not by 5-year-old children.
When the stop production was compared between bilingual and monolingual children, it was found that 10-year-old bilingual children showed longer VOT for Korean lenis and aspirated stops than monolingual Korean children. The bilingual children produced shorter VOTs for English stops than monolingual English-speaking children. The bilingual children also showed different fo values than monolingual children. They produced lower fo for Korean aspirated stops. These results were interpreted that Korean-English bilingual children employed both assimilation and dissimilation depending on age. Dissimilation took place by producing VOT longer than monolingual children in order to maximally distinguish all stops within a long lag region. Although a merged category was not found, 10-year-old bilingual children produced lower fo, indicating that lower fo in English may influence their fo for Korean stops.
Fabiano-Smith and Bunta  examined Spanish and English voiceless stops produced by eight 3–4-year-old Spanish-English bilingual children. Some bilingual children had recently arrived in the US, while the parents of other bilingual children had grown up in an English-speaking community. Regardless, the bilingual children attended a bilingual preschool where both languages were used and the language of the classroom alternated each day. Thus, both simultaneous bilingual and child L2 learners were included in this study. The bilingual speech was compared to eight monolingual Spanish or eight English-speaking children. Only bilabial and velar voiceless stops were examined. The authors found that although English or Spanish VOT values were significantly different between monolingual English- and monolingual Spanish-speaking children, these values were not significantly different in bilingual children. In terms of between group comparisons, English VOT values produced by bilingual children were significantly different from monolingual English-speaking children; however, their VOT values of Spanish were not different from monolingual Spanish-speaking children. The authors suggested that the results of this study provide evidence to support Flege’s claim, that is, equivalent classification that L1 may trigger assimilation of the L2 segmental category. This study provided important information that monolingual English- and Spanish-speaking children did distinguish voiceless stops across languages. However, since the authors did not test whether monolingual children distinguish English and Spanish voiced stops, it would be more comprehensive if they tested voiced stops in their study.
Muru and Lee  examined VOT produced by 5–6-year-old and 10-year-old Spanish-English bilingual children. These children were raised in a Spanish-speaking home and started to learn English at English-speaking daycare centres. Thus, these children were categorized as sequential bilingual children. The authors did not include monolingual counterparts. Thus, only Spanish and English VOT values produced by bilingual children were compared. The authors found that the 5–6-year-old Spanish-English bilingual children only made a distinction between English and Spanish for voiceless stops, but not for voiced stops. On the other hand, 10-year-old Spanish-English bilingual children were able to distinguish both voiced and voiceless stops across English and Spanish. One exception was that no significant difference was found between English voiced and Spanish voiced for velar place of articulation. This study was a good extension of Fabiano-Smith and Bunta’s study in that older aged Spanish-English bilingual children were examined. It seems that phonetic category formation is not established between English and Spanish at 3 years of age; however, their phonetic representation develops as they grow older and distinctive phonetic categories for voiceless first evolve at 5 years of age. Finally, phonetic category formation for English and Spanish stops is established at 10 years of age. These results were similar to Korean-English bilingual children, confirming that phonetic category formation is fully established by 10 years of age, but not 5 years of age.
In another study, Lee and Iverson  examined English and Korean vowels produced by 5- and 10-year-old Korean-English bilingual children. In their previous study , Lee and Iverson found that phonetic category for stops were established in 10-year-old Korean-English bilingual children, but not in 5-year-old children. The goal of this study was to determine when phonetic category formation takes place for vowel production. Is it similar to stop production? The same cohort of Korean-English bilingual children in Lee and Iverson’s stop study participated in this vowel study. Thus, all children characteristics were the same. Unlike Baker and Trofimovich’s study that involved Korean-English bilingual children who learned English after they fully acquired Korean, the Korean-English bilingual children in Lee and Iverson’s study had been exposed to both English and Korean for at least 2 years (5-year-olds) and 5 years (10-year-olds). First and second formant frequencies (F1 and F2) were measured. When bilingual and monolingual children were compared, English vowels were similar between the two groups except for a few vowels whereas F2 values of Korean vowels /u/ and /o/ were significantly higher in bilingual children than in monolingual Korean children, indicating English language influence on Korean vowels. When English and Korean vowels were compared within bilingual children, these vowels were grouped into four groups: high-front /i, ɪ, e/; non high-front /ɛ, æ/; high-back /u, ʊ, o/ and non high-back /ʌ, ɑ, ɔ/. The results showed that F1 and F2 values for high-front vowels were distinguished based on F2 values except for Korean /i/ and English /i/. In terms of non high-front, English /ɛ/ and Korean /ɛ/ were similar to each other; but they were different from English /æ/. All high-back vowels were produced fully distinctively; none of the F1 and F2 values overlapped each other. F1 or F2 values of non high-back vowels were also significantly different except for English /ɔ/ and Korean /ʌ/. The authors claimed that detailed phonetic categories across languages are not formed holistically in an across-the-board fashion. In other words, vowel acquisition is typically earlier than stop acquisition in monolingual children. Phonetic category formation also takes place in vowels earlier than stops in bilingual children. The authors also found little evidence regarding assimilation and dissimilation. The higher F2 of Korean /u/ was interpreted as evidence of assimilation that the centralized English /u/ influences the Korean /u/. This finding parallels the findings of Flege  with higher F2 than is characteristic of native French. The authors also found evidence of dissimilation in that 10-year-old Korean-English bilingual children produced the vowel /æ/ with higher F1 than found among monolingual English-speaking children. Korean-English bilingual children may exaggeratedly lower the tongue in the production of /æ/ to maximally distinguish it from the vowel /ɛ/.
Recently, Lee and Iverson  examined when phonetic categories of stops emerge in 3-year-old Korean-English bilingual children and whether phonetic category formation takes place similarly between two different sound categories. The bilingual children were exposed to both Korean and English languages from birth to 18 months. The authors examined both English and Korean stops as well as front vowels produced by 12 bilingual, 15 monolingual Korean and 15 monolingual English-speaking children. VOT and fo values of English and Korean stops and F1 and F2 values for English and Korean vowels were measured. The study found that monolingual and bilingual children produced English or Korean vowel phonemes distinctively. When English and Korean were compared, both monolingual and bilingual children did not distinguish any stop categories within the same VOT region; neither VOT nor fo was different across English and Korean stops. However, the bilingual and monolingual children produced stops differently in that the bilingual children produced higher fo values for English voiceless stops. While stops were not produced distinctively by both monolingual and bilingual children, both groups produced English and Korean vowels significantly differently for Korean /i/ and English /ɪ/ pairs as well as Korean /ɛ/ and English /æ/ pairs. When English vowels were compared between monolingual and bilingual children, no group differences were found in either language, indicating that Korean and English vowels produced by the 3-year-old bilingual children were similar to monolingual children. The authors concluded that phonetic categories in 3-year-old children develop without much interaction between the two languages in simultaneous bilingual children exposed to two languages at an early age.
Lee  further examined VOT values produced by 3-year-old sequential Korean-English bilingual children. These children had been exposed to both languages for only 6–8 months. They had very limited English language abilities when the study was conducted. The author found that these bilingual children showed some evidence to distinguish English and Korean stops in that English voiced and Korean fortis stops were produced differently. Korean fortis stops were produced with higher fo than English voiced stops. However, the Korean lenis and aspirated and English stops were not significantly different from each other. Since these children fully acquired Korean stops when they were exposed to English, the phonetic distinction between Korean fortis and English voiced stops may be salient to these children. Although the sequential Korean-English bilingual children distinguished English voiced and Korean fortis stops, the other consonants were not distinguished from each other, suggesting that these children did not fully acquire phonetic category formation in stop production. Since the author did not compare the bilingual child with monolingual counterparts, it is not certain whether sequential bilingual children’s stop production is similar or different from monolingual children.
Another study examining a 3-year-old sequential bilingual child was done by Simon  and Yang . Simon reported a longitudinal case study examining the acquisition of English and Dutch stops. Dutch voiced and voiceless stops are produced with voicing lead and short lag, respectively that is similar to Spanish. The first recording was made 3 months after his exposure to English until 4.0 in 11 sessions. The author found the bilingual child successfully mastered the English contrast within a 7-month period, but the child’s L1 system showed changes. The percentage of Dutch voiced stops produced with voicing lead decreased 30% at the end of session, suggesting the influence from L2 on L1. Yang longitudinally examined a Chinese-English bilingual child’s vowel production for a 20-month period. Recording began when the child started to attend an English language preschool at age 3.7. Approximately, one recording session was made each month until 5.2. The author found three phases of vowel development. During the initial phase, several broad L1 categories are clustered near the three L1 corner vowels (/i, u, a/). Then, the child began to contrast among individual vowels in L2 with great production variation. Finally, the child’s vowel system was stabilized and reduced within category variation. Acoustic vowel space of English and Chinese was compared during the period. While Chinese vowel space was relatively stable, the child’s English vowel space showed substantial changes in both size and shape. Because these two studies did not compare stop or vowel segments between the two languages, it is not certain whether these children showed distinctive phonetic categories across languages.
Yang and Fox  further examined Chinese and English vowels produced by 5–6-year-old Chinese-English bilingual children as a group. Fifteen bilingual children participated; the children were divided into two groups depending on their English language proficiency. The authors found that although no significant difference of vowel formant frequencies among three groups (monolingual English and two bilingual children), bilingual children with low English proficiency showed greater variation and slight positional changes. Furthermore, the bilingual children with high English proficiency showed better separation among the vowel categories, similar to that of the monolingual English-speaking children whereas the bilingual children with low English proficiency showed great overlaps for most vowel pairs than the other groups. In addition, shared vowels of English and Chinese were compared. The authors reported that no significant difference was found for English and Chinese /i/ by monolinguals, but the other shared vowels were fully separated from each other. Two groups of bilingual children showed similar production patterns. The authors concluded that L2 vowel systems in the bilingual children with low English proficiency were strongly influenced by their L1. The bilingual children produced L2 vowels in a near-native manner, but some L2 features were transferred to L1 vowels, suggesting an assimilation process taking place during L1 acquisition. Table 1 shows a summary of empirical evidence of phonetic category formation.
4. Developmental model of phonetic category formation in bilingual children
Based on the findings of previous work on phonetic category formation, I propose a model called “development model of phonetic category formation” in both simultaneous and sequential bilingual children. In this model, I argue that phonetic category formation continues to evolve during the developmental process rather than emerge all at once in both types of bilingual children. Figures 1 or 2 shows a schematic representation of phonetic category development in terms of stop and vowel categories for either simultaneous or sequential bilingual children. The direction of arrows shows the language transfer effect. As can be seen in Figure 1, in simultaneous bilingual children at 3–4-years of age, phonetic categories for L1 and L2 stops are not distinguished at all regardless of language types. Thus, two circles representing L1 and L2 overlap each other. The size of the circle denotes the development of a stop system in each language. Whether the stop system of each language is fully developed or not depends on the sound system of each language. For example, 3-year-old simultaneous Korean-English bilingual children were able to produce both English and Korean stop phonemes distinctively within a language  whereas Spanish-English , Japanese-English  or Spanish-German  bilingual children were not able to produce Spanish or Japanese voiced stops which fall in voicing lead category. This finding was similar to previous research reporting that monolingual children have difficulty to acquire voicing lead stops . The language influence effect also varied depending on languages. While Korean-English bilingual children did not show much interaction effects, bidirectional interaction  or unidirectional an influence of L2 on L1 . At 5–6-years of age, phonetic categories for stops across language remains constant. Lee and Iverson  reported that Korean-English bilingual children did not distinguish English and Korean stops across languages; neither Khattab  nor Watson  reported that bilingual children distinguished stop categories at five years of age. These children still failed to produce voiced stops with lead voicing if any language has voicing lead stops. An L2 influence on L1 still exists at this age . Phonetic category for stops, however, is fully established at age 10 or older in simultaneous bilingual children. It was also noted that interaction effects between L1 and L2 take place at these ages. The interaction direction may be unidirectional in that L2 influences L1 [19, 24, 26] or bidirectional . It is not certain why Whitworth found a bidirectional influence with these children. Further studies need to verify this aspect.
While phonetic category for stops is not fully established until 10 years of age, that of vowels seems developed earlier than stops. At 3–4-years of age, simultaneous bilingual children produced vowels of both languages distinctively with limited interaction effects. The fully separated vowel systems remain the same at 5- and 10-years of age; however, the L2 language started to have an influence on the L1 vowel system at 5-years of age. The developmental model of vowel category formation heavily relies on Korean-English bilingual children [2, 38]. Since limited evidence is available on vowel production produced by simultaneous bilingual children, further studies are warranted to verify this observation.
It seems that phonetic category formation in sequential bilingual children develops similarly with simultaneous bilingual children; but some differences are also observed. At 3–4-years of age, sequential bilingual children did not manifest fully distinctive phonetic category for stops, similar to simultaneous bilingual children. While no transfer effect was observed in simultaneous bilingual children, a language transfer effect appears in that there was a strong effect of L1 on L2 language in sequential bilingual children. Similar to simultaneous bilingual children, sequential bilingual children did not manifest distinctive phonetic categories for stops at 5–6-years of age. Although voiceless stops were distinguished from each other, voiced stops across languages remains undistinguished by this age. Similar to 3–4-years of age, a unidirectional L1 influence on L2 exists during this age [40, 44]. Phonetic category formation; however, is fully acquired at 10 years of age or older in sequential bilingual children. There was also L1 influence on L2 during this age . It is interesting to observe that L1 influence on L2 on stops in sequential bilingual children because L2 typically influences L1 in simultaneous bilingual children. It is not certain why this happens. It may be due to the fact that sequential bilingual children fully develop a stop system of their L1; thus, it may affect stops of L2, which is not fully developed yet.
Vowel category formation in sequential bilingual children also showed a similar pattern as compared to simultaneous bilingual children. At 3–4-years of age, a sequential bilingual child showed separation of two vowel systems after short exposure duration to L2, suggesting that this child tends to distinguish two systems although there was an influence of L2 on L1 . However, this finding was based on a single bilingual child without direct comparisons between the two languages. Further studies are warranted to confirm their findings. At 5–6-years of age, sequential bilingual children continue to manifest two systems. L1 production is also influenced by L2 at this age . The distinctive vowel categories remain in separation at 10 years of age. Unlike younger aged sequential bilingual children, 10-year-old sequential bilingual children showed either bidirectional influence for children with longer exposure duration or L1 influence on L2 for shorter exposure duration . L1 influence on L2’ vowels were not observed in research with bilingual children, but the effect is commonly found in adult L2 learners. These differences may suggest that phonetic category formation and the effect of interaction between L1 and L2 may be different between child and adult bilingual speakers. In short, phonetic category formation in bilingual children is established progressively using multi-dimensional representations for each sound category, and continues to evolve in the developmental process. Interaction between L1 and L2 varied depending on types of bilingualism.
5. Limitations and directions for future research
The developmental model proposed in this chapter is based on current empirical evidence. Some research studies are a single case study without employing rigorous statistical analysis. Thus, this model should continue to develop based on more empirical findings in the future. Future studies should consider following aspects when phonetic category formation is examined in bilingual children. First, more group studies are expected in the future. Among 20 studies examining phonetic category formation in bilingual children, only half of the studies employed group comparisons. In order to lead to a more solid theoretical model of phonetic category formation, findings should be based on group studies. Second, when studies examine phonetic category formation in bilingual children, it is necessary to employ monolingual control groups of each language. Without understanding the phonetic development of monolingual children, it is not certain whether such a pattern shown in bilingual children is a natural developmental consequence or a bilingual effect. For example, several studies reported that bilingual children whose stops are produced with voicing lead often produced voiced stops with short lag VOT instead of voicing lead. It is not certain whether such production is attributed to the fact that these children acquire two languages or one language influences the other. Third, compared to studies examining stops, vowel studies are relatively limited. Only vowels produced by Korean-English and Chinese-English are currently available. In addition, no fricative or other consonantal study has been conducted. Thus, future studies are warranted to examine vowels and other consonants in simultaneous or sequential bilingual children. Fourth, although recent studies examined more diverse bilingual languages, still limited bilingual languages have been studied. Some bilingual languages are similar in that stops are categorized as either voiced or voiceless. Only Koreans, whose stop systems are different from other languages, were examined. Future studies may examine more simple or complex stop or vowel systems in order to fully understand how bilingual children manifest distinctive phonetic categories when they are in different language systems.