In tonal languages (e.g., Chinese and Thai), the meaning of a word cannot be defined solely by consonants and vowels without a lexical tone, which varies in pitch patterns. The pitch patterns associated with Mandarin lexical tones are used to distinguish lexical meaning. Mandarin Chinese has four lexical tones and tones 1-4 can be described phonetically as high level, high rising, low rising, and high falling pitch patterns respectively. The syllable /ma/ in Chinese, for example, can stand for “mother” [tone1], “hemp” [tone2], “horse” [tone3], or “scold” [tone4]. Previous studies have shown that the native speakers of tone languages are highly sensitive to changes in lexical tones regardless of whether the subjects focus their attention on the stimuli or not (Tsang et al., 2011; Ren et al., 2009). The pitch patterns associated with Mandarin intonation, however, may serve a variety of linguistic functions such as attitudinal meanings, discoursal meanings, or grammatical meanings (Cruttenden, 1997; Pell, 2006). A cross-linguistic (Chinese and English) study showed that whereas pitch contours associated with intonation are processed predominantly in the right hemisphere whereas the pitch contours associated with tones are processed in the left hemisphere by Chinese listeners only (Gandour et al., 2003).
The neurophysiological study of the processing of tone and intonation can provide valuable insight into the nature of pitch patterns perception. Other than non-tone languages such as English, lexical tone and intonation in Chinese are both signaled primarily by changes in fundamental frequency (
The issue of whether or not attention is needed during speech perception has provoked a large amount of researches such as the influence of attention in audiovisual speech perception (Astheimer & Sanders, 2009; Navarra et al., 2010) and role of selective attention in speech perception (Astheimer & Sanders, 2009; 2012). There exists evidences either supports for the view that the audiovisual integration of speech is an attention modulated process, or for the view that audiovisual integration of speech is an automatic process (Navarra et al., 2010; Astheimer & Sanders, 2009; Jones & Munhall, 1997). Concerning the role of attention in speech comprehension, Andersen et al. (2009) demonstrated that temporally selective attention may serve a function that allows preferential processing of highly relevant acoustic information such as word-initial segments during normal speech perception. In subsequently study, Andersen et al. (2012) examined the use of temporally selective attention in 3-to 5-year –old children and found that, like adults, preschool aged children modulate temporally selective attention to preferentially process the initial portions of words in continuous speech. By directly comparing the effects of attention on different speech stimuli, Hugdahl et al. (2003) revealed that attention to speech sounds may act to recruit stronger neuronal activation compared to when the same stimulus is processed in the absence of attention. Although previous results showed that cognitive processing of many aspects of language such as semantic, syntactic, and pitch information take place indexed by MMN regardless of whether subjects focus their attention on linguistic stimuli, the size of the MMN can be modulated by the level of attention (Pulvermüller & Shtyrov, 2003; 2006). The MMN is larger when subjects attend to the stimuli, as compared with that of subjects are involved in a distraction task.
Tone languages are advantageous for examining the nature of pitch patterns processing. In recent years, the functional asymmetry of two human cerebral hemispheres in the processing prosody information has received a considerable attention. The left hemisphere has been thought to be dominant for language-related behaviours (Gandour et al., 2002; Klein, Zatorre, Milner, & Zhao, 2001) and the right hemisphere to be dominant for pitch-related behaviours (Warrier & Zatorre, 2004; Zatorre & Belin, 2001). However, what cues are used by the brain to determine the labor division is still a matter of debate. The functional hypothesis (Pell & Baum, 1997; Wong, 2002) states that the psychological functions of sounds determine which neural mechanisms are engaged during speech processing. Those sounds that carry a greater linguistic load (e.g., lexical tone) are preferentially processed in the left hemisphere, while those that carry a less linguistic load (e.g., intonation) are preferentially processed in the right hemisphere. However, the acoustic hypothesis (Zatorre & Belin, 2001; Zatorre, Belin, & Penhune, 2002) states that all pitch patterns are lateralized to the right hemisphere regardless of psychological functions.
More recently, the dynamic models such as two-stage model and a more comprehensive model have been put forward, which integrate the acoustic hypothesis and functional hypothesis. The two-stage model (Luo et al., 2006) states that speech is initially processed as a general acoustic signal with lateralized to the right hemisphere at a pre-attentive stage, and then mapped into a semantic representation with lateralized to the left hemisphere at an attentive stage. This point of view is compatible with the notion put forward by Zatorre et al (2002) that the left hemisphere lateralization effect in linguistic functions may arise from a slight initial advantage in decoding speech sounds. According to the more comprehensive model proposed by Gandour and his colleagues (Gandour et al., 2004; Tong et al., 2005), speech prosody perception is mediated primarily by the right hemisphere for complex sound analysis while left hemisphere is dominant when language processing is required. What is more, both the left and right hemispheres were found to contribute to pitch patterns perception (Pell, 2006; Xi, Zhang, Shu, Zhang, & Li, 2010). The prosodic speech information can be processed on either hemisphere depending on whether the speech information is emotional or the linguistic prosodic cues (Pell, 2006). The acoustic and linguistic information is processed in parallel at an early stage of speech perception (Xi et al., 2010).
The left hemisphere lateralization in the perception of lexical tones is supported by evidence from a number of studies including dichotic-listening (Wang, Jongman, & Sereno, 2001) and functional imaging studies (Gandour et al., 2002; Klein, Zatorre, Milner, & Zhao, 2001). For example, when Thai and Chinese subjects were required to perform discrimination judgments of Thai tone, only Thai subjects displayed an increased activation in the left inferior prefrontal cortex (Gandour et al., 2002). Similar hemispheric dominance was obtained in Chinese speakers when Chinese and English speakers were required to discriminate the pitch patterns in Chinese words (Klein, Zatorre, Milner, & Zhao, 2001). Nevertheless, those studies mentioned above likely reveal the temporally aggregated brain activity of auditory processing due to the coarse temporal resolution of fMRI or PET.
The specific aims of this study are to further investigate the neural mechanisms underlying the perception of linguistic pitch patterns by comparing the early pre-attentive processing of Mandarin tone and intonation, and examine whether the pitch changes of the intonation associated with Mandarin tone 2 can be detected by native speaker of Chinese at the early pre-attentive stage. Here a method of combining event-related potentials and a source estimation technique Low-resolution electromagnetic tomography (LORETA) was used. The ERP component of interest is the mismatch negativity (MMN), which peaks at about 100-250 ms after stimulus onset and is present by any discriminable changes in auditory processing irrespective of subjects’ attention or task (Näätänen & Escera, 2000; Näätänen, Paavilainen, Rinne, & Alho, 2007; Näätänen, Tervaniemi, Sussman, Paavilainen, & Winkler, 2001). A new MMN paradigm was applied in this study, which allows one to obtain different MMNs in a short time (Näätänen, Pakarinen, Rinne, & Takegata, 2004). The sources of the MMNs were estimated by LORETA, an approach that has been successfully used in the studies on auditory processing to locate the sources of the neural activities (Liu & Perfetti, 2003; Marco-Pallarés, Grau, & Ruffini, 2005).
2. Materials and methods
Thirteen graduate students (age rang 21-25; six male, seven female) participated in this study as paid volunteers. All subjects were native speakers of Mandarin Chinese and right-handed, with no history of neurological or psychiatric impairment. Informed consent was obtained from all subjects.
Stimuli consisted of two meaningful auditory Chinese words that have the same consonant and vowel ( /lai/) but different lexical tone, pronounced in high rising tone (tone 2) and high falling tone (tone 4) respectively. The syllable /lai4/ was pronounced in a declarative intonation, and the syllable /lai2/ was pronounced in a declarative intonation or an interrogative intonation respectively. The standard stimulus was the syllable /lai2/ pronounced in a declarative intonation. Deviant stimuli differed from the standard in either intonation (/lai2/, intonation deviant) or lexical tone (/lai4/, lexical tone deviant).
A new passive auditory odd-ball paradigm (Näätänen, Pakarinen, Rinne, & Takegata, 2004) was applied to present the stimuli. In order to control the effect of physical stimulus features to obtain the relatively pure contribution of the memory network indexed by MMN (Pulvermüller & Shtyrov, 2006; Pulvermüller, Shtyrov, Ilmoniemi, & Marslen-Wilson, 2006), we created three sequences including one oddball sequence and two control sequences to calculate the identity MMN. The oddball sequence preceded by 15 standard was a pseudorandom block of 1015 stimuli which included standard (P = 0.8) and two deviants (P = 0.1 for each). The two control sequences for each deviant comprised 400 trials respectively and each deviant stimulus was presented alone (P = 1). The subjects were instructed to ignore the sounds from the headphone and watch a silent movie during the course of experiment. The order of the presentation of the three sequences was randomized across the subjects.
The auditory stimuli were pronounced in isolation by a trained female speaker and digitized at a sampling rate of 22, 050 Hz. The stimuli were modified with Praat software (doing phonetics by computer version 4.4.13, download from www.praat.org) and normalized to 450 ms in duration, including 5 ms rise and fall times. The stimuli were presented binaurally at an intensity of 70 dB through headphones in a soundproof room with a stimulus onset asynchrony of 700 ms. The maximum fundamental frequency between the two deviants was comparable. Fig. 1 shows the acoustic features of the experimental stimuli.
The EEG was recorded using the 64 electrodes secured in an elastic cap (Neuroscan Inc.) with a sampling rate of 500 Hz, and a band-pass from 0.05 to 40 Hz. The bilateral mastoids serve as the reference and the GND electrode on the cap serve as the ground. The vertical and horizontal electrooculograms were monitored by electrodes placed at the outer canthus of each eye and the electrodes above and below the left eye respectively. All impedances were kept below 5 kΩ.
The raw EEG data were first corrected for eye-blink artifacts and filtered with a band-pass filter 0.1-30 Hz. Trials with artifacts exceeding ±75µV in any channel were excluded from the averaging. Epochs were 600 ms including a 100 ms pre-stimulus baseline. ERPs elicited by deviants in the oddball sequence and the identical stimulus in the control sequences were averaged separately across subjects and electrodes.
Although the MMN is generally obtained by subtracting the responses to standard from that to deviant stimulus, it is possible that the physical differences between standard and deviant stimuli influence the responses. In order to control the physical stimulus properties in a more stringent manner (Pulvermüller & Shtyrov, 2006; Pulvermüller, Shtyrov, Ilmoniemi, & Marslen-Wilson, 2006), we calculated the MMN (identity MMN) by subtracting from the ERP to a deviant stimulus presented in the oddball sequence, the ERP to the identical stimulus in the control sequence.
The grand ERP wave shapes were first analyzed by visual inspection and the time window of MMN was defined 110-240 ms. The MMN amplitudes were measured as mean voltages using a 40 ms time-window centered at the peak latency from the electrode Cz, since the largest response was observed at Cz in the grand average waveform. Two different analyses of variance (AVOVA) were done on the mean amplitudes. A original ANOVA for the original mean amplitudes was performed to estimate the two MMNs (one for the lexical tone and the other for the intonation) with condition (lexical tone, intonation), type (deviant, the identical stimulus presented alone), and electrode (F3, F4, Fz, C3, C4, Cz, P3, P4, Pz) as independent factors. To compared the MMN elicited by lexical tone condition with the MMN elicited by the intonation condition, a difference ANOVA was conducted for the difference waveforms with condition (lexical tone, intonation), lobe (frontal, central, parietal), and hemisphere (left, right) as within subject factors. The Greenhouse-Geisser adjustment was applied when the variance sphericity assumption was not satisfied.
Low resolution electromagnetic tomography (LORETA) was used to estimate the sources of the MMN elicited in the experiment. LORETA is a tomographic technique that can help find the best possible solution of all possible solutions consistent with the scalp distribution (Pascual-Marqui, Michel, & Lehmann, 1994). The LORETA-KEY (http://www.unizh.ch/keyinst/NewLORETA/LORETA01.htm) (Pascual-Marqui, Esslen, Kochi, & Lehmann, 2002) was used and the results are illustrated in Talairach space (Talairach & Tournoux, 1988). We computed the LORETA solutions on each time point covered the MMN. The input for LORETA was the grand averaged ERP, sampled over the MMN window. The outputs were 3D maps of activity value for each of 2,394 cortex pixels, based on the scalp distribution of each time point, with a subtraction of the averaged scalp distribution during the 100 ms prior to stimulus onset which corresponding to the baseline. Those pixels among the top 5% in activation value of each 3D map were treated as “active” pixels to allow focusing on a reduced set of highly activated brain regions (Liu & Perfetti, 2003; Ren, Liu, & Han, 2009 a; Ren, Yang, & Li, 2009 b).
Fig. 2 shows the grand average waveforms to the deviant stimuli in the oddball sequence (P=0.1) and to the identical stimuli in the control sequences (P=1). The deviant-minus-control difference waveforms are shown in Fig. 3.
The original three-way [condition×type×electrode] ANOVA revealed a significant main effect of condition (
A difference three-way [condition×lobe×hemisphere] ANOVA revealed a main effect of condition (
Given the latency and topography of the difference negative deflection (see fig.1 and fig. 2), we classified it as MMN (Rinne et al., 2006). Since no MMN was elicited for the intonation, only the source of the MMN for the lexical tone was analyzed. The local maximum of the MMN was located in the right middle temporal gyrus (BA 21, Talairach coordinates of the maximum: x = 53; y = 3; z = -13).
The present study examined the early cortical processing of linguistic pitch patterns by comparing the ERP responses to Mandarin tone and intonation. The results demonstrated that MMN was elicited only by the lexical tone contrast and no MMN was obtained to the intonation contrast which associated with a Mandarin tone 2. Source estimation of the MMN showed that the highest activation of brain areas underlying lexical tone processing was located into the right hemisphere, the right middle temporal gyrus (BA 21).
A clear MMN was observed for the lexical tone contrast and the highest activation of the MMN was located into the right temporal gyrus. The result of right hemisphere dominance for lexical tone in the early pre-attentive processing is converging with previous studies (Luo et al., 2006; Ren et al., 2009 b). Ren et al. (2009 b) demonstrated that both the sources of the MMNs to Mandarin lexical tone and its hummed version were located in the right hemisphere in the early pre-attentive processing. By comparing the early pre-attentive processing of Mandarin tones and consonants, Luo et al (2006) found that Mandarin tones evoked a stronger pre-attentive response in the right hemisphere than in the left hemisphere. Those results above presumably reflect the role of right hemisphere in acoustic processing and compatible with the acoustic hypothesis (Zatorre & Belin, 2001) and the dynamic models (Zatorre et al., 2002; Gandour et al., 2004; Tong et al., 2005 ; Luo et al., 2006), but cannot be explained by the functional hypothesis which predicts lexical tones are preferentially processed in the left hemisphere (Pell & Baum, 1997; Wong, 2002). However, the functional hypothesis was supported by the data from fMRI or PET studies (Hsieh et al., 2001; Klein et al., 2001; Gandour et al., 2002; 2003) which revealed the left hemisphere dominance of native speakers in the perception of lexical tones and suggest that hemispheric lateralization is sensitive to linguistic functions of pitch patterns and language experience. Taken together, these findings seem to reflect the dynamic interaction between the two hemispheres and are compatible with the dynamic models of speech perception (Gandour et al., 2004; Luo et al., 2006; Zatorre, Belin, & Penhune, 2002). Just as proposed by Gandour et al (2004) that both acoustics and linguistics are all necessary ingredients for developing a neurobiological model of speech prosody. High-level linguistic processing might initially have developed from low-level acoustic processing (Zatorre et al., 2002; Luo et al., 2006).
LORETA analysis for the MMN to lexical tone was located into the right middle temporal gyrus (BA 21), one major source of the MMN (Luo et al., 2006; Näätänen et al., 2001). Besides the temporal lobe, there are other sources contribute to the MMN generators, such as frontal lobe (Molholm, Martinez, Ritter, Javitt, & Foxe, 2005) and parietal lobe (Levänen, Ahonen, Hari, McEvoy, & Sams, 1996; Marco-Pallarés, Grau, & Ruffini, 2005). It appears that the change of MMN generators is associated with the time points of the current sources and is feature dependent (Levänen et al., 1996; Molholm et al., 2005). In this study, the cortical locus reported as the MMN generator was the highest level of activation region covering the time window of the MMN. It can be seen that although the solution of LORETA produces a “blurred-localized” image of a point source, it conserves the location of maximal activity and allows at least the discussion of asymmetric hemispheric involvement in pitch perception (Liu & Perfetti, 2003; Mulert et al., 2007; Ren et al., 2009a; 2009b).
The result that no MMN was elicited by the intonation contrast (declarative vs. interrogative) demonstrated the perceptual difficulties when the intonation is combined with the Mandarin tone 2. The MMN, an index of change-detection of brain response to any change in auditory stimuli, can enable one to determine discrimination accuracy which usually with a good correspondence with behavioural discrimination (Näätänen et al., 2007; 2012). It suggested in the study that the listeners cannot tease part the two types of intonation at the pre-attentive processing stage. For the four Mandarin tones, the average fundamental frequency contours produced in isolation reflect directly the canonical forms of the tones (Xu, 1997). The F0 contour of tone 2 in isolation is rising in its’ phonological representation (Xu, 2005; Yuan, 2004) and resembles that of interrogative intonation. When tone 2 is at the end of sentences, it is more difficult for native speakers of Chinese to identify the interrogative intonation (Ren et al., 2011; Yuan, 2004). Three mechanisms were proposed by Yuan (2004) to explain the perception of interrogative intonation, such as the phrase curve mechanism, the strength mechanism, and the tone-dependent mechanism. Among the mechanisms, the strength mechanism may conflict with the tone-dependent mechanism on the Mandarin tone 2. This conflict likely leads to the perceptual difficulties of interrogative intonation for tone 2.
The perceptual difficulties of intonation contrast showed in the present study also suggested the interaction between lexical tone and intonation. Gandour et al. (1997) demonstrated the interaction between lexical tone and intonation in Thai by analyzing intonational characteristics of the Thai sentences which produced by normal and brain-damaged speakers at a conversational speaking rate. Most models on Mandarin intonation are in terms of contour interaction (Chao, 1968; Shen, 1992; Yuan, 2004). For example, Chao (1968) likened syllabic tone and sentence intonation to small ripples riding on large waves in the ocean and stated that they interact by addition. Based on perceptual and acoustic studies, Yuan (2004) proposed a tone-dependent mechanism for intonation perception, which flattens the falling slope of the falling tone (such as Mandarin tone 4) and steepens the rising slope of the rising tone (such as Mandarin tone 2). It can be reasoned that the contrast of declarative and interrogative intonation might be detected more difficult or easier for certain tone. In a prior experiment, we found a clear MMN to intonation contrast for Mandarin tone 4 (Ren et al., 2009 b).
In summary, the present study demonstrated the right hemispheric dominance of lexical tone in the early pre-attentive processing, which is compatible with the acoustic hypothesis (Zatorre & Belin, 2001) and the dynamic models (Gandour et al., 2004; Tong et al., 2005 ; Luo et al., 2006), but cannot be counted by the functional hypothesis (Pell & Baum, 1997, Wong, 2002). Moreover, the current results provide clearly evidence that listeners can not tease apart the declarative and interrogative intonation when the target was Mandarin tone 2 at the early stage of pre-attentive processing. However, how tone and intonation interact and how intonation is perceived remain to be determined, and we will focus on them in the further experiments.
This research was supported by Grants from the National Natural Science Foundation of China (31100732 and 31271091) and Specialized Research Fund for the Doctoral Program of Higher Education (20112136120003).