Open access peer-reviewed chapter

Experimental Approaches to Socio‐Linguistics: Usage and Interpretation of Non‐Verbal and Verbal Expressions in Cross‐ Cultural Communication

Written By

Xiaoming Jiang

Submitted: September 27th, 2016 Reviewed: May 24th, 2017 Published: July 5th, 2017

DOI: 10.5772/intechopen.69879

Chapter metrics overview

1,425 Chapter Downloads

View Full Metrics


Social context shapes our behavior in interpersonal communication. In this chapter, I will address how experimental psychology contributes to the study of socio-linguistic processes, focusing on nonverbal and verbal processing in a cross-cultural or cross-linguistic communicative setting. A systematic review of the most up-to-date empirical studies will show: 1) the culturally-universal and culturally-specific encoding of emotion in speech. The acoustic cues that are commonly involved in discriminating basic emotions in vocal expressions across languages and the cross-linguistic variations in such encoding will be demonstrated; 2) the modulation of in-group and out-group status (e.g. inferred from speaker’s dialect, familiarity towards a language) on the encoding and decoding of speaker’s meaning; 3) the impact of cultural orientation and cultural learning on the interpretation of social and affective meaning, focusing on how immigration process shapes one’s language use and comprehension. I will highlight the significance of combining the research paradigms from experimental psychology with cognitive (neuro)science methodologies such as electrophysiological recording and functional magnetic resonance imaging, to address the relevant questions in cross-cultural communicative settings. The chapter is concluded by a future direction to study the socio-cultural bases of language and linguistic underpinnings of cultural behaviour.


  • brain
  • cross‐cultural communication
  • culture
  • dialect
  • in‐group advantage
  • non‐verbal behaviour
  • prosody
  • social norm

1. Introduction

Human language is deeply grounded in social interactions, and the use and understanding of various linguistic forms are socially constrained. One of the most intriguing questions that experimental socio‐linguists ask is how different aspects of society, which include cultural norms, expectations and contexts, constrain the way language is used and how the language use exerts its impact on society. Among these societal factors, culture defines the shared commonalities within groups of individuals [1, 2] which refer to the social conventions and behavioural habits that are learned through interaction with a collectivity of individuals [3]. The relationship between culture and non‐verbal behaviour is stated by Ref. [4] that variations between cultures in the perceived significance of social interactions lie in the norms for decoding and displaying expressions (such as the Cultural Display Rules), casting a critical role of culture in the language use. These display rules define appropriate or inappropriate expressions of speaker meanings [5] and dictate when, how and to whom individuals should express their meanings [6]. To maintain interpersonal harmony and avoid breakdown in interpersonal communications, individuals within a certain cultural group actively and automatically use this norm to decode linguistic (verbal) and paralinguistic (non‐verbal) expressions [7, 8]. While the history of socio‐linguistic studies heavily relies on descriptive and qualitative analysis (e.g. conversational analysis), a growing recent trend that incorporates multi‐disciplinary and multi‐methodological approaches has arisen to quantify socio‐communicative phenomena [911]. This chapter addresses one of these research lines, discussing non‐verbal and verbal expressions from the perspectives of cross‐linguistic/cross‐cultural variations, inter‐group communicative mechanisms and individual differences.


2. Cross‐linguistic and cross‐cultural differences in language use and interpretation

The analysis of pitch, intensity, rhythm and speed of the acoustic signal (acoustic analysis) has been performed to characterize the vocal correlates of non‐verbal expressions in the prosody and the impact of language or culture on the encoding of vocal expressions. Pell et al. compared differential acoustic measures in seven vocal expressions (surprise, happiness, anger, sadness, fear, disgust and neutrality) with four different languages, including Canadian‐English, German, Hindi and Arabic [12]. ‘Pseudo‐utterances’1 were produced by professional actors who were given script scenarios to elicit target expressions and were then rated by independent groups of native speakers of these languages for the accuracy of these expressions. The acoustic analysis based on the perceptually valid expressions highlighted the mean fundamental frequency (how high the pitch is) and speech rate (how fast the speaker sounds) dissociate different types of expressions, with emotional expressions sounding lower and less dynamic in pitch and faster in speaking rate, across languages. The mean level and variation of the expression are crucial in predicting the specific level of expressions. These findings demonstrate culturally universal mechanisms underlying expression encoding.

To show the constraint of culture on the decoding of non‐verbal expressions, Liu et al. [13] measured the brain responses with electroencephalograms when native Chinese and Canadian‐English speakers judged the expression from pairs of a pseudo‐utterance and a face. Culture constrains the ways of encoding non‐verbal expression in the voice and the face, and may consequently impact how our brain decodes certain expressions from multiple modalities [14]. The participants were asked to either focus on the voice or the face, simulating an interpersonal situation when one’s attention is deployed to different modalities of communication. The expression in the voice (fear and sadness) was either congruent or incongruent with that of the face, forming a variant of ‘Stroop’2 paradigm in experimental psychology which is suitable to observe the cultural difference [15, 16]. Cultural differences were found in both the response accuracy and the brain response towards the expression pairs. The incongruent pairs led to a reduced accuracy and an enhanced N400 (a negative brain response after the onset of the pair of stimuli) relative to the congruent ones, with the changes larger when the listener’s focus was directed to the voice than to the face for English speakers and not different between instructions for Chinese speakers. These findings highlight a stronger interference from the face in the English speaker even when attention was directed away from that channel, consistent with earlier findings showing a heightened sensitivity towards vocal cues in the eastern culture and towards visual cues in the western culture [17].


3. Language use in and out of culture

The universality of decoding non‐verbal expression across cultures and languages was demonstrated by a communicator’s intact ability to recognize expressions from the tone of voice in his or her foreign language that was commonly used in a disparate culture [18]. In this study, expression‐laden utterances (e.g. I ate a fish) were produced by native American‐English speakers and were presented to both the English speakers and Shuar‐Spanish bilinguals from Amazonian Ecuador. The Shuar speaker was able to recognize the target expression from American‐English utterances at above the chance level, although their recognition accuracy was slightly inferior to the native speaker [18]. These findings suggest the capacity of individuals to communicate across distant cultural boundaries.

A similar finding was shown by Thompson and Balkwill [19] who compared the recognition accuracy of English speakers when they decoded sentence expressions (of anger, joy, sadness and fear) in English, Chinese, German, Japanese and Tagalog. Listeners reached at least 30% of the accuracy rate to judge the expression which was above the chance level of 25%. The average accuracy was highest in judging one’s native language, lower in German and Tagalog, and was lowest in judging Chinese and Japanese, suggesting impedance of cultural variance that reduces one’s performance in judging the expression types. More importantly, acoustic features (mean and standard deviation of f0, and mean and standard deviation of intensity) were associated with different expression types (e.g. the higher means of f0 indicates joy and anger while lower means of f0 indicates sadness and fear) in a systematic manner across languages and cultures, demonstrating a culture‐universal mechanism in encoding non‐verbal expressions.

Another study specifically investigated the role of speech rate and silent pause in the evaluation of expression in speech samples between cultures [20]. The effect on one’s ability to decode non‐verbal expressions was studied in a more primitive form of non‐verbal expression, that is, vocalizations (or affective bursts) from one’s own and unfamiliar cultures. One‐minute monologues of emotionally neutral content were acted out by native Hungarian speakers with vocalizations of emotion (e.g. laughter), and the vocal samples were subsequently shortened (by 21 or 50%) or lengthened (by 18 of 50% of the original length of the utterance) acoustically to create voices with faster or slower speech rates. These samples were rated by both Hungarian (in‐group) and German (out‐group) speakers on the emotionality. German raters considered Hungarian speakers to be sadder, angrier, less positive and more scared than the native speakers of Hungarian, suggesting that the linguistic familiarity and/or the acoustic parameters affect the decoding of expressions from the voice. Regardless of the cultural groups, lower speech rate and longer pauses led to increased ratings of sadness and scare, and reduced ratings of happiness and positiveness. A culturally universal role maybe involved in explaining the consistent role of the durational parameter in the expression perception.

Culture‐specific mechanisms of decoding non‐verbal expressions were shown as imbalanced recognition accuracy in judgement between different cultures (e.g. superior in one’s own and inferior in others’ culture). Normally, one shows a benefit of recognizing speaker meaning from those who shared the cultural identity or group membership. Koeda et al. [21] examined the perception of non‐verbal vocalization expressions in one’s native and non‐native culture. Native Japanese and Canadian speakers were asked to judge the intensity (level of strength of the expression), valence (pleasantness of the expression) and arousal (level of experience) of the vocalizations produced by Canadian‐French speakers. Compared with the Canadian speakers, Japanese listeners produced reduced intensity and lowered negativity ratings in the anger, disgust and fear expressions and reduced positivity ratings in the pleasantness. These findings suggest the existence of ‘cultural display’ in an eastern culture which impedes the perception of the prosodic pattern of expressions in the western culture in both positive and negative expressions.

The impact of in‐group versus out‐group perception is also evaluated when listeners judge vocalizations in their own and other’s culture. Native British‐English and Himba speakers (from northern Namibian, speaking in a minority tribal language) judged five basic emotion expressions (anger, disgust, fear, sadness and surprise) and four positive higher order social emotive expressions (achievement, amusement, sensual pleasure and relief) from their own language and the language of the other [22]. Across all basic emotions, both groups recognized the expressions from both their own culture and the culture of the other group. The vocalizations of achievement and sensual pleasure can only be recognized in the native language. These positive affective vocalizations facilitate social cohesion within groups, and therefore the communication of the positive vocalization maybe restricted to in‐group members with whom social connections are established [22]. However, amusement was recognized regardless of the cultural group, suggesting that the ‘laugh‐like’ vocalizations may be universally associated with the feeling of enjoyment of physical play.

Some affective features of the non‐verbal expressions impede the outcome of perceiving out‐group emotions. Laukka et al. [23] compared the recognition rates of vocalizations of eight different expressions spoken by professional actors of India, Kenya, Singapore and USA with Swedish listeners. This comparison did not involve an in‐group perception condition (Swedish listeners judge the vocalization in Swedish), therefore eliminating the potential confounding effects of in‐group versus out‐group perception (all language being judged were ‘out‐group’ relative to the listener). Regardless of cultural type, a wide range of positive (happiness, interest, lust, relief, serenity and positive surprise) and negative expression (anger, contempt, disgust, fear and sadness) could be recognized with relatively high accuracy rate. However, expression types related with self‐consciousness revealed relatively low recognition rate across culture (shame, guilt and pride). These findings suggest that the self‐related concept may interact with the cultural effects on the perception of non‐verbal expressions conveyed in non‐native languages.

The reduction in the recognition efficiency of expressions from ‘pseudo‐utterances’ also revealed an interference from expressions in the out‐group culture. Three relevant studies with different paradigms support the ‘Dialect Theory’ [24, 25], stating that ‘cultural dialect’ (distinct manners of speaking and communicating in distinct cultures) modulates the perception of non‐verbal cues. The culturally relevant cues in a non‐verbal dialect benefit the recognition of expressions and enhance accuracy rates while the culturally irrelevant cues in the dialect interferes with the recognition process and reduces accuracy rates.

In the first study by Pell et al. [26], Argentina Spanish listeners were asked to explicitly judge the expression from pseudo‐utterances of six basic emotions and a neutral expression spoken by their native speakers, and by speakers of other unfamiliar languages (i.e. German, Canadian‐English and Arabic). The listener achieved a mean recognition accuracy above around four times of chance level (∼50% and with each individual expression correctly judged above chance level), regardless of the language. More importantly, the recognition rate was increased when their native language was judged than the unfamiliar ones.

Testing on a different cultural sample, Paulmann and Uskul [27] invited Chinese and British‐English speaker groups to judge emotional pseudo‐utterances spoken in Chinese or British‐English, forming in‐group (Chinese speaker perceived Chinese utterances and English speakers perceived English utterances) and out‐group perception cases (Chinese speakers perceived English and English speakers perceived Chinese utterances). The culture‐universal and culture‐specific accounts of processing non‐verbal information are tested. The culture‐universal account would assume that the acoustic variation embedded in the language would be sufficient to allow listeners of in‐group and out‐group to accurately judge the expression from that language. The culture‐specific account will be supported by a clear ‘in‐group advantage’ of more accurate response in one’s native than other’s language. Seven different expressions (of basic emotions) were included. Both accounts obtained evidence that (1) both in‐group and out‐group speakers hit the target expressions much above the chance level (supporting culture‐universal account) and (2) there existed an in‐group benefit in some emotions in each of the language group even when the same set of primary acoustic variables was identified to recognize the expressions (supporting culture‐specific account).

One study examined the implicit perception of voice by using Facial‐Affect Detection Task (‘FADT’ [28]). In the ‘FADT’ paradigm, the listener was asked whether a facial expression is emotional upon encountering a face following a pseudo‐utterance, which expressed an emotion congruent or incongruent with the subsequent face. Without explicitly asking about the expression in the voice, the listener was able to differentiate the pseudo‐utterances by showing a ‘congruency’ effect, meaning that their response accuracy was lower and the response time was more reduced when the incongruent face was judged [29]. This finding suggests the implicit processing of the expression from the utterance. Based on this finding, Pell and Skorup [28] compared pseudo‐utterances produced by Canadian‐English and Arabic speakers when the native English speaker judged the face following the voice, while manipulating the delay between the voice and the face (short 600 ms vs. long 1000 ms). With enough time to activate the speaker identity (1000‐ms delay), the incongruent condition revealed more errors no matter when the speaker was perceiving an in‐group or out‐group voice as compared with the congruent condition. However, with the time of exposure to the voice more limited, the listener was less sensitive to the congruency between voice and face, making equal numbers of errors when processing the incongruent and congruent condition, suggesting that the activation of meaning of a vocal expression from a non‐native language requires more time.

These experiments included a group of speaker judging the expressions from either their native or non‐native culture with limited exposure to them. Two questions remain unanswered. Firstly, it is curious whether the relatively high accuracy of detecting speaker meaning in the non‐native language is specific to the group of listener being tested or can be generalized to other listener group from the same dialect group judging the same languages. Secondly, is in‐group advantage a dynamic process which can be learned from exposure to the target culture or is it a more stable and less changing process which is mainly determined by the specific phonological and prosodic features of the in‐group and the out‐group languages.


4. Language proficiency and cultural immersion impact language use

In order to address these questions and to examine how fast and efficient the expression can be judged, an auditory‐gating paradigm was used when listeners from two language groups (Canadian‐English and Hindi) were simultaneously tested on their native and the other language (Canadian‐English and Hindi [30]). Pseudo‐utterances were spliced into five different intervals from the utterance onset, creating short pieces of 200, 400, 500, 600 and 700 ms, and full‐length utterances. Listeners judged the expressions from four categories in native language (English judged by English speakers and Hindi judged by Hindi speakers), foreign language (Hindi judged by English speakers) and second language (English judged by Hindi speakers). The listener responded more accurately and more efficiently (achieved a reliably high accuracy with less amount of auditory information) towards the native than the non‐native language, even when they showed proficiency towards that second language (Hindi speaker judging English utterances). The higher level of proficiency in oral comprehension (speaking and listening) of English for Hindi speakers predicts higher recognition accuracy and more efficiency of judging the expression from that language, therefore reducing the in‐group advantage. These data further provided support of the ‘Dialect Theory’, such that culturally consistent cues in a non‐verbal dialect facilitate the recognition of emotional expressions from the in‐group and culturally inconsistent cues in the dialect hamper the recognition process.

To further delineate the effect of cultural learning (or the acquisition and maintenance of the cultural display rules) on the decoding process, they conducted a follow‐up study to compare the brain responses in Chinese immigrants to Canada (living in Canada for at least 6 years) with the native Canadian‐English and native Chinese speakers (living in Canada for less than 1 year) [31] with a same experimental paradigm across three groups (as in Ref. [13]). The rationale for these comparisons is that if cultural learning affects the listener’s decoding of non‐verbal expressions, the pattern of the behavioural or neural response of the immigrant group (who has been exposed to the Canadian‐English culture for at least 6 years) should show resemblance to Canadian‐English groups. The result demonstrated a limited influence of culture. The immigrant group revealed a reduction of accuracy when the face of the speaker was ignored, a pattern which was shown also in native English group. However, the enhanced N400 in the incongruent voice‐face pairs was not modulated by the attentional focus, a pattern similar to the native Chinese speaker but different from the native English speaker who exhibited a more pronounced N400 change when their attention was directed away from the face. These findings suggest a progressive adaptation of neurocognitive processes underlying the processing of non‐verbal information from multisensory modalities, with behavioural adaptation preceding the neural plasticity.

The impact of cultural learning on the decoding of expressions was also studied by directly comparing the recognition accuracy between groups of native speakers, second language speakers (L2) and foreign language speakers of target expressions [32]. Three listener groups, the Estonian native speakers, the Russian native speakers who lived in Russia and the Russian speakers who lived in Estonia, completed web‐based experiments in which spoken sentences, which conveyed different expressions (joy, anger, sadness and neutrality) in the tone of voice, were judged with online platforms outside the lab setting. Spoken sentences were intoned in prosody either congruent or incongruent with the sentence content (…we feel much more comfortable here than in Narva in joy expression). All listener groups judged the expressions above the chance level and the native and the L2 Estonian speakers reached twice chance level. Russian speakers who lived in Estonia reached a significantly lower accuracy than Estonian speakers only in joy and neutrality, and a significantly higher accuracy than Russian speakers living in Russia in anger, joy and neutrality. These findings provide a case when the decoding of expression depends on both culture and cultural learning, implying that social norms of using non‐verbal expressions are learned through social interaction and exposure to the target language.

These studies cannot differentiate the physical properties of stimuli and cultural norms implied in the cross‐cultural differences in the display and recognition of expressions. For example, the cultural display, the norms for displaying and decoding emotions which is culturally dependent, reflects the ability to control and decode the expression, and consequently affects social interactions [33]. Cultural learning may root from the exposure of physical properties in encoding expressions from the culture or the adoption of a convention of who, when and how to communicate. The differentiation of the two factors has valuable implications for cross‐cultural communication in multiple domains, including business, education and legal application.

One study with machine learning simulation aims to make the first attempt to isolate the impact of physical properties of auditory stimuli from that of cultural norms that affect the perceptual‐acoustic classification of vocal expressions [34]. In this study, acoustic features were extracted from short utterances expressing 11 typical emotions (anger, contempt, fear, happiness, interest, neutral, sexual lust, pride, relief, sadness and shame), produced by 100 professional speakers from 5 English‐speaking cultures (Australia, India, Kenya, Singapore and USA). Machine‐learning simulation models were trained to recognize the patterns of expression types that differentiate in fundamental frequency, vocal intensity, formants and voice quality and temporal characteristics. The models were further employed to classify expressions based on the patterns of the acoustic features from the same or different cultural groups. In general, the classification rates by machine‐learning models were above chance levels in cross‐cultural conditions, and were enhanced in models tested in the same culture than in the different culture (e.g. between USA and India). In‐group and out‐group classification rates did not differ between expressions produced in Australian and American‐English. These findings demonstrate that vocal expressions share characteristics across cultures and culture dialects exist in expressive vocal style. Given that the subjective bias in recognizing the expression from a different culture was unlikely to affect the classification rate, greater (or lower) recognition rates from in‐group (or out‐group) culture may result from greater exposure to and larger familiarity with the culturally specific expressive styles.


5. Future directions

One of the promising aspects of continuing this line of research is to study how cultural dimensions (e.g. individualistic vs. collectivistic) modulate the use and decoding of the non‐verbal expressions. Hofstede (1983) proposed the cultural dimension theory which explains the impact of culture on the values of its members and that how these values relate to their behaviour. Dimensions of cultural variability include Individualism‐Collectivism, Power Distance, Uncertainty Avoidance, Masculinity‐Femininity, Long‐Term‐Short‐Term Normative Orientation and Indulgence‐Restraint [35, 36]. Suggested to be the meaningful predictors of cultural variations in display rules during communications of non‐verbal expression, the scores on the individualism‐collectivism dimension are seldom experimentally tested [31]. Highly individualistic culture encourages outward displays of expression that exaggerates the strength of the feeling while within the highly collectivistic culture, the expression of emotion is controlled and evaluated against the relationship between the self and the others [37]. One study reported the recognition rate of native Persian speakers (who were living in Iran) in judging expressions of six basic emotions from their own culture [38]. Sentences with emotional lexical contents were articulated in a congruent tone of voice, rendering the availability of expression‐related cues in both verbal and non‐verbal channels. On average, above 95% accuracy was reported for this cultural group which was known for their higher scores in Hofstede’s individual‐collectivism dimension [39] (https://geert‐ Although this study did not compare groups at different ends of the cultural orientations, a between‐study comparison showed a reduced rate in judging expression by the German (∼70% [40]) and Canadian‐English groups (∼83% [12]) who are culturally more individualistic, and a reduced rate in judging Chinese groups who are culturally more collectivistic (∼73% [41]), as compared with native Persian speaker groups. These between‐study comparisons did not yield consistent results and could be confounded by many factors. These factors include (1) the number of expressions included in a forced‐choice task, (2) whether only non‐verbal or both non‐verbal and verbal cues were available for detection and (3) differential acoustic features of speakers in different groups, and other cultural variables that co‐vary with the listener and the speaker.

A related question is how social variables interact with culture in affecting the decoding of non‐verbal expressions. Biological sex of the listener has been suggested to differentiate in the ability to recognize the expression of speaker meaning, with females more sensitive to the socially relevant information in non‐verbal cues [42] and more likely to combine non‐verbal and verbal cues to form an integrated meaning [43], especially when their attention was not explicitly directed to do so. In two electro‐encephalogram (EEG) studies, positive and negative words, which were produced in either happy or sad tone of voice (creating congruous and incongruous prosody conditions), were evaluated for their emotional lexical meaning while the prosody was ignored (explicit task instruction) or judged for the congruency between the prosody and lexical meaning (implicit task instruction [4345]). German speakers judged German expressions [44, 45] and Cantonese speakers judged Cantonese expressions [46]. Regardless of culture, sex differences were present only when non‐verbal cues were ignored, with a larger late‐positive brain response in the incongruent than the congruent prosody conditions present only in female judges. These findings suggest a culturally independent mechanism of the involvement of biological sex in the decoding of non‐verbal and verbal expressions.

The mechanisms of encoding and decoding of culturally relevant non‐verbal cues in the inter‐cultural communication setting are also of great interest to the socio‐linguistic aspect of studies on multiculturalism/multilingualism [47]. Although socio‐linguistic researchers have intensely used attitudinal surveys to reveal the positive or negative stereotype (‘stigma’) that is associated with the language‐related identity (Bresnahan, Ohashi, Nebashi, Liu, Shearman, 2002), more recent research has been focused on how the neurocognitive system is adapted to process the stereotype‐implying accent information. For example, Bestelmeyer et al. [48] scanned the listener’s brain when they passively listened to the digit numbers vocally expressed by the speaker sharing the regional dialect (Southern‐English or Scottish‐English) or not (American‐English). Two groups of participants were either in native Southern‐English or in Scottish‐English accent. An adaptation paradigm was used when the vocal stimuli from the same or different dialect were repeatedly presented to the listener and the change of the neural responses was captured. The neural activity in amygdala, which was typically activated in the emotional reaction, was strengthened when the same dialect was presented, while the activity in the same region was reduced when a different dialect was presented. These findings suggest a heightened social relevance perceiving the speaker who uses the language which symbols the same geographic, ethnic and social status.

The accurate decoding of non‐verbal expressions has crucial impact on native as well as non‐native language processing. The vocal expression is used to resolve lexical ambiguity in a second language. Hanulíková and Haustein [49] reported that the German speaker who learned English as a second language was more likely to judge the English ‘sadness‐neutral’ homophone (‘banned/band’) to have a sad meaning when it was spoken in a sad tone of voice. However, both English L2 learner and native English speakers were equally capable to judge the English ‘happiness‐neutral’ homophone (‘flower/flour’) to bear a positive meaning when it was produced in a happy tone of voice. Some emotional meaning of lexical ambiguity is resolved by the tone of voice and by the native status of the listener. These findings highlight the great potentials of experimental investigations in the socio‐cultural bases of language and linguistic underpinnings of cultural behaviour.



Special thanks are to Professor Dr Marc D. Pell who leads Neuropragmatics and Emotion Lab in the School of Communication Sciences and Disorders. The author thanks the McLaughlin Scholarship and the ‘MedStar’ award from McGill University.


  1. 1. Triandis HC. Major cultural syndromes and emotion. In: Kitayama S, Markus HR, editors. Emotion and Culture: Empirical Studies of Mutual Influence. Washington, DC: American Psychological Association; 1994. pp. 285‐306
  2. 2. Matsumoto D. Culture, context, and behavior. Journal of Personality. 2007;75(6):1285‐1320
  3. 3. Matsumoto D, Grissom RJ, Dinnel DL. Do between‐culture differences really mean that people are different? A look at some measures of cultural effect size. Journal of Cross‐Cultural Psychology. 2001;32:478‐490
  4. 4. Ekman P, Friesen WV. The repertoire of nonverbal behavior: Categories, origins, usage, and coding. Semiotica. 1969;1:49‐98
  5. 5. Matsumoto D, Kasri F, Kooken K. American‐Japanese cultural differences in judgments of expression intensity and subjective experience. Cognition and Emotion. 1999;13:201‐218
  6. 6. Matsumoto D. Cultural similarities and differences in display rules. Motivation and Emotion. 1990;14(3):195‐214
  7. 7. Jiang X, Zhou X. Impoliteness electrified: ERPs reveal the real time processing of disrespectful reference in Mandarin utterance comprehension. In: Terkourafi M, editor. Interdisciplinary Perspectives on Im/politeness. Amsterdam: John Benjamins; 2015. pp. 239‐266
  8. 8. Pell MD. Reduced sensitivity to prosodic attitudes in adults with focal right hemisphere brain damage. Brain and Language. 2007;101:64‐79
  9. 9. Bögels S, Levinson S. The brain behind the responses: Insights into turn‐taking in conversation from neuroimaging. Research on Language and Social Interaction. 2016;50:71‐89
  10. 10. de Ruiter J, Albert S. An appeal for a methodological fusion of conversation analysis and experimental psychology. Research on Language and Social Interaction. 2016;50:90‐107
  11. 11. Holtgraves T, Bonnefon J‐F. Experimental approaches to linguistic (im)politeness. In: Culpeper J, Haugh M, Kádár D, editors. The Palgrave Handbook of Linguistic (Im)politeness. UK: Palgrave Macmillan; 2017. pp. 381‐401
  12. 12. Pell MD, Paulmann S, Dara C, Alasseri A, Kotz S. Factors in the recognition of vocally expressed emotions: A comparison of four languages. Journal of Phonetics. 2009;37:417‐435
  13. 13. Liu P, Rigoulot S, Pell MD. Culture modulates the brain response to human expressions of emotion: Electrophysiological evidence. Neuropsychologia. 2015;67:1‐13
  14. 14. Han S, Ma Y. A culture‐behavior‐brain loop model of human development. Trends in Cognitive Sciences. 2015;9:666‐676
  15. 15. Oyserman D, Sorensen N, Reber R, Chen S. Connecting and separating mind‐sets: Culture as situated cognition. Journal of Personality and Social Psychology. 2009;97:217‐235
  16. 16. Stroop J. Studies of interference in serial verbal reactions. Journal of Experimental Psychology. 1935;18:643‐662
  17. 17. Ishii K, Reyes JA, Kitayama S. Spontaneous attention to word content versus emotional tone: Differences among three cultures. Psychological Science. 2003;14:39‐46
  18. 18. Bryant G, Barrett C. Vocal emotion recognition across disparate cultures. Journal of Cognition and Culture. 2008;8:135‐148
  19. 19. Thompson W, Balkwill L. Decoding speech prosody in five languages. Semiotica. 2006;158:407‐424
  20. 20. Tisljár‐Szabó E, Pléh C. Ascribing emotions depending on pause length in native and foreign language speech. Speech Communication. 2014;56:35‐48
  21. 21. Koeda M, Belin P, Hama T, Masuda T, Matsuura M, Okubo Y. Cross‐cultural differences in the processing of nonverbal affective vocalizations by Japanese and Canadian listeners. Frontiers in Psychology. 2013;4, Article 105
  22. 22. Sauter D, Eisner F, Ekman P, Scott S. Cross‐cultural recognition of basic emotions through nonverbal emotional vocalizations. Proceedings of National Academy of Sciences. 2010;107:2408‐2412
  23. 23. Laukka P, Elfenbein H, Söder N, Nordström H, Althoff J, Chui W, Iraki F, Bockstuhl T, Thingujam N. Cross‐cultural decoding of positive and negative non‐linguistic emotion vocalizations. Frontiers in Psychology. 2013;4, Article 353
  24. 24. Elfenbein HA, Ambady N. On the universality and cultural specificity of emotion recognition: A meta‐analysis. Psychological Bulletin. 2002;128:203‐235
  25. 25. Elfenbein HA, Beaupré M, Lévesque M, Hess U. Toward a dialect theory: Cultural differences in the expression and recognition of posed facial expressions. 2007;7:131-146.
  26. 26. Pell M, Monetta L, Paulmann S, Kotz S. Recognizing emotions in a foreign language. Journal of Nonverbal Behavior. 2009;33:107‐120
  27. 27. Paulmann S, Uskul A. Cross‐cultural emotional prosody recognition: Evidence from Chinese and British listeners. Cognition and Emotion. 2014;28:230‐244
  28. 28. Pell MD, Skorup V. Implicit processing of emotional prosody in a foreign versus native language. Speech Communication. 2008;50:519‐530
  29. 29. Pell MD. Nonverbal emotion priming: Evidence from the ‘facial affect decision task’. Journal of Nonverbal Behavior. 2005;29:45‐73
  30. 30. Jiang X, Paulmann S, Robin J, Pell MD. More than accuracy: Nonverbal dialects modulate the time course of vocal emotion recognition across cultures. Journal of Experimental Psychology: Human Perception and Performance. 2015;41:597‐612
  31. 31. Liu P, Rigoulot S, Pell M. Cultural immersion alters emotion perception: Neurophysiological evidence from Chinese immigrants to Canada. Social Neuroscience. 2016. In Press.
  32. 32. Altrov R. Aspects of cultural communication in recognizing emotions. Trames. 2013;2:159‐174
  33. 33. Matsumoto D. Cultural influences on the perception of emotion. Journal of Cross‐Cultural Psychology. 1989;20:92‐105
  34. 34. Laukka P, Neiberg D, Elfenbein A. Evidence for cultural dialects in vocal emotion expression: Acoustic classification within and across five nations. Emotion. 2014;14:445‐449
  35. 35. Hofstede G. Dimensions of natural cultures in fifty countries and three regions. In: Deregowski JB, Dziurawiec S, Annis RC, editors. Expiscations in cross‐cultural psychology. The Netherlands: Swets & Zeitlinger; 1983. pp. 335‐355
  36. 36. Hofstede G. Culture’s Consequences: Comparing Values, Behaviors, Institutions and Organizations across Nations. 2nd ed. Thousand Oaks, CA: Sage; 2001
  37. 37. Matsumoto D, Takeuchi S, Andayani S, Kouznetsova N, Krupp D. The contribution of individualism vs. collectivism to cross-national differences in display rules. Asian Journal of Social Psychology. 1998;1:147‐165
  38. 38. Keshtiari N, Kuhlmann M. The effects of culture and gender on the recognition of emotional speech: Evidence from Persian speakers living in a collectivist society. International Journal of Society, Culture and Language. 2016;4:71‐86
  39. 39. Hofstede G, Hofstede GJ, Minkov M. Cultures and Organizations: Software of the Mind. New York, NY: McGraw‐Hill; 2010
  40. 40. Paulmann S, Pell M, Kotz S. How aging affects the recognition of emotional speech. Brain and Language. 2008;104:262‐269
  41. 41. Liu P, Pell MD. Recognizing vocal emotions in Mandarin Chinese: A validated database of Chinese vocal stimuli. Behavior Research Methods. 2012;44:1042‐1051
  42. 42. Schirmer A, Kotz SA. Beyond the right hemisphere: Brain mechanisms mediating vocal emotional processing. Trends in Cognitive Sciences. 2006;10:24‐30. DOI: 10.1016/j.tics.2005.11.009
  43. 43. Jiang X, Pell MD. Neural responses towards a speaker’s feeling of (un)knowing. Neuropsychologia. 2016;81:79‐93
  44. 44. Schirmer A, Kotz SA, Friederici AD. Sex differentiates the role of emotional prosody during word processing. Cognitive Brain Research. 2002;14:228‐233
  45. 45. Schirmer A, Kotz SA, Friederici AD. On the role of attention for the processing of emotions in speech: Sex differences revisited. Cognitive Brain Research. 2005;24:442‐452
  46. 46. Schirmer A, Lui M, Maess B, Escoffier N, Chan M, Penney T. Task and sex modulate the brain response to emotional incongruity in Asian listeners. Emotion. 2006;6:406‐417
  47. 47. Min C, Schirmer A. Perceiving verbal and vocal emotions in a second language. Cognition and Emotion. 2011;25:1376‐1392
  48. 48. Bestelmeyer P, Belin P, Ladd R. A neural marker for social bias toward in‐group accents. Cerebral Cortex. 2014;25:3953‐3961
  49. 49. Hanulíková A, Haustein J. Flour or flower? Resolution of lexical ambiguity by emotional prosody in a non‐native language. Proceedings of Speech Prosody. 2016: 469-473.


  • Pseudo‐utterances were created in such a way that the lexical items were replaced by the meaningless words while the phonological structures were preserved in the utterance.
  • The stroop effect was taken as the interference in the reaction time of a task. The naming of the colour of a word which is not congruent with the meaning of the word (e.g. ‘red’ printed in blue) takes longer time and is less accurate than naming of a congruent word.

Written By

Xiaoming Jiang

Submitted: September 27th, 2016 Reviewed: May 24th, 2017 Published: July 5th, 2017