Shared Neural Correlates for Speech and Gesture

Humans are inherently social creatures: we spend a remarkable portion of our waking hours communicating with one another. We share our thoughts, goals, and desires, tell stories about what happened at lunch and make plans for the weekend. Although messages can be written, signed, or typed, the majority of this communication occurs through spoken language and face-to-face dialogue. These interactions demand that message recipients attend not only to words and sentences, but also to numerous nonverbal cues that include body language, facial expressions, and gestures, among others.


Introduction
Humans are inherently social creatures: we spend a remarkable portion of our waking hours communicating with one another. We share our thoughts, goals, and desires, tell stories about what happened at lunch and make plans for the weekend. Although messages can be written, signed, or typed, the majority of this communication occurs through spoken language and face-to-face dialogue. These interactions demand that message recipients attend not only to words and sentences, but also to numerous nonverbal cues that include body language, facial expressions, and gestures, among others.
Hand gestures have been the focus of a substantial body of research in recent decades. While the body as a whole can be used to signify general emotional state, hand gestures tend to represent more precise semantic content. These spontaneous movements can be used independently or in conjunction with speech. For example, a "thumbs up" sign in the absence of any speech may indicate "I'm okay" after a bad fall, while wiggling index and middle fingers accompanying the statement "I went to the store earlier" may indicate the subject walked rather than drove. These and other examples suggest that gestures convey semantic and/or pragmatic information much in the same way that speech does. In light of this, some researchers have suggested that gesture, which is still relied upon by our primate ancestors for communication, may constitute the evolutionary basis of spoken language [1]. The following chapter will offer a comprehensive look at this intimate relationship between gesture and language, as well as a critique of the so-called "gestural origins theory." More specifically, we will address the following questions: (1) Are gesture and speech fundamentally linked, representing two parts of a single system that underlies human communication? (2) Did language initially emerge as a purely manual system?

Definition Example
Gesticulations Spontaneous and idiosyncratic movements of the hands and arms. Rarely occur in absence of speech/ require speech for full comprehension.

Iconic Gestures
Visually represents the co-expressive speech content.
While describing a car accident, hands form a T-shape, representing how the two collided.

Metaphoric Gestures
Represent abstract concepts or relationships.
Using the hands to form a spherical shape, representing the idea of "wholeness"

Deictic Gestures
Also known as pointing gestures. Locate objects and actions in space. Can be concrete or abstract.
Classical deictic gesture is an extended index finger.

Beat Gestures
Also known as "baton" gestures. Provide temporal highlighting to speech. Signal the speaker feels part of the message is particularly important.
Generally a rhythmic waving of the hands or arms. Thumbs up sign means "I'm okay" or "Everything is good"

Speech-framed gestures
One finger to the lips means "be quiet"

Instrumental Gestures
Meant to influence or direct the behavior of another.
Generally these gestures can also be classified as emblems.
"Come here" sign with one finger extending and then forming a hook back to the speaker.

Expressive Gestures
Express inner feeling states.
May also be classified as emblems.
Hands turned up and to the sides to indicate "I don't know"

Sign Language
Full-fledged language system with syntactic structure and a community of users.

American Sign Language, Nicaraguan
Sign Language  take an alternate view, arguing that gesture and speech are separate and independent systems, only loosely related. According to this second camp, gesture is merely used as an auxiliary support when speech processing is unusually difficult.
Evidence is accumulating in favor of the first proposal that gesture and speech are intimately connected and combine to form a single system of meaning. While they are undoubtedly used to bolster communication under adverse conditions (e.g. loud environments), gestures are used far more widely than this hypothesis would suggest. Instead, McNeill explains that gestures are able to convey ideas that cannot always be captured with conventional spoken language (e.g. information about spatial relationships). While speech is highly structured and arbitrary, gesture provides information in a more holistic and imagistic fashion [4]. Gesture and speech serve distinct, but complementary functions in this regard: a speaker's message cannot always be expressed, nor understood in its entirety without this composite signal. The movement of the hands is not just a "bonus" feature; it is fundamental to successful transmission of the message.
There are several lines of evidence that support McNeill's claim of an intimate relationship between speech and gesture: 1) gesture and speech are temporally synchronized, 2) speech and gesture co-develop in children, 3) there is a correlation between handedness and the cerebral lateralization of language, 4) people readily incorporate gestural information into the retelling of speech-only content, and 5) the use of gesture does not disappear when people are physically removed from their audience [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19]. Each of these arguments will be explored in more detail below.

Temporal synchronization of speech and gesture
When we produce gestures, we instinctively produce them so they overlap with their coexpressive speech. Consider an example cited by McNeill [2]: while describing a scene from a comic in which a character bends a tree towards the ground, the speaker grips an imaginary branch and pulls it inwards and down (from the upper gesture space to the body). The gesture stroke concludes as the subject finishes the utterance "he grabs a big oak tree and he bends it way back" [2, p.25]. Here, the gesture and speech are carefully synchronized so the hand movement can be linked to the content it both depends and elaborates upon. In general, the gesture stroke generally precedes speech onset, within a certain restricted time window. The gesture stroke is rarely, if ever, initiated after the speech it is meant to represent or supplement.
Several researchers have examined the sensitive nature of temporal relationship between speech and gesture. For example, Rauscher, Krauss, and Chen [5] manipulated participants' ability to gesture while they described a cartoon to a listener. In those conditions where hand movement was restricted, subjects spoke less fluently and produced more unfilled pauses. Based on these findings, the authors argue that gestures facilitate the speech production process itself (in particular access to the mental lexicon), rather than serving as a backup mechanism for communication once speech has failed.
Mayberry and Jaques [6] reach a similar conclusion in their work on persons who stutter. When these individuals narrate cartoons, gestures are only produced alongside fluent speech. In the cases when gestures have been initiated prior to a stuttering event, the gesture stroke is frozen until speech is resumed and the two can continue to co-occur. Again, the results directly contradict the independent systems theory: if gesture and speech were separate processes, persons who stutter would be expected to continue gesturing even when speech is temporarily interrupted. In fact, these people would likely gesture more in order r to compensate for the breakdown in speech. This bidirectional relationship-the fact that the gesture stroke is halted in time with the stuttering events--suggests speech and gesture must be linked at a deep, neural level. Mayberry and Jaques [6] exclude the possibility that it is simply a "manual-motor shutdown" that prevents gesturing during stuttering events by showing that only speechrelated hand movements (and not simultaneous button-pressing, finger-tapping, etc.) are suspended during dysfluencies. Instead, the two must be connected at a planning stage, prior to motor execution.

Co-development of speech and gesture
Speech and gesture are known to show similar developmental trajectories in children. Bates and Dick [7] provide a comprehensive review of these parallel milestones, starting with the co-emergence of rhythmic hand movements and babbling in six to eight month olds. The same trends continue as children age and language abilities expand rapidly. Between twelve and eighteen months, gesture and naming are positively correlated (children who gesture earlier also name objects earlier). By18 months of age, toddlers begin to form both gesture-word and gesture-gesture combinations, and at 24 months, the ability to reproduce arbitrary sequences of manual actions is correlated with grammatical competence [7,8].This tight developmental link between speech and gesture can be easily understood if we believe speech and gesture are supported by a common and amodal system of communication.
Interestingly, hand banging is significantly correlated with onset of babbling and single word production even in infants with Williams Syndrome (WS), a rare genetic disorder causing broad developmental delays. More importantly, these manual movements in infants with WS are not correlated with other motor milestones; the link is specific to these early precursors of spoken language and gesture [9]. Also interesting is the observation that in congenitally deaf children, the emergence of manual babbling is developmentally appropriate, coinciding with the emergence of vocal babbling in typical hearing children [10]. This suggests that infants are innately disposed to acquire language, but that the system is flexible in terms of the input (e.g. visual or auditory) it will accept and later imitate.
Relatedly, studies have also shown that language and handedness both emerge early in development. The left hemisphere has long been known to support language function, and the majority of the global population develops a right handed bias for motor activity (motor activity on the right side of the body is also controlled by the left hemisphere of the brain). Interestingly, this handedness effect is stronger when producing symbolic rather than non-communicative hand movements [11]. These results suggest that that there is a common network within the left hemisphere that may support any type of communicative act, whether it is achieved through spoken language or manual movements.

Incorporation of gesture into speech retell
Numerous studies have demonstrated that people incorporate gestural information into the retelling of stories [12-15,among others]. For example, Church, Garber, and Rogalski [12] compared subject recall for ambiguous statements (e.g. "My brother went to the gym") alone versus when accompanied by a complementary gesture (e.g. shooting a basketball). At testing, researchers found a significant memory enhancement effect when both speech and gesture were available to subjects. Moreover, when asked to recall the speech items, 75% of the subjects added pieces of information based on the accompanying gestures. This pattern of results suggests that the brain does not "tag" the incoming information as originating in separate channels, but immediately integrates the two sources and processes them together.
Subjects may also add new content to a narrative in order to resolve potential mismatches between speech and gesture. For example, a conflict is introduced if a subject hears the phrase "and then Granny gives him a penny" but sees a gesture suggesting that Granny was actually on the receiving end of the interaction. In this case, the subject might insert additional information in their retelling: "and she threw him a penny, so he picked up the penny." Now, the gesture towards the body is aligned with "he picked up the penny," which is more logical than the mismatch that was originally presented [13]. Importantly, the subject does not ignore the gestural information in favor the speech. Instead, the two are seen as equally viable sources of information that must be linked in some fashion.

Gesture in self-only conditions
An additional line of evidence verifying the intimate relationship for speech and gestures comes from the repeated observation that the presence of gesture does not disappear entirely when a speaker's audience is removed (i.e. separated by a partition, on the phone, etc.). While the rate of gesturing is always higher in conditions where the receiver of the message is visible to the speaker), we do not stop gesturing in monologue or non face-to-face conditions. Why gesture if it cannot ease the comprehension load of our listener? Some researchers hypothesize that in these instances, gestures are used to benefit the speaker by facilitating word retrieval and lexical access, while others suggest that it is simply the result of habit. However, in the context of other research, it seems most likely that because gesture and speech are so tightly and inextricably linked, it becomes challenging to produce the speech without simultaneously producing the gesture [16][17][18] Similarly, there is evidence that congenitally blind individuals gesture as well, suggesting that -since they have never observed it-their use of gesture and its association with speech is innate rather than learned. Moreover, they gesture at a rate that is comparable to sighted individuals [19]. This behavior persists even when they are talking to individuals whom they know to be blind and could not benefit from the visual input.

Evidence from neuroimaging
While the behavioral studies described above are somewhat convincing, neuroimaging techniques may provide more compelling evidence that speech and gesture are best described as two example of a singular process. Functional magnetic resonance imaging (fMRI) and electroencephalography /event-related potentials (EEG/ERP) provide useful methods to explore what the brain is doing as it processes speech and gesture, either separately or together.
Results of imaging studies have demonstrated that 1) gestures influence the earliest stages of speech processing, 2) gestures are subject to the same semantic processing as speech, and 3) speech and gesture activate a common neural network.

Early sensory processing
A handful of studies have indicated that gestures can affect the earliest stages of language processing [20][21][22][23][24][25]. In an ERP experiment, Kelly, Kravitz, and Hopkins [21] showed a modulatory effect of gesture on the sensory P1-N1 and P2 components elicited at frontal sites. Since these early components are generally reflective of low level and automatic sensory processing, this suggests that the interaction between speech and gesture occurs obligatorily and prior to any conscious semantic processing. Such a finding directly contradicts the view that gesture is an "add-on" or "bonus" feature, only used post-hoc in cases when speech fails. Similarly, in an fMRI experiment, Hubbard et al. [23] presented subjects with videos of speech accompanied by spontaneous production of beat gestures (i.e. rapid movements of the hands which provide 'temporal highlighting' to accompanying speech; [1]), nonsense hand movements, or no hand movements. Analysis revealed higher BOLD signal in brain regions relevant to speech perception, including the left superior temporal gyrus and the right planum temporale, in the beat gesture condition.
Gestures do not only affect how we process speech; they also affect how we produce it. Bernardis and Gentilucci [24] compared the properties of speech and gesture emitted in multimodal (speech + gesture) conditions versus unimodal (speech only or gesture only) conditions. The authors found increased F2 and pitch in vocal spectra when words were accompanied by meaningful gestures, but no effect when words were accompanied by aimless arm movements. Similarly, speaking a word, but not a pseudoword, aloud reduced the maximal height reached by the hands and duration of meaningful gestures. These findings offer clear evidence that there is a bi-directional relationship between speech and gesture: producing one automatically and reflexively influences how we produce the other. Krahmer and Swerts [25] confirm that producing a gesture (in this case, a beat gesture) influences how a speaker generates co-occuring speech in terms of its acoustic features (emphasis, duration, frequency, etc). The reverse is also true: when participants can see a speaker's gesture, they rate the accompanying word as more "prominent."

Semantic processing of speech and gesture
A series of ERP experiments has shown that speech and gesture reflect the same semantic and cognitive processing. These experiments focus on the N400 component, which is thought to be an index of semantic integration and is commonly elicited by both words and gestures that are incongruent with the ongoing discourse. While the N400 was initially reported as generated by incongruent or unexpected words [26], the N400 to incongruent gestures is an incredibly robust finding [21, 27-29, among others]. For example, Kelly, Kravitz, and Hopkins [21] showed participants video clips in which an actor gestured to one of two objects (a short, wide dish or a tall, thin glass) and then described the same object aloud. The N400 was smallest when the gesture and verbal descriptor referred to the same object and largest when they referred to different objects. Similarly, Holle and Gunter [27] used homonyms to investigate the ability of gesture to disambiguate speech. An N400 effect to the homonym was found when the ongoing discourse failed to support the meaning that was previously indicated via gesture.

Shared neural networks
A smaller body of research has examined the processing of autonomous gestures, like emblems and pantomimes. Studying these gesture types, rather than the gesticulations dependent on speech for context, allows researchers to contrast the brain's response to each form of communication separately. For example, a recent fMRI study [30] demonstrated that language and symbolic gestures both activate a common, left-lateralized network of inferior frontal and posterior temporal regions, including the inferior frontal gyrus/Broca's Area (IFG), posterior middle temporal gyrus (pMTG), and superior temporal sulcus (STS) (see Figure 2 for illustration).The authors suggest that these regions are not language-specific but rather function more broadly to link symbols with their meaning. This is true regardless of the modality or form the symbol adopts: sounds, words, gestures, pictures, etc.

The gestural origins theory
The findings that speech and gesture are tightly integrated at multiple stages of processing and that they appear to activate a common neural system have significant implications for the question of how language evolved. The Gestural Origins Theory, made popular by Michael Arbib, Michael Tomasello, and Michael Corballis, proposes that spoken language emerged from the system of gestural communication we still see today in non-human primates (see [31] for review). In humans, a growth in brain size and the development of the vocal tract permitted a gradual transition to a more complex language system based upon vocalizations. Subsequently, and although we still use gestures to express ourselves, spoken language became the dominant mode of communication because it freed the hands for simultaneous tool use, was less demanding of energy resources, and did not require the speaker and addressee to be in the same physical (not to mention well-lit) location.

Gesture in our primate ancestors
Renowned primatologist Jane Goodall, as well as many other scientists, cites our sophisticated spoken language system as the crucial difference between humans and chimpanzees. Our primate relatives do produce sounds in order to communicate, but these vocalizations are limited in their scope and function and are used mainly to direct attention. Instead, it is their gesticulations that serve a more "language-like" function. These gestures are numerous: pointing, shaking, begging, and offering are all common [32]. These manual gestures can also Figure 2. Common areas of activation for processing symbolic gestures and spoken language minus their respective baselines, identified using a random effects conjunction analysis. The resultant t map is rendered on a single subject T1 image: 3D surface rendering above, axial slices with associated z axis coordinates, below. See [30] for more details. be used intentionally, flexibly, and across many contexts, unlike facial and vocal gestures which are more automatic and ritualized [33].
So, the question is now what is unique about humans that supports spoken language ability? Spoken language requires the same careful coordination of motor systems as manual gestures, only the same fine motor control of the hands gradually transitioned to similar movements of the vocal tract. This transition was only possible due to skeletal changes: the lowering of the larynx, lengthening of the tongue and neck, etc. A popular theory claims that a genetic mutation in the FOXP2 gene located on chromosome 7 may be responsible for the development of fine motor skills necessary for articulation and vocalization [34].

Gesture and the mirror neuron system
The discovery of the mirror neuron system lent added credence to the gestural origins theory.
Mirror neurons were first identified in area F5 of the monkey ventral premotor cortex and fire whether an animal executes or observes an action (for review, see [35]). A similar system is thought to exist in humans, and the areas of the human MNS, activated both by speech and by gesture, overlap largely with the classical language areas (i.e. Brodmann Area 44/Broca's Area). In terms of the Gestural Origins Theory, the mirror neuron system accounts for what Michael Arbib terms parity: the fact that what a listener hears and understands is the message that the speaker intended to send [36]. However, the role of the MNS has been hotly debated in recent years, with some researchers suggesting that it cannot account for the complex semantic features of our language system [37] and suggesting its role in action understanding may be overstated [38][39].

Gesture as a universal language
The existence of a communication system is a feature of every human culture. However, spoken language is not a unitary phenomenon: depending on geographic location and the community we belong to, we speak one or two (or in some circumstances, maybe three or four) out of hundreds of modern languages. When an English speaker travels to China for the first time, for example, it is highly unlikely he will understand even simple words or phrases if he has not spent extensive time memorizing vocabulary and practicing with fluent speakers first. In these situations, we turn to gestures. Unlike speech, gestures, such as pointing, is relatively consistent across cultures (emblems, of course, are culturally bound and the exception to this rule). For example, Liszkowski et al. [40] showed that infants and caregivers from seven different cultures all pointed with the same general frequency and under the same circumstances, suggesting a universal and prelinguistic basis for communication.
Many studies have examined the frequency of gesture usage in situations where no common language exists between speakers or when an individual is speaking in his non-native language. In general, speakers rely more upon gesture when communicating in their second language (L2) [41][42]; gesture under these circumstances likely function to decrease the production burden for the speaker and increase the likelihood of comprehension for the listener. Another line of research has been study the role of gesture in L2 vocabulary acquisition. This work has demonstrated that learning novel words paired with meaningful gestures helps learners retain the material over time [43][44][45].
Similarly, it seems that it is easier for members of deaf communities to develop a common gesture or sign-based language than it is for members of separate speech communities to develop a new spoken language. The most notable example is perhaps Nicaraguan Sign Language, which emerged in the 1970s after the opening of a special education school that brought deaf children in the community together for the first time [46]. In sum, the fact that 1) we rely upon gesture as a common platform for communication when we lack a common language and 2) signed (but not spoken) languages still arise spontaneously, suggest that gestures may indeed form the core of our communication system.

Conclusions
Evidence overwhelmingly favors the view that speech and gesture are tightly integrated with one another, at both the behavioral and neural levels, suggesting that forms of verbal and nonverbal communication are parts of one amodal system that enables complex human communication.
Considered broadly, evidence also seems to support a view of language evolution rooted in manual gesture. The mechanisms that underlie this, however, are still somewhat unclear. The mirror neuron system may be the center of the "language-ready brain," but this theory is not free from controversy. Equally viable (and not mutually exclusive) is the proposal we advocate here: the system that supported nonverbal communication was co-opted over the course of evolution to support spoken language.
Nevertheless, David McNeill, whose work we see as central to both of these hypotheses, is actually a critic of the "gesture-first" view, instead claiming that speech and gesture emerged alongside one another and in response to the same environmental pressures. Challenging this view, however, is the literature on comparative biology, primate vocalizations and gesture, molecular, and the developmental trajectories of gesture and speech in children, all of which all suggest that speech lags behind gesture in our evolutionary history.
In the end, the question of how language evolved and whether or not it emerged from a system built on manual gestures is not as important as what the relationship is between speech and gesture, now that they both exist. The intimate relationship between the two, which is now well established, has important implications for education, acquisition of second languages, effective public speaking, treatment of patients with communication disorders, and much, much more.