Joint attention is a keystone in social cognitive development and a skill acquired early in life. It is the triadic coordination of attention between two people and an object or event in which they are commonly interested. Language development follows in its tracks and is dependent on this early acquired skill. Its deviation from typical development is considered one of the earliest signs of autism. Consequently, its remediation has gained intensive focus in therapy. In this review, the development of joint attention skill in initiating (IJA) and responding (RJA), and its atypical development in autism and related spectrum disorders would be discussed. This would include existing problems in pointing, sharing attention with the participant, and facial recognition; and the rationale behind these deviations as covert attention. Related fMRI findings would also be reviewed, outlining the integration between the posterior involuntary parietal and superior temporal cortices (RJA) and the anterior volitional prefrontal and orbital frontal areas (IJA) in typical development, and the long‐distance underconnectivity and local overconnectivity in autism. Several cortical regions are implicated in autism, revealing the heterogeneity of the findings, but general conclusions could be drawn.
- initiating joint attention
- responding to joint attention
- functional MRI
- social development
Joint attention is a triadic organization of attention and communication between two persons sharing an object or event, and turning occasionally to look at each other to communicate what they are experiencing. This coordinated joint attention can be initiated, ‘Initiating Joint Attention’ (IJA), or responded to, ‘Responding to Joint Attention’ (RJA). Pointing, shifting of eye gaze, and displaying facial expression are essential components of joint attention.
The typical form of joint attention is that of two people engaged in an activity, following a performance, or watching another person or picture or object, nevertheless, turning every now and then to look at each other. This momentary sharing of attention during the course of activity is very meaningful. It conveys to the other partner that his/her presence is recognized, checks his/her reactions, and registers with him/her the responses of the first partner.
If two persons are just engaged in watching a common object without looking at each other, their attention would be ‘parallel,’ not ‘coordinated.’ Joint attention means that attention is not exclusively on the object; as in this condition, ‘the sense of sharing’ would be missed. Attention is divided between the object and a person interested in this object. This human communication model is imprinted from early on in life through brain communication circuits and social emotional brain areas. The acquired outcome is ‘cognitive’ from the object and ‘social/cognitive’ from the person.
That is why joint attention is a pivotal skill in a child's social/communicative path, paving the way toward language development . Impairment of joint attention before 1 year is one of the earliest indicators of autism . Furthermore, joint attention intervention in autism is reported to produce better language acquisition and outcome .
The aim of this chapter is to review the clinical presentation of deficient joint attention in autism, the proposed rationale behind it, and the related radiological picture in functional MRI (fMRI) of autistic children
2.1. Clinical picture and rationale
Early home videos in the first 2 years of life in children, later diagnosed as having autism, show deviation in the typical development of joint attention [4–7]. There is deficient facial expression, social interaction, orientation to name, and pointing to/showing of objects. So, what's the typical development of joint attention and how are those children not following the expected path?
Pointing is a cornerstone in joint attention development; giving the social reference of: ‘Look this way, I want to show you something!’ Children's behavior during the first 6 months of meaningful words’ acquisition is soaked in pointing. It becomes evident as a declarative tool at 12 months of age . They want to show the parents what they are exploring as they go around. This is, however, preceded by months of social exploration, not of the surroundings, but of the facial expression of adults.
As early as 2 months of age, infants focus on facial expressions of caregivers. They learn to identify emotional states from facial expressions and spend a lot of time doing that during the first year of life. Facial expressions and vocalizations characterize early human communication, especially before 6 months of age .
As infants become mobile and begin to attend more to objects in the environment, the caregiver starts to label and comment on objects at which the infant's attention is focused . The joint attention becomes ‘supported’ with the caregiver being the director of communication and then ‘coordinated’ with the caregiver and infant sharing in the communicative game by alternating looking at each other and at the object of interest.
Supported joint attention appears by the middle of the first year and remains prevalent in communication between 18 and 30 months of age. Coordinated joint attention emerges gradually between 9 and 15 months of age. At 18 months, children sustain well‐timed glances between them and the caregiver during a coordinated joint attention play activity .
Initiating joint attention (IJA) is a voluntary skill; while responding to it occurs automatically. The child uses gestures and eye contact to direct others to objects, events, and to themselves. The propensity to initiate joint attention develops across 9 months to one and a half years, while the natural tendency to respond to joint attention (RJA) increases during that period. The frequency of both is, however, unrelated. Mundy  finds responding to joint attention to be highly related to vocabulary development in typically developing children.
Although both IJA and RJA have similar social roles, yet they have different motivational origins. RJA is possibly maintained by extrinsic reinforcement (as tangible rewards) , while IJA is maintained an intrinsic drive (social sharing) . Joint attention becomes gradually symbol‐infused. Although adults speak during their first encounter with a newborn, infants begin to understand and produce symbolic acts late in their first year, and with a greater variety in their second year. The initial period of joint attention is thus nonsymbol infused and becomes symbol infused when language skills emerge .
Joint attention is related to both language development and social development. Nonlinguistic interactions between the child and the mother encourage early language development. These interactions provide the child with a predictable referential context that makes the language of the mother meaningful. The underlying mechanism that works in these mother‐child interactions is joint attention . Tomasello et al.  found positive correlation between joint attention at 15 months of age and vocabulary size at 21 months.
In social development, joint attention is related to both pretend/symbolic play and theory of mind. These two social cognitive abilities are later developing and specifically impaired in individuals with autism. Whereas joint attention first develops before 1 year of age, pretend play is not seen until the second year of life, while theory of mind emerges in preschool years . Theory of mind, when developed, allows the child to understand that other people have their own feelings, thoughts, and beliefs that differ from one's own .
In symbolic play, the theme of pretence evolves from playing with toys functionally as in constructing a building to playing with toys symbolically as pretending that a banana is a telephone . In comparison with typically developing children at the same mental age, autistic children have significant delay in development of symbolic play . Children with autism handle and manipulate toys in a rigid and stereotyped manner  and are less likely to start symbolic play activities  or get involved with people sharing the activity with them. They are object focused .
Holmes and Willoughby  have likewise observed solitary or parallel functional play in seventeen 4‐ to 8‐year‐old autistic children in the classroom. Keen et al.  studied eight children with autism and found that except for few instances of commenting, they mainly communicated for the purpose of requesting objects or protesting. Unfortunately, it was observed that teachers infrequently acknowledged the children's communicative trials, as a result of their atypical nature .
The atypical development of this trend of nonverbal communication, termed joint attention, continues as a barrier to proper social development in autistic individuals. There is little motivation in autistic children to look at people's facial expression, because it does not provide them with information about the emotional states, motives, or intentions of others . They are unable to shift attention/eye gaze from a person to object, to point to initiate joint attention, or to use the eye direction of other people as a guide . When they use pointing, it is for the sake of regulating other persons’ behavior by indicating that they want something (protoimperative), rather than for the purpose of sharing interest with others (protodeclarative) . Generally, autistic patients are more likely to respond to joint attention bids than to initiate joint attention .
Investigation of the nature of aberrant joint attention has led to multiple rationales. Experiments show that attention in autism is covert (without head and eye movement ). Overt attention involves directed eye movements (namely, saccades) toward a certain target, while in covert attention, the focus on the object is mental, without significant eye movement. As people look toward things that interest them, the direction of eye gaze reveals people's goals and focus of attention. Typically, people tend to reflexively orient to where other people are gazing, especially if the gaze shows an eye movement or shift of eye gaze without a head movement. The deficit in gaze following in autism may result from a focus on details and features rather than a more holistic integration of eye direction, head position, body posture, and pointing cues .
Autistics tend to orient only if another person's gaze is predictive . They respond significantly more accurately than nonautistics if another person's gaze is informative, whereas nonautistics reflexively orient their attention in the direction of the other person's gaze, even if that direction is not expected. They want to find out where/what the gaze leads to.
Posner et al.  pioneered in measuring covert attention using signal detection on a screen. Participants are seated in front of the screen and instructed to fixate their eyes on the central point of the screen, marked by a dot. An arrow appears to the right or to the left of the central dot. It is followed by a target stimulus, usually a shape. In 80% of trials, the target is on the same side of the arrow, while in 20%, it is on the opposite side. The participant should immediately respond when the target appears and the response time is measured. Data show that autistics show more covert attention than age‐matched nonautistic controls .
Autistic children have atypical prolonged resistance to distraction . They do not disengage their attention from their immediate focus with ease and consequently cannot readily follow eye gaze of other people. This underlies the lack of the automatic social response of eye gaze following, especially if there is no predicted target in the direction of the eye gaze . Bayliss et al.  described autistics as “possessing a stronger ability to inhibit the influence of non‐informative social cues.” This is in contrast to non‐autistics who “suffer greater interference” and cannot ignore another person's gaze, even if the direction of the gaze seems to be irrelevant.
Autistics also have enhanced visual processing. They have a three times more powerful recognition of a visual object in a complex visual background . This atypical visual perception is one of the most intriguing puzzles in the diverse world of autism. If asked to search for a target element among a display of 25 elements, as a white cube among white balls and colored cubes, autistics are nearly twice as fast as nonautistics to find the target. They perceive in parallel; they do not inspect the presented picture, item by item .
Defective imitation abilities in autism were described as early as the 1970s by DeMeyer et al. . There is a lower rate of spontaneous imitation of gestures in children with autism when compared with age‐matched typically developing children. Other studies have focused on home videos during the first 2 years of life, of children later diagnosed as having autism and revealed lower rates of spontaneous imitation . This defective imitation was viewed by some authors as a form of motor dyspraxia, meaning that the autistic child is unable to execute the components of the movement of pointing or looking where another person is pointing .
The atypical response of autistic children to conventional bids for joint attention has been related to multiple rationales, namely, difficult volitional action execution; ability to attend covertly not overtly, to perceive the environment fluidly, and to resist distraction or interruption. So, the proposed rationale behind joint attention deficit in autism has scanned both the sensory perception and the motor execution abilities. A simultaneous sensorimotor dysfunction paradigm could be coexisting to a variable degree in autistic patients.
2.2. fMRI studies related to joint attention in autism
By fMRI, joint attention was found to be the outcome of the integration between two attention regulation systems: a posterior involuntary, related to RJA and operated by parietal and superior temporal cortices, and an anterior volitional, related to IJA and operated by prefrontal association cortex, orbital frontal cortex, and anterior cingulate .
The aforementioned posterior involuntary system is a perceptual system that starts to develop in the first 3 months of life. It orients the infant toward biologically meaningful stimuli . The parietal and superior temporal areas serve the development of representation; imitation; perception of other people's head and eye movements; as well as the perception of spatial position of self, others, and the environment. The following quote summarizes the cognitive message gained from this system: “where others’ eyes go, their behaviors follow .”
The anterior attention system related to IJA is later developing and gives the following cognitive message: “where my eyes go, my behavior follows.” This system is volitional, goal‐directed, and reward affected. It also supervises integration of activity between the anterior and posterior attention systems. Starting from 4 to 6 months of age, the anterior attention system integrates the internal control of one's gaze direction together with goal‐directed behavior, with the external monitoring of the gaze direction of others and their behavior. This integration yields the specific form of human attention called joint attention . Also, this comparative monitoring between the external looking behavior in the form of overt attention and the attention to internal self‐representations is a very important substrate for cognitive development. Joint attention, after all, is a social cognitive skill, and practice with joint attention in the first 9 months of life is a major contributor to the development of social cognition .
The decreased brain connectivity in ASD implicated in fMRI studies, specifically between frontal and posterior‐temporal cortical regions, affects social and emotional information processing . Local overconnectivity and long‐distance underconnectivity are, however, the recent understanding of the cortical neural dysfunction in autism. This model addresses the heterogeneity of the disorder and the pervasive nature in general . There is more tendency in autistic individuals to process tasks in a manner that relies less on anterior frontal gray matter centers (theory of mind, face, and social processing) and more on those of posterior temporal gray matter (visual processing). The white matter alteration of structure, as defective myelination, is implicated in this lack of fast synchronous communication between these gray matter areas . This disrupted connective circuitry in the brain was proposed by Just et al.  in the cortical underconnectivity theory, describing a lower communication bandwidth between frontal and posterior temporal areas in autistic compared to control participants. On the other hand, McFadden and Minshew  pointed to local overconnectivity in the form of consistent finding of excess interstitial neurons that reflects failure of appropriate developmental apoptosis and leads to improper brain connective function.
In an fMRI study of joint attention in 2005 , activity was detected in association with joint attention in ventromedial frontal cortex, the left superior frontal gyrus (BA10), cingulate cortex, and caudate nuclei. Both the ventromedial frontal cortex and BA10 are related to mental activity and cognitive integration tasks. Authors concluded that a developmental defect in the left anterior frontal lobe could be an underlying factor in autism spectrum disorders (ASD).
On the other hand, a review of fMRI findings in autism spectrum disorders (ASD) described hypoactivation of the ‘social brain’ in prefrontal cortex, posterior superior temporal gyrus, amygdala, and fusiform gyrus; decreased anterior‐posterior functional connectivity during resting states; and anomalous mesolimbic responses to social rewards . It should be put in mind, however, that there is heterogeneity in fMRI findings. Findings are complex, because as Geschwind and Levitt  described, there are ‘many autisms'.
Faces are social stimuli that infants attend to from very early in life . Defective face processing and discrimination, as reduced face emotion recognition, exist in autism . In neurotypical individuals, activation of fusiform gyrus, in addition to superior temporal sulcus, amygdala, and orbitofrontal cortex, occurs in response to face viewing . Most fMRI studies in autism implicate hypoactivation of fusiform gyrus in response to faces and facial expressions . However, the predicted fusiform activation has been reported to occur in autistic individuals when there are familiar faces  or unfamiliar faces in the presence of an attentional cue .
Atypical amygdala activation has been implicated in tasks of judging emotional state from eyes , or from facial expression , as Amygdala functions in identification of emotional situation based on the facial expression . Reduced activity in the inferior frontal region in ASD has been described by Bookheimer et al.  during face processing tasks, as matching faces presented in upright versus inverted positions. The typical response of increased activation for inverted faces was lacking, due to the absence of the social significance of the stimulus.
A neurological human skill that is related to theory of mind, called Empathy, was also studied by functional brain imaging. Empathy is the ability to understand the emotional state of others by mapping of the feelings of others onto our nervous system. This is crucial for a socially appropriate response and is done by mirror neurons, which relay the impulses onto the premotor cortex . It is a self‐referential emotional cognition, in order to compare and relate the emotional state of others with our own. ‘Theory of Mind’ allows the awareness of the fact that the mental state of others differs from ours, and to be understood it has to be adopted and evaluated from our own perspective. Atypical neural activation occurred in ASD individuals who were asked to evaluate other people and their own facial emotional response, as compared to control subjects. For example, activation of the prefrontal cortex occurred dorsally in ASD and ventrally in control subjects. This may underlie disturbed empathy in autism .
The fMRI findings in autism reflect aberrant function in both white matter axons and gray matter areas. Underconnectivity (distant) and overconnectivity (local) in white matter, in addition to underactivation (frontal) and overactivation (posterior) in gray matter, characterize the array of heterogenous findings in autism.
Social and language developments are inseparable domains in human communication. Joint attention as a social cognitive skill, being a core component in both social and language development, has been recognized as deficient in autism‐related research. Further research attempts in this domain are warranted due to a few number of participants in research addressing joint attention. There are unraveling fMRI findings implicating several areas as the prefrontal cortex, the posterior superior temporal gyrus, and the functional anterior‐posterior cerebral connectivity. There is persistent need to address joint attention more fully and variably in intervention of autism for the sake of better outcomes. Joint attention is a skill that is acquired very early in life and is adherent to typical language development and use among humans. Language use is typically the most pervasively affected linguistic category in autism.