Autism is a neurodevelopmental disorder of complex etiology and is amongst the most heritable of neuropsychiatric disorders while sharing genetic liability with other neurodevelopmental disorders such as intellectual disability (ID). Autism spectrum disorders (ASDs) are defined more broadly and include autism, Asperger syndrome, childhood disintegrative disorder and pervasive developmental disorder not otherwise specified. Under the Diagnostic and Statistical Manual of Mental Disorders, 4th Edition Revised (DSM-IVTR), these disorders are grouped together with Rett syndrome (“Rett’s disorder”) as pervasive developmental disorders. However, Rett syndrome has a reportedly distinct pathophysiology, clinical course, and diagnostic strategy (Levy & Schultz, 2009) and will likely be removed in the impending publication of DSM-V (APA, 2010). The new diagnostic manual will formally adopt the single diagnostic category “ASDs”, which is used here. Reported prevalence rates for ASDs range from 20 (Newschaffer et al. 2007) to 116 (Baird et al., 2006) per 10,000 children, and vary in accordance with diagnostic, sampling, and screening criteria. The Centers for Disease Control and Prevention (CDC) suggest that in the United States, the prevalence of ASDs is 1 in 110 (1/70 in boys and 1/315 in girls) (ADDM, 2009). The three primary characteristics of ASDs are communication impairments, social impairments, and repetitive/stereotyped behaviors. The DSM-IVTR, ICD-10, and many other diagnostic instruments require impairment in each of these domains for a diagnosis of autistic disorder.
Within the last decade, a number of major technological developments have transformed our understanding of the genetic causes of autism, and the field continues to evolve rapidly. In this chapter, we will review three approaches to identifying genetic factors that contribute to the pathogenesis of ASDs: 1) common variants and genome-wide association studies (GWAS); 2) rare variants and copy number variation (CNV) studies, and 3) familial forms of autism and the role of next-generation sequencing (NGS) methods. Data from all three approaches underscores the conclusion that autism is a highly complex and heterogeneous disorder, involving a multifactorial etiology. Moreover, it is becoming increasingly apparent that autism is not a unitary disorder, and that the spectrum may consist of any number of different autisms that share similar symptoms or phenotypes. This conclusion has important implications for evaluation and treatment, which are discussed in the conclusion.
2. Heritability of ASDs
Although Skuse (2007) cautions that heritability estimates of ASDs may have been skewed by the co-inheritance of (low) intelligence or other variables, there is little doubt that genetic factors play a key role in autism. In the most widely-cited twin study, Bailey et al. (1995) report that monozygotic twins are 92% concordant on a broad spectrum of cognitive or social abnormalities, compared with only 10% for dizygotic twins. Parents and siblings of individuals with ASDs often exhibit subsyndromal levels of impairment (Piven et al., 1997), and having an affected sibling is the single biggest risk factor for developing an ASD. In an analysis of 943,664 Danish children (Lauritsen et al., 2005), the strongest predictors of autism were siblings with ASDs, who conferred a 22-fold increased risk, while Fombonne (2005) suggested that this risk may be even greater.
3. Early insight from Rett syndrome and fragile X
Early efforts to identify the genetic causes of ASDs utilized linkage and association approaches. Linkage studies, more prominent in the 1980s and 1990s, typically focus on families or larger pedigrees and is well powered to identify rare genetic variants. The most common linkage approach is the affected sib-pair design (see O’Roak & State, 2008), which examines the transmission of genomic segments through generations. Linkage studies helped define the locus containing FMR1, which is mutated in fragile X syndrome (e.g. Richards et al., 1991), and is a common cause of autism in fragile X syndrome, affecting ~30% of children who are diagnosed with fragile X (Rogers et al., 2001; Harris et al., 2008). Similarly, this approach has been important to identifying MECP2 as the major cause of Rett syndrome (e.g. Curtis et al., 1993).
Association studies take the opposite approach, scanning the genome from the top down, with the goal of determining post-hoc whether identified variants are more or less common in affected individuals. Early association studies (i.e. pre HapMap) were complementary with the linkage approach, and in many designs, linkage primed target loci for this more fine-grained analysis. These early insights have played a significant role in shaping our current understanding of ASDs, and functional studies of FMR1 and MECP2 have highlighted the importance of synaptic dysfunction (Ramocki & Zoghbi, 2008) as a unifying factor that could extend into the more common forms of autism. This is significant because it provides a means of linking neural correlates with genomic data, as well as related clinical phenotypes such as seizures and cognitive deficits (Hagerman et al., 2009). The Alzheimer’s paradigm, which includes functional models of how molecular, biochemical, and neural systems interact is instructive in this regard (e.g. Cissé et al., 2011).
4. Genome wide association and common variants
Aside from notable successes with fragile X and Rett syndrome, early linkage and association studies have been inconsistent in resolving more complex genetic correlates of ASDs, with candidate genes often not being replicated between studies. These challenges may in part accounted-for by their relatively low resolution, which makes it difficult to detect candidate loci other than those of major effect. In the past decade however, association studies have become increasingly more sophisticated, with the whole-genome approach, allowing us to examine thousands of individuals on a mass scale, using hundreds of thousands of markers.
Genome-wide association studies (GWAS) examine the frequency of single nucleotide polymorphisms (SNPs) in cases versus control populations and can adopt either a case-control or family-based approach. The former allows researchers to avoid the often complex process of acquiring diagnostic/phenotype data from a patient’s family, and can incorporate very large numbers of control datasets that may be more readily available. The latter controls for the often confounding phenomenon of population stratification, where variants more common to specific racial groups may either be erroneously identified as causal, or obscure actual causal variants. A major caveat with family-based designs is the often unfounded assumption that unaffected family members do not share causal variants.
GWAS test for common variants (>1% population frequency), with the assumption that ASDs are at least in part caused by the coinheritance of multiple risk variants, each of small individual effect (odds ratios between 1:1 and 1:5). This assumption is known as the common disease-common variant (CDCV) model (Risch and Merikangas, 1996).
A 2009 paper by Wang et al. (2009) from our laboratory was the first to identify common variants for ASDs on a genome-wide scale. We examined 780 families (3,101 individuals) with affected children, a second group of 1,204 affected individuals, and 6,491 controls, all of whom were of European ancestry. We identified six genetic markers on chromosome 5 in the 5p14.1 region that confirmed susceptibility to ASDs. The region straddles two genes, CDH9 and CDH10. Both genes encode type II classical cadherins, transmembrane proteins that promote cell adhesion. The association of cadherins is consistent with the cortical-disconnectivity model of autism (e.g. Gepner & Féron, 2009), which postulates that ASDs may result from an increase or decrease in functional connectivity and neuronal synchronization in relevant neural pathways. Functional studies suggest that under-activity between and within networks are correlated with social, communication, cognitive, and sensorimotor impairments (Müller et al., 2011).
The study design utilized two independent replication cohorts and the key SNP at this locus has also been replicated in two additional independent cohort studies (Ma et al., 2009; St Pourcain et al., 2010), lending further support that genetic factors at the 5p14 locus, which is flanked by two relevant cadherin genes, represent strong candidates for aligning molecular function with known neural deficits in ASDS. The original report by Wang and colleagues, also demonstrated that there was an enrichment in Catherin associated genes in ASDs in general, based on gene-pathway analysis (Wang et al, 2009). Cadherins represent a large family of transmembrane proteins that mediates calcium-dependent cell–cell adhesion and, via cell adhesion, has been shown to generate synaptic complexity in the developing brain (Redies, 2000). Other common GWAS variants reported have not been replicated in independent studies and will not be covered here.
4.1. Replicated common variants from candidate gene studies
Other common variants from candidate gene studies include CNTNAP2, EN2, and MET, which are reviewed briefly below. A more in depth review of these genes can be derived from catalogs at
Located on chromosome 7q35, Contactin Associated Protein 2 (CNTNAP2) was identified by Alarcón et al. (2002) as a candidate for the age at first word endophenotype. A subsequent follow-up by the same group (Alarcón et al. 2008) using linkage, association, and gene-expression analyses, found CNTNAP2 to be the only autism-susceptibility gene to reach significance across all approaches. An independent linkage analysis by Arking et al. (2008) also highlighted CNTNAP2 as a significant ASD candidate gene. CNTNAP2 is part of the neurexin family, which have repeatedly been associated with autism (see below). Interestingly, Vernes et al. (2008) showed that CNTNAP2 binds to FOXP2, which is a well-established correlate of language and speech disorders (Lai et al., 2001) – a common phenotype in ASDs.
Engrailed 2 (EN2) is a homeobox gene that is critical to the development of the midbrain and cerebellum. Like other homeobox genes, it regulates morphogenesis. EN2 is a human homolog of the engrailed gene, which is found in Drosophila. En2 mouse mutants have anatomic phenotypes in the cerebellum that resemble cerebellar abnormalities reported in autistic individuals (Cheng et al., 2010). Benayed et al. (2005, 2009) have reported and replicated in three separate datasets a significant association with broad and narrow ASD phenotypes. Wang et al. (2008) also found an association between EN2 and ASDs in a Chinese Han sample, although Zhong et al. (2003) failed to find evidence of an underlying association.
The oncogene MET is also strongly linked to ASD etiology, having been supported by a number of studies in the past decade (e.g. IMGSAC, 2001; Campbell et al., 2006, 2008; Sousa et al., 2009). Recently, Eagleson et al. (2011) reported a role for Met signaling in cortical interneuron development in vitro in a mouse model.
4.2. Unexplained variance
For the most significant discovery SNP identified in the Wang et al. study above (rs4307059), the risk allele frequency was 0.65 in cases with an odds ratio of 1.19, which is comparable with common variant discoveries in other psychiatric disorders including schizophrenia (Glessner & Hakonarson, 2009; Glessner et al., 2010); bipolar disorder (Ferreira, 2008), and attention-deficit/hyperactivity disorder (Arcos-Burgos et al. 2010). While it is important not to undermine the significance of these findings, it should be noted that the predictive value of such ratios is relatively low (Dickson et al. 2010), often explaining less than 5% of the total risk (review at
The possibility that common variants are not the major cause of ASDs is also gaining increased support from the preponderance of copy number variation (CNV) studies, which are identifying rare variants with a stronger causal impact.
5. Copy number variation in ASDs
Copy number variations (CNVs) are insertions, deletions, or translocations in the human genome that are universal in the general population but more commonly found in genic regions in individuals with neuropsychiatric disorders (e.g. Pinto et al., 2010). CNVs can be detected by the same SNP arrays used in GWAS, and vary in length from many megabases to 1 kilobase or smaller. They are often not associated with any observable phenotype.
One of the most widely-known CNVs is Down syndrome, which is characterized by an extra chromosome 21. Rett syndrome is also caused by a CNV, which includes a deletion in MECP2. CNVs can be inherited or occur de novo, the cause of which is thus far unknown. Common disease-causing CNVs are infrequent but rare CNVs, with a frequency of less than 1%, have been identified for a range of disorders including ADHD (e.g. Williams et al., 2010), schizophrenia (e.g. Glessner et al., 2010; Levinson et al., 2011), bipolar disorder (e.g. Chen et al., 2010) and many others. A substantial portion of autism appears to be caused by rare CNVs. De novo CNVs that are greater than 100kb in size are more common in genic regions in individuals with ASDs than in the general population.
Sebat et al. (2007) provided some relevant early insights into the genomic features of CNVs. Firstly, they noted that de novo CNVs were individually rare – from 118 ASD cases, none of the identified variants were observed more than twice, with the majority seen just once. This confirmed the widely-held assumption that many different loci can contribute to the same ASD phenotype. Secondly, the authors affirmed the utility of population-study approaches that analyze sporadic and multiplex (i.e. more than one family member affected) families separately. The rate of de novo mutation in large (mostly genic) loci in multiplex families was significantly lower than for the sporadic cases (p = 0.04). While this observation remains to be replicated in a larger study, the finding implies two mechanisms of genetic susceptibility – spontaneous mutation and inheritance. Finally, the sheer volume of loci identified by this approach (multiple loci on 20 chromosomes) affirms the extraordinarily complexity of ASDs.
A number of subsequent studies have greatly expanded the number of candidate loci. Our laboratory (Bucan et al. (2009)) reported 150+ CNVs in 912 ASD families that were not found in 1,488 controls. Critically, 27 of these loci were replicated in an independent cohort of 859 ASD cases and 1,051 controls. Some of the rare variants we identified had previously been associated with autism, including NRXN1 and UBE3A, which are established ASD candidate genes (Guilmatre et al., 2009). Samaco et al. (2005) previously identified significant deficits in ube3a expression in mecp2-deficient mice, suggesting a shared pathological pathway with Rett syndrome (as well as Angelman syndrome, and autism). Similarly, Kim et al. (2008) associated NRXN1 with a balanced chromosomal abnormality at chromosome 2p16.3 in two unrelated ASD individuals. Rare variants in the coding region included two missense changes.
Glessner et al. (2009) identified and reported CNVs in two major gene networks, including neuronal cell adhesion molecules (such as NRXN1) and the ubiquitin gene family (such as UBE3A). Interestingly, four of the most prominent genes enriched by CNVs in ASD cases (UBE3A, PARK2, RFWD2 and FBXO40) are all part of the ubiquitin gene family. Ubiquitination can alter protein function after translation, and degrade target proteins in conjunction with proteasomes. The ubiquitin–proteasome system operates at pre- and post-synapses, whose functions includes regulating neurotransmitter release, recycling synaptic vesicles in pre-synaptic terminals, and modulating changes in dendritic spines and post-synaptic density (Yi & Ehlers, 2005). As well as implicating an ubiquitination network in relation to ASDs, we also identified a second pathway involving NRXN1, CNTN4, NLGN1, and ASTN2. Genes in this group mediate neuronal cell-adhesion, and contribute to neurodevelopment by facilitating axon guidance, synapse formation and plasticity, and neuron–glial interactions. We also note that ubiquitins are involved in recycling cell-adhesion molecules, which is a possible mechanism by which these two networks are cross linked.
In a similar approach, Pinto et al. (2010) further confirmed the importance of rare CNVs as causal factors for ASDs. Interestingly, the group did not observe a significant difference between cases and controls in terms of raw number of CNVs or estimated CNV size. However, the number of CNVs in genic regions was significantly greater in ASDs compared to controls. Again, loci enriched for CNVs include a number of genes known to be important for neurodevelopment and synaptic plasticity, such as SHANK2, SYNGAP1, and DLGAP2. Between 5.5% and 5.7% of ASD cases have at least one de novo CNV, further confirming the significance of de novo genetic events as risk factors for autism. Similar to the Glessner study, the Pinto group mapped CNVs to a series of networks involved in the development and regulation of the central nervous system functions. Implicated networks include neuronal cell adhesion, GTPase regulation (important for signal transduction and biosynthesis), and GTPase/Ras signaling, also involved in ubiquitination.
Finally, Gai et al. (2011) took a slightly different approach, focusing exclusively on inherited CNVs. While underlying loci were not necessarily common to those identified by the Glessner and Pinto groups, enrichment in pathways involving central nervous system development, synaptic functions and neuronal signaling processes was again confirmed. The Gai et al. study also emphasized the role of glutamate-mediated neuronal signals in ASDs.
Collectively, these CNV studies suggest that certain hotspots on the genome are particularly vulnerable to ASDs, which include loci on chromosomes 1q21, 3p26, 15q11-q13, 16p11, and 22q11. These hotspots are part of large gene networks that are important to neural signaling and neurodevelopment and have additionally been associated with other neuropsychiatric disorders.
In particular, a number of CNV studies in schizophrenia have highlighted structural mutations incorporating chromosomes 1q21, 15q13, and 22q11 (e.g. McClellan and King 2010; Glessner et al., 2010), which are significantly enriched in cases versus controls, with NRXN1 being a standout in this regard. From a phenotype perspective, autism and schizophrenia seem very different, both in behavioral manifestation and age of onset, and it may seem counter-intuitive that associated loci should overlap. Some authors have addressed this peculiarity by proposing that schizophrenia and autism may in fact be different poles of the same spectrum. Thus, Crespi and Braddock (2008) suggest that social cognition is underdeveloped in ASDs and over-developed in the psychotic spectrum, with a similar polarization of language and behavioral phenotypes. Although speculative, this hypothesis has gained some traction. In the next several years, genomic, imaging, and model-systems approaches will likely shed further light on the relationship between autism, schizophrenia and other neuropsychiatric disorders.
6. Sequencing familial forms of ASDs
To this point, we have focused primarily on the complex interactions of polygenic networks as the major cause of ASDs. However, this is not exclusively the case. Paralleling the recent spate of CNV is a renewed focus on rare disorders, including familial forms of complex diseases that potentially are monogenic or with less complex inheritance pattern. At the outset of this chapter, we emphasized the overlap with fragile X syndrome, where one third of cases are co-morbid for ASD. As mentioned, fragile X is caused by a failure to express the protein coded by FMR1. However, mutations in FMR1 do not always result in fragile-X and can result in a phenotype more representative of ASDs. Thus, Muhle et al. (2004) found that 7-8% of idiopathic ASD cases may have mutations at the FMR1 locus. Likewise, although mutations in MECP2 are the common cause of Rett syndrome, certain mutations at the same locus have been associated with idiopathic autism (Carney et al. (2003).
X-linked genes encoding neurologins NLGN3 and NLGN4 and SHANK3 (a neuroligin binding partner) are other prominent examples of distinct rare genetic causes, and a parallel can be drawn with these studies and mental retardation and epilepsy, which include many rare syndromes that collectively account for a substantial proportion of the two disorders (Morrow et al., 2008). Indeed it is perhaps more than coincidence that autism is heavily co-morbid with these two conditions, with >40%( Bölte et al., 2009) and ~40% (Danielsson et al., 2005) of ASD cases meeting diagnostic criteria for mental retardation and epilepsy respectively. It also is noteworthy that many of these monogenic-related genes are also major players in neurodevelopment and synapse activity. Other prominent examples include TSC1, TSC2 (Osborne et al., 1991; Franz, 1998), NF1, and UBE3A (see Morrow et al. (2008).
The identification of monogenic or possibly oligogenic autisms is likely to accelerate in the next several years as next-generation sequencing becomes more widely available. We recently encountered a family of two parents, six healthy siblings, and two siblings with severe autism suggestive of autosomal recessive inheritance. Unsuccessful attempts using linkage and CNV approaches failed to identify a causal locus, but whole-exome sequencing at 20x coverage identified four genes, including one with a non-synonymous SNP in the protocadherin alpha 4 isoform1 precursor (PCDHA4) gene, which presents a strong candidate gene, currently under validation. Protocadherins are part of the cadherin family that facilitates neuronal cell adhesion and this discovery is consistent genomically and neurobiologically with the findings addressed above in relation to CDH9 and CDH10.
Known syndromes with ASD features include fragile-x, neurofibromatosis type 1, down syndrome, tuberous sclerosis, neurofibromatosis (which confers a 100-fold increased risk for ASDs Li et al. (2005), Angelman, Prader-Willi and related 15q syndromes, and at least several dozen others (see Zafeiriou et al., 2007 for a comprehensive review). Table 1 from Volkmar et al. (2005) lists the most commonly associated syndromes with median rate and range. It is likely that many more unidentified rare syndromes with Mendelian causes have ASD phenotypes. As of March 2011, the Online Mendelian Inheritance in Man (OMIM) database listed 6,727 known or suspected Mendelian diseases (MD), with 2,993 (44%) of these having an identified molecular basis. Since OMIM derives its data from published reports, these figures likely under-represent rare disorders, which may go unreported. It has been proposed that as many as 30,000 genetic disorders may exist, suggesting that many Mendelian disorders have no genetic etiology identified to date. Given the large-representation of autism phenotypes in known syndromes, we can assume a similar trend in unreported disease.
It remains to be determined whether rare variants will account for the majority of autisms. Irrespective, as with many other aspects of scientific inquiry, the study of rare variants will continue to play an important role in explicating the pathogenesis of ASDs. El-Fishawy and State (2010) point to hypercholesterolemia and hypertension (Brown, 1974; Lifton et al., 2001) as examples where rare mutations have been successful in driving a molecular understanding of the disease as opposed to identifying risk factors in the general population. Rare mutations, particularly when they are Mendelian, carry large effects and are typically in genic regions. These characteristics make the resolution of underlying networks distinctly less complex and, moreover, are amenable to modeling in other systems.
Recent groundbreaking studies by Marchetto et al. (2010) and Muotri et al. (2010), who created a cell culture model of Rett syndrome, are potentially exciting developments in this regard. Here, the researchers used skin biopsies from four Rett’s patients, each carrying a different MeCP2 mutation, to culture induced pluripotent stem cells (iPS). Once the iPS cells developed into neurons, they showed a decreased number of neurons and dendritic spines, consistent with neurodevelopmental disruptions. Intervention with insulin-like growth factor 1 (IGF1), which is known to regulate neurodevelopment, was subsequently shown to reverse Rett-like symptoms in a mouse model of the disease. This innovative approach is an exciting model of how rare gene approaches can stimulate our understanding of the pathophysiology and potential reversibility of ASDs.
|Syndrome||Number of Studies||Median Rate||Range %|
ASDs are clearly highly heritable disorders and advances in gene-finding technology in the past decade have rapidly accelerated gene discovery. As is typically the case, successive developments have made the problem more complex such that there are dozens of candidate genes, many of which remain to be replicated. In spite of this complexity, we can observe a number of patterns beginning to unfold 1) the relative scarcity of causal common variants, 2) the growing list of causal rare variants, and 3) the emergence of monogenic disorders with primary and secondary ASD phenotypes.
The monogenic autisms are particularly interesting from a treatment perspective, as they provide a mechanism for studying ASD phenotypes in model systems and an obvious target for drug intervention. They are also amenable to clinical testing and the decreasing cost of research technologies means that this capacity is more widely available to clinicians. In fact, as the resolution of clinical instruments becomes more sophisticated, it is likely that the clinic will become a primary workplace for syndromic discovery.
A key requirement in driving gene discovery is the necessity of high-quality phenotype data. ASDs are notoriously heterogeneous, and are fractionated in terms of symptoms and trajectory. Mandy & Skuse (2008) reviewed seven factor analysis studies of ASDs symptoms, and found that all but one dissociated social and non-social factors. In a non-clinical sample of 3,000 twin pairs, Happé et al. (2006) examined autistic-like traits and found consistently low correlations (r = 0.1-0.4) between each of the core deficits on the autism spectrum. Endophenotypes, sub-components or sub-processes of the broader phenotype, may provide a productive avenue to disentangling some of this complexity. By filtering out all but a few discrete measures, we can theoretically increase the signal-to-noise ratio in genotype-phenotype associations. A number of endophenotypes for ASDs have been identified associated with disease genes, including head circumference (associated with the HOXA1 A218G polymorphism, Conciatori et al., 2004), age at first word (associated with a quantitative trait locus on 7q35, Alarcón et al. 2005), delayed magnetoencephalography evoked responses to auditory stimuli (Roberts et al., 2010), and enhanced perception (Mottron et al., 2006). The endophenotype approach is arguably more consistent with rare-/mono-genic discovery, where a mutated network may not yield a diagnosis of autism per se, but nevertheless cause associated abnormalities. Note, this approach does not diminish the pleiotropic effects of genes involved in neurodevelopment, and only serves to make the point that the relevant genotype may associate with some but not all ASD features.
The converse, of course, is also true with a large number of candidate genes contributing to the majority of known ASDs. With ~80% of genes expressed in the brain it is likely that this number will continue to grow, and here again careful phenotyping is critical to identifying functional consequences. Ultimately, the primary goal is not to determine the frequency of variation/mutation in cases versus controls, but to determine the pathway(s) and gene networks that lead to pathology. This will be no mean feat, with other major players such as epigenetic factors, RNA regulatory elements, and environmental exposures also an important part of the equation. While daunting, the elucidation of these elements will doubtlessly take us closer to developing effective treatments for ASDs. Given the current rate of progress, we have cause for cautious optimism in this regard.