Autism Spectrum Disorders: Insights from Genomics

The field of genetics has made considerable scientific progress in the past several years and continues to evolve at a rapid pace. This progress parallels developments in genom‐ ic technology, where instrumentation and methodology are becoming increasingly so‐ phisticated and cost-effective. Here, we review recent developments in understanding autism spectrum disorder (ASD) from a genomics perspective. A large catalog of com‐ mon and rare variants has now been associated with ASD, and we are beginning to see some of these discoveries translate into pharmacogenomic intervention. This review pro‐ vides an overview of genome-wide association studies (GWAS) and common genetic var‐ iants, followed by an overview of the status of rare variant research, which have risen to prominence with the proliferation of next-generation sequencing and techniques for iden‐ tifying copy number variants. While these approaches need not be mutually exclusive, they provide a useful structure for organizing relevant genetic factors. Although there is much work to be done before these discoveries will enter the clinic, the past decade has seen us make major inroads in elucidating the causes of ASD and making tentative steps towards developing treatments.


Introduction
The field of genetics has made considerable scientific progress in the past several years and continues to evolve at a rapid pace. This progress parallels developments in genomic technology, where instrumentation and methodology are becoming increasingly sophisticated and cost-effective. Here, we review recent developments in understanding autism spectrum disorder (ASD) from a genomics perspective. A large catalog of common and rare variants has now been associated with ASD, and we are beginning to see some of these discoveries translate into pharmacogenomic intervention. This review provides an overview of genome-wide association studies (GWAS) and common genetic variants, followed by an overview of the status of rare variant research, which have risen to prominence with the proliferation of next-generation sequencing and techniques for identifying copy number variants. While these approaches need not be mutually exclusive, they provide a useful structure for organizing relevant genetic factors. Although there is much work to be done before these discoveries will enter the clinic, the past decade has seen us make major inroads in elucidating the causes of ASD and making tentative steps towards developing treatments.

Defining the autism phenotype
Autism is known to be highly heterogeneous, and this phenomenon has made definitions of the phenotype somewhat problematic. The American Psychiatric Association recently proposed revisions to its Diagnostic and Statistical Manual of Mental Disorders V (DSM-5) criteria for ASD (see Wing et al., 2011) [1], acknowledging the long-observed overlap between social and communication dimensions (previously separate). Thus, ASD will be defined by 1) persistent deficits in social communication and social interaction across contexts, and 2) restricted, repetitive patterns of behavior, interests, or activities. These should impair every-teria for autism (Rogers et al., 2001;Harris et al., 2008) [15,16]. Similarly, linkage studies have been important to identifying MECP2 as the major cause of Rett syndrome (e.g. Curtis et al., 1993) [17].
Association studies take a different approach. Rather than track transmission of specific genomic regions through generations, association studies scan the breadth of the genome. Here, the goal is to determine post-hoc whether identified variants are more or less common in affected individuals. Early association studies (i.e. pre HapMap) were complementary with the linkage approach, and in many designs, linkage primed target loci for this more fine-grained analysis.
These early insights have played an important role in shaping our current understanding of ASD. Functional studies of FMR1 and MECP2 have highlighted the importance of synaptic dysfunction (Ramocki & Zoghbi, 2008) [18] as a unifying factor that could extend into the more common forms of autism and, as discussed below, remain highly relevant to our understanding of the broader ASD phenotype.

Genome wide association and common variants
Aside from notable successes with fragile X and Rett syndrome, early linkage and association studies have been inconsistent in resolving more complex genetic correlates of ASD, and candidate genes have often not being replicated between studies. These challenges may in part be accounted-for by their relatively low resolution/coverage, making it difficult to detect candidate loci other than those of major effect. A shift in technology was required to get beyond such challenges, which was engendered by the introduction of high-resolution single nucleotide polymorphism (SNP) arrays. SNP arrays provided coverage of many thousand (now several million) common SNPs, which could be examined at a relatively low cost across large sample sets.
Genome-wide association studies (GWAS) examine the frequency of SNPs in case versus control populations, and can adopt either a case-control or family-based design. The former allows researchers to avoid the often complex process of acquiring diagnostic/phenotype data from a patient's family, and can incorporate very large numbers of control datasets that may be more readily available. The latter controls for the often confounding phenomenon of population stratification, where variants more common to specific racial groups may either be erroneously identified as causal, or obscure actual causal variants. A major caveat with family-based designs is the often unfounded assumption that unaffected family members do not share causal variants.
GWAS test for common variants (>1% population frequency), with the assumption that ASD are at least in part caused by the coinheritance of multiple risk variants, each of small individual effect (odds ratios typically between ~1:1 and ~1:5). This assumption is known as the common disease-common variant (CDCV) model (Risch & Merikangas, 1996) [20].

The 5p14.1 locus
A 2009 paper from our laboratory (Wang et al., 2009 [21]) was the first to identify common variants for ASD on a genome-wide scale. Our group examined 780 families (3,101 individuals) with affected children, a second, independent group of 1,204 affected individuals, and 6,491 controls. All were of European ancestry. We identified six genetic markers on chromosome 5 in the 5p14.1 region that confirmed susceptibility to ASD. This locus has been replicated in two additional independent cohort studies St Pourcain et al., 2010) [22,23], lending further support for 5p14 as associated with ASD risk.
As shown in Figure 1, the region straddles two genes, CDH9 and CDH10. Both genes encode type II classical cadherins, transmembrane proteins that promote cell adhesion. Cadherins represent a large family of transmembrane proteins that mediate calcium-dependent cellcell adhesion, and have been shown to generate synaptic complexity in the developing brain (Redies, Hertel & Hübner, 2012) [24]. The association of cadherins is consistent with the cortical-disconnectivity model of autism (e.g. Gepner & Féron, 2009) [25], which postulates that ASD may result from an increase or decrease in functional connectivity and neuronal synchronization in relevant neural pathways. While this hypothesis may yet be confirmed, a recent study by Kerin et al. (2012) [26] suggests a more complex mechanism to explain association between ASD and the 5p14.1 locus.
Basing their analyses on the genomic region surrounding the rs4307059 locus, the authors used a bioinformatics approach (i.e. Genome Browser) to examine relevant expressed sequence tags (ESTs) and RNA (by Tiling Array within the 100-kb linkage disequilibrium at the GWAS peak). Only one functional element-a single noncoding RNA-was located. The 3.9-kb RNA corresponded to moesin pseudogene 1 (MSNP1), and has 94% sequence identity to the mature mRNA of the protein-coding gene MSN. Located on the X chromosome (Xq11.2), MSN spans 74 kb and contains 13 exons. It produces a 4-kb mRNA, and encodes the 577-amino acid moesin protein. The noncoding RNA at 5p14.1 was encoded by the opposite (antisense) strand of MSNP1, and was therefore named moesin pseudogene 1, antisense -MSNP1AS.
Follow-up analyses by the group largely confirm that MSNP1AS is expressed in the brain, providing important functional validation. Using custom TaqMan Gene Expression assays to target the region, they showed that while MSN was widely expressed in all tissues tested, MSNP1AS was expressed variably. Sites of greatest expression were the adult temporal cerebral cortex, adult peripheral blood, and fetal heart, as well as three immortalized cell lines. Moreover, postmortem analyses (qPCR on total RNA) of fresh-frozen, superior temporal gyri of ASD-control pairs (n=10) found a 12.7-fold increase of MSNP1AS expression in the temporal cortex of individuals with ASD. Individuals with ASD also showed a 2.4-fold increase in MSN expression in the same region. Interestingly, there was no evidence of increased expression for either CDH9 or CDH10.
The group next used genotype-determined from three associated SNPs from the original Wang et al. paper-as the independent variable in expression analyses. All three SNPs, rs7704909, rs12518194, and rs4307059, have a high degree of linkage disequilibrium (LD) (r 2 >0.98). Using resequencing to compare relevant genotypes, they identified highly significant differences in MSNP1AS expression. Thus, the T/T genotype at rs7704909 corresponded to a 23.3 fold increase in MSNP1AS RNA compared to the C/C genotype. For the rs4307059, the T/T versus C/C genotype corresponded to a 22.0-fold increase in MSNP1A expression. For rs12518194, the A/A versus G/G genotype corresponded to a 10.8 fold increase in MSNP1A expression. Again, there was no evidence of increased/decreased expression differences for CDH9 or CDH10 in relation to genotype or case/control status. Although Western blot analyses did not identify significant differences in moesin protein levels between cases and controls, overexpressing MSNP1AS in human cell lines was shown to significantly reduce levels of the moesin protein. The authors speculated that relevant alterations in moesin may occur only during specific development landmarks, which may im-pact neurodevelopment. This would explain why moesin levels are not elevated in the ASD samples per se, in spite of the marked differences in MSNP1AS expression. Further work is needed to confirm this hypothesis, and quantification of moesin protein levels at key developmental stages would certainly contribute in this respect.
Taken as a whole, these results provide compelling support for 5p14.1 as a risk locus for ASD. Although sample sizes for some analyses were small (10 ASD-control pairs for postmortem studies), this quite rigorous series of experiments draws a clear path from GWAS result through functional validation. As such, these results help allay criticism of the GWAS approach as a means of candidate discovery. Thus, a 2010 review by McClellan and King (2010) [27] singled out the 5p14.1 locus as an example of the "perils of cryptic population stratification". These comments seemed somewhat misguided in the light of rigorous methodologies developed by the GWAS community for controlling stratification (e.g. EigenStrat) [28], replication [22,23], and now functional validation by the Kerin et al. group [26].
Similarly, replication/validation of the 5p14.1 locus provides an important demonstration of the legitimacy of associations in intergenic regions. Again, McClellan and King had disputed the utility of such results, questioning how "genome-wide association studies come to be populated by risk variants with no known function?" It is important to emphasize that the GWAS approach typically does not tag the disease variant, but rather its approximate location-through linkage disequilibrium, this is typically 100kb or less. Moreover, as in the Kerin et al. paper, the significant SNP may be tagging an intergenic regulatory element, which has functional consequences far beyond the associated region, in this case the MSN locus on the X-chromosome.
Finally, these expression analyses provide a reminder about the capabilities of different genomic technologies. In the past twelve months, a number of high-profile next-generation sequencing (NGS) studies have been able to examine genomic correlates of ASD with unprecedented resolution. These types of studies-reviewed in greater depth below-have been interpreted as the future of ASD genetics and, to a large extent, this may be true. However, we note that DNA sequencing in the 5p14.1 region would not have identified the noncoding RNA at this locus. Thus, although NGS platforms used for RNA-sequencing are becoming increasingly sophisticated (Ozsolak & Milos, 2011) [29], microarray studies retain a place in guiding genomic discovery.

Other replicated common variants from candidate gene studies
A number of other common variants from candidate gene studies have been proposed as ASD risk factors. These include Contactin Associated Protein 2 (CNTNAP2), located on chromosome 7q35, which has been identified as a candidate for the age at first word endophenotype (Alarcón et al., 2002) [30]. A follow-up by the same group (Alarcón et al., 2008) [31] using linkage, association, and gene-expression analyses, found CNTNAP2 to be the only autism-susceptibility gene to reach significance across all approaches. An independent linkage analysis by Arking et al. (2008) [32] also highlighted CNTNAP2 as a significant ASD candidate gene. CNTNAP2 is part of the neurexin family, which have repeatedly been associated with autism (see below). Interestingly, Vernes et al. (2008) [33] showed that CNTNAP2 binds to FOXP2, which is a well-established correlate of language and speech disorders (Lai et al., 2001) [34], and are commonly observed in ASD.
Another locus indentified by the candidate gene approach is Engrailed 2 (EN2), a homeobox gene that is critical to the development of the midbrain and cerebellum. Like other homeobox genes, it regulates morphogenesis. EN2 is a human homolog of the engrailed gene, which is found in Drosophila. En2 mouse mutants have anatomic phenotypes in the cerebellum that resemble cerebellar abnormalities reported in autistic individuals (Cheng et al., 2010) [35]. In three separate datasets, Benayed et al. (2005Benayed et al. ( , 2009 [36,37] have reported and replicated a significant association between EN2 and both broad and narrow ASD phenotypes. Wang et al. (2008) [38] also found an association between EN2 and ASD in a Chinese Han sample, although Zhong et al. (2003) [39] failed to find evidence of an underlying association.

Unexplained variance
For the most significant discovery SNP identified in the Wang et al. study above (rs4307059), the risk allele frequency was 0.65 in cases with an odds ratio of 1.19. This is comparable with common variant discoveries in other psychiatric disorders including schizophrenia (Glessner et al., 2009) [45], bipolar disorder (Ferreira, 2008) [46], and attention-deficit/hyperactivity disorder (Arcos-Burgos et al. 2010) [47]. While it is important not to undermine the significance of these findings, it should be noted that the predictive value of such ratios is relatively low   [48], often explaining less than 5% of the total risk (review at http://www.genome.gov/gwastudies). However, it is also possible that these common SNPs may be tagging a rarer causative variant (i.e. synthetic association), where the effect sizes may be markedly underestimated by the GWAS variant as we recently reported   [48]. In one example, Wang et al. (2010) [49] examined the NOD2 locus as a cause of Crohn's disease. Using resequencing data, they found that three causal variants explain > 5% of the genetic risk, where GWAS had estimated the risk at ~1%. This finding has two potentially important implications. First of all, it highlights the need for careful phenotyping of cohorts, which is important to ensure that the phenotypes produced by rare-variants are not being "filtered-out" and missed as a consequence. A long range haplotype analysis of the GWAS data at respective loci is recommended in an attempt to enrich for individuals with rare-causative variants, who can be selected from the cohort and sequenced for confirmation   [49].
Second, the results of our Crohn's disease study suggest that in certain circumstances, there may be an explicit relationship between tagged variants and underlying rare variants. Thus, the distinction between loci harboring common versus rare variants is not necessarily concrete. Indeed, the same locus may harbor both common and rare variants (Anderson et al., 2011). In recent years, we have seen an increased emphasis on the former, which is reflected in an upsurge in the number of copy number variation (CNV) and NGS studies.

Copy number variation in ASD
CNVs are insertions, deletions, or translocations in the human genome that are universal in the general population (e.g. Pinto et al., 2010) [50]. CNVs can be detected by the same SNP arrays used in GWAS, and vary in length from many megabases to 1 kilobase or smaller. They are often not associated with any observable phenotype.
One of the most widely-known CNVs is Down syndrome, which is characterized by an extra chromosome 21. Rett syndrome is also caused by a CNV, which includes a deletion in MECP2. CNVs can be inherited or occur de novo, the cause of which is thus far unknown.  [55] provided some early insights into the genomic features of CNVs in ASD. Firstly, they noted that de novo CNVs were individually rare -from 118 ASD cases, none of the identified variants were observed more than twice, with the majority seen just once. This confirmed the widely-held assumption that many different loci can contribute to the same ASD phenotype. The sheer volume of loci identified by this approach (multiple loci on 20 chromosomes) affirms the extraordinarily complexity of ASD.
A number of subsequent studies have greatly expanded the number of candidate loci using the CNV approach. Our laboratory   [56] reported 150+ CNVs in 912 ASD families that were not found in 1,488 controls. Critically, 27 of these loci were replicated in an independent cohort of 859 ASD cases and 1,051 controls. Some of the rare variants we identified had previously been associated with autism, including NRXN1 and UBE3A, (Guilmatre et al., 2009) [57]. Samaco et al. (2005) [58] previously identified significant deficits in ube3a expression in mecp2-deficient mice, suggesting a shared pathological pathway with Rett syndrome (as well as Angelman syndrome, and autism). Similarly, Kim et al. (2008) [59] associated NRXN1 with a balanced chromosomal abnormality at chromosome 2p16.3 in two unrelated ASD individuals. Rare variants in the coding region included two missense changes.   [52] identified and reported CNVs in two major gene networks, including neuronal cell adhesion molecules (such as NRXN1) and the ubiquitin gene family (such as UBE3A). Interestingly, four of the most prominent genes enriched by CNVs in ASD cases (UBE3A, PARK2, RFWD2 and FBXO40) -all of which were uncovered independently -are part of the ubiquitin gene family. Ubiquitination can alter protein function after translation, and degrade target proteins in conjunction with proteasomes. The ubiquitin-proteasome system operates at pre-and post-synapses, whose functions includes regulating neurotransmitter release, recycling synaptic vesicles in pre-synaptic terminals, and modulating changes in dendritic spines and post-synaptic density (Yi & Ehlers, 2005) [60]. As well as implicating an ASD-ubiquitination network, we also identified a second pathway involving NRXN1, CNTN4, NLGN1, and ASTN2. Genes in this group mediate neuronal cell-adhesion, and contribute to neurodevelopment by facilitating axon guidance, synapse formation and plasticity, and neuron-glial interactions. We also note that ubiquitins are involved in recycling celladhesion molecules, which is a possible mechanism by which these two networks are cross linked.
In a similar approach, Pinto et al. (2010) [50] further confirmed the importance of rare CNVs as causal factors for ASD. The group did not observe a significant difference between cases and controls in terms of raw number of CNVs or estimated CNV size. However, the number of CNVs in genic regions was significantly greater in ASD cases. Again, loci enriched for CNVs include a number of genes known to be important for neurodevelopment and synaptic plasticity, such as SHANK2, SYNGAP1, and DLGAP2. Between 5.5% and 5.7% of ASD cases have at least one de novo CNV, further confirming the significance of de novo genetic events as risk factors for autism. Similar to the Glessner study, the Pinto group mapped CNVs to a series of networks involved in the development and regulation of the central nervous system functions. Implicated networks include neuronal cell adhesion, GTPase regulation (important for signal transduction and biosynthesis), and GTPase/Ras signaling, also involved in ubiquitination. Collectively, these CNV studies suggest that certain hotspots on the genome are particularly vulnerable to ASD, which include loci on chromosomes 1q21, 3p26, 15q11-q13, 16p11, and 22q11. These hotspots are part of large gene networks that are important to neural signaling and neurodevelopment and have additionally been associated with other neuropsychiatric disorders. In particular, a number of CNV studies in schizophrenia have highlighted structural mutations incorporating chromosomes 1q21, 15q13, and 22q11 (e.g. Glessner et al., 2010) [62], which are significantly enriched in cases versus controls, with NRXN1 being a standout in this regard. From a phenotype perspective, autism and schizophrenia seem very different, both in behavioral manifestation and age of onset, and it may seem counter-intuitive that associated loci should overlap. Some authors have addressed this peculiarity by proposing that schizophrenia and autism may in fact be different poles of the same spectrum. Thus, Crespi and Braddock (2008) [63] suggest that social cognition is underdeveloped in ASD and over-developed in the psychotic spectrum, with a similar polarization of language and behavioral phenotypes. Although speculative, this hypothesis has gained some traction. In the next several years, genomic, imaging, and model-systems approaches will likely shed further light on the relationship between autism, schizophrenia and other neuropsychiatric disorders.

Sequencing familial forms of ASD
To this point, we have focused primarily on the complex interactions of polygenic networks as the major cause of ASD. However, this is not exclusively the case. Paralleling the recent spate of CNV studies is a renewed focus on rare disorders. These include familial forms of complex diseases that are potentially monogenic or with less complex inheritance pattern. At the outset of this chapter, we emphasized the overlap with fragile X syndrome, where one third of cases are co-morbid for ASD. As mentioned, fragile X is caused by a failure to express the protein coded by X-linked genes encoding neurologins NLGN3, NLGN4 and SHANK3 (a neuroligin binding partner) are other prominent examples of distinct rare genetic causes. A parallel can be drawn between these studies and studies of mental retardation and epilepsy, which include many rare syndromes that collectively account for a substantial proportion of the two disorders (Morrow et al., 2008). Indeed it is perhaps more than coincidence that autism is heavily co-morbid with these two conditions, with ~40% of ASD cases meeting diagnostic criteria for mental retardation and epilepsy respectively (Bölte et al., 2009; Danielsson et al., 2005) [7,65]. It is also noteworthy that many of these monogenic-related genes are also major players in neurodevelopment and synapse activity. Other prominent examples include TSC1, TSC2 (Osborne et al., 1991;Franz, 1998) [66,67], NF1, and UBE3A (see Morrow et al., 2008) [68].
The identification of monogenic or possibly oligogenic autisms is likely to accelerate in the next several years as NGS becomes more widely available. In our group, we recently encountered a family of two parents, six healthy siblings, and two siblings with severe autism suggestive of autosomal recessive inheritance. Unsuccessful attempts using linkage and CNV approaches failed to identify a causal locus, but whole-exome sequencing at 20x coverage identified four genes, including one with a non-synonymous SNP in the protocadherin alpha 4 isoform1 precursor (PCDHA4) gene, which presents a strong candidate gene, currently under validation. Protocadherins are part of the cadherin family that facilitates neuronal cell adhesion and this discovery is consistent with the functional properties of the PCDH family.
Known syndromes with ASD features include fragile X, neurofibromatosis type 1, down syndrome, tuberous sclerosis, neurofibromatosis (which confers a 100-fold increased risk for ASD Li et al. (2005) [69], Angelman, Prader-Willi and related 15q syndromes, and at least several dozen others (see Zafeiriou et al., 2007, for a comprehensive review) [70]. Table 1 from Volkmar et al. (2005) [71] lists the most commonly associated syndromes with median rate and range. It is likely that many more unidentified rare syndromes with Mendelian causes have ASD phenotypes. As of September 2012, the Online Mendelian Inheritance in Man (OMIM) database listed over 7,000 known or suspected Mendelian diseases (MD), with 3,500 (~50%) of these having an identified molecular basis (http://omim.org/statistics/ entry). Since OMIM derives its data from published data, these figures likely under-represent rare disorders, which may go unreported. As such, there may be several times more Mendelian disorders that have no defined genetic etiology to date. Given the large-representation of autism phenotypes in known syndromes, we can assume a similar trend in unreported ASD syndromes.
The proportion of ASD accounted for by rare variants remains to be determined. Irrespective, as with many other aspects of scientific inquiry, the study of these events will continue to play an important role in explicating the pathogenesis of ASD. El-Fishawy and State (2010) [72] point to hypercholesterolemia and hypertension (Brown, 1974;Lifton et al., 2001) [73,74] as examples where rare mutations have been successful in driving a molecular understanding of the disease as opposed to identifying risk factors in the general population. Rare mutations, particularly when they are Mendelian, carry large effects and are typically located in genic regions. These characteristics make the resolution of underlying networks distinctly less complex and, moreover, are amenable to modeling in other systems.
Recent groundbreaking studies by Marchetto et al. (2010) [75] and Muotri et al. (2010) [76], who created a cell culture model of Rett syndrome, are potentially exciting developments in this regard. Here, the researchers used skin biopsies from four Rett syndrome patients, each carrying a different MECP2 mutation, to culture induced pluripotent stem cells (iPS). Once the iPS cells developed into neurons, they showed a decreased number of neurons and dendritic spines, consistent with neurodevelopmental disruptions. Intervention with insulinlike growth factor 1 (IGF1), which is known to regulate neurodevelopment, was subsequently shown to reverse Rett-like symptoms in a mouse model of the disease. This innovative approach is an exciting model of how rare gene approaches can stimulate our understanding of the pathophysiology and potential reversibility of ASD.

Large-scale next-generation sequencing
In April 2012, Nature simultaneously published three papers that used exome sequencing to probe genomic correlates of ASD. This represented something of a landmark for both ASD and NGS research, as it demonstrated the viability of NGS on a large scale -the three studies combined examined 600 trios (parents and offspring), plus a 935 further ASD cases. Col-lectively, these papers suggest that several hundred or more genes may be considered autism candidates, and again highlight the staggering complexity of the phenotype.
O'Roak et al. (2012) [77] sequenced 677 individual exomes from 209 families -primarily from the Simons Simplex Collection [78]. In 189 new probands, they validated 120 severely disruptive de novo mutations, 39% of which occur in a highly interconnected b-catenin/chromatin remodeling protein network. The group observed a strong paternal bias (41:10) in the rate of de novo mutations, which supports the hypothesis that the germline mutation rate in coding regions is markedly more prominent among males. These de novo events were more common in older fathers, marking paternal age as a significant risk factor for ASD.
Among the identified de novo loci, 62 were identified as top candidate mutations based on severity and/or supporting evidence from the literature. Interestingly, probands with these mutations were broadly distributed in terms of IQ score, with only a modest (non-significant) association with intellectual impairment. Recurrent protein-disruptive mutations were identified in two genes: netrin G1 (NTNG1) and chromodomain helicase DNA binding protein 8 (CHD8). NTNG1 is known to play a role in axon guidance and dendritic organization (Nishimura-Akiyoshi et al., 2007) [79]. CHD8 regulates β-catenin and p53 signaling, and has not previously been associated with ASD. This gene was emphasized as particularly noteworthy, after follow-up protein-protein interaction (PPI) analyses, showed that β-catenin and p53 signaling may be features of an ASD-relevant network. In total 49 of proteins in the PPI network were highly interconnected, with a number of underlying genes also previously associated with neurodevelopment.
Neale et al. (2012) [80] exome-sequenced 175 trios and also focused on de novo mutations. As per the O'Roak study, there was a correlation between paternal age and de novo events for offspring (P<0.0001), and also for maternal age (P=0.000365). Across the sample set, the group observed 161 point mutations, of which 101 were missense, 50 silent, and 10 nonsense. Two conserved splice site rare single nucleotide variants and six frameshift insertions/deletions (indels) were also observed. Three genes were found to harbor two de novo mutations: BRCA2 (two missense), FAT1 (two missense) and KCNMA1 (one missense, one silent).
The group next performed PPI analyses to determine whether interactions between genes associated with de novo mutations, as well as existing ASD candidates, was of etiological importance. This pathway approach, which additionally incorporated data from Sanders et al. study (below) [81], found that the distribution of functional de novo mutations is not random. The average distance for non-synonymous variants was significantly larger for controls versus cases (3.78 vs. 3.66; P=.033). This suggests that a proportion of these de novo events contribute to autism. A model whereby de novo variants in up to 20% of cases, confer a 10-to 20-fold increased risk was supported.
In the third of these Nature papers, Sanders et al. (2012) [81] performed exome sequencing on 238 families, including 200 quartets (parents, 1 affected and 1 unaffected sibling) from the Simons Simplex Collection [78]. Comparing de novo non-synonymous single nucleotide variants (SNVs) between affected and unaffected siblings, the group observed a significantly (P=.01) higher proportion among the probands (125 total) versus their unaffected sibling (87 total). From simulations, the authors concluded that two or more de novo nonsense/splicesite mutations should be considered significant. The gene sodium channel, voltage-gated, type II, α subunit gene (SCN2A) was the only such gene -with two ASD individuals found to harbor relevant nonsense mutations. Mutations in SCN2A have been associated with epilepsy (Kamiya et al., 2004;Ogiwara et al., 2009) [82,83] and idiopathic ASD in multiplex families (Weiss et al., 2003) [84]. Neither of the probands has a history of seizures.
Combining It is important to note, however, that for de novo events in general, there was no evidence to support the hypothesis that multiple events in any individual conferred an increased risk of ASD. As such, the 'two de novo hit' hypothesis is not supported.
In a fourth independent exome sequencing study involving 343 families from the Simons Simplex Collection Iossifov et al. (2012) [85] also reported a relatively equal distribution of de novo mutations in cases and controls. Again however, loss-of-function mutations-nonsense, splice site, and frame shifts-were more common in individuals with ASD (59 versus 28). Of the 59 "likely gene disruptions (LGD)" in ASD cases, none occurred more than once, although two-NRXN1 and PHF2-had been identified in a previous CNV study by the same group (Gilman et al, 2011) [86]. Intriguingly, the 59-strong LGD shared considerable overlap with a set of 842 proteins that interact with the fragile X protein, FMRP. In total, 14 of the 59 appeared on the FMRP list (P=.006). Furthermore, 13 of 72 CNV candidates from the group's previous CNV paper were also on the list (P=.0004), meaning 26 of the combined 129 total were FMRP-related (P<1x10 -13 ).
The authors subsequently screened for de novo mutations in upstream targets of FMR1. One was identified -a deletion in GRM5 that removes a single amino acid and causes an additional substitution at the same site. GRM5 encodes the glutamate receptor mGluR5 (Bear et al, 2004) [87] and, as noted below, mGluR5 antagonists are currently in clinical trial (Jacquemont et al., 2011) [88] having indicated success in mouse models (Dölen et al., 2007) [89]. Further elucidating the relationship between FMR1/FRMRP and these ASD candidates is clearly an important next step in maximizing the impact of these findings. These are discussed further in the section below.
Collectively, all four of these exome sequencing studies converge upon the conclusion that ASD is highly heterogeneous, with several hundred or more loci potential risk variants. Simulations by the Neale et al. group confirm the statistical implausibility that hundreds of variants with high penetrance are possible, and a model where de novo variants in up to 20% of cases, confer ~10-to 20-fold increased risk is supported. The studies also converge on the conclusion that paternal age (and possibly maternal age) is a significant ASD risk factor, but the frequency and size of de novo mutations per se is not. Evidence for three candidate genes -CHD8, KATNAL2, and SCN2A-would seem quite strong, though further functional studies are needed to help define pathogenesis. Perhaps most exciting is the association between GRM5 and existing/novel candidates. As we have learned from GWAS, larger sample sets are clearly needed to fully harness the power of NGS in relation to such a complex phenotype. While these studies have been important in proposing novel candidates and confirming existing hypotheses of ASD, we await with anticipation results from the sequencing of all 2,648 families from the Simon Simplex Collection.

Toward a treatment?
Ultimately, the primary goal of genome research should be to propose targets for intervention. As mentioned above, a number of translational studies have begun to probe the metabotropic glutamate receptor, mGluR5, as a potential target for fragile X syndrome treatment. These studies have a theoretical basis in the hypothesis that protein-synthesis-dependent functions of metabotropic receptors are exaggerated in fragile X syndrome (Bear, Huber & Warren, 2004) [87]. Thus, the fragile X protein, FMRP, is thought to work in functional opposition to mGluR5 (and mGluR1). Where FMRP is absent, mGluR-dependent protein synthesis becomes over-activated, resulting in neurological and behavioral abnormalities.  [96] recently showed that idiopathic autism cases may have higher burden of mGluR5 variants. The group found that in 209 idiopathic cases, there was significant enrichment for rare functional variants in the mGluR5 pathway-namely the genes TSC1, TSC2 and SHANK3, and HOMER1-relative to controls (n=300). It is likely that drugs targeting the mGluR5 pathway, if/when approved for fragile X syndrome, will lead to human clinical trials for ASD. This translational approach -which delineates a direct route from gene discovery, through functional validation to treatment, is clearly the blueprint by which genome research can have tangible clinical impact.

Conclusions
ASD are clearly highly heritable disorders and advances in gene-finding technology in the past decade have rapidly accelerated gene discovery. As is typically the case, successive developments have made the problem more complex such that there are huge numbers of candidate genes, most of which remain to be replicated. In spite of this complexity, we can observe a number of patterns beginning to unfold 1) the relative scarcity of causal common variants, 2) the growing list of causal rare variants, and 3) the emergence of monogenic disorders with primary and secondary ASD phenotypes.
The monogenic autisms are particularly interesting from a treatment perspective, as they provide a mechanism for studying ASD phenotypes in model systems and are an obvious target for drug intervention. They are also amenable to clinical testing and the decreasing cost of research technologies means that this capacity is more widely available to clinicians. In fact, as the resolution of clinical instruments becomes more sophisticated, it is likely that the clinic will become a primary workplace for syndromic discovery.
A key requirement in driving gene discovery is the necessity of high-quality phenotype data. ASDs are notoriously heterogeneous, and are fractionated in terms of symptoms and trajectory. Mandy & Skuse (2008) [97] reviewed seven factor analysis studies of ASD symptoms, and found that all but one dissociated social and non-social factors. In a nonclinical sample of 3,000 twin pairs, Happé et al. (2006) [98] examined autistic-like traits and found consistently low correlations (r = 0.1-0.4) between each of the core deficits on the autism spectrum. Endophenotypes, sub-components or sub-processes of the broader phenotype, may provide a productive avenue to disentangling some of this complexity. By filtering out all but a few discrete measures, we can theoretically increase the signal-to-noise ratio in genotype-phenotype associations.  [102]. The endophenotype approach is arguably more consistent with rare-/mono-genic discovery, where a mutated network may not yield a diagnosis of autism per se, but nevertheless cause associated abnormalities. Note, this approach does not diminish the pleiotropic effects of genes involved in neurodevelopment, and only serves to make the point that the relevant genotype may associate with some but not all ASD features.
The converse, of course, is also true, as a large number of candidate genes contribute to the majority of known ASD. With ~80% of genes expressed in the brain it is likely that this num-ber will continue to grow, and here again careful phenotyping is critical to identifying functional consequences. Ultimately, the primary goal is not to determine the frequency of variation/mutation in cases versus controls, but to determine the pathway(s) and gene networks that lead to pathology. We will also need to identify other major biological players such as epigenetic factors, RNA regulatory elements, and environmental exposures, which are critical components of the ASD equation. While daunting, the elucidation of these elements will doubtlessly take us closer to developing effective treatments for ASD. Given the current rate of progress, we have cause for cautious optimism in this regard.

Author details
John