Summary of En2 expression and function from animal studies
Our autism research has focused on the homeobox transcription factor, ENGRAILED 2 (EN2). Prior to the advent of genome wide association and re-sequencing analysis, we selected EN2 as a candidate gene due to neuroanatomical similarities observed between individuals with autism and mouse En2 mutants.
Animal studies have demonstrated that En2 is expressed throughout CNS development and regulates numerous cell biological processes implicated in ASD including connectivity, excitatory/inhibitory (E/I) circuit balance, and neurotransmitter development. The relevance of these functions to ASD etiology is discussed.
Human genetic analysis by us determined that two intronic SNPs, rs1861972 and rs1861973, are significantly associated with Autism Spectrum Disorder (ASD). We observed the common haplotype (rs1861972-rs1861973 A-C) is over-transmitted to affected individuals while the rs1816972-rs1861973 G-T haplotype is over-represented in unaffected siblings. Significant results were observed in 3 datasets (518 families, 2336 individuals, P=.00000035). 6 other groups have also reported association of EN2 with ASD, suggesting that EN2 is an ASD susceptibility gene. These results are discussed.
However if EN2 contributes to ASD risk, we would expect the ASD-associated A-C haplotype to segregate with a polymorphism that is functional and affects either the regulation or activity of EN2. Linkage disequilibrium mapping, re-sequencing and additional association analysis was performed, and identified the A-C haplotype as the best candidate for functional analysis. Luciferase assays conducted in primary mouse neuronal cultures demonstrated that the A-C haplotype functions as a transcriptional activator and specifically binds a protein complex. Transgenic mouse studies have demonstrated that the A-C haplotype is also functional, increasing gene expression in vivo. Finally, human post-mortem studies indicate EN2 levels are also increased in individuals with autism. Thus, the ASD-associated A-C haplotype is functional and increased EN2 levels are consistently correlated with ASD.
Six significant CpG islands also flank human EN2. Preliminary studies indicate hypomethylation of these CpGs can also result in increased EN2 levels, suggesting epigenetic alterations influenced by non-genetic environmental factors can affect EN2 levels. To study how genetic and epigenetic changes may function together to influence EN2 regulation and CNS development, we are creating a chromosomal engineered knock-in that will replace ~75kb of mouse En2 with the human gene.
In summary EN2 is consistently associated with ASD and functions in developmental pathways implicated in ASD. In addition, we have shown that the ASD-associated haplotype is functional, resulting in increased expression both in neuronal cultures in vitro and in transgenic mice in vivo. Increased levels are also observed in human post-mortem samples. Together these human genetic data along with our molecular, mouse and post-mortem studies indicate that EN2 is an ASD susceptibility gene
2. Selection of ENGRAILED 2 as a candidate gene
Before genome-wide strategies were available for identifying common and rare variants for ASD, my laboratory decided to test candidate genes based upon neuroanatomical phenotypes. When we started this work in 2003, two cerebellar neuroanatomical phenotypes were consistently observed in individuals with ASD: a decrease in cerebellar volume (hypoplasia) and fewer Purkinje neurons (Bauman and Kemper 1985; Bauman 1986; Courchesne, Yeung-Courchesne et al. 1988; Courchesne 1997; Amaral, Schumann et al. 2008). We knew of numerous mouse mutants that displayed similar morphological phenotypes so we decided to test these genes for association in the available Autism Genetic Resource Exchange (AGRE) dataset. A list of nearly 100 genes were compiled that displayed similar cerebellar phenotypes in the mouse and individuals with ASD. The list also included genes that at the time were expressed in the cerebellum in specific spatial-temporal patterns suggesting they were likely to contribute to development. These genes were then placed on the human genome to determine which ones mapped near polymorphic markers that displayed linkage to ASD.
Many of the genes mapped to possibly interesting locations so we prioritized our association analysis by the following criteria: i) distance to SSLP marker, ii) LOD score or statistical significance of marker, iii) whether segregation or linkage to the chromosomal region had been replicated in multiple studies, iv) whether the genomic region displayed linkage in the AGRE dataset which would be used for our association analysis, v) whether mouse mutants existed for the gene, vi) and the similarity between reported mouse and ASD cerebellar phenotypes
Based on these criteria we selected the homeobox transcription factor ENGRAILED 2 (EN2) as a candidate gene. EN2 belongs to a class of transcription factors that are homologous in their DNA binding domain called the homeobox. Homeobox transcription factors regulate gene expression by binding to AT-rich DNA elements, and play central roles in coordinating development. Many homeobox genes are evolutionarily conserved from Drosophila to humans. The engrailed gene was first identified in classical genetic screens for developmental regulators in Drosophila. Humans and mice have two Engrailed genes, Engrailed 1 (En1) and Engrailed 2 (En2). Both En1 and En2 regulate important aspects of CNS development (see Section 4 – ENGRAILED 2 function)
Human EN2 maps to distal chromosome 7 (7q36.3), near markers that display linkage to ASD in several datasets (Liu, Nyholt et al. 2001; Alarcon, Cantor et al. 2002; Auranen, Vanhala et al. 2002). Two of these studies had been performed using AGRE families. In addition two different En2 mouse mutations existed – a traditional knock-out or deletion of En2, and a transgenic misexpression mutant. In the knockout the cerebellum is reduced in size and cell counts have determined an ~30-40% reduction in all the major cerebellar cell types including Purkinje cells (Millen, Wurst et al. 1994; Kuemerle, Zanjani et al. 1997). In the trangenic En2 is misexpressed in a subset of Purkinje cells and similar phenotypes were observed (40-50% reduction in cerebellar area; ~40% decrease in the number of adult Purkinje cells)(Baader, Sanlioglu et al. 1998).
Significant association of EN2 with ASD was initially demonstrated by us and has now been reported by 5 additional groups (Brune, Korvatska et al. 2007; Wang, Jia et al. 2008; Yang, Lung et al. 2008; Sen, Singh et al. 2010; Yang, Shu et al. 2010). Prior to summarizing these data, we will first describe the known expression of mouse and human EN2 as well as the cell biological processes regulated by En2 in the developing and adult brain.
3. Engrailed 2 expression during development
Mouse En2 expression has been evaluated primarily by in situ hybridization and lacZ knock-in mice (see Table 1 for summary). In these studies En2 expression is initiated at E8.0 at the junction between the midbrain and hindbrain. En2 continues to be expressed in a majority of mid-hindbrain cells from E8.5 to E12.5. These En2 expressing cells will generate the cerebellum and midbrain colliculi dorsally, as well as parts of the serotonin (raphe nucleus) and norepinephrine (locus coeruleus) neurotransmitter systems ventrally. By E17.5 En2 expression becomes more spatially restricted. In the chick tectum En2 is expressed in a rostral to caudal gradient, while in the cerebellum it is stripe-like. By post-natal day 6 En2 transcripts are restricted to the differentiating cells in the external germinal layer and developing inner granule cell layer of the cerebellum. In the adult En2 continues to be expressed in mature cerebellar granule cells. Finally, QRTPCR studies indicate En2 is also expressed at low levels in adult hippocampus.
|E8.0-E12.5||Mid-hindbrain junction||A-P patterning,|
colliculi, ventral mid-
hindbrain nuclei including
LC and RN, periaqueductal
|Cell cycle and differentiation|
|Adult||Mature granule cells||Unknown|
A limited number of human ENGRAILED 2 expression studies have been performed. One analysis conducted on 18-21 weeks post-conception fetuses demonstrated widespread expression for both ENGRAILED 1 and 2 genes throughout the mid-hindbrain region including the cerebellar cortex and deep nuclei. Expression was also observed in several ventral hindbrain nuclei (inferior olive, arcuate nucleus, caudal raphe nucleus)(Zec, Rowitch et al. 1997). Western blot analysis conducted on cerebellar samples at later gestational ages (40 weeks) indicated abundant expression for both EN proteins (Logan, Hanks et al. 1992). Interestingly, recent microarray analysis performed by The Allen Institute for Brain Science demonstrates abundant expression throughout the cerebellum (cortex and deep nuclei) but also in numerous forebrain and midbrain structures (basal ganglia, amygdala, thalamus)(Figure 1). A complete developmental analysis of human EN2 expression has not been reported. These data suggest human adult brain EN2 expression is more widespread than mouse En2, and in fore- and mid-brain structures relevant to ASD phenotypes.
4. ENGRAILED 2 function
Molecular studies have determined that En2 functions as a transcriptional repressor. The protein regulates numerous cell biological pathways during CNS development but has a well-characterized function in establishing connectivity maps. Emerging data also supports En2 function in E/I circuit balance as well as serotonin and norepinephrine neurotransmitter development. All of these cellular processes have been implicated in ASD etiology.
4.1. Transcriptional repressor function of En2
Molecular studies indicate the Engrailed 2 protein primarily functions as a transcriptional repressor, which is mediated by several different protein domains (Figure 2). DNA binding occurs through the homeodomain to a generic AT rich cis-sequence recognized by homeobox transcription factors. Two domains (engrailed homology region 1 (EH1) and EH5) contribute to Engrailed repressor activity. EH1 is located in the N-terminal portion of the protein while the EH5 domain is immediately 3’ of the homeodomain in the C terminal portion of the protein. Both domains bind the co-repressor Groucho, while EH1 is sufficient to confer repression activity when transferred to a transcriptional activator. Engrailed repressor function is mediated by two different mechanisms. The protein can actively block the trans-activation of activators by binding to nearby cis-sequences. Alternatively, the engrailed proteins compete for the binding of the basal transcriptional machinery to TATA box sequences (Ohkuma, Horikoshi et al. 1990; Jaynes and O'Farrell 1991; Tolkunova, Fujioka et al. 1998). Finally, two other domains (EH2 and EH3) bind the Pbx family of homeodomain transcription factors, which affect DNA biding specificity (van Dijk and Murre 1994; Peltenburg and Murre 1997).
4.2. En2 regulates mid-hindbrain patterning
Mouse and chick studies have determined that En2 coordinates multiple cell biological process throughout development. From E8.0-E12.5, En2 and En1 are spatially overlapping at the mid-hindbrain junction and both genes function to restrict progenitors to a midbrain and hindbrain lineage (Joyner 1996). En2 temporal expression commences a few hours after En1 transcripts are first detected and because of this difference, the En1 knock-out mouse displays a more severe phenotype with a deletion of mid-hindbrain structures (Wurst, Auerbach et al. 1994). Knock-in experiments where En2 is targeted to the En1 locus are sufficient to rescue this phenotype, demonstrating that En2 is functionally redundant to En1 at this early stage of development (Hanks, Wurst et al. 1995).
4.3. Engrailed genes and 5HT and NE neurotransmitter system development
Previous studies have demonstrated that the Engrailed genes are important in the development and maintenance of substantia nigra neurons in the dopamine neurotransmitter system. These data are reviewed elsewhere (Simon, Saueressig et al. 2001; Alberi, Sgado et al. 2004; Simon, Thuret et al. 2004; Gherbassi and Simon 2006; Sgado, Alberi et al. 2006). Instead we focus on the role of the En genes on serotonin (5HT) and norepinephrine (NE) development, since abnormalities in these neurotransmitter systems have been more consistently implicated in ASD.
Mutations in the Engrailed genes affect the development of ventral mid-hindbrain nuclei that synthesize NE and 5HT: the locus coeruleus (LC) and raphe nuclei (RN) respectively. The LC is generated early in development (E9-E10 in the mouse) from the dorsal mid-hindbrain junction. The LC is deleted in the double En1-/- En2-/- knockout mice but appears relatively normal in the single knockouts suggesting the genes compensate for each other during development. The RN is generated in the ventral mid-hindbrain and express 5HT by E11.5. Several transcription factors including Pet1, Lmx1b and Gata3 are important in the generation of RN. Recent analysis indicates that both En genes are expressed in the progenitors of RN at E11.5 and to continue to be expressed in post-mitotic rostral 5HT neurons. In addition an ~50% loss of neurons is observed in the dorsal RN by E16.5 in the double En knockouts. Like the LC phenotype the RN is relatively normal in the single knockouts suggesting the genes compensate for each other during development (Simon, Saueressig et al. 2001; Simon, Scholz et al. 2005; Sgado, Alberi et al. 2006; Fox 2010). Neurochemical data from our collaborator, Emanuel DiCicco-Bloom MD, have demonstrated abnormal levels of NE and 5HT in both the fore- and hindbrain structures of the En2 knockout (Lin 2010). These data indicate that the development of the 5HT and NE neurotransmitter systems are regulated by the Engrailed proteins.
Numerous studies have implicated the 5HT and NE pathways in ASD. The 5HT pathway regulates mood, eating, body temperature and arousal, some of which are often perturbed in individuals with ASD. Abnormalities in the 5HT pathway have been consistently observed in individuals with ASD. Blood platelet hyperserotonemia has been reported since the 1960s in ~30% of affected individuals (Ritvo, Yuwiler et al. 1970; Campbell, Friedman et al. 1975; Takahashi, Kanai et al. 1976; Anderson 1987; Anderson, Freedman et al. 1987; McBride, Anderson et al. 1989; Cook, Rowlett et al. 1992; Lam, Aman et al. 2006). However, several studies suggest 5HT functioning is depressed in the CNS of individuals with autism. For example, serotonin reuptake inhibitors (SSRIs) can improve some of the symptoms of ASD (Cook, Rowlett et al. 1992; Gordon, State et al. 1993). In addition, the rate-limiting step of 5HT synthesis is the hydroxylation of tryptophan and acute depletion of tryptophan worsens ASD symptoms (McDougle, Naylor et al. 1996). The NE neurotransmitter system regulates attention, stress, anxiety, and memory, some of which are also affected in individuals with ASD. Unlike the 5HT system, the peripheral and central NE systems are tightly coordinated. Five studies have revealed increases in NE in the blood (Lake, Ziegler et al. 1977; Launay, Bursztejn et al. 1987; Leventhal, Cook et al. 1990; Leboyer, Bouvard et al. 1992; Minderaa, Anderson et al. 1994). However since plasma NE has a very short half-life, it remains possible that this increase is due to arousal at the time of blood drawing.
4.4. En2 regulates connectivity
From E15.5-P0, En2 is expressed in a stripe-like pattern in the cerebellum. En2 is one of many patterning genes that are expressed in this stripe-like pattern at this age (En1, Shh, Pax2 and Wnt7b)(Millen, Hui et al. 1995). Interestingly, these stripe-like expression domains are coincident with the innervation of cerebellar afferents (mossy and climbing fibers), suggesting that these patterning genes regulate the topographic mapping of axons. Consistent with this possibility, En2 mouse mutants display connectivity phenotypes disrupting the innervation of mossy fibers (Herrup and Kuemerle 1997; Baader, Sanlioglu et al. 1998; Baader, Vogel et al. 1999; Sillitoe, Stephen et al. 2008; Sillitoe, Gopal et al. 2009; Sillitoe, Vogel et al. 2010). Thus En2 is important in establishing the cerebellar connectivity map during development.
Several studies indicate the Engrailed proteins are secreted and function as axon guidance proteins for retinal-tectal mapping. Initial EM and protein studies from the Prochiantz group indicated that a subset of the Engrailed proteins are associated with caveolae-like vesicles (Joliot, Trembleau et al. 1997). Subsequent work demonstrated that ~5% of the Engrailed protein are secreted and they are internalized by neighboring cells. A protein sequence embedded in the homeodomain called the penetratin domain is responsible for this activity (Joliot, Maizel et al. 1998). In addition, in vitro cultures demonstrated that exogenous En2 acts as a guidance cue for isolated retinal axons transected from the nucleus. Imaging studies indicate En2 is endocytosed by these growth cones. The protein then interacts with the eukaryotic initiation factor 4E (eIF4E), and En2 mutations that prevent eIF4E interaction fail to cause axon turning. En2 also results in the phosphorylation of eIF4E and its binding protein, 4E-BP1, in axons, which is typically associated with translation initiation (Brunet, Weinl et al. 2005). Recent antibody experiments that block exogenous activity cause significant connectivity defects in the tectum (Wizenmann, Brunet et al. 2009). Interestingly, several other developmentally important transcription factors (Pax6, Otx2) also display non-cell autonomous phenotypes (Lesaffre, Joliot et al. 2007; Sugiyama, Di Nardo et al. 2008), suggesting this phenomenon is not specific to the Engrailed genes.
Thus, a small proportion of the Engrailed 2 protein is secreted and is important in regulating connectivity through local translation. The FMR protein, which is mutated in Fragile X Syndrome (FXS), also regulates local synaptic translation. Approximately one-third of individuals with FXS are diagnosed with ASD, suggesting synaptic translation defects could contribute to ASD etiology.
En2 transcripts are also observed at low levels in the adult hippocampus. En2 knock-out studies revealed a decrease in the number of inhibitory GABA interneurons in the CA3 pyramidal layer and stratum lacunosum moleculare of the adult hippocampus. The knock-out mice also display an increase in the susceptibility of kainic acid-induced seizures. These data suggest an imbalance in excitatory/inhibitory (E/I) connectivity, which has been postulated to be a contributing factor to ASD etiology (Tripathi, Sgado et al. 2009).
Post-natally, En2 is expressed in differentiating and mature granule cells. Studies by Emanuel DiCicco-Bloom’s group demonstrated that En2 functions to promote cell cycle exit and differentiation in developing granule cells (Rossman 2008). The function of En2 in mature adult granule cells has not been investigated but it is likely to regulate the expression of genes needed for synaptic plasticity and other mature neuronal functions.
In summary although EN2 was initially selected as a candidate gene based upon similar cerebellar neuroanatomical phenotypes, En2 coordinates multiple developmental processes. In particular the protein plays an important role in regulating connectivity and neurotransmitter system during CNS development, both of which are relevant to ASD etiology.
5. Engrailed 2 genetic analysis
5.1. rs1861972-rs1861973 association in AGRE and NIMH datasets
Human EN2 is encoded by two exons in ~8.5kb. In collaboration with Linda Brzustowicz’s group at Rutgers University, association analysis was initially performed in 167 Autism Genetic Resource Exchange families (AGRE I dataset- 745 individuals). Positive association with ASD was observed for the common alleles of two intronic SNPs, rs1861972 and rs1816973. Significant association was detected under a narrow (autism) and broad (ASD) diagnosis for both SNPs individually and as a haplotype (A-C rs1861972-rs1861973)(Table 2)(Gharani, Benayed et al. 2004). These results were then replicated in two additional datasets (AGREII –222 families, 1102 individuals; NIMH – 129 families, 566 individuals)(Table 2). When all three datasets were combined (518 families, 2413 individuals) more significant results were observed (Table 2)(Benayed, Gharani et al. 2005).
Many factors may contribute to the lack of replication in association studies of complex genetic traits. These include inadequate statistical power, the intrinsic complexity of a disease such as unknown gene-gene and gene-environment interactions as well as locus and allelic heterogeneity in different datasets. Given these limitations, replication of rs1861972 and rs1861973 association supports EN2 as an ASD susceptibility gene.
Risk for the haplotype was then determined. Individual relative risk (RR) estimates the risk the haplotype confers to a given individual, and is calculated by the degree to which the haplotype is over-transmitted from heterozygous parents to affected children. Population attributable risk (PAR) estimates the risk of the haplotype to the general population and takes into account the degree of over-transmission and frequency of the haplotype. For the 518 families individual RR was estimated as approximately 1.42 and 1.40 under the narrow and broad diagnosis respectively. Because the frequency of the rs1861972-rs1861973 A-C haplotype is ~67% in the combined sample, this modest individual RR corresponds to a significant PAR of ~39.5% and 38% for the narrow and broad diagnosis of ASD respectively (see Benayed et al 2005 for more details). These data imply that as much as 40% of ASD cases in the population are influenced by the risk allele responsible for rs1861972 and rs1861973 association
|P value||P value||P value||P value|
5.2. Additional EN2 association studies
Prior to our association analysis for EN2, a case-control study was performed using 100 control and affected individuals from Western/central France. Significant association was observed for a PvuII RFLP that we later mapped to ~2.5kb 5’ of the promoter (rs34808376)(Petit, Herault et al. 1995; Benayed, Gharani et al. 2005). Since our association analysis, 5 separate studies have reported positive results for rs1861972 or rs1861973 either individually or as part of a haplotype (Brune, Korvatska et al. 2007; Wang, Jia et al. 2008; Yang, Lung et al. 2008; Sen, Singh et al. 2010; Yang, Shu et al. 2010). These studies were performed in datasets recruited by the authors and represent various ethnicities (Northern/Western European, Chinese, Indian). However differences have also been observed. Additional polymorphisms have been reported to be associated and the allele for rs1861972 and rs1861973 that is over-transmitted to affected individuals can vary. These results are summarized in Table 3. These differences could reflect variations in LD blocks for the different ethnicities. It is also possible that different risk alleles exist in various populations.
|Petit et al||Western/central French||200||PvuII||CG|
|Brune at al||Primarily Western/|
|Wang et al||Chinese||630||rs3824068||A|
|Yang et al (2008)||Chinese||502||rs1861973|
|Yang et al (2010)||Chinese||551||rs1861972|
|Sen et al||Indian||281||rs1861973||C|
In summary EN2 association with ASD has been reported by 7 different groups. These data are consistent with EN2 being an ASD susceptibility gene. However if EN2 contributes to ASD risk, then we would expect these genetic associations to be due to the co-inheritance of an allele that affects either the regulation or activity of EN2. The identification of an associated allele that is also functional would provide additional support for EN2 being an ASD susceptibility gene.
5.3. EN2 LD mapping and re-sequencing analysis
The next step in our analysis was to identify candidate common risk alleles by performing linkage disequilibrium (LD) mapping. LD indicates the degree to which alleles in the human population segregate with each other. Two measures for LD are commonly used: D’ and r2. D’ takes into account recombination rate while r2 includes recombination rate and the frequency of the alleles in the population. For common risk alleles responsible for rs1861972-rs1861973 association, we expected candidates to display the following criteria:
Candidates must display strong LD (D’ and r2 >.75) with rs1861972 and rs1861973
Candidates must be consistently associated with ASD
LD mapping was then performed for 24 additional polymorphisms that were situated throughout the EN2 gene (Figure 3). These polymorphisms were typed in the AGRE I dataset and we found that only the intronic SNPs were in significant LD (D’ >0.72) with rs1861972 and rs1861973. We then re-sequenced the intron from individuals with ASD that had inherited the A-C haplotype from at least one heterozygous parent. This identified only 1 additional polymorphism (rs28999108). Rs2899108 has a minor allele frequency of 1%, indicating that additional more common polymorphisms are likely not to be identified and ss38341503 does not fit the criteria of a common risk allele. Association analysis of all intronic SNPs demonstrated that none of them were as consistently or significantly associated as the rs1861972-rs1861973 A-C haplotype (Benayed, Gharani et al. 2005; Benayed, Choi et al. 2009).
However, it was equally possible that rs1861972 and rs1861973 was in strong LD with a polymorphisms situated further 5’ or 3’ of EN2 that was not tested for association. If this were the case, we would expect these flanking SNPs to be in strong LD with rs1861972 or rs1861973 and therefore display r2 values similar to.767 that is observed between rs1861972 and rs1861973. To identify other polymorphisms that fit these criteria, publicly available
Hapmap data was analyzed. The Hapmap project determined the LD relationship of over 1 x 106 SNPs in four human populations (CEU– Utah residents with ancestry from northern and western Europe; JPT- Tokyo, Japan; CHB- Han Chinese Beijing, China; YRI-Yoruba in Ibadan, Nigeria). r2 and D’ values were first examined for 4 SNPs (rs1861973, rs1861973, rs6460013 and rs1861958) typed in both the Hapmap and ASD datasets. The values were found to be nearly identical, justifying this approach to identify candidate risk allele. The inter-marker Hapmap r2 values with rs1861973 were then determined in all four Hapmap datasets for SNPs within 2 Mb of EN2 (1Mb 5’ and 1 Mb 3’). Because 70.3% of the AGRE datasets tested for association were of Northern/Western European descent, the CEU Hapmap data were analyzed first and all SNPs within the 2 Mb region were found to be in weak r2 with rs1861973 (r2<.370). Similar results were observed for the other datasets (Benayed, Choi et al. 2009). These data identified the A-C haplotype as the most appropriate common variant to test for functional differences.
It is also possible that rare variants on the A-C haplotype contribute to ASD risk and the genetic association of the haplotype with ASD. Re-sequencing over 100 individuals did not identify any non-synonymous coding polymorphisms (Benayed, Gharani et al. 2005, Rahman and Millonig, unpublished results). For all these reasons, we decided to focus our research on determining whether the ASD associated A-C haplotype was functional. Our molecular and mouse genetic studies are summarized below and demonstrate that the A-C haplotype functions as a transcriptional activator both in vitro and in vivo. These data provide molecular genetic support for EN2 being an ASD susceptibility gene.
6. A-C haplotype functional studies
6.1. In vitro molecular genetic analysis
To investigate potential function of the ASD associated A-C haplotype, luciferase (luc) assays were conducted. The luc reporter system measures quanta of light, which is a sensitive and reproducible methodology for detecting transcriptional changes. Human EN2 intron was cloned 3’ of a basal promoter and luc gene but 5’ of the polyA sequence (Figure 4). The construct also included the EN2 splice acceptor and donor sequences. In this way the intron is transcribed and spliced like the endogenous gene. Constructs were generated for both the A-C and G-T haplotypes and are ~8kb in length. The only sequence difference between the constructs is the rs186972-rs1861973 haplotype.
Both constructs were transfected into primary cultures of cerebellar granule cells. We chose this cell type to test the function of the A-C haplotype for the following reasons. One, cerebellar granule cells are the most abundant neuronal cell type in the brain and because of its small size they can be isolated to near homogeneity. Two, the cells can undergo various steps of development in culture including proliferation, migration, and differentiation. Three, endogenous En2 is expressed at high levels in cerebellar granule cells.
When we transfected our constructs, the A-C haplotype resulted in significantly higher luc levels compared to the promoter control after 1 day in culture. The G-T haplotype did not display any activity compared to the promoter (Figure 4). Electrophoretic Mobility Shift Assays (EMSAs) were then performed to detect DNA-protein interactions. Granule cell nuclear extract was employed along with a 200bp fragment encompassing either the A-C or G-T haplotypes. A protein complex binds significantly better to the A-C than the G-T haplotype (data not shown). These data demonstrate that the A-C haplotype functions as a transcriptional activator in vitro. The A-C haplotype is one of two ASD associated alleles for which function has been ascribed.
6.2. In vivo transgenic analysis
Because ASD is a neurodevelopmental disorder, we then generated transgenic mice to determine the developmental cell types and ages in which the A-C haplotype is functional. Our constructs include ~10kb of 5’ evolutionarily conserved sequence, the intron, and ~10kb of 3’ evolutionarily conserved sequence. Exon 1 of EN2 was replaced with the Ds-Red fluorescent reporter and exon 2 with the polyadenylation sequence. Like our luc constructs, the intron also includes EN2 splice acceptor and donor sequences so the intron is transcribed and spliced as the endogenous locus. Transgenes for both the A-C and G-T haplotypes were generated with the only nucleotide difference between the ~25kb transgenes being the rs186172-rs1861973 haplotype (Figure 5).
We have begun our analysis by examining the expression of the transgenes in the adult cerebellum because En2 is expressed specifically in granule cells. Thus we might expect to observe a similar difference in expression as observed for our in vitro luc analysis. Taqman QRTPCR was performed for Ds-Red and Gapdh on the adult cerebellar RNA isolated from A-C and G-T lines with similar copy numbers. These assays were performed in quadruplicate on three A-C and 3 G-T littermates. The A-C haplotype results in ~250%% increase in normalized Ds-Red levels compared to the G-T haplotype in the adult cerebellum (Figure 5). These results demonstrate the A-C haplotype functions as a potent activator in vivo. These data determined that the ASD A-C haplotype functions as a transcriptional activator both in vitro and in vivo, providing molecular genetic evidence that EN2 contributes to ASD risk.
We are now examining levels and spatial expression at additional time points (E12.5, E17.5, P6 and adult) relevant to various described functions of En2 (see Table 1). These studies will determine when, where, and how the A-C haplotype is functional during CNS development, providing the first in vivo functional analysis of any common associated allele with ASD.
6.3. Post-mortem and epigenetic analysis
To investigate whether EN2 levels are also increased in individuals with ASD, post-mortem analysis has been performed. 78 age and sex matched cerebellar samples have been obtained from NICHD Brain and Tissue Bank for Developmental Disorders or Harvard Brain Tissue Resource Center via Autism Tissue Program (49 control, 29 affected). These samples have been genotyped for rs1861972 and rs1861973, and Taqman QRTPCR has been performed for EN2 and GAPDH. Normalized EN2 mRNA levels display a significant increase in affected compared to controls (Figure 6). Further examination of these data suggests that the increase is due to both the rs1861972-rs1861973 genotype and affection status. A more detailed statistical analysis is ongoing but these results are consistent with EN2 levels being increased in individuals with ASD. Together our in vitro, in vivo, and post-mortem studies have demonstrated that increased amounts of EN2 are consistently associated with ASD, suggesting that elevated levels of the protein alter CNS development to increase risk for ASD.
The previous data indicate the A-C haplotype results in increased gene expression. However, increased EN2 levels could also be achieved by epigenetic mechanisms. Environmental factors can affect gene regulation through epigenetic modifications such as differential methylation. Epigenetics likely plays an important role in ASD for the following reasons. One, epigenetics provides an interface between environmental factors and genetic susceptibility. Numerous common environmental factors (e.g. bis-phenol, arsenic, certain antibiotics) affect CpG island methylation and gene expression (Villar-Garea and Esteller 2003). Thus differential environmental exposures could cause variations in epigenetic modifications and gene expression. This model provides a possible explanation for the phenotypic variability observed in ASD and other polygenic disorders (Bjornsson, Fallin et al. 2004; Feinberg 2007). In addition, the methyl-CpG binding proteins, MeCP2 and MBD2, are mutated in Rett Syndrome and ASD, pointing to the importance of epigenetic regulation in ASD (Amir, Van den Veyver et al. 1999; Li, Yamagata et al. 2005; Coutinho, Oliveira et al. 2007; Loat, Curran et al. 2008).
CG dinucleotides are clustered in regions called CpG islands that are regulated by epigenetic mechanisms. CpG dinucleotides are the substrates for cytosine methyl transferases and DNA methylation often leads to decreased expression. Six CpG islands flank human EN2 with 3 in the gene. Interestingly, a vast majority of these CpG islands are not observed in mouse or rat, indicating they have evolved since rodent radiation to possibly regulate EN2 expression. To investigate whether EN2 is epigenetically regulated, we treated two human neuronal cell lines (Daoy, SH-SY5Y) that express EN2 with the methylation inhibitor, 5-aza-2'-deoxycytidine (AZA), and a methyl group donor, S-adenosylmethionine (SAM). Preliminary bisulfite sequencing demonstrated methylation of CpGs with SAM treatment while the same dinucleotides are unmethylated in AZA treated cells (Figure 7). Importantly this difference in methylation is correlated with EN2 mRNA levels. AZA treatment results in increased expression; SAM treatment with decreased levels (Figure 7). Thus, these data are consistent with EN2 being epigenetically regulated
We have bisulfite sequenced the promoter in a few post-mortem samples. In affected individuals none of the CpG dinucleotides are methylated while in unaffected individuals the same CpGs were methylated. These CpGs are the same dinucleotides methylated after SAM treatment in vitro (Figure 7). In sum, these results are consistent with epigenetic differences contributing to the increase in EN2 mRNA levels observed in the post-mortem samples. High-throughput epigenetic platform analysis is ongoing to investigate this hypothesis further.
7. Future studies
One important next step is to identify the downstream molecular and cell biological effects of increased EN2 expression. For this analysis we are generating a humanized EN2 knock-in
mouse whereby we are replacing ~75kb of mouse En2 with the human sequence. This sequence will also contain the flanking CpG islands. To accomplish this goal we are using a strategy called Recombination Mediated Genome Replacement (RMGR) developed by Andrew Smith PhD (Figure 8)(Wallace, Marques-Kranc et al. 2007). In this way we will be able to determine the molecular and cell biological effects of the A-C haplotype throughout development. Because the human sequence will include the flanking CpG islands, we will also be able to expose the mice to various non-genetic factors that affect epigenetic regulation and investigate how these environmental compounds can either improve or worsen the A-C associated phenotypes.
We have demonstrated that the EN2 rs1861972-rs1861973 A-C haplotype is significantly associated with ASD in 3 datasets. 6 additional groups have reported EN2 ASD association, suggesting it is an ASD susceptibility gene. If this possibility is correct, then we would expect the associated alleles to segregate with common or rare variants that functionally alter EN2 expression or activity. To address this question, we decided to use a combinatorial approach that included human genetics, molecular biology, mouse transgenesis, and human post-mortem analysis. In the three datasets that we studied, LD mapping, re-sequencing, and additional association studies identified the A-C haplotype as the best candidate to test for function. In vitro luc assays demonstrated that the A-C haplotype functions as a transcriptional activator, resulting in elevated levels. Importantly transgenic mice have recapitulated these results in vivo and will determine when, where, and how the A-C haplotype is functional throughout CNS development. EN2 levels are also increased in individuals with ASD. Thus elevated amounts of EN2 seem to be correlated with increased ASD risk. Our preliminary studies indicate that EN2 is also epigenetically regulated, suggesting exposure to environmental non-genetic factors may also increase EN2 expression. Future experiments are directed at identifying downstream molecular and cell biological pathways affected by increased EN2 levels. Finally, En2 regulates developmental processes implicated in ASD, including the establishment of connectivity maps. In sum, our combinatorial approach has provided evidence that EN2 is an ASD susceptibility gene.
We thank NICHD Brain and Tissue Bank for Developmental Disorders, the Harvard Brain Tissue Resource Center for the post-mortem samples, and all participating families for the post-mortem samples. We thank the Autism Tissue Program and especially Jane Pickett for all their help. We acknowledge the funding agencies that have supported this research: NIH (MH076624, MH080429, MH083509), Department of Defense (W81XWH-09-1-0286), NAAR/Autism Speaks, and New Jersey Governor’s Council for Medical Research and Treatment of Autism