Brain development follows a complex process orchestrated by diverse molecular and cellular events for which a perturbation can cause pathologies. In fact, multiple neuronal cell fate decisions driven by complex gene regulatory programs are involved in neurogenesis and neurodevelopment, and their characterization are part of the current challenges on neurobiology. In this chapter, we provide an overview of the various genomic strategies in use to explore the spatiotemporally defined gene regulatory wires implicated in brain development. Finally, we will discuss the intake of these approaches for understanding the multifactorial events implicated in neurodevelopment and the future requirements for further expanding our understanding of the brain.
- gene regulatory networks
- cell fate
- systems biology
- functional genomics
Since the release of the first draft of the human genome and the development of massive parallel DNA sequencing strategies, our understanding of the genetic basis for a variety of human illnesses, including neurological disease, has expanded rapidly. In fact, around 50% of the known Mendelian disorders were already matched with their underlined genes  and this gap is expected to further decrease, namely by the improvements in the analysis of non-coding genomic regions . This being said, the performance on the identification of the genetic context of diseases with complex phenotypes is more modest, probably due to their multigenic etiology. In fact, the use of exome sequencing for the detection of new mutations in an unknown gene in family pedigrees appeared as a straight approach in the context of Mendelian disorders, but at most it provides the list of common variants when applied to neurological illnesses with complex phenotypes. As a consequence, further functional genomic readouts, including transcriptomes, transcription factors profiling, or epigenetic landscaping, are required to further narrow the observed mutations and to reconstitute the complex relationship among the various genes implicated on the inset of the disease.
In this context, this chapter will focus on the use of such further readouts to complement previous exome sequencing efforts (for a review on the use of exome sequencing applied to neurological diseases: ) and provide an overview of the integrative computational strategies in use. Importantly, the concept of gene networks as an approach to describe the inter-relationship among the various implicated genes on the disease is discussed and illustrated by the major efforts performed over the last years in the field of neurodevelopment and related diseases (Figure 1). Finally, we discuss the arrival of new technological approaches for enhancing our capacity to interrogate the human nervous tissue, which in contrary to other tissues, remained till recently restricted to postmortem collected samples.
2. Interrogating neurodevelopment events by functional genomics
The evolution of genomics analyses, notably due to the sequencing of the human genome, allowed to study neurodevelopment from a different perspective; i.e., by the interrogation of the role of the genetic context during neurodevelopment. In fact, while the implication of genes in this process was previously studied at the individual level with the use of in-situ hybridization and RT-PCR methods, the developments in DNA microarray and RNA-sequencing technologies provided a global perspective as witnessed by the various studies focused on the brain transcriptome either from the whole organ or particular regions and across stages of development. Among them, the work, performed by Kang et al., for the establishment of transcriptomes from 57 postmortem human brains in 16 regions across the lifespan spanning developmental embryos through adulthood corresponds to one of the earliest most comprehensive studies. In fact, beyond the large amounts of data, they provided a spatiotemporal transcriptome regulation view enhanced by the establishment of gene co-expression networks recapitulating different stages of development. Importantly, this study highlighted that the majority of spatiotemporal differences happen before the birth with a shift of gene expression patterns around the birth in the neocortex. Principally in the fetal brain, genes with a role in cell proliferation, cell migration, and neuronal differentiation are expressed in contrast to the late fetal period and infancy, where genes coding to dendrite and synapse development are found .
Further studies performed by Colantuoni et al. focused on the temporal dynamic of the transcriptome in prefrontal cortex in a large number of human brain samples demonstrated that genes expressed differently in prenatal brain fetal development are reversed during postnatal life  with the recruitment of new genes in the early developmental brain . With the same idea, the pattern of spatial gene expression in brain was shown to follow a way determined by embryonic origin that can change during development . In fact, Pletikos et al. defined three phases in neocortical development: the prenatal with highest differential gene expression, the preadolescent phase with increasing synchronization of areal transcriptome, and the adolescence where differential expressions among area reappear . The spatial part of transcriptome analysis gave the proof of structure gene regulation in human brain. Especially, differences in gene expression profiling were demonstrated between brain substructures or sites with the presence of region-specific genes [9, 10, 11]. Hawrylycz et al. combined histological analysis with microarray in 900 neuroanatomic subdivisions from two human brains and observed that the spatial topography of the neocortex is reflected in its transcriptomic topography where closer cortical regions have similar gene expression . However, symmetry bilaterally between two hemispheres was observed during development [8, 9, 11]. In addition, the gene expression variability exists also between layers of neocortex. The neocortex consists of six horizontal layers with subsets of neurons, the transcriptional analysis of the layers in prefrontal cortex showed human specific layer gene expression patterns . A study realized by Miller et al. demonstrated differential gene expression between proliferative and postmitotic layers in mid gestation human fetal brain with the presence of a molecular gradient frontotemporal in cortical layers . These observations supported the gene expression gradients along the anteroposterior axis of neocortex .
While informative, the transcriptome analysis over the whole brain or performed on specific regions is issued from the analysis of multiple cells possibly presenting heterogeneous cell types populations. The development in single-cell transcriptomics appears as a relevant alternative for gathering information about cell types. The single-cell whole transcriptomic analysis permitted to identify cellular heterogeneity in the brain and subtypes of neuronal cells with differential gene expression between fetal and adult neurons . Single nuclear transcriptome in the adult cerebral cortex was used to see diversity in neuronal subtypes and neuroanatomical areas . Habib et al. combined this technique of single nucleus RNA-Seq with pulse-labeling proliferative cells using the thymidine analog, the 5-ethynyl-2′-deoxyuridine (EdU), to identify hippocampal cellular types and track transcriptional trajectories single proliferating cells in the adult hippocampal neurogenic niche . Similarly, a recent single-cell RNA-Seq study in the human fetal cortex and medial ganglionic eminence during prenatal neurogenesis demonstrated the presence of lineage specific trajectories dependent of transcription regulatory . This study also demonstrated the modest transcriptional differences in cortical radial glia cascade which conducts robust typological differences in neurons. In the same context, Lake et al. combined single-cell sequencing with epigenome readouts in adult human brain cells to reveal chromatin/transcription factor regulatory events within distinct cell types . Recently, Fan et al. also performed single-cell spatial transcriptome analysis in human brain mid gestation embryos, where they observed heterogeneity in each cortex region with no synchronization in cortex development and maturation .
The study of the transcriptional expression behavior during brain development is expected to enhance our understanding of pathological situations. Autism spectrum disorder (ASD), a heterogeneous pathology with prevalence of 1 in 59 children, is one of these examples. The pathogenesis of ASD is characterized by social impairments, disrupted communication skills and repetitive behaviors. Numerous genes were shown to be implicated in ASD and their gene co-expression and/or gene regulatory networks analyses are providing new insights on the impaired/affected pathways on this disorder. In fact, several studies have tried to identify transcriptome alterations implicated in ASD using either DNA microarray hybridization assays or genome sequencing. By comparing autistic and control brain samples, upregulated genes implicated in immune function, while others repressed and involved in neurodevelopment or synaptogenesis were highlighted [22, 23, 24]. Another study described a dysregulation in mitochondrial oxidative phosphorylation and protein translation pathways without seeing changes in DNA methylation . Consistent with this observation, the downregulation of genes involved in mitochondrial and synaptic function were also reported by using multiple genomics datasets like RNA-Seq and microarray studies previously published . Interestingly, dysfunction in synaptic pathways was also described in another neurodevelopmental disease, namely schizophrenia [27, 28, 29, 30]. This pathology affecting approximately 1% of the population is characterized by personality disturbances, hallucinations, delusions, and/or disorganizing behavior. High-throughput transcriptomic analysis revealed multiple deregulated genes in schizophrenia [29, 30, 31, 32]. Several of them are implicated in neurodevelopmental pathways, neuronal communication, energy metabolism, and synaptic function [29, 30, 32]. Changes in DNA methylation related to the prenatal-postnatal life transition were also reported by comparing schizophrenia postmortem and unaffected control brain samples, strongly arguing for the implication of an epigenetic regulation in the disease’s development [33, 34, 35].
In addition to the observed changes in gene expression, alternative RNA splicing has been described to occur at high frequency in human brain samples, corresponding to more than one-third of the human brain transcriptome [9, 36]. In addition, beyond the reported changes in protein coding gene expression , non-coding micro RNAs (miRNA) and/or long non-coding RNAs (lncRNA) were shown to have a role in neurodevelopment, participating in the reinforcement of brain complexity. Indeed, Ziats et al. described differential miRNAs expression in different parts of human brain along time of development with a principal shift that happens after the birth . In the same idea, changes in lncRNA transcriptome during brain development , preferentially across fetal development with spatial regulation, were described . LncRNAs also play a role in neuronal differentiation and neurogenesis, as suggested by studies highlighting a differential expression of lncRNAs during differentiation from human pluripotent stem cells [41, 42]. One example is the lncRNA rhabdomyosarcoma 2-associated transcript (RMST) which through its interaction with SOX2 regulates downstream genes implicated in neurogenesis . The dysregulation of miRNA or lncRNA expression was also observed in autism [44, 45, 46], schizophrenia , and intellectual disability . In this last case, lncRNAs were shown to be implicated in synaptic transmission, neurogenesis, or neurodevelopment.
Across these different transcriptome studies, a variety of databases hosting microarray and/or RNA-Seq data are currently available (for a comprehensive review, see ). Among them, we can cite the HB Atlas [4, 9], the BrainSpan Consortium , Brain Cloud , the Allen Brain map portal , the cortex single cells , or the single-cell portal . In addition, several consortia, sometimes covering topics beyond the brain tissue, are at the basis of the establishment of major databases. Among others, we can cite the “Genotype Tissue Expression (GTex)” regrouping gene expression data issued from different tissues covering more than 600 donors . Similarly, the “Encyclopedia of DNA Elements (ENCODE)” regroups large-scale datasets from various projects and combines multi-omics data from different species, variety of cell lines and tissues at different stages of development. A more specialized version of ENCODE, the “Psychiatric Encyclopedia of DNA Elements (PsychENCODE),” collects datasets concerning epigenetic modifications and non-coding RNA in healthy and disease-related human brains . In the context of the data issued from brain samples, Huisman et al. developed the web portal “Brainscope” providing an interactive visualization of Allen Atlas adult brain transcriptome and across different stages of development . Recently, a method to predict mRNA expression in whole brain using microarray data from Allen Brain Atlas with in-vivo positron emission tomography (PET) data was developed . Overall, the generation of these databases correspond to major efforts for the research community, providing centralized access to the large collections of data; thus, further efforts of data integration can be performed, for instance by the reconstruction of gene regulatory networks on the basis of previously generated transcriptomes.
3. Inferring molecular coregulatory events from the integration of collected functional genomic readouts
The development of mid/high throughput strategies for analyzing genome sequences, their variants, gene expression, or even the proteome composition, provided means to the scientific community to interrogate each of these layers of complexity in a variety of model systems and tissues and in addition to integrate them to reconstruct a regulatory view. As illustrated in the previous section, several studies described major functional genomic readouts focused on studying brain development in normal and disease settings.
While being comprehensive, in most cases they provide relevant list of players (gene variants, differentially expressed genes, etc.) on the basis of statistical descriptors but forgets completely to address their potential relationship. Or, from a biological point of view, each of the players composing the system under study is expected to directly (or indirectly) influence the behavior of others. As a consequence, the current challenge is to evolve into an integrative view, focused on studying the various “deregulated events” as interconnected entities by the incorporation of multiple types of readouts and supported by computational solutions.
From an historical perspective, the article of Walsh et al. released in Science in 2008, corresponds to one of the first major studies aiming at identifying neurodevelopmental programs involved in a disease context like schizophrenia . In this study, the authors hypothesized that the collective contribution of each of the rare structural variants retrieved on neurological/neurodevelopmental syndromes accounts for these disorders, and in the specific case of schizophrenia, they have demonstrated a difference of at least 3-fold between controls and individuals with schizophrenia on the frequency of rare structural variants within coding regions. Furthermore, they have focused on structural mutations that disrupt genes, and evaluated their functions with the help of computational solutions querying for gene enrichment in one or more functionally defined pathways (PANTHER and Ingenuity Pathway Analysis). This strategy per se aims at establishing gene relationships on the basis of their annotation to a given program (or pathway), even though in this case such relationships are inferred in-silico.
Since then, further studies incorporated other types of data, like the use of RNA-Seq transcriptomic analysis to identify the differentially expressed genes between controls and individuals with schizophrenia, which are then associated to biological functions by Gene ontology analysis [55, 56, 57]. Furthermore, the development of computational solutions for enhancing data integration has being performed like in the case of NETBAG, which allows to integrate multiple types of genetic variations like single nucleotide variants (SNVs), rare copy number variants (CNVs), and genome-wide association studies (GWAS), to identify highly connected gene clusters, potentially related to functional roles. NETBAG was initially described in the context of de novo CNVs in autism  and schizophrenia .
Beyond correlating changes in gene expression with the identification of genetic variations, further efforts are required for stratifying information, like the use of gene co-expression strategies. This approach aims at aggregating genes on the grounds of their expression levels under the hypothesis that co-expressed genes are the consequence of a common regulatory force; e.g., the action of transcription factors. This analysis can be represented under a network structure, on which a pair of genes is displayed interconnected on the basis of their significant co-expression relationship. This strategy has been applied by Voineagu and colleagues to resolve consistent differences in transcriptomes assessed over autistic and normal brain samples . Specifically, they have resolved gene expression levels in cortical regions (suggesting cortical abnormalities in the context of autism), but in addition they have managed to identify discrete modules of co-expressed genes, clearly demonstrating the advantages of such strategy for enhancing the analytical resolution. Since then, various studies incorporated gene co-expression analysis together with genome-wide association data (GWAS) [60, 61], incorporated multiple human brain regions and issued from various human development stages as a way to identify specific biological processes and defined brain regions associated to autism disorder [62, 63].
While gene co-expression networks are expected to be the consequence of the action of defined master transcription factors, their identity remains unknown in this type of analysis. The combination of chromatin immunoprecipitation (ChIP) with massive parallel sequencing provided means to scrutinize the genome locations on which given TFs are located. Furthermore, on the basis of their proximity to annotated coding regions, it is possible to infer their transcriptional regulation activity over proximal genes. Following such strategy, factors like TBR1  or Auts2 [65, 66], initially identified by rare genetic variant studies were ChIP-sequenced to reveal their direct targets. In both cases, they were found located on genomic regions adjacent to autism spectrum disorder (ASD)-related genes. A similar strategy has been applied to map the gene targets associated to the chromatin modifier CHD8 (chromodomain helicase) , previously shown to be mutated in rare genetic variant studies .
Although powerful for the identification of the target genes for a given factor, performing ChIP-Seq assays remains still challenging for covering a large number of TFs, epigenetic modifications, and/or chromatin remodelers which could appear associated to neurodevelopmental events. In fact, identifying strategies to prioritize the list of TFs to be immunoprecipitated remains a key step, which is currently handled by applying computational strategies. In this context, we have recently developed TETRAMER, a computational approach able to reconstruct gene regulatory networks from the integration of transcriptomes provided by the user and annotations retrieved in various databases concerning TF-Target gene relationships . Furthermore, TETRAMER simulates transcription regulation propagation over the reconstructed connectivity to identify master TFs, which could then be prioritized for experimental assays. This strategy has been initially used for identifying novel master TFs implicated on neurogenesis by reconstructing gene regulatory networks from temporal transcriptomes ; then, it has been extrapolated to a collection of more than 3000 transcriptomes covering ∼300 cell/tissue types and representing 14 different anatomical systems in the human body. Among them, 58 cell/tissue types composing the human nervous system were analyzed, for which their relevant master TFs as well as their related gene regulatory networks were inferred. As illustrated in Figure 2, this type of analysis allows to compare the fraction of shared TFs retrieved on different nervous systems, thus providing to highlight relevant players implicated on their transcriptional regulation. In Figure 2, a comparison between the TFs retrieved on frontal cortex and hypothalamus is depicted, revealing the presence of factors like TBR1 or ARNT2, previously identified as presenting rare genetic variants associated to autism disorders [64, 71] or NPAS3, previously described as a master regulator of neuropsychiatric related genes .
Overall, the analytical strategies aforementioned clearly suggest the necessity of incorporating various types of genetic and functional genomic readouts such that their inter-relationship might enhance our comprehension of the phenomena under study. This is more relevant when studying neurodevelopment and their related diseases as the consequence of multigenetic events. Furthermore, it is important to mention that data integration is systematically supported by computational developments, as witnessed by the various tools and computational strategies devoted to infer relationships among the available data, but also to model systems behavior. Notably, the use of machine learning strategies for modeling the maturity and regional identity obtained during neuronal in-vitro assays in comparison with human fetal brain data, provide means to take advantage of in-vitro systems that manage to reconstitute as close as possible the in-vivo events . In a similar manner, major efforts like the “blue brain project” are currently combining data assessment with computational modeling to reconstruct cell atlas for instance of the mouse brain , strongly suggesting that over the coming years major discoveries in neuroscience might arise from such multidisciplinary efforts.
4. Perspectives for the coming years: from the use of new in-vitro 3D-brain tissue models, single cell strategies to big-data systems biology
The majority of transcriptome or related studies in human brain used postmortem tissues as source of material. As consequence, technical concerns like the potential RNA degradation following pre- and postmortem factors as environment, collection methods, or postmortem interval could directly influence the quality of the readouts [75, 76, 77]. The use of animal models as an alternative is losing interest due to the reported differences, for instance in human corticogenesis relative to mouse models, which are further supported by human specific gene signature and/or divergences in gene regulatory programs [78, 79, 80]. Even if few percentages of genes have different trajectories in non-human primate and human in contrast to rodent, this model can help to understand brain development, but it cannot model all features found in human [79, 81]. In fact, comparison between non-human primate and human brains transcriptome analysis showed human specificity in gene expression profiling [82, 83, 84] with demonstration that genes differentially expressed are principally upregulated in human brains in contrast to other organs [85, 86]. In addition, the transcriptome remodeling during postnatal periods appears delayed in human brain comparing to non-human primate .
More recently, the use of human-induced pluripotent stem cells (hIPSCs) combined with in-vitro culture strategies for generating two- or three-dimensional nervous tissue appears as an alternative to animal model systems. In fact, nowadays it is possible to generate hIPSCs from tissue samples collected from patients presenting neurological disorders which can be differentiated toward nervous tissue. In this context, a recent study compared the transcriptome of neural stem cells driven in-vitro toward corticogenesis and discovered a strong conservation with in-vivo gene expression with the conservation of cortical gene network implicated in ASD . In contrast to the in-vitro neuronal differentiation in two dimensions, the generation of three-dimensional models (known as cerebral organoids) appears as a more relevant physiological model to study neurodevelopment [88, 89, 90, 91]. Comparing human cerebral organoids and fetal brain development demonstrated the similarity in gene expression programs and epigenomic signatures [92, 93, 94]. Furthermore, single-cell transcriptome analysis over cerebral organoids revealed an important cellular heterogeneity, reminiscent to what is observed in the human brain . As a consequence, the use of human cerebral organoids corresponds to a new approach for modeling the neuronal development and providing means to study neurogenesis from a systems biology perspective. For example, Mariani et al. generated cerebral organoids from hIPSCs derived from patients with ASD and recapitulated transcriptional programs present in fetal cortical development. In this study, the use of gene network analyses allowed to identify upregulated gene programs implicated in cell proliferation, neuronal differentiation and synaptic process . Similarly, Amiri et al. identified gene modules implicated in ASD that overlap those described previously in postmortem data. This study supported the idea that cerebral organoids provide means to reveal gene regulatory elements contributing to ASD . Due to these success, major efforts focused on the development of protocols to generate tissues reminiscent to different brain structures like forebrain [90, 96], midbrain [96, 97], or hypothalamus  were developed. Recently, chimeric organoids issued from the fusion different regionalized organoids (like dorsal-ventral forebrain organoids) were generated to increase the complexity of the generated tissues .
The use of cerebral organoids as a model system for studying neurodevelopment and related diseases is in its infancy. This approach still requires improvements, for instance in the context of the reproducibility, but due to its alternative to human postmortem samples and animal models, it is expected to continue to evolve over the coming years. In fact, this tendency is also boosted by multiple other developments, including the use of CRISPR/CAS9 system to engineer organoids , the democratization of single cell omics strategies , as well as the gain in multidisciplinary approaches, specifically by the incorporation of computational approaches for modeling brain tissue organization .
Understanding the brain complexity corresponds to one of the major challenges for the scientific community. This does not only imply its physiological function, but also its relationship with the human mind. The use of omics strategies is revolutionizing the way to interpret any living system from the expression of their genome, and in the particular case of the human brain, it is enhancing the comprehension of neurological disorders. In this chapter, we have discussed the use of transcriptomes, exome sequencing, and gene regulatory network strategies for revealing the influence of multiple genes. Furthermore, we have highlighted the arrival of cerebral organoids as a novel model system for studying human nervous system, which in combination with further developments (single-cell strategies, CRISPR-Cas9 engineering, etc.) is a promising major progress for understanding the brain function. This enthusiasm is further supported with the major advancements in computational developments, notably the artificial intelligence, which together with the major amounts of data (issued from omics strategies) is expected to accelerate discoveries. Overall, we expect that this chapter will open the mind to young readers to further explore the multidisciplinary approaches described herein to directly participate in the exploration of the human brain in the following years.
We thank all members of the SysFate lab for discussions related to the elaboration of this chapter. SysFate is supported by the “Genopole Thematic Incentive Actions” funding (referred to by their French acronym “ATIGE”) and by the institutional bodies CEA, CNRS, and Université d’Evry, Université Paris-Saclay.
Conflict of interest
The authors declare that there is no conflict of interest.
|RT-PCR||reverse transcription polymerase chain reaction|
|ASD||autism spectrum disorder|
|lncRNA||long non-coding RNA|
|RMST||rhabdomyosarcoma 2-associated transcript|
|SOX2||sex determining region Y-box 2|
|GTex||genotype tissue expression|
|ENCODE||encyclopedia of DNA elements|
|PsychENCODE||psychiatric encyclopedia of DNA elements|
|PET||positron emission tomography|
|SNV||single nucleotide variants|
|CNV||copy number variants|
|GWAS||genome-wide association studies|
|TBR1||T-box, brain 1|
|Auts2||activator of transcription and developmental regulator|
|CHD8||chromodomain helicase DNA binding protein 8|
|ARNT2||aryl hydrocarbon receptor nuclear translocator 2|
|NPAS3||neuronal PAS domain protein 3|
|hIPSCs||human-induced pluripotent stem cells|
|CRISPR/CAS9||clustered regularly interspaced short palindromic repeats/CRISPR-associated 9|
|Transcriptome||total of RNA molecules expressed in a cell or a population of cells|
|Exome||the part of the genome composed of exons which are the coding portions of gene|
|Epigenome||multitude of chemical compounds and proteins that modify and control the expression of genes without change in DNA sequence|
|MicroRNA||class of small non-coding RNA molecules of about 22 nucleotides in length that function as posttranscriptional regulators of target genes|
|LncRNA||non-coding RNA molecules greater than 200 nucleotides in length|
|Single nucleotide variants||loci with alleles that differ at a single base|
|Rare copy number variants||number of copies of a particular gene that varies between individuals|
|Genome-wide association study (GWAS)||approach to associate specific genetic variations with particular diseases|
|Chromatin immunoprecipitation||procedure to investigate interaction between proteins and genomic DNA regions|