Analytical Validities and Diagnostic Capacity of Cytogenomic Analyses
Within each human cell, double strand DNA molecules are packed into 22 pairs of autosomes numbered from 1 to 22 and one pair of sex chromosomes denoted as XX for female and XY for male. Every human chromosome has a centromere to guide its segregation through cell cycles and a telomere at each end to protect its integrity. Chromosomes play important roles in gene expression regulation, DNA replication and cell division. Abnormalities involving the number and the structure of each chromosome or a segment within a chromosome are known to introduce functional disturbance and cause genetic diseases.
Medical genetics has been driven by evolving technologic innovations for better genetic diagnosis and expanding clinical evidence for rational genetic counseling and disease treatment. Since 1970s, a series of technologies operating on differentiate staining of metaphase chromosomes or locus-specific hybridization of labeled DNA probes has been developed to study chromosomal and submicroscopic abnormalities. Karyotyping using Giemsa-stained banding pattern (G-band) on treated metaphase chromosomes and fluorescent in situ hybridization (FISH) mapping on metaphase chromosome or interphase chromatin are the standard procedures in clinical molecular cytogenetic laboratories. Molecular cytogenetic analysis of pediatric patients with developmental delay (DD), intellectual disability (ID), multiple congenital anomalies (MCA) and autistic spectrum disorders (ASD) has found many causative chromosomal abnormalities and some genomic disorders. In the past decade, genomic analysis using either oligonucleotide array comparative genomic hybridization (aCGH) or single nucleotide polymorphism (SNP) chip has been validated and recommended as the first-tier genetic testing for pediatric patients. This integrated cytogenomic approach further defines the genomic coordinates and gene content of chromosomal and cryptic genomic abnormalities and extends the spectrum of etiologic causes for ID/DD/MCA and ASD. The genomic information facilitates fine-mapping of disease-causing genes and dissecting underlying pathogenic mechanisms through in silico bioinformatic data mining, in vitro cellular phenotyping and in vivo animal modeling. Ultimately, this progress will lead to rational disease classification and therapeutic interventions for patients with ID/DD/ASD [1-3].
2. Cytogenetic and genomic methodologies
2.1. Molecular cytogenetic approach
Clinical cytogenetics is the study of human chromosomal abnormalities and their associated phenotypes. In 1956, Tjio and Levan  correctly described that a normal human metaphase contains 46 chromosomes. This fundamental cytologic observation was built upon the development of in vitro cell culture techniques along with the use of colchicine to arrest the cell cycle at metaphase and the modification of Hsu’s hypotonic treatment prior to fixation to spread out the chromosomes . The analysis of directly Giemsa-stained chromosomes led to the identification of numerical chromosomal abnormalities like trisomy 21 in Down syndrome , 45,X in Turner syndrome , 47,XXY in Klinefelter syndrome , trisomy 13 , and trisomy 18 . The early discoveries of these syndromic numerical chromosomal abnormalities prompted efforts to differentiate all 23 pairs of chromosomes to detect structural abnormalities. In 1968, Caspersson et al.  reported differentiate Quinacrine staining of chromosomes and triggered the development of various chromosome banding techniques. Giemsa staining on trypsin-treated chromosome spreads forms unique Giemsa-positive and negative bands which looks like G-band ‘barcodes’ for each pair of chromosomes under the microscope. A normal human G-band ideogram was used as a standard for accurate grouping, numbering and pairing of human chromosomes based on their size, centromere position, defined regions and bands; this organized chromosomal profile of an individual is referred to as a karyotype . Chromosome heteromorphisms mainly involving highly repetitive sequences in the pericentric and satellite regions have been recognized through studies on normal human populations and diagnostic practices . General consensus on heteromorphic regions and their reporting practice was reported . Despite an effective tool to detect numerical and structural chromosomal abnormalities, the banding method has two obvious technical limitations: the requirement of viable cells for setting up cell culture to capture metaphases for microscopic analysis and the low analytical resolution of chromosomal G-bands. The size of a human genome is 3000 Mb (megabases) and estimated total number of protein-coding genes is about 20,000. So the average size of a chromosome G-band in a medium 500-band level is about 6 Mb and contains 40 coding genes. Before the application of genomic technologies, the lack of genomic mapping for involved genes of many detected chromosomal abnormalities had been the major obstacles for accurate karyotype-phenotype correlation and candidate gene identification.
In 1982, FISH technology using labeled DNA probes hybridized onto metaphase chromosomes was developed to map genes onto specific chromosomal G-band regions . This gene mapping tool was immediately recognized to have great diagnostic value. FISH on metaphase chromosomes, using labeled DNA probes in the size of 100-800 kilobase (Kb), has enhanced the analytical resolution and allowed accurate diagnosis of genomic disorders (also termed contiguous gene syndromes or microdeletion disorders), such as DiGeorge syndrome (OMIM#188400) by a deletion at 22q11.2, Prader-Willi syndrome (OMIM#176270) and Angelman syndrome (OMIM#105830) by a deletion at 15q11.2. FISH can also be performed directly on interphase nuclei, which overcame the limitation of cell culture and extended its diagnostic application toward rapid screening of chromosomal and genomic abnormalities. Multiplex FISH panels with differentially labeled probes have been developed for prenatal screening of common aneuploidies involving chromosomes X, Y, 13, 18 and 21  and for postnatal detection of cryptic subtelomeric rearrangements .
The molecular cytogenetic approach combining G-banding and FISH technologies has been the standard for a primitive genome-wide view or a locus-specific view of numerical and structural chromosomal abnormalities. An international system for human cytogenetic nomenclature (ISCN) was first introduced in 1978 and has been continuously updated to the current 2013 version for a systematic documentation of chromosomal and genomic abnormalities . Practice guidelines for cytogenetic evaluation of DD/ID/MCA have been established; the abnormality detection rate is 3.7% by conventional karyotyping for large numerical and structural chromosomal abnormalities and up to 6.8% when combined with FISH analysis for targeted genomic disorders and subtelomeric rearrangements .
2.2. Genomic analysis as first-tier genetic testing
In 1992, to overcome frequent cell culture failure and poor metaphase quality in karyotyping solid tumor samples, Kallioniemi et al.  developed comparative genomic hybridization (CGH) using differently labeled test and control DNAs co-hybridized onto normal metaphase chromosomes to measure copy number changes. In 1995, Schena et al.  developed a microarray-based technology to quantitatively monitor multiple gene expression. A hybrid of these CGH and microarray technologies formed the novel array CGH (aCGH) technology for a high resolution analysis of copy number changes through the genome. From 1998 to 2001, prototype CGH arrays with increased coverage from a single chromosome to the whole genome using spotted BAC or PAC clones were produced by academic laboratories [21, 22]. Five years later, high-throughput, high density oligonucleotide microarrays or single nucleotide polymorphism (SNP) chips following industrial standards along with user-friendly analytical software packages were developed by several companies. These novel genomic technologies quickly filled the gap between the megabase (Mb)-range chromosome G/R-bands and kilobase (Kb)-level gene structure and led to the discovery of polymorphic copy number variants (CNV) in the normal human genome [23, 24]. CNVs are defined as the gain or loss of genomic materials larger than 1 kb in size and they present in approximately 12% of the genome from normal human populations . Meanwhile, these genomic technologies had also been applied to delineate chromosomal abnormalities and detect pathogenic genomic imbalances for patients with DD/ID/MCA in a research setting [25-28]. These technical and research progresses set a solid foundation for diagnostic application.
To ensure the safety and effectiveness of a new technology for genetic diagnosis, analytical and clinical validities followed by evidence-based practice guidelines have to be established. Genomic analysis involves multi-step bench procedures of DNA extraction, enzymatic labeling or extension and DNA hybridization, a large amount of data analysis and knowledge-based result interpretation. The integration of this DNA-based genomic analysis into a cell-based microscopic analysis could be a technical challenge for many clinical cytogenetic laboratories.
For genomic analysis, analytical validity refers to the probability that a test will be positive when particular copy number variants (deletion or duplication) are present (analytical sensitivity), the probability that the test will be negative when these variants are absent (analytical specificity), and the analytical resolution . Most analytical validation studies compared the outcomes between the genomic analysis and the conventional cytogenetic method or among different platforms. Two earlier pilot studies compared array results with known chromosomal abnormalities from 25 cases to validate targeted BAC clone arrays, and the clone-by-clone sensitivity and specificity were estimated to be 96.7% and 99.1% respectively [30, 31]. Using a receiver operating characteristic (ROC) curve, the analytical validity of a genome-wide oligonucleotide aCGH (Agilent 44K) showed 99% sensitivity and 99% specificity when the analytical resolution was set at 300-500 Kb by five to seven contiguous oligonucleotides (about six times the average spatial resolution of 68 Kb, given by the coverage of a 3,000,000 Kb human genome with approximately 44,000 oligonucleotide probes). For the detection of mosaicism using the set resolution, aCGH can achieve 85% sensitivity and 95% specificity for a mosaic pattern at 50% of the cell population, but increased test-to-test variations and reduced sensitivity are expected as the mosaic percentage decreases . Another validation study recommended similar analytical parameters by using a sliding window of four to five oligonucleotide probes . Additionally, the comparison of the area under the ROC curve clearly demonstrated that the analytical validity of oligonucleotide aCGH outperformed BAC clone aCGH . The superior performance of oligonucleotide aCGH over BAC clone aCGH was later confirmed in a comparative analysis . ROC analysis is effective in evaluating and comparing analytical validity among different technical platforms for a rational decision in selecting a novel technology for diagnostic application.
Cross platform comparison on a 33K tiling path BAC array, 500K affymetrix SNP chip, 385K Nimblegen oligonucleotide array and 244K Agilent oligonucleotide array was performed using ten cases with known genomic imbalances ranging from 100 Kb to 3 Mb. Sensitive performances were noted in all platforms, but accurate and user-friendly computer programs are of crucial importance for reliable copy number detection . Technically, one obvious advantage of the SNP chips over aCGH is the ability to detect uniparental disomy (UPD), copy neutral loss of heterozygosity (CN-LOH) or absence of heterozygosity (AOH), and level of consanguinity and incest . However, the introduction of SNP probes into CGH array by Agilent Technologies has resolved the technical differences to a certain extent. Validation studies of UPD detection using the Affymetrix SNP genechips detected isodisomic UPD and segmental AOH of a defined size but missed heterodisomic UPD [37, 38]. The high density aCGH and SNP chips have achieved exon-by-exon coverage to detect intragenic and exonic copy number changes, which could allow direct evaluation of genotype-phenotype concordance . A recent study using exon-level high density aCGH on a targeted list of genes showed the detection of mostly exonic deletions in 2.9% cases with autosomal dominant disorders, intragenic deletions in 10.1% of cases of autosomal recessive disorders tested with one known mutation, and a deletion and duplication in 3.5% of X-linked disorders . Laboratories pursuing this high resolution genomic analysis will require additional validation studies, in-depth result interpretation into Mendelian disorders, more familial follow up studies for incidental or secondary findings, and eventually further functional studies.
Clinical validation refers to the probability that a test will be positive in people with the disease (clinical sensitivity) or negative in people without the disease (clinical specificity) . There were concerns about false negative results, procedure variability and interpretation criteria for the clinical application of early versions of targeted BAC clone array . Due to the high cost of aCGH and SNP chip analysis, a validation study on a large number of patient and control samples in every clinical laboratory is not practical. The ACMG guidelines recommend a validation procedure of analyzing a minimum of 30 specimens with different known chromosomal abnormalities . Most cytogenetic laboratories performed a parallel comparison between aCGH or SNP chip analysis and karyotyping on a small case series. All these studies demonstrated consistently that the aCGH or SNP chip can define the genomic coordinates and gene content of chromosomally observed imbalances and also detect cryptic microdeletions, microduplications and subtelomeric rearrangements. For example, a focus oligonucleotide aCGH was validated with 100% concordance toward known chromosomal imbalances and yielded an 11.9% abnormal detection rate of 211 clinical samples . Oligonucleotide aCGH using high density Agilent 244K was validated in 45 cases with known chromosomal abnormalities and microdeletions . A multi-center comparison of 1,499 patients using the same oligonucleotide platform (Agilent 44K) showed a 12% abnormality detection rate, and about 53% of the abnormal findings are less than 5 Mb and thus beyond the analytical resolution of routine karyotyping . The clinical validity of genomic analysis could be estimated from data published by the International Standards for Cytogenomic Arrays consortium (ISCA). Using 14 well known recurrent microdeletions and microduplications ranging from 1.5-3 Mb as a reference set of genomic disorders, 458 microdeletions and 270 microduplications were detected from 15,749 pediatric patients and 12 microdeletions and 53 microduplications were found from 10,118 published controls . Given the analytical sensitivity of 99% at a resolution of 300-500 Kb for a routine aCGH (Agilent 44K) , the clinical sensitivity is close to 100% for this reference set of genomic disorders; given the 65 false negative results from 10,118 controls, the clinical specificity is estimated to be 99.4%. The near-perfect validities and significantly improved resolution of genomic technologies made them ideal diagnostic tools for delineating chromosomal imbalances and detecting CNVs.
2.3. Integrated cytogenomic workflow and practice guidelines
The current practice guidelines from the American College of Medical Genetics (ACMG) and the peer consensus recommend that genomic analysis be the first-tier genetic testing for pediatric patients with DD/ID/MCA/ASD [47-50]. A CNV detected from a normal individual is likely a polymorphic variant without clinical significance and usually termed as a benign CNV (bCNV). A CNV with known disease association is referred as pathogenic CNV (pCNV), and a rare or private CNV with uncertain clinical relevance is named a variant of unknown significance (VUS) . For the past five years, clinical genomic analysis has progressed rapidly from ‘targeted’ or ‘focused’ aCGH to genome-wide high resolution aCGH or SNP chips. A systematic review of 21,698 pediatric patients analyzed by different genomic platforms demonstrated a diagnostic yield of 15-20% , which is two to three times higher than the 3.7-6.8% yield by molecular cytogenetic analysis .
In diagnostic practice, there are serious concerns regarding the complete replacement of molecular cytogenetic approach. The analytical validities and technical capacity of chromosome, FISH and oligonucleotide aCGH are summarized in Table 1. Although cytogenetic testing has gradually become a supplemental or confirmatory procedure, karyotyping is still the gold standard to detect numerical chromosomal abnormalities (e.g., trisomy 21 for Down syndrome and monosomy X for Turner syndrome) and structural rearrangements (e.g. Robertsonian translocation) and FISH is also the ‘cell-based’ method of choice to determine mosaic patterns. Approximately 45% of genomic imbalances are larger than 3-5 Mb and could be confirmed by high resolution G-banding; most recurrent genomic disorders, subtelomeric rearrangements and mosaic patterns can be readily confirmed by clinically-validated commercial FISH probes .
|Analytical Validity*||Types of Abnormalities Detected**|
|Resolution||Resolution||Sensitivity||Specificity||Num Abn||Struc Abn||Struc Abn||CNV||AOH||Exonic||Mosaic|
|Routine (400-550 bands)||5 ~ 7 Mb||+||+||+||-||-||-||>6%|
|High Resolution (550-850 bands)||3 ~ 5 Mb||+||+||+||-||-||-||>6%|
|Regional specific (cen/subtel)||>100 Kb||~98%||~98%||+||+||+||+||-||-||>3~5%|
|Oligonucleotide aCGH (Agilent)|
|Human CGH 44K||68 Kb||400~500 Kb||>99%||>99%||+||-||+||+||-||-||>20%|
|Human CGH 180K||17 Kb||100~120 Kb||>99%||>99%||+||-||+||+||-||-||>20%|
|Human CGH+SNP 180K||17 Kb||100~120 Kb||>99%||>99%||+||-||+||+||+||-||>20%|
|Human CGH+SNP 400K||7.5 Kb||40~50 Kb||+||-||+||+||+||+||>20%|
In a recent study on the interpretation and reporting of CNVs without known associated abnormal phenotype from different laboratories, it was found that none of the thirteen CNVs was in complete agreement and the interpretations ranged from normal to abnormal in some cases . In fact, some genomic findings will always be difficult to interpret because of their variable expressivity and incomplete penetrance. The collection of bCNVs, pCNVs and VUSs into web-delivered databases has provided an essential tool in interpreting results for diagnostic laboratories and also in educating clinical geneticists and genetic counselors . The websites of Database of Genomic Variants (DGV), International Standards for Cytogenomic Arrays (ISCA), DatabasE of Chromosomal Imbalance and Phenotype in Humans using Ensembl Resources (DECIPHER) and other related web resources are listed at the end of this report. CNVs in the DGV, DECIPHER and ISCA have been loaded onto the Human Genome Browser as searchable tracks. The brief clinical description from DECIPHER  and the evidence-based rating of CNV into pathogenic, likely pathogenic, uncertain significance, likely benign and benign from ISCA  are all helpful in reporting genomic findings. Recognized cytogenomic syndromes usually have entries in the Online Mendelian Inheritance in Man (OMIM). Novel cytogenomic abnormalities are usually presented as case reports and can be search from PubMed. Reports of a series of similar findings and in-depth reviews will provide more evidence. Figure 1 shows an aCGH detected 16p13.11 deletion in a patient with ASD in comparison with pathogenic deletions and duplications documented in the DECIPHER and ISCA databases. Searchable clinical information from these databases could be used to assure genotype-phenotype correlations. According to the genotype-phenotype correlation, a four-level evidence-based result interpretation scheme has been proposed :
Level I: Tight genotype-phenotype correlation with well-defined association of the described syndrome and the pCNV. There may be variability in the phenotype, but the spectrum of variability is well described (e.g. Williams syndrome and DiGeorge syndrome). Many known syndromes have an assigned OMIM number which could be used directly as a reference in the report.
Level II: Evolving genotype-phenotype correlation. The described syndrome is represented by a case report in the literature, or is associated with more than one distinct phenotype and may be influenced by penetrance or modifying genes, such as 1q21.1 duplication or deletion syndrome.
Level III: Possible new genotype-phenotype correlation. The CNV has not been described in the literature before, and no published phenotypic data are available.
Level IV: Uninterpretable multiple CNVs and VUSs. None of which are reported as normal variants or associated with a disease phenotype, or one or more of the CNVs/VUSs are also found in a phenotypically discordant parent.
Each laboratory that performs genomic analysis should develop its quality control and quality assurance procedures. Proficiency testing for genomic analysis has been implemented by the College of American Pathologists (CAP). Since 2008, two pilot and ten survey challenges of twelve DNA specimens distributed to as many as 74 different laboratories yielded 493 individual responses with a 95.7% mean consensus for matching result interpretations. Responses to supplemental questions indicate that 72% of laboratories use oligonucleotide aCGH and 23% use SNP chips, and array platforms used are increasing in probe density .
Taken together, the development of practice guidelines and proficiency testing indicated that the aCGH and SNP chip analyses are becoming the ‘gold standard’ in clinical diagnosis of chromosomal and genomic abnormalities. As shown in Figure 2, a workflow integrating the cell-based molecular cytogenetic methods (G-band and FISH), the DNA-based genomic copy number detection (aCGH or SNP chip) and evidence-based result interpretation has been the most efficient and cost-effective diagnostic cytogenomic setting. On the clinical front, pediatric genetic evaluation should be arranged with pre- and post-test consultations by a medical geneticist so that the benefits, limitations, and possible outcomes, as well as the difficulties of interpreting some copy number variants can be discussed in details.
3. Spectrum of cytogenomic abnormalities in ID/DD and ASD
The prevalence of ID/DD and ASD are reported to be 1~3% and 0.67%, respectively . Other common neurodevelopment disorders including speech and language delay, schizophrenia and epilepsy are also subjected for cytogenomic testing. The integrated cytogenomic analysis has significantly improved the diagnostic yield from 3.7-6.8% by molecular cytogenetic analysis  to 15-20% by oligonucleotide aCGH or SNP gene chip . The spectrum of cytogenomic abnormalities ranges from large interstitial and subtelomeric imbalances, submicroscopic recurrent genomic disorders, to cryptic oligo-genic to intragenic copy number changes. However, the diagnostic yield could be varied by the criteria of patient referrals and the resolution of genomic analysis. For example, from 2006-2011, there were 1,354 consecutive pediatric patients analyzed by 44K and 180K oligonucleotide aCGH (Agilent) in Yale Cytogenetic Laboratory and pathogenic abnormalities were detected in 176 patients (a 13% diagnostic yield). These abnormalities were classified into chromosomal and cryptic structural abnormalities 95 patients (54%, 95/176), recurrent genomic disorders in 66 patients (37.5%, 66/176), and common aneuploidies in 15 patients (8.5%).
3.1. Chromosomal and cryptic structure abnormalities
With its much higher analytical resolution than chromosome G-banding, genomic analysis can delineate the genomic coordinates and gene contents for almost all chromosomally visible numerical and structural imbalances. This genomic information facilitates fine mapping of critical regions or intervals containing candidate dosage-sensitive genes through subtractive comparison of overlapped deletions and duplications [59-61]. In most recent case reports, the critical regions defined by aCGH or SNP chip have clearly mapped genomic coordinates and accurately defined gene content. As more cases involving similar genomic locus accumulated, the mapping of a critical region can be narrowed down to a few candidate genes or even a single gene. A typical example is the mapping of dosage-sensitive genes associated with microcephaly, corpus callosum agenesis and seizure from chromosome 1q43-q44 deletion syndrome (OMIM#612337). This syndrome is caused by heterogeneous subtelomeric deletions of 1q43-q44. Based on 10 cases of unrelated patients with 1q43-q44 subtelomeric deletions, Van Bon et al.  defined a 360 Kb critical region of 1q44 with four candidate genes C1orf100, ADSS, C10rf101 and C1orf121 for corpus callosum abnormality. A series of studies including cases with small interstitial deletions suggested that the nearby genes AKT3, ZNF238 and HNRNPU are more likely the candidate genes [63-67]. A de novo 163 Kb interstitial microdeletion at 1q44 involving only the HNRNPU and FAM36B genes was reported in a boy with thin corpus callosum, psychomotor delay and seizure . Combined data from these studies supported three critical regions containing AKT3, ZNF238 and HNRNPU genes for microcephaly, corpus callosum abnormalities and seizure, respectively. However, the term “corpus callosum abnormalities” associated with 1q44 deletions include a spectrum of developmental aberrations from complete agenesis, partial agenesis, dysgenesis, hypoplasia and thin corpus callosum. The modifying effect from gene interaction or genetic background could also contribute to the phenotype. Further experimental study on gene function and interaction is needed to fully understand the genotype-phenotype correlation.
Genomic analysis can also resolve the genomic structures, mutagenesis mechanisms and mitotic or meiotic behaviors from puzzling chromosomal structural abnormalities like ring chromosomes or supernumerary marker chromosomes [69-71]. For example, ring chromosome 20 syndrome is a rare chromosomal disorder characterized by refractory epilepsy with seizures in wakefulness and sleep, behavior problems and mild to severe cognitive impairment. The aCGH analysis revealed two distinct groups of patients: 75% were mosaic for the r and a normal cell line with no detectable deletions or duplications of chromosome 20 in either cell line and 25% had non-mosaic ring chromosomes with a deletion at one or both ends of the chromosome. The age of onset of seizures inversely correlated with the percentage of cells containing the ring chromosome . Another interesting observation from aCGH applications on two large series is the detection of low-level mosaicism of numerical and structural abnormalities in approximately 0.5% of patients referred for DD/ID/MCA [73, 74]. The authors suggested that the DNA extracted from the white blood cells can reflect mosaic pattern more accurately than culture stimulated lymphocytes. A cytogenomic approach combining cell-based methods of FISH on directly prepared interphase cells and extensive karyotyping on metaphase cells with DNA-based estimation from aCGH log2 ratio or SNP pattern was proposed for dissecting mosaic patterns .
Hidden genomic aberrations in complex chromosomal rearrangements or apparently balanced translocations were also detected by aCGH [75, 76]. Of patients presenting abnormal phenotypes and an apparently balanced translocation, approximately 29-40% has cryptic breakpoint-associated or unrelated imbalances of paternal origin [77, 78]. Several disease-causing mechanisms induced by a balanced translocation including loss of function by gene disruption, gain of function by gene fusion and aberrant expression by positional effect have been demonstrated. For example, Cacciagli et al.  detected a de novo balanced translocation t(10;13)(p12;q12) in a patient with severe speech delay and major hypotonia. This translocation disrupted the ATP8A2 gene. This gene is highly expressed in the brain, suggesting the patient’s mental disability is likely due to the halpoinsufficiency of the ATP8A2 gene. Brownstein et al.  reported a case with over-expression of the α-Klotho gene induced by a balanced translocation t(9;13)(q21.13;q13.1) and established the association α-Klotho over-expression with hypophosphatemic rickets and hyperparathyroidism. Application of paired-end genomic sequencing or breakpoint-targeted capture sequencing on five ASD/DD patients carrying a balanced rearrangement revealed unexpected sequence complexity as an underlying feature of karyotyping balanced alterations . Cost-effective diagnostic sequencing analysis for balanced rearrangements detected in patients with ID/DD/ASD should be implemented in the near future.
3.2. Recurrent genomic disorders
Genomic disorders refer to microdeletions and microduplications mediated by non-allelic homologous recombination (NAHR) within regional low copy repeats (LCRs). A dozen of recurrent genomic disorders such as DiGeorge syndrome caused by a deletion at 22q11.2, Williams-Beuren syndrome (OMIM#194050) by a deletion as 7q11.23, Prader-Willi syndrome and Angelman syndrome by a deletion at 15q11.2 have been recognized clinically and routinely diagnosed by FISH testing. The application of genomic analysis enables not only more accurate diagnosis of these previously recognized genomic disorders but also the detection of many novel recurrent genomic disorders. In 2006, the first genomic disorder identified by aCGH was a 500 Kb microdeletion at 17q21.31 containing the MAPT gene (microtubular associated protein tau) from patients with a clearly recognizable ID, hypotonia and a characteristic face [82, 83]. This later termed Koolen syndrome (OMIM#610443) is caused either by heterozygous mutation in the KANSL1 gene or a 17q21.31 deletion. The KANSL1 gene encodes a nuclear protein that plays a role in chromatin modification. It is a member of histone acetyltransferase (HAT) complex. The reciprocal 17q21.31 microduplication syndrome (OMIM#613533) manifestsing some degree of psychomotor retardation and poor social interaction and communication difficulties reminiscent of ASD was reported . Since then, many genomic disorders have been reported, and the diagnosis of these genomic disorders has become an integral part of pediatric genetic evaluation. The aCGH analysis on 15,767 pediatric patients with ID/DD estimated that ~14.2% of them are caused by pCNVs over 400Kb, and approximately 60% of these pCNVs are within 45 known genomic disorder regions . An evidence-based approach was used to establish the functional and clinical significance of the most commonly seen 14 genomic disorders . Table 2 lists the recognized dosage-sensitive genes and estimated penetrance, frequency and prevalence of these 14 recurrent genomic disorders. These most frequently seen genomic disorders represent a 4.5% (1/22) diagnostic yield in pediatric patients and an estimated 0.18% (1/550) prevalence in a general population. A study of human populations for the polymorphic inversions at 17q21.31 observed that the H2 haplotype occurred at the highest frequencies in South Asian and Southern Europe; this H2 haplotype is susceptible to de novo deletions that lead to developmental delay and learning difficulties . Population genetic studies for genomic disorders of other loci could define predisposing genomic structures and recurrence risk for different ethnic groups at different geographic regions.
|Chromosome G-band||Genomic locus (Mb in hg18)||Dosage-sensitive genes||Abn.*||Clinical features (OMIM#)*||Penetrance**||Ref #46 (15,749 cases)||Ref #85 (15,767 cases)||Frequency (31,516)†||Est. prevalence‡|
|22q11.2||17.4-18.7||TBX1||del||DiGeorge syndorme (188400)||1||93||96||1/167||1 in 4,000|
|dup||22q11.2 duplication syndrome (608363)||0.91||32||50||1/384||1 in 9,200|
|16p11.2||29.5-30.1||TBX6, KCTD13||del||Autism, Obsity (611913)||0.96||67||64||1/241||1 in 5,800|
|dup||Autism, underweight||0.93||39||28||1/470||1 in 11,300|
|1q21.1||145.0-146.4||HYDIN2||del||1q21.1 deletion syndrome (612474)||0.96||55||47||1/309||1 in 7,400|
|dup||1q21.1 duplication syndrome (612475)||0.96||28||26||1/584||1 in 14,000|
|15q13.2-q13.3||28.7-30.3||CHRNA7||del||15q13.3 microdeletion syndrome (612001)||1||46||42||1/358||1 in 8,600|
|dup||Psychiatric disease||0.87||14||20||1/927||1 in 22,200|
|7q11.23||72.2-77.5||ELN, GTF21||del||Williams syndrome (194050)||1||34||42||1/414||1 in 10,000|
|FZD9, LIMK1||dup||7q11.23 duplication syndrome (609757)||1||16||16||1/985||1 in 23,600|
|15q11.2-q13||22.3-26.1||GABRA5||del||Prader-Willi (176270)/Angelman syndromes (105830)||1||41||16||1/552||1 in 13,300|
|dup||15q11-q13 duplication syndrome (608636)||1||35||27||1/508||1 in 12,200|
|17q21.31||41.0-41.7||MAPT, KANSL1||del||17q21.31 deletion syndrome (610443)||1||22||23||1/700||1 in 16,800|
|dup||17q21.31 duplication syndrome (613533)||0.43||21||2||1/1,370||1 in 32,900|
|16p13.11||15.4-16.2||MYH11||del||Autism, ID, and schizophrenia||0.86||22||18||1/788||1 in 18,900|
|dup||Variable phenotype||0.71||45||24||1/457||1 in 11,000|
|17p11.2||16.6-20.4||RAI1||del||Smith-Magenis syndrome (182290)||1||16||16||1/985||1 in 23,600|
|dup||Potocki-Lupski syndrome (610883)||1||15||9||1/1,313||1 in 31,500|
|17q12||31.9-33.3||TCF2||del||Renal cysts and diabetes (137920)||0.88||18||14||1/985||1 in 23,600|
|dup||Epilepsy||0.86||21||18||1/808||1 in 19,400|
|1q21||144.1-144.5||HFE2||del||Thrombocytopenia-absent radius syndrome (274000)||0.87||17||13||1/1,050||1 in 25,200|
|dup||1q21.2 duplication||0.81||9||25||1/927||1 in 22,200|
|8p23.1||8.1-11.8||SOX7, CLDN23||del||8p23.1 deletion syndrome||1||10||7||1/1,853||1 in 44,500|
|dup||Variable phenotype||1||6||7||1/2,424||1 in 58,200|
|5q35||175.6-176.9||NSD1||del||Sotos syndrome (117550)||1||8||8||1/1,969||1 in 47,300|
|dup||Short stature, microcephaly, speech delay||n/a||2||0||1/15,758||1 in 378,200|
|3q29||197.2-198.8||DLG1||del||3q29 microdeletion syndrome (609425)||1||9||6||1/2,101||1 in 50,400|
|dup||3q29 duplication syndrome (611936)||1||8||4||1/2,626||1 in 63,000|
|Total||1 in 22||1 in 550|
The microdeletion and microduplication of the same genomic locus offer an opportunity to study dosage-sensitive genes, especially for the opposite phenotypes of haploinsufficient and triple-sensitive genes. Although clinical evaluation has been complicated by overlapped phenotypes, variable expressivity and reduce penetrance for many newly-defined genomic disorders, opposite phenotypes have been seen in a few genomic disorders. Comparison of clinical features of 7q11.23 microdeletion for Williams syndrome with reciprocal microduplication syndrome (OMIM#609757) noted different neurologic and behavior problems. The 7q11.23 microdeletion shows relative strength in expressive language and excessive sociability. To the contrary, the 7q11.23 microduplication has speech and language delay, deficit of social interaction and aggressive behavior. The FZD9, LIMK1, CLIP2 and GTF21RD1 genes have been suggested to be the candidate genes for neurologic and behavior phenotypes . Microdeletion syndrome at 16p11.2 (OMIM#611913) and microduplication syndrome at 16p11.2 (OMIM#614671) were initially associated with ASD  but a subsequent study revealed mirror body mass index phenotypes . Microdeletion at 16p11.2 is often associated with obesity, macrocephaly and ASD, while reciprocal microduplication is associated with underweight, microcephaly and schizophrenia. Chromosome 1q21.1 microdeletion syndrome (OMIM#612474) and the reciprocal 1q21.1 microduplication syndrome (IMIM#612475) associated with microcephaly or macrocephaly and developmental and behavioral abnormalities, respectively [90-92]. Brunetti-Pierri et al.  suggested that the HYDIN paralog located in the 1q21 interval is a dosage-sensitive gene exclusively expressed in brain; HYDIN haploinsufficience is responsible for the microcephaly and HYDIN triple-sensitive effect is for the macrocephaly.
Reciprocal microdeletion and duplication could present distinct, overlapping or similar phenotype likely caused by the same dosage-sensitive gene. Hereditary neuropathy with liability to pressure palsies (HNPP) (OMIM#162500) is caused by a microdeletion of the PMP22 gene at 17p11.2 and Charcot-Marie-Tooth Neuropathy type 1A (OMIM#118220) is caused by a reciprocal microduplication. Both present demyelinating neuropathies but the phenotypes are distinct. Smith-Magensis syndrome is caused by a microdeletion at 17p11.2 involving the RAI1 gene and Potocki-Lupski syndrome is caused by the reciprocal microduplication. In human, both syndromes show variable mental retardation, motor and speech delay, sleep disturbance, behavior problems and autistic features. In mouse models, mice with RAI1 deletion show hypoactive, decreased anxiety and decreased dominance like behavior. To the contrary, mice with RAI1 duplication show hyperactive, increased anxiety and increased dominance behavior . All these findings indicate the presence of different type of dosage-sensitive genes and the importance of careful and detailed clinical observations and functional understanding of multiple and complex dosage-sensitive mechanisms.
3.3. UPD (uniparental disomy), AOH (absence of heterozygosity) and VUS (variants of unknown significance)
UPD is defined as the inheritance of both homologs of a chromosome pair from a single parent. When both homologs are from that parent, it is denoted as heterodisomy or heteroUPD. If both copies are from one parental homolog, it is termed as isodisomy or isoUPD. UPD of chromosomes 6, 7, 11, 14 and 15 have been known to cause diseases. Paternal UPD of chromosome 15, patUPD15, causes Prader-Willi syndrome while maternal UPD, matUPD15 causes Angelman syndrome. Segmental duplication of maternal 11p15 or paternal deletion of 11p15 causes decreased expression of IGF2, manifesting with impaired growth for Silver–Russell syndrome. Segmental duplication of paternal 11p15, paternal UPD, or maternal imprinting mutations of 11p15 lead to increased expression of IGF2, manifesting with overgrowth for Beckwith–Wiedemann syndrome . Current validated CGH-SNP aCGH and SNP chip can detect chromosomal and segmental isoUPD but the detection of heteroUPD requires concurrent parental study [36-38]. The clinical significance of AOH segments is not clear. One possible disease-causing mechanism could be the presence of autosomal recessive phenotype by the doubling of a single mutation within the AOH segment. Other findings such as VUSs detected in approximately 9.3% of pediatric cases will require follow up parental study to determine the parental origin of VUS and even further functional analysis to understand their clinical significance .
4. Mapping candidate genes and understanding dosage-sensitive mechanisms
The characterization of genomic coordinates of pathogenic chromosomal and genomic abnormalities in patients with ID/DD/MCA and ASD provides the opportunities to map dosage-sensitive genes. Genomic imprinting is a known dosage-sensitive mechanism regulating gene expression from only the paternal or maternal genes. By stringent definition, a dosage-sensitive gene will cause opposite phenotypes by haploinsufficiency in a deletion and triple-overdose in a duplication. However, ascertainment bias could be introduced due to assumed more severe phenotype in a deletion than a duplication. An accurate genotype-phenotype correlation will require systematic and replicate studies on large case-control series. For all newly-defined genomic disorders, more family-based and case-control pathophysiologic and disease course studies as well as functional studies in in vitro cellular models or in vivo animal models are needed. Functional characterization of dosage-sensitive mechanisms will provide better understanding of human development and rational intervention for ID/DD/MCA and ASD.
4.1. In Silico mining revealing interacting brain expressed gene from cytogenomic abnormalities
The development of high-throughput genomic techniques has prompted the introduction of numerous bioinformatic data mining tools to study gene function and interaction. It has been hypothesized that brain expressed genes and their interaction could play important roles in human mental development. In Silico bioinformatic analyses have been used to identify candidate genes and functional interactions from a small case series of pediatric patients and a large case-control study of ASD patients [95, 96]. It has been hypothesized that ID phenotype from different pCNVs and segmental imbalances may be caused by the functional disturbance of genes in interacting pathways or networks. To reveal candidate brain-expressed genes (BEGs) and their interacting networks from detected cytogenomic abnormalities, a discovery-driven in silico analysis using bioinformatic tools was performed . Of the 1,354 patients analyzed by oligonucleotide array comparative genomic hybridization (aCGH) in a five-year interval in Yale cytogenetics lab, pathogenic abnormalities were detected in 176 patients, including recurrent genomic disorders in 66 patients, subtelomeric rearrangements in 45 patients, interstitial imbalances in 33 patients, chromosomal structural rearrangements in 17 patients and aneuploidies in 15 patients. Subtractive mapping of bCNV and extractive constructing of smallest overlapped regions defined 82 disjointed critical regions from the detected abnormalities. All genes from these critical regions were sorted by functional annotation using Database for Annotation, Visualization, and Integrated Discovery (DAVID) and by tissue expression pattern from Uniprot. A list of 461 BEGs generated from 73 disjointed critical regions was denoted. Enrichment of central nervous system specific genes in these regions was noted, and the number of BEGs increased with the size of the regions. Further gene prioritization using Gene Relationships Across Implicated Loci (GRAIL) identified candidate BEGs with significant cross region interrelation from data sources of PubMed abstract, gene ontology terms and expression pattern. Figure 3 shows the cross-loci interactions of gene ontology from detected deletions and duplications. This result implied shared cellular component and biological process from the defined candidate BEGs. Pathway analysis using Ingenuity Pathway Analysis (IPA) denoted five significant gene networks involving cell cycle, cell-to-cell signaling, cellular assembly, cell morphology, and gene expression regulations. Previous studies and our preliminary data support a model of polygenic interactions and multiple functional networks for human mental development. Further experimental study on the cellular function of these candidate genes and their interactions will lead to a better understanding of dosage-sensitive mechanisms.
4.2. In vitro cellular phenotyping and in vivo animal modeling
Little is known about the cellular and developmental functions of many newly identified dosage-sensitive genes. The understanding of dosage-sensitive mechanism for specific gene or pathway could lead to targeted treatment using protein inhibitor or small RNA interference. The limited availability and accessibility of live brain and neuron tissues is the major obstacle in the study of disease-causing mechanisms in human mental development. Recent progress in stem cell technologies has made possible the modeling of disease using patient derived stem cells. In 2010, Marchetto et al.  developed a culture system using induced pluripotent stem cells (iPSCs) from Rett syndrome patients’ fibroblasts. These Rett syndrome iPSCs were able to undergo X-inactivation and generate functional neurons. Neurons derived from these iPSCs had fewer synapases, reduced spine density, smaller soma size, altered calcium signaling and electrophysiological defects when compared to controls. This cellular model provided critical evidence of an unexplored developmental window before disease onset and enable direct testing of drug effect in rescuing synaptic defects. Similarly, Bona fide iPCSs were generated from fibroblasts of a patient with Prader-Willi syndrome bearing 4p/15q translocation . These iPSCs retained the DNA methylation in the imprinting center of maternal allele and reduced the expression of the disease-associated SNRPN gene, therefore, could be differentiated into neuron tissue to modeling cellular phenotype. Using skin fibroblasts from affected patients or amniocytes from prenatal diagnosis tests, iPCSs for monosomy X (Turner syndrome), trisomy 8 (Warkany syndrome 2), trisomy 13 (Patau syndrome) and partial trisomy 11;22 (Emanuel syndrome) were generated for further studies of global gene expression and tissue-specific differentiation . All these reports indicated that stem cell technology offers reproducible in vitro cellular phenotypes for better understanding of neurodevelopment and also a testable system for the development of therapeutical approach .
In vivo animal models generated by genetic manipulation have also been used to study cytogenomic abnormalities. Mouse models of 16p11.2 deletion and duplication detected in vivo brain anomalies and behavior disorders . Overexpression and transcript suppression of the 29 candidate genes from this 16p11.2 region in zebrafish identified the KCTD13 gene as a major driver of the mirrored neuroanatomical phenotypes . Although the physiology and genetic makeup in the model animals are different from human, the animal models allow direct evaluation of gene-dosage effects and association of neuroanatomical defects with phenotypes. Figure 4 shows modeling of human cytogenomic abnormalities using stem cell technology and animal model and the potential application in drug development.
5. Future directions and concluding remarks
In the first decade since the completion of the Human Genome Project, we have witnessed the rapid development of genomic technologies and the integration of genomic analysis into pediatric genetic evaluation. The experiences from current genomic analysis revealed a systematic approach including the evaluation of analytical validity of novel technologies using ROC statistics, the assessment of clinical validity through case series or clinical trials, the establishment of clinical evidence of detected genomic variants by large scale case-control studies, and the development of practice standards and guidelines as well as web-deliverable databases and resources. This approach will be effective for the integration of the next-generation exome sequencing and the next-next generation genomic sequencing technologies into clinical screening and diagnosis. However, the increased technical complexity of exome or genomic sequencing and the intellectual challenge to interpret a large amount of sequencing data will require more collaborative effort and supportive resources. The ultimate goal for an integrated pediatric and prenatal genetic evaluation is to provide comprehensive and in-depth profiling of karyotype, pCNV and mutations and associated medical phenotypes and risks for patients.
The aCGH or SNP chip analysis has brought pediatric genetic evaluation into the genomic era. This progress has contributed greatly to our understanding of genetic etiology in 12%-20% of pediatric patients with DD/ID/MCA/ASD. In additional to the technical progress in genetic diagnosis, the implementations of knowledge-based genetic counseling, rational clinical action and follow up familial studies could be of direct benefit for a substantial proportion of patients [104, 105]. For example, the aggressive behavior from patients with a 15q13.3 deletion involving the CHRNA7 gene could benefit from treatment with the NChR allosteric modulator and acetylcholinesterase (AChE) inhibitor, galantamine . As we gain more knowledge of these genomic abnormalities through functional analysis using in vitro cellular and in vivo animal models, disease-specific guidance management and treatment could be developed, culminating in fully personalized medicine.
Database of Genomic Variants (DGV): http://projects.tcag.ca/variation/
International Standards for Cytogenomic Arrays (ISCA): https://www.iscaconsortium.org/
DatabasE of Chromosomal Imbalance and Phenotype in Humans using Ensembl Resources (DECIPHER): http://decipher.sanger.ac.uk/
Human Genome Browser: http://genome.ucsc.edu/
Online Mendelian Inheritance in Man: http://www.ncbi.nlm.nih.gov/omim
Database for Annotation, Visualization, and Integrated Discovery (DAVID) ( http://david.abcc.ncifcrf.gov/ )
Gene Relationships Across Implicated Loci (http://www.broadinstitute.org/mpg/grail/)
Ingenuity Pathways Analysis (Ingenuity Systems Inc. http://www.ingenuity.com/)