Advances in Molecular Analysis of Muscular Dystrophies

Molecular genetic testing began in the mid-1980’s in research laboratories which involved linkage analysis to aid disease gene discovery (Petersen 2000; Ensenauer, Michels et al. 2005). With the identification of novel disease causing genes, genetic tests became available and were launched in clinical testing laboratories in both academic and commercial settings. Unlike complex diseases such as cardiovascular diseases and cancers, diagnostic assays for monogenic Mendelian genetic disorders are relatively easy to design and use in a clinical diagnostic setting. Muscular dystrophies which affect muscles are mostly monogenic diseases and are either dominantly or recessively inherited. However, due to overlapping phenotype or similar clinical presentations of several disorders caused by closely associated genes, the diagnosis may often be elusive. Muscular dystrophies (MD) are a group of genetically and clinically heterogeneous hereditary myopathies characterized by hypotonia, skeletal muscle weakness, contractures, and delayed motor development. They are broadly classified into nine different types including Duchenne (DMD), Becker (BMD), limb girdle (LGMD), congenital (CMD), facioscapulohumeral (FCMD), myotonic (MD), oculopharyngeal (OPMD), distal and Emery-Dreifuss (EMD), some of which have several subtypes based on the gene involved. The clinical manifestations and severity of the various types and subtypes of muscular dystrophies vary widely, ranging from mild myopathy to even cardiac failure. Because of the heterogeneity and overlapping phenotype the patients often face a diagnostic odyssey before receiving the appropriate clinical and molecular diagnosis (Mendell, Sahenk et al. 1995; Mendell 2001). Given the recent improvement of molecular technologies, the classification of MDs in specific, has significantly changed from phenotype driven towards a more molecular based categorization. Therefore it is of pivotal importance to diagnose the molecular basis for the disease which includes determination of the gene and the genotype involved. Molecular diagnosis of the disease is important not only for subsequent patient follow-up but also for choosing the appropriate personalized therapy. Single gene sequencing is considered effective when a single missing protein is identified by a muscle biopsy and loss of that protein fits the phenotype. However, a comprehensive gene sequencing panel is necessary when ambiguous results arise or when muscle biopsies are difficult to obtain. Recent technological advances in sequencing using next generation sequencing and microarrays has made it possible to screen a large number of genes for causative mutations at a fairly low cost and in a reasonably less time. In this book chapter we will discuss the various technological advancements in the molecular diagnosis of various muscular dystrophies and its impact in the clinical world.


Introduction
Molecular genetic testing began in the mid-1980's in research laboratories which involved linkage analysis to aid disease gene discovery (Petersen 2000;Ensenauer, Michels et al. 2005). With the identification of novel disease causing genes, genetic tests became available and were launched in clinical testing laboratories in both academic and commercial settings. Unlike complex diseases such as cardiovascular diseases and cancers, diagnostic assays for monogenic Mendelian genetic disorders are relatively easy to design and use in a clinical diagnostic setting. Muscular dystrophies which affect muscles are mostly monogenic diseases and are either dominantly or recessively inherited. However, due to overlapping phenotype or similar clinical presentations of several disorders caused by closely associated genes, the diagnosis may often be elusive. Muscular dystrophies (MD) are a group of genetically and clinically heterogeneous hereditary myopathies characterized by hypotonia, skeletal muscle weakness, contractures, and delayed motor development. They are broadly classified into nine different types including Duchenne (DMD), Becker (BMD), limb girdle (LGMD), congenital (CMD), facioscapulohumeral (FCMD), myotonic (MD), oculopharyngeal (OPMD), distal and Emery-Dreifuss (EMD), some of which have several subtypes based on the gene involved. The clinical manifestations and severity of the various types and subtypes of muscular dystrophies vary widely, ranging from mild myopathy to even cardiac failure. Because of the heterogeneity and overlapping phenotype the patients often face a diagnostic odyssey before receiving the appropriate clinical and molecular diagnosis (Mendell, Sahenk et al. 1995;Mendell 2001). Given the recent improvement of molecular technologies, the classification of MDs in specific, has significantly changed from phenotype driven towards a more molecular based categorization. Therefore it is of pivotal importance to diagnose the molecular basis for the disease which includes determination of the gene and the genotype involved. Molecular diagnosis of the disease is important not only for subsequent patient follow-up but also for choosing the appropriate personalized therapy. Single gene sequencing is considered effective when a single missing protein is identified by a muscle biopsy and loss of that protein fits the phenotype. However, a comprehensive gene sequencing panel is necessary when ambiguous results arise or when muscle biopsies are difficult to obtain. Recent technological advances in sequencing using next generation sequencing and microarrays has made it possible to screen a large number of genes for causative mutations at a fairly low cost and in a reasonably less time. In this book chapter we will discuss the various technological advancements in the molecular diagnosis of various muscular dystrophies and its impact in the clinical world.

Mutation spectrum in genes associated with MD
As discussed in other chapters in this book, each type and subtype of MD is caused by mutations in different genes associated with muscle structure and function. Therefore identifying the gene is critical to diagnosis and treatment. Molecular approach to disease diagnosis is highly dependent on the mutation spectrum of the disease causing gene. For example, while DMD associated with Duchenne and Becker dystrophies has a high frequency of intragenic deletions (65%), mutation spectrum of CAPN3 involved in LGMD2A shows a high frequency of point mutations (76%) (Figure 1). Based on these mutation spectrums, deletion-duplication analysis is suggested prior to sequencing analysis for DMD while the inverse is suggested for CAPN3.

Traditional methods used for molecular diagnosis
Since the early practice of genetic testing for diseases that started in mid-1980, several DNA and protein based diagnostic methods have been developed. These traditional methods of diagnosis for muscular dystrophies include linkage analysis, multiplex PCR, Multiplex ligation-dependent probe amplification (MLPA), quantitative PCR, Southern blotting, Immunoblotting (IB) and Immunohistochemical (IHC) analysis. While MLPA, PCR and southern blotting involve DNA analysis, IHC and IB involve protein expression and require muscle biopsies. However, since performing a muscle biopsy is highly invasive it is not preferred both by the patients as well as physicians. Recent advances in molecular analysis for mutation detection have revolutionized the approach to diagnosing these patients (Witkowski 1989;Gangopadhyay, Sherratt et al. 1992;Whittock, Roberts et al. 1997;Ginjaar, Kneppers et al. 2000;Beroud, Carrie et al. 2004).

Immunohistochemistry
Prior to the development of gene based mutation-detection analysis, disease diagnosis for various MDs was through conventional screening of affected individuals by clinical examination which involved assessment of creatine phosphokinase (CPK) levels and immuno-histological examination of muscle tissue obtained only through an invasive biopsy (Love and Davies 1989;Love, Forrest et al. 1989). As can be seen in the figure below, dystrophic muscle fibers can be easily distinguished from control or normal fibers. These distinguishable characteristics include the high variation in the fiber size and shape, high frequency of internal nuclei, increased connective and adipose tissue in between fibers, as www.intechopen.com well as presence of large number of regenerating and degenerating fibers in dystrophic musculature (Norwood, de Visser et al. 2007). However these pathological findings vary widely based on the protein involved or disease associated and age of the patient during biopsy. Though immuno-histochemical findings may lead a way to the confirmation of muscular dystrophy, identifying the exact protein (or gene) involved and therefore the specific subtype of muscular dystrophy may always be elusive. This is because of the secondary reduction in protein levels of other closely integrated proteins (Hack, Ly et al. 1998). For example, a mutation in one sarcoglycan can often lead to reduced expression of other sarcoglycans as well (Hack, Ly et al. 1998). Therefore, molecular diagnosis is highly recommended for confirmation of the involved gene and therefore the specific subtype of MD.
Shown in the figure below, are the immunological findings of an individual with mutations in calpain3 (CAPN3) which causes LGMD2A (Figure 2). Though calpain3 analysis is not shown here, it can be observed that there is a secondary reduction in β-sarcoglycan. Moreover, in the same patient, Immunoblotting analysis showed occasional reduction of dysferlin protein levels as well. This suggests the importance of molecular diagnosis through DNA analysis, for proper management and therapy.

Multiplex western blotting
Immunoblotting or western blotting provides an alternative to IHC. IB provides more information regarding the expression and mutation of the protein compared to IHC. Skeletal muscle proteins such as dystrophin, dystroglycans, sarcoglycans and laminin-2 that are associated with Duchenne/Becker muscular dystrophies and different LGMDs as well as CMDs, physically interact and integrate to provide structural stability to the muscle fiber cells. Therefore mutation in one protein may result in altered expressions or stability of these closely associated proteins leading to overlapping phenotypes. Hence, looking at the expression levels of all proteins simultaneously may help better diagnose the disease. Instead of analyzing each protein individually as in IHC and regular IB, a multiplex WB using a cocktail of antibodies can be adopted (Anderson and Davison 1999). This is facilitated by the difference in the molecular sizes of the different proteins. In our laboratory, www.intechopen.com a variety of antibodies covering different domains of the proteins have been selected to avoid any variability in the hybridization and analyzed as shown ( Figure 3).
In the figure below ( Figure 3A), at least seven different proteins involved in various MDs were simultaneously analyzed in a set of clinically diagnosed dystrophic patients. Two different cocktails, each made of a set of antibodies targeting different proteins have been optimized for such diagnosis in our laboratory. As can be seen in the figure below, controls (lanes to the left) significantly express dystrophin while no detectable levels could be observed in the patients (lanes to the right). Though there appears to be a secondary reduction in the sarcoglycans as previously described, the complete absence of dystrophin strongly indicates that the causative protein (or gene) is perhaps dystrophin. Further molecular analysis involving deletionduplication analysis or sequencing of the entire gene is required to confirm the exact mutation. Similarly, carrier and disease status can be inferred by immunoblotting analysis for other MDs like the limb girdle muscular dystrophy type 2 B (LGMD2B) by comparison of dysferlin protein expression ( Figure 3B and C). The low expression of dysferlin protein in lane 2 (figure 3B) indicates probable carrier status while the relatively smaller bands in lane 2 ( Figure 3C) indicates a probable functional dysferlin protein with a deletion. Both these findings were later confirmed by sequencing analysis.

Multiplex PCR
Multiplex PCR amplification of genomic DNA is a conventional and cost-effective method for identifying deletions in hot spot regions of the gene in affected individuals and involves visualization of bands through regular agarose gels stained with ethidium bromide. Using a combination of several primer pairs in 2-3 reactions, most of the exons including hot-spot regions for DMD would be tested for, at a reasonable expense (Beggs, Koenig et al. 1990). However dosage analysis is required for carrier females and is performed through quantitative PCR (qPCR) in which the copy number of the target sequence is directly proportional to the fluorescence of SYBR Green dye during the logarithmic phase. Both deletions and duplications can be identified by qPCR. For example, there would be no amplification of the target product in a male individual with deletion, while in carriers, the amount of product amplified would be half the amount observed in normals carrying two copies of the target sequence to start with. Similarly, in case of a duplication carrier, this ratio would be 3:2 compared to normal. Though these PCR based tools were useful for Duchenne and Becker muscular dystrophies where a majority (70-75%) of disease causing mutations were either deletions or duplications, they were not preferred for other muscular dystrophies.

Southern blotting
Southern blotting is an alternative technique for screening deletions and duplications in large genes like DMD. It involves gel electrophoresis combined with transfer of separated fragments on to membrane and subsequent target fragment detection by hybridization to known probes. It is used in clinical laboratories to confirm deletion and duplication mutations identified by multiplex PCR as well as to determine the extent of the deletion/duplication. Dosage analysis of copy number can also be performed through southern blotting by subsequent densitometry but can be challenging for carrier females (Medori, Brooke et al. 1989).

MLPA
Multiplex ligation-dependent probe amplification is a variation of conventional PCR and a significant advancement over the multiplex PCR method that permits multiple different targets to be amplified with only a single primer pair (Schouten, McElgunn et al. 2002). Clinically applicable MLPA based simultaneous screening of all 79 exons of DMD gene for deletions and duplications in Duchenne and Becker muscular dystrophy patients was developed around year 2005 and is widely used till date in several clinical laboratories around the world (Schwartz and Duno 2004;Janssen, Hartmann et al. 2005;Lalic, Vossen et al. 2005). It does not require costly equipments and is a very cost-effective method and has therefore been widely accepted by several diagnostic labs. In addition to deletions and duplications it may also identify point mutations. Since MLPA is highly dependant on hybridization of a single probe false positive results can occur in presence of a variation (single nucleotide, deletion or insertion) in the sequence hybridizing to the probe. For this reason single exon deletion need to be investigated further. The presence of variation may also hinder the precise definition of end points of deletions.

Sequencing
While most of the above mentioned DNA based methods are effective for deletion and duplication detection, they are not preferred for analysis of point mutations. As discussed earlier, majority of the smaller genes associated with MDs have a high frequency of point mutations than for deletions and duplications. Only DMD has a high frequency of deletions and duplications while point mutations still account for atleast 35%. Therefore thorough diagnosis of such MDs requires sequence analysis of the exonic regions. PCR amplification of each exon of the suspected gene with exon-specific primer pairs followed by Sanger sequencing is therefore practiced. Patient sequences obtained thus, are then compared to reference sequences and sequence variations as fine as single base-pair change (point mutations) and small indels are efficiently identified (Figure 4). However, such PCR amplification and sequence analysis may be feasible for small genes that have only few exons. Further, thorough characterization of the clinical presentations need to be performed to narrow down the possible causative gene. Overlapping clinical phenotype and heterogeneity of MDs leads to a suspicion of more than one gene. In such scenario, analysis of more than one gene, some of which have a large number of exons (79 in DMD, 55 in DYSF, 24 in CAPN3) may be very tedious and expensive for clinical diagnosis. Therefore high-throughput, cost effective methods for disease diagnosis have always been in demand.

New technological advances in molecular diagnosis
The completion of human genome project has revolutionized the field of human genetics and more specifically human medical genetics (Venter, Adams et al. 2001). High-throughput mutation detection methods such as comparative genomic hybridization arrays (aCGH) and target capture based next generation sequencing panels have been developed. This topic will focus on the various rapidly emerging comprehensive technologies and their advantages over traditional methods. This will include detailed discussion of the microarray based gene panels and next generation sequencing.

Application of microarrays to detect copy number variation in MD genes
Microarray based comparative genomic hybridization (aCGH), also called molecular karyotyping, is a recently developed technique that enables high-resolution, genome-wide analysis of genomic copy number variations (CNVs). The assay has become a powerful routine clinical diagnostic tool and its increasing resolution and accuracy is gradually replacing traditional cytogenetic approaches for CNV determination. Earlier, the detection potential of genomic imbalances was limited to >5-10Mb with even the highest quality Gbanded chromosome analysis. However, with the advent of aCGH, deletions and duplications, as small as 50-100 kb in size are now routinely detected throughout the genome (Stankiewicz, Pursley et al. 2010). The wide-spread application of aCGH has also facilitated the identification of various recurrent CNVs and eventual characterization of several microdeletion and microduplication syndromes.
In a typical aCGH measurement, total genomic DNA is isolated from test (patient) and reference cell populations (or biological sample such as blood or saliva), differentially labeled and hybridized to oligonucleotide arrays. The relative hybridization intensity which is ideally proportional to the relative copy number of target sequence regions is then measured and calculated. The reference genome being normal, any increases and decreases in the intensity ratio directly indicate DNA copy-number variations (deletions or duplications) in the test or patient genome. The intensity data is typically normalized so that the modal ratio for the reference genome is set to a standard value of 0.0, and any decrease is inferred as deletion while an increase in s i g n a l i n t h e t e s t g e n o m e i s i n f e r r e d a s duplication ( Figure 5).  (chr15:40,697,686-40,713,512) using a custom-designed 385K high-density array from NimbleGen. The zoomed in view of the corresponding array highlights the breakpoints for a patient with a deletion mutation encompassing exon 10 to exon 12 with breakpoints in intron 9 and the 3' UTR. As can be seen in the above figure, the target sequence with normal copy number normalizes to value 0.0, while the deleted regions of exon 10 and exon 12 fall below 0.0 inferrring a loss of a copy number (deletion) compared to reference genome.
In our laboratory, gene-targeted high-resolution oligonucleotide CGH array was custom designed on a NimbleGen 385K platform (till June 2010) or OGT 44K platform (July 2010 onwards) to detect deletions and duplications in 450 genes associated with various genetic disorders. The NimbleGen 385K platform used long oligonucleotides (45-60 mer) to achieve isothermal Tm across the array, with repeat sequence masking implemented to ensure greater sensitivity and specificity. The OGT 44K platform has 44,000 unique sequence probes tiled on the array. Both arrays were designed with average spacing of 10 bp within coding regions and 25 bp within promoter, intronic regions, and 3' UTR, with repeat sequence masking. Use of intronic oligonucleotide probes allows robust detection of dosage changes of the gene within the entire genomic region, as well as determination of approximate breakpoints. The breakpoints for various deletions and duplications detected by the high-resolution aCGH analysis were as close as 500bp to the exact breakpoint determined by conventional sanger sequencing (Ankala, Kohn et al. 2012). Shown below is a figure with aCGH data for several patients with DMD ( Figure 6). Deletion-duplication analysis for the entire DMD gene by aCGH shows the various intragenic regions that were found deleted in the patients. The zoomed in view of the data also shows the specific exons that were found deleted in these patients thus confirming the molecular diagnosis. Fig. 6. The column on the left shows comparative genomic hybridization data for DMD gene locus (chrX:31,137,345-33,229,673) for six patient's DNA, using a custom-designed 385K high-density array from NimbleGen. Right column shows the zoomed in view of the corresponding array on the left, with the deleted exons highlighted in red. Each row refers to a patient.
Such a robust, diagnostic test capable of determining disease causing copy number variations (deletions or duplications) in one single assay makes clinical diagnosis rapid and economic. Almost all known MD associated genes can be analyzed simultaneously for mutations through one single affordable test which proves useful especially when the clinical phenotype is very overlapping and narrowing down of genes seems difficult.

Panel based approach for mutation detection in MDs for example CMD and LGMD
As subtypes of MD caused by different genes share similar clinical presentations simultaneous sequencing of all associated genes as a panel reduces costs and provides quick diagnosis. Several such sequencing panels of different MDs are currently offered in clinical diagnostic laboratories for molecular diagnosis. At least four different sequencing panels are offered at Emory Genetics Laboratory (EGL), which include LGMD, CMD and DMD. Table  1 summarizes the different genes involved in LGMD and the different number of exons in each of these genes.

Application of next generation sequencing to molecular diagnosis
The high demand for low-cost sequencing has instigated the development of highthroughput sequencing technologies that parallelize the sequence process, producing several thousands of reads or sequences simultaneously in a single reaction (Church 2006;Hall 2007). These high-throughput sequencing technologies have lowered the cost of DNA sequencing beyond what has been possible with the standard sanger sequencing methods Schuster 2008). A variety of technologies called next-generation sequencing technologies emerged, each with a unique biochemical strategy (Brenner, Johnson et al. 2000;Church 2006;Valouev, Ichikawa et al. 2008;Drmanac, Sparks et al. 2010;Porreca 2010). In general, most of these approaches use an in vitro clonal amplification or PCR step to amplify the DNA molecules present in the sample. Illumina or Solexa sequencing, Applied Biosystem's SOLiD sequencing and Ion semiconductor sequencing developed by Ion Torrent Systems Inc are most recent and popular next generation sequencing techniques that are currently being used in clinical laboratories for molecular diagnosis. The sensitivity and specificity of these sequences or the next generation sequencing method is further improved by target capturing the regions of genomic interest from a biological sample or DNA. Particularly when specific regions of the genome need to be targeted or when the gene of interest has been narrowed down to a specific region of the chromosome, then target enrichment methods may be used to enrich the samples in these genomic regions and processed for further analysis of variants or mutations by next generations sequencing. Several capture technologies namely microarray based capture for DMD, RainDance (ten Bosch and Grody 2008) and Fluidigm PCR based capture, Agilent SureSelect and Nimblegen Sequence capture (ten Bosch and Grody 2008) are available in the market. Our comprehensive and comparative analysis of these capture technologies has suggested that RainDance and Fluidigm PCR-based strategies are ideal for clinical panels as they are robust and give a coverage of ~100X whereas in-solution sequence capture protocols from Agilent and Nimblegen are ideal for research based approaches to identify novel genes due to the ability to capture a large number of genes (exome) in a single experiment (Bainbridge, Wang et al. ; Cirulli, Singh et al.).
These target enrichment methods are used both in research and clinical laboratories. In research laboratories, this allows for new gene discovery when CNVs in a particular genomic region correlate with a recurrent clinical phenotype. However, clinical laboratories have a different application where a certain set of disease genes (such as those discussed in Table 1. Known LGMD subtypes and causative genes. Shown are the various subtypes of LGMDs and causative genes that have been found to be associated with each of these types of limb girdle muscular dystrophies. Also listed are the number of exons that make up each of these genes to give an overview of the number of sequencing reactions that may be required to sequence and analyze the entire list of genes. Titin alone has 312 exons, and combined with all other genes the number of exons total 550 which may require more than 550 PCR reactions to sequence amplification considering that some exons may be too long for one single sequencing reaction. gene panels above) are targeted for mutation detection in patient sample. For example, in our clinical laboratory at Emory Genetics Laboratories, target enrichment is performed for a set of 91 genes associated with X-linked Intellectual Disability (XLID) to identify the causative gene and mutation in XLID patients. These two technologies in combination stand the most successful and economic diagnostic tool currently in use both in research and clinical laboratories.

Whole exome and whole genome sequencing
Whole genome sequencing refers to re-sequencing of the entire genome of the individual while whole exome sequencing refers to selective sequencing of only the coding regions of the genome namely exons. Both strategies may be applied to identifying causal genes and associated mutations for any genetic disorder. However, routine whole genome sequencing is still not feasible due to the high cost associated with the technology. On the other hand, whole exome sequencing is quite affordable as it involves targeted sequencing of all the exons (around 300,000) accounting for only about 1% (36.5 Mb) of the entire human genome (Senapathy, Bhasi et al. 2010). Since it is estimated that these protein coding regions of the human genome account for at least 85% of the disease-causing mutations, the technology is highly appreciable (Choi, Scholl et al. 2009). Currently, whole exome sequencing is being used in research labs for identification of new genes associated with diseases. It involves extensive data mining, validation and confirmation analysis which makes it quite expensive for clinical applications. Whole exome sequencing for mutation detection is currently offered in only one or two clinical labs. Using whole exome sequencing new candidate genes for several Mendelian disorders have been identified, demonstrating its potential (Ng, Turner et al. 2009;Jones, Ng et al. 2012). To demonstrate the potential of whole exome sequencing, we discuss a patient case that we analyzed through whole exome sequencing and identified pathogenic mutations in a novel gene. This particular patient was tested for all known genes associated with his clinical presentations but was found negative for any pathogenic mutations. We then performed whole exome analysis which gave a large number of variants. Using several filters such as score, coverage and allele percentages we narrowed down the variants. We found two novel mutations that were later confirmed through sanger sequencing (Figure 7).

Summary and future directions
We believe whole exome sequencing will become more feasible in the near future and will allow easy identification of disease associated genes and mutations. This definitely needs more sophisticated algorithms to filter the false-positives leaving fewer variants for confirmation and validation. Further, studies involving genotype-phenotype correlation will be very useful and will allow teasing out the various subtypes of the disease (Straub, Rafael et al. 1997;Culligan, Mackey et al. 1998;Gullberg, Tiger et al. 1999;Yurchenco, Cheng et al. 2004;Vainzof, Ayub-Guerrieri et al. 2008). This may be achieved through an integrated approach involving gene expression studies (geneST arrays) and protein antibody arrays. GeneST arrays are expected to give global muscle expression profile and indicate the variability in gene expression by identifying the up and down regulated genes (Vachon, Loechel et al. 1996;Tsao and Mendell 1999;Yamamoto, Kato et al. 2004;Wakayama, Inoue et al. 2008). Also active patient registries should be maintained for each disease type to provide ready access to a pool of information including clinical presentations and causative mutations. This will allow better understanding of the genotype-phenotype correlation and provide more focused approach for molecular diagnosis of the disease.