Representative genetic disorders affecting human population along with the involved genes.
With improving technology and accumulation of knowledge, there has been a constant improvement in the understanding of various anomalies affecting the human population. Still, a lot is yet to be done. The prognosis of any aberration in the living system which may result in disease/syndromes/death is dependent on multiple variables. These include the overall parameters defining the health of an individual like age, nutrition and environment. Further, the family health scenario is suggestive of health status of an individual concerned. This is where the inherited factors constituting the genome are involved.
2. Differential gene expression and correlation with human diseases/syndromes
Alterations in the genome can be either inherited or acquired. These alterations may or may not lead to disease. Moreover, not all the acquired alterations are passed on to the next generation. We need to address a few questions to be able to comprehend the situation. What factors decide the fate of a gene as if to be expressed or not, how much and when? What alterations in genes lead to diseases? Which conditions are inherited? Are there ways by which the body tries to repair these aberrations? And finally, can the affected part of the genome be replaced in a way that the individual becomes healthy again?
The expression of a gene is a very tightly regulated process with reference to when, where and how much to be expressed. A detailed discussion of the same is beyond the scope of this article. However, it’s worth mentioning that sequence and expression variations can sometimes be directly correlated to a diseased situation. The alterations in sequence may or may not be inheritable depending on whether the somatic or germline cells are getting affected, respectively. There are situations where the somatic cells may be having an altered genome but the germline genome is protected from the same and hence the mutations are restricted to the present generation. This happens primarily because of two reasons. First, the germ cells are very much isolated as compared to the somatic cells and hence are often shielded from the causing agents in the environment or within. Secondly, the repair mechanisms active at the germline level are possibly much more efficient and sensitive than the ones at somatic level. Thereby, the chances of the alterations getting repaired before
|Disorder||Point Mutation/s||Aberration in gene/s||Chromosomal aberrations||Gene/s involved||Chromosome||Reference|
|Cri du chat||•||Semaphorine F, Delta catenin,|
|5||Rodrifuez-Caballero et al 2010|
|Fabry disease||•||α galactosidase A||X||Saito et al 2011|
|Cystic fibrosis||•||CFTR||7||Madry et al 2011|
|Di George syndrome||•||TBX1||22||Huh and Omitz 2010|
|De grouchy syndrome||•||MBP, Galanin receptor||18||Wilson et al 1979|
|Sickle cell anemia||•||Hemoglobin gene||11||Mousa and Qari, 2010|
|Siderius X-linked mental retardation syndrome||•||PHF8||X||Abidi et al 2007|
|Wolf-Hirschhorn syndrome||•||WHSC1, WHSC2||4||South et al 2008|
|Myotonic dystrophy DM1||•||DMPK||19||Magana and Cisneros 2010|
|Myotonic dystrophy DM2||•||ZNF9||3||Raheem et al 2010|
|Huntingtons disease||•||HTT||4||Warby et al 2011|
|CADASIL disease||•||Notch3||19||Valenti et al 2011|
|Lesch-Nyhan syndrome||•||HPRT||X||Gucev et al 2010|
|Crohns disease||•||CARD15||16||Raelson et al 2007|
|Down syndrome||•||Trisomy 21||Purvey et al 2010|
|Turner syndrome||•||45, X||Lopes et al 2010|
|Klinefelters syndrome||•||47, XXY||Judith et al 2009|
being packaged for the next generation are enhanced. Besides, not all genetic alterations lead to diseases. There are some which make the individual susceptible to certain conditions while the actual occurrence and progression of the disease is dependent on environmental factors and lifestyle. For instance, mutations in BRCA1 gene have been successfully used for epidemiological studies and predicting the risk of individuals to develop breast cancer. Precautions in lifestyle, drugs and environment thereon have helped in combating the disease. Similarly, about 8 susceptibility loci are known for occurrence of prostate cancer and their detailed analysis is currently being pursued. So, what exactly are the possible alterations in genes which may lead to diseases?
2.1. Involvement of genes in disease: mutations, deletions and translocations
Genetic alterations affecting the physiology of the individual, resulting in any disease or syndrome can be classified into three types. First of all, a point mutation (insertion/deletion) of a single gene results in the diseased phenotype. Secondly, multiple genes need to be affected for the diseased phenotype. Herein, the genes may be partly or fully deleted, translocated or their length may increase or decrease due to change in copies of repeat elements. Lastly, the disorders caused by complete/partial loss or gain of chromosome. Several examples of these anomalies along with their affected genes/chromosomes are mentioned in table 1. The frequency of occurrence of these disorders varies across the world depending on the environmental factors, gene pool, lifestyle and even availability of health care facilities. The inheritance pattern of these diseases is dependent upon dominant or recessive nature of the causative gene as well as its chromosomal localization. This has been discussed in detail in the next section.
2.2. Genes and diseases
2.2.1. Single gene disorders
The disorders caused by alterations in a single gene are called monogenic disorders. Over 6000 such disorders are known till date and one in every 200 individuals is affected by them. Some of the common examples include cystic fibrosis, sickle cell anemia and Marfan syndrome. These follow the Mendelian laws of inheritance. The responsible gene may be localized on autosome or sex chromosome. Further, the gene may be dominant or recessive in nature. Hence, the monogenic diseases can be classified as autosomal dominant, autosomal recessive, sex linked dominant and sex linked recessive. Their inheritance patterns have been explained in figures 1-4.
From the figures shown below, we can conclusively state that the inheritance pattern of a monogenic disorder is determined by three factors as follows
The gene being dominant or recessive in expression
Localization of the gene on autosomes or sex chromosomes
Status of the parents with reference to the causative gene.
2.2.2. Multi-gene disorders
The total genes present in any organism’s genome are often much less than the characteristics or traits in the phenotype of that organism. This led to the understanding that not every gene is responsible for an outcome in the phenotype. Thereon it has been established that many heritable traits such as eye colour, skin colour and height are determined by multiple factors. Similarly, all diseases cannot be attributed to a single gene defect. In most of the cases, more than one genetic element is involved, like in diabetes, cancer and obesity. These disorders do not follow a simple inheritance pattern as monogenic cases. Instead, the situation in the genome at multiple places needs to be monitored to understand the inheritance. Such cases have lead to the defining of susceptibility loci wherein there are genomic targets whose monitoring have helped us to evaluate the risk of an individual for getting the disease. Precautions at the lifestyle and genetic level (genetic counselling) have proved very useful in controlling and monitoring the disease.
2.2.3. Mitochondrial genetic disorders
Mitochondria are small circular or rod like organelles present in the eukaryotic cells cytoplasm and involved in the cellular respiration leading to the production of energy for the cell. The mitochondrial genome follows “cytoplasmic inheritance” or “maternal inheritance”. At the time of fertilization, the cytoplasm (having mitochondria) of the zygote is contributed by ovum and hence the name. This has been illustrated in figure 5.
Each mitochondrion may have several circular pieces of DNA and each cell may have several mitochondria depending upon the activity of the cell. Even within the same cell, different mitochondria would be differing in their genetic material (heteroplasmy). These variations render uniqueness to every individual suffering from the same disease necessitating the need for personalized medicine. However, the most significant aspect of mitochondrial genetics lies in its contribution to a range of disorders from cancer, diabetes, Parkinson’s disease, stroke to male infertility. Their exact involvement is still being explored.
So far, we have seen the possible means by which different genes can contribute to diseases either singly or in combination with others. Thereby, assessment of the expression level of the genes becomes significant to ascertain their roles at the functional level. The various techniques employed for the same have been discussed in the next section.
2.2.4. Y linked disorders
These are caused by alterations in genes located on the Y chromosome. The inheritance of Y chromosome is restricted to males being passed on from father to son and so on. Since, the Y chromosome doesn’t have a concerned homologous chromosome, so all genes would have a dominant inheritance pattern. This means all the sons of affected a father would be affected and as expected the daughters would be unaffected from the same. One such attribute is the hypertrichosis pinnae or hair on the pinnae of ear.
3. Techniques for analyzing gene expression
Realizing the importance of gene expression profiles under various situations, techniques have been used for ascertaining the same. Northern blot has been one of the first approaches used for quantifying gene expression. Herein the RNA is blotted and hybridized before the final quantification. However, two factors radically changed this approach. First was the advent of polymerase chain reaction (PCR) and secondly, the discovery of reverse transcriptase.
Polymerase chain reaction (PCR), discovered by Kary Mullis, is based on thermal thermal cycling where specific DNA stretch is targeted by DNA oligo-nucleotides (primers) and amplification is carried out by thermo-stable Taq DNA polymerase. These steps result in manifold production of desired DNA sequence starting from relatively less template (Saiki et al 1985, 1988; Mullis 1990). Since its discovery, it has become an indispensable tool in genetics and molecular biology. It has been used successfully for a range of activities from clinical diagnostics, cloning, sequencing to carrying out site directed mutagenesis, besides others.
Till the discovery of reverse transcriptase, the central dogma of molecular biology was as shown in figure 6
However, in 1970 Howard Temin and David Baltimore (Baltimore 1970; Temin 1964 a, b, c) independently discovered the existence of an enzyme which could reverse transcribe RNA to DNA. The enzyme was named “reverse transcriptase” for its action and both the scientists shared the 1975 Nobel Prize for the discovery. The central dogma was then modified as shown in figure 7.
This discovery has had far reaching consequences in the understanding of many aspects of RNA viruses in particular (Hurwitz and Leis 1972; Kulesh et al 1987, Rodgers et al 1995). Presently, we would be focussing on its significance in studying the expression of genes. Reverse transcriptase creates a single stranded DNA from RNA template. Thus, the RNA repertoire from a source can be reverse transcribed to cDNA wherein the relative levels of different genes would be a representation of their expression status. This has been discussed in the subsequent sections. Also, this enzyme made it possible to access the eukaryotic coding sequences without the introns. Hence, it has been immensely helpful in understanding and analyzing the splicing of different genes.
3.1. Reverse transcription
A combination of the above two approaches is being used exhaustively for the gene expression analysis. Herein, the total RNA is reverse transcribed to cDNA having repertoire of different transcribing genes in ratios comparable to their expression. These levels could then be estimated using labelled gene specific probes for hybridization and subsequently comparing the signal intensities. However, with advancements in technology and exhaustive information available about many genomes, more sensitive approaches are being used today. These include microarray and real time PCR which would be discussed in the coming sections.
3.2. Microarray analysis
The property of nucleotides to bind specifically (A-T; G-C) forms the essence of all hybridization experiments. Small specific probes (oligonucleotide) have been made radioactive and employed for ascertaining the presence of genes using southern hybridization. The same principle and the need for screening for fate of thousands of targets at the same time have driven the development of microarray technology. It began with the analysis of 378 bacterial lysates having different sequences and as of now over 30000 genes can be analyzed at one go. A flowchart for the microarray experiment has been shown in figure 8. Microarray has been used successfully for a range of applications including gene expression profiling, comparative genomic hybridization, detection of SNPs and analysis of splicing (Hacia et al 1999; Lashkari et al 1997; Nuwaysir et al 2002; Pollack et al 1999; Schena et al 1995; Shalon et al 1996). A typical experiment for gene expression profiling involves hybridizing cDNA to an array of microscopic spots; each spot having a bound probe (oligonucleotide specific for a gene) and finally detection of hybridization using different fluorescence or chemiluminescence.
A microarray experiment for profiling gene expression would require at least two sets of cDNA as the final detection of signal intensities need a reference point to assess their fate (Wei et al 2004). For instance, the expression profile of genes in diseased individual would require a reference of a normal individual. Similarly, the profile in cells which have undergone any treatment can be compared with respect to untreated cells. The two or more sets of cDNA (of which one can be used as reference) hence constitutes the sample for microarray experiment.
The number of genes to be screened in an experiment varies. If no information is available regarding the targets to be screened, it’s advisable to go for exhaustive screening of all possible genes. However, if preliminary information is available as to which pathways would be affected or which genes may be targeted in a particular situation; then, it makes sense to have multiple probes for those set of genes rather than assessing the whole transcriptome.
The probes used in microarray experiment are short oligonucleotide sequences specific for a particular gene. In case some particular repertoire of genes needs to be focussed on, then multiple probes are designed on different regions of each gene to verify results.
As mentioned earlier, the expression levels are indicated by post hybridization intensities, so an increase or decrease in intensities in relevance to the reference sample would give us the expression profile of the studied genes. Since equal amounts of samples are used for hybridization, an increase in intensity would correspond to an increase in expression and vice versa. It has been explained by an illustration in figure 9.
3.2.4. Advantages and limitations
Thus, the microarray analysis helps us have an idea about the fate of genes in different circumstances. The greatest advantage of this approach lies in the ability to screen thousands of targets in a single experiment. This becomes particularly significant if we are to study a disease or response to any chemical/stress/radiations wherein the genetic targets are not known.
However, the variations in intensities are reflective of expression profiles and these intensities can be quantified as well but these quantifications are not the real representation of the relative expression of genes. This is primarily because the detection through fluorescence or chemiluminescence is an indirect approach. Though there have been improvements in detection but still a lot needs to be done (Tang et al 2007). Till date, microarray falls short in providing a numerical value with confidence as to how the expression levels were affected. Moreover, the approach can be applied to only those organisms where exhaustive information is available about both the genome and its corresponding transcriptome.
3.3. Real time PCR
In conventional PCR methods, the product is visualized at the end of the reaction. Since, amplification of target sequence from template takes place in an exponential manner, at the end of 20 amplification cycles the difference in intensities of products is not true representation of the difference at the beginning. This difference is all the more critical if we have to decide on the gene expression levels from cDNA leading to diseased or affected phenotypes. Under such circumstances, the initial level of the target gene in cDNA needs to be determined. This has lead to the development of real time PCR wherein the actual PCR reaction can be monitored with the help of fluorescence dyes. Hence, the initial template can be quantified. It has been used successfully for estimating the relative expression of genes, copy number variations and allelic profiling besides others (Mackay et al 2002; Nailis et al 2006; Nolan et al 2006; Spackman et al 2008). This article would focus on use of relative quantification (RQ) for ascertaining gene expression levels.
The relative expression studies require two things: cDNA preparation from RNA and PCR amplification using gene specific primers. An overview of the same has been provided in figure 10. There are two established approaches involved in real time reactions for detection of PCR: SYBR green and Taqman.
3.3.1. SYBR green assays
SYBR green (SG) is a dye which binds to any double stranded DNA. The DNA dye complex absorbs blue light (λmax=488nm) and emits green light (λmax=522nm), hence the name (Zipper et al 2004). Its chemical structure has been shown in figure 11.
An overview of real time PCR reaction using SG dye has been shown in figure 12. The SG dye has no binding specificity of its own and would bind to any dsDNA present. It even binds to ssDNA, though with a much lower affinity. Hence, while carrying out the reaction, any additional DNA source would lead to false signal. To ensure that the signal we are observing as amplification is specific to the target from our template source we follow the following steps
Check for dissociation curve
Dissociation curve is the calculation of Tm of product after the reaction is over. A single peak would be indicative of specific amplification while multiple peaks would refer to non specific detection. A reaction can be used for analysts only if the corresponding dissociation curve is having a single peak. Moreover, to check for any DNA/RNA contamination two controls are required. First, template where reverse transcription has not taken place and secondly, no template control (NTC) wherein water is added as template. Any amplification in the first signifies impurity of RNA whereas in second it would refer to nucleic acid contaminated water being used for reaction. Owing to the sensitivity of the reaction, all these steps must be ensured for accurate results.
3.3.2. Taqman assays
The real time PCR assays employing taqman chemistries differ from those with SG primarily in two aspects. First, in addition to the target specific primers employed for amplification, a probe is present located between the two primers. Secondly, the fluorescence comes from labelled probe (5’ end) and hence is target specific unlike SG binding to any DNA present. A representation of real time PCR using taqman chemistry is shown in figure 13.
There are multiple options for labelling the probe like FAM and VIC. Since the probe is responsible for specificity, hence multiplex reactions can be carried out using taqman approach. This is not feasible when employing SG assays.
In order to understand the expression level of a target gene between 2 or more samples two terms need to be introduced: Endogenous control and Calibrator. Endogenous control is a gene whose expression level is known to be fairly constant. These include the various housekeeping genes such as β-actin, and GAPDH. The expression of control needs to be ascertained in all the samples. The calibrator is the sample in reference to which the expression levels needs to be calculated for other samples. For example, if the purpose is study the variations in expression level of androgen receptor (AR) in patients suffering from prostate cancer, then a normal person should be taken as calibrator. When a real time PCR assay is performed it gives us the cycle in which the particular sample reaches threshold level. How this information is corroborated to give relative expression levels is explained below (Pfaffll 2001).If,
X0 = number of target molecules at cycle number 0
Xn = number of target molecules at cycle number “n”
Ex = Efficiency of PCR amplification
Then, the equation for target amplification is as follows
Similarly, for endogenous control, equation for amplification can be written as
The threshold cycle (Ct) is the cycle number at which the amplification crosses the threshold fluorescence. This amplification would be constant for the target as well as endogenous control. Hence the equations Eq. 1 and Eq. 2 can be written as
Ctx = threshold cycle number for target
Ctr = threshold cycle number for endogenous control
Taking the ratios
Let’s assume both the amplifications are occurring with same efficiency. Thereby,
Substituting 6 in 5 we get
Normalizing the target with control the equation becomes
Rearranging equation (Eq. 8),
Finally, comparing the value of XN for nay sample XNS with reference to the chosen calibrator XNC, relative quantification is given by
Considering the efficiency for both reactions to be 1, equation 12 for relative quantification becomes as follows
Let’s take a hypothetical example to understand the same. Relative expression assays were done to ascertain the levels of AR expression in prostate cancer patients with reference to normal males. Β-actin was taken as endogenous control. The CT values obtained and subsequent calculations have been shown in table 2. From the table, it can be observed that with reference to calibrator (expression level 1), the patients are showing up to 272 folds higher expression of AR.
Similarly, expression profile can be obtained for n number of genes across different sample sets. Since the approach doesn’t quantify expression in absolute terms so it’s also known as relative quantification method. Further as the expression level is determined by value of ∆∆CT the approach is often also referred to as ∆∆CT method.
3.3.4 Advantages and limitations
The first and foremost advantage that real time has over other conventional methods is assigning a numerical value to expression levels with great accuracy. The assays are relatively easy to perform and design. However, there are few limitations as well. Though housekeeping genes have been used successfully as controls for long, several reports suggest that even their expression may be affected in certain cases. The way forward is to use multiple controls and select the one exhibiting least variations (Dhanasekaran et al 2010). Further, designing assays wherein sufficient information is not available about the organism’s genome/transcriptome is very difficult. This is so because the specificity of the primers/probes cannot be ascertained conclusively. Moreover in case of genomes very rich
|Sample||AR CT||Avg. CT|
|β-actin CT||Avg. CT|
(sample∆ CT -calibrator∆ CT )
in GC content designing primers and probes with required taqman specifications has often proved difficult. Since each gene needs a specific assay to be designed, therein we can use real time PCR only when we have candidate targets. It doesn’t give us exhaustive data indicative of potential targets; instead it helps us explore candidate targets once known through other approaches. All these limitations notwithstanding, real time PCR would continue is a very effective and accurate approach for studying gene expression.
4. Targets for “Gene Therapy”
The identification of genetic basis of diseases would be significant only if we are able to use that information to be able to cure or at least manage the disease. There are many approaches that have been used for treatment and management of different diseases mostly at the protein level. These include vaccination, antibiotics, hormones and a wide range of drugs. Recently, the focus has been to “modify” or rather “correct” the gene alteration which caused the disease in its first place. This approach can be broadly described as what constitutes “gene therapy”. In principle gene therapy refers to the replacement of an affected gene with a normal gene. This can be achieved primarily by two approaches:
Ex-vivo: Cells are removed from the tissue where the affected gene needs to be expressing normally and a normal copy of the gene is introduced in these cells. Subsequently, these cells having a normal expression of the target gene are re-introduced in the body.
In-vivo: A normal copy of the gene is directly introduced in the body through viral vectors or liposome mediated approach.
There are various challenges which need to be considered while using gene therapy as a treatment option. Presently, we would focus on choosing a target for gene therapy rather than the approach itself.
The first and foremost requirement for any gene to be targeted for gene therapy is that the defect in the gene should be responsible for the disease. When a gene is affected it may either stop expressing or express an altered non-functional protein. This protein may need to be addressed. This situation is often referred to as dominant negative. In such cases, besides introduction of a normal copy of the gene, the altered copy needs to be removed as well. If it cannot be removed there are two options, either silence the gene and introduce a normal copy or repair the gene itself. Both the approaches have been used.
Secondly, it should be known that the introduction of a normal expressing gene in the system successfully cures the diseased situation. This is relatively easy to determine in case of monogenic disorders. However, gene therapy can also be used in multi-factorial disorders. In such cases, it’s required to identify the gene which has a dominant functional role. This should enable to control the multiple elements through one.
Thirdly, the expression profile of the target gene needs to be available. This would include the various tissues where the gene is expressed. If its differential expression in any particular tissue is leading to the disease that tissue needs to be targeted.
Fourthly, the feasibility of the target gene to be introduced to the concerned tissue needs to be explored. The gene has to be present in the body such that it expresses only in the desired tissues otherwise it may lead to complications.
Lastly, the patient needs to be monitored for certain duration after successful expression of the introduced gene.
There has been a constant increase in our understanding of the genetic aspect of diseases, particularly in the last decade or so. This can be attributed to two reasons. First, the pace of technology advancements has been the highest in this period. This has resulted in faster, more sensitive and efficient generation as well as dispersal of data. Secondly, during this period there has been tremendous increase in public awareness and participation in health programmes. This has made the analysis as well as predictions statistically significant. Moreover, awareness has lead to better diagnosis and management of diseases. Though situation has improved but globally there are still many areas where lots need to be done. Unless a disease and its pathogen are completely eradicated from the world, the evolution of the pathogen may lead to more potent variants of the disease.
The chances of an individual getting diseased while living in a safe suitable environment and having a balanced diet are very low. Primarily, it is our life style which is responsible for our diseased body. Various socioeconomic and environmental factors are equally responsible for the life we live. However, awareness and precautions would be the best approach to stay healthy.
The cause of diseases at the physiological level is the imbalance between the three pillars of life; DNA, RNA and protein. Since DNA is the storehouse of all information, mostly, the cause can be traced back to it. DNA is a sequence of nucleotides which governs life. There are several mechanisms which ensure that DNA remains uncorrupted. However if it does get affected there are means to get it repaired. These machineries act as barrier for any wrongful transmission of the genetic material across generations. Still, genetic diseases are inherited possibly contributing not only to the evolution of the species but also to that of the disease.
Hypothetically, it should be possible to cure a disease which is caused by an alteration in gene by giving the body a normal copy of the gene. Even the body on its own tries to do the same which often leads to gene duplications resulting in copy number variations. This thought has formed the basis of gene therapy. To achieve this in reality however, involves exhaustive information about the pathogenesis of the disease; genes involved; their sequence variations and expression profiles. Though the idea is very promising but for it to be successful it needs to be used cautiously and in combination with other approaches.