Polymorphism or variation in DNA sequence can affect individual phenotypes such as color of skin or eyes, susceptible to diseases, and respond to drug, vaccine, chemical, and pathogen. It occurs more often than mutations (frequency ≥ 1%). The common polymorphism is single nucleotide polymorphism (SNP) which is a single base change in a DNA sequence that occurs most commonly in the human genome. SNPs have been used as molecular markers in a wide range of studies. Genome-wide association studies (GWAS) searches for SNPs that occur more frequently in person with a particular disease than in person without the disease and pinpoint genes or regions that may contribute to a risk of disease. This topic describes about polymorphisms, SNPs, GWAS, linkage disequilibrium (LD), minor allele frequency, haplotype, method for SNP genotyping, and application of SNPs and genome-wide association study in human diseases and drug development.
- drug development
- genome-wide association studies
- human diseases
- single nucleotide polymorphism
Phenotype of living organism is controlled by DNA. Variation in DNA sequence or polymorphism may make individual difference such as the differences in phenotype, risk of various diseases, and response to drugs, vaccine, chemical, and pathogen. Polymorphisms commonly occur in nature and related to biodiversity, genetic variation, and adaptation. It helps to maintain variety of forms in a population living in a varied environment . It is preserved by frequency-dependent selection. The polymorphisms in this topic focus on human polymorphisms related to diseases and drug respond. Because the Human Genome Project had been completed, a large number of polymorphisms among the population have been found [2, 3, 4]. The most abundant type of the variations is single nucleotide polymorphisms (SNPs), with more than 9 million reported in public databases [5, 6, 7]. In this chapter, the definition of several terms such as polymorphism, minor allele frequency (MAF), allele frequency, haplotype, and linkage disequibrium (LD) is clarified. Moreover, SNPs, genome-wide association study (GWAS), methods to detect SNPs and application of SNPs in association with diseases and drug development are mainly discussed topics.
Genetic polymorphism, the definition by Cavalli-Sforza and Bodmer, is the occurrence in the same population of two or more alleles at one locus, each with appreciable frequency, where the minimum frequency is typically taken as 1% . An allele is one of the variant forms of a gene at specific locus on a homologous chromosome. The different forms of the polymorphism (alleles) are observed more often in the general population than mutations. The most common polymorphism in the human genome is the single-nucleotide polymorphism (SNP) .
2.1. Single nucleotide polymorphism (SNP or snip)
SNPs are popular molecular genetic markers in disease genetics studies and pharmacogenomic research. It is a single base change in a DNA sequence, with a normal alternative of two possible nucleotides at a given position. This variation occurs at a specific position in the genome and has allele frequency of 1% or greater . Around 325 million, SNPs have been identified in the human genome, 15 million of which are present at frequencies of 1% or higher across different populations worldwide . An example for SNP is shown in Figure 1. It demonstrates that at a specific position of human genome when compared between two individuals and two DNA sequences. The DNA sequence of a person 1 has C nucleotide which is similar to most of the other person (majority group), whereas the DNA sequence of a person 2 has T at this position which is minority group of population. It is said that there is an SNP at this specific position between allele C or T.
The majority of SNPs have two alleles, which represent a substitution of one base for another. The SNP occurs at each allele of an individual may be different. If the SNP occurs more frequently in the general population, it is called “major” allele. In contrast, if the frequency of the SNP exist is rare in the population, it is designated the “minor” allele. Since human have two copies of chromosome or diploid, therefore, an individual can have various genotypes such as homozygous of major or minor alleles, or heterozygous of major and minor allele . Many SNPs are correlated with one another, so it is difficult to distinguish the SNP that affects the phenotype from the several SNPs associated with it .
SNPs are identified and characterized by sequencing the same genomic region in several populations [13, 14]. The sample size of the population being resequenced is important. In general, larger sample sizes are needed to identify SNPs on the lower end of the minor allele frequency spectrum. The minor allele frequency (MAF) refers to the frequency at which the less common allele occurs in a given population. By using population genetics theory prediction for a SNP detection rate of 99%, a SNP with a minor allele frequency of 5% or greater needs 48 chromosomes, whereas a SNP with a minor allele frequency of 1% or greater requires 192 chromosomes for the verification of genotype of SNP .
Currently, the genotyped in a large scale of SNPs can be performed by automated machines, which facilitate the genetic association study using DNA-based marker. Human Genome Project rank SNP discovery and characterization as high priorities [16, 17] and encourage public and private sections [4, 18, 19] to push an effort toward these objectives.
2.1.1. Effects of SNPs location
The location of SNPs may affect gene products and others. The SNPs within a gene may alter protein structures. The SNPs in the regulatory region outside a gene may affect when and how the gene is turned on, which affects the quantity of the protein produced. They also affect gene splicing, transcription factor binding, or the sequence of non-coding RNA. The SNPs that are not within the proximity of a gene may be used as genetic markers for locating disease-causing genes (Figure 2).
2.1.2. Types of SNPs
As described earlier, the SNPs may fall within coding sequences of gene, or non-coding regions of gene, or in the intergenic regions (regions between genes). The SNPs in the coding region of gene are divided into two types: synonymous and nonsynonymous SNPs. The synonymous SNPs do not change the amino acid sequence of protein or not affect the protein function. The nonsynonymous SNPs are divided into two types: missense and nonsense. A missense SNP, single nucleotide change results in a codon that codes for a different amino acid, resulting in protein nonfunction. For nonsense, a point mutation in a sequence of DNA that changes to a stop codon results in a nonfunctional protein product. SNPs that are in non-coding regions of gene, or in the intergenic regions may affect gene splicing (SNPs at intron region), transcription factor binding (SNPs at 5′ untranslated region), messenger RNA degradation, or the sequence of non-coding RNA. The type of SNPs located upstream or downstream from the gene that affect gene expression is referred to an expression SNP (eSNP).
3. Association study
By comparing the patterns with patterns obtained by analyzing the DNA from a group of individuals affected and unaffected by the disease is called as an ‘association study’. This study demonstrates the linking between the polymorphism and diseases or drug respond.
3.1. Minor allele frequency
Minor allele frequency (MAF) refers to the frequency at which the less common allele occurs in a given population. MAF is widely employed in GWAS for complex traits . SNPs with a minor allele frequency of 5% or greater were targeted by the HapMap project.
3.2. Allele frequency and genotype frequency
Allele frequency is the relative frequency of an allele at a particular locus in a population.
Genotype frequency in a population is the sum of the individuals with the same genotype divided by the total number of individual in that population. In addition, the genotype frequency, in population genetics, means the proportion or frequency of genotypes in a population (0 <
Haplotypes are a combination of alleles at different markers along the same chromosome that is inherited as a unit. Each haplotype is a combination of major and minor alleles along the chromosomes, and each individual is represented twice to account for the maternal and paternal contributions . The fundamental difference between haplotypes and individual genotypes at SNPs is that the alleles are assigned to a chromosome.
Haplotypes inform about the exchange of DNA during meiosis or recombination, which is useful for locating the mutation that are associate with diseases by using linkage method. It has an effect on linkage disequilibrium.
3.4. Linkage disequilibrium (LD)
In population genetics, linkage disequilibrium (LD) is the non-random association of alleles at different loci in a given population that may or may not be on the same chromosome. Loci are said to be in linkage disequilibrium when the frequency of association of their different alleles is higher or lower than what would be expected if the loci were independent and associated randomly . LD can detect differences between the SNP patterns of the two groups and reveal which pattern is most likely associated with the disease-causing gene or response to certain drugs.
LD is an important concept in genetic studies that aims to identify and localize genes related to disease susceptibility. LD is commonly used to indicate that two genes are physically linked. It is defined as the difference between the observed frequency of a particular combination of alleles at two loci and the frequency expected for random formation of haplotypes from alleles. The frequency of a particular allele at a given locus will be independent of alleles at other linked loci. LD plays a crucial role in the current methods for mapping complex disease or trait-associated genes or plays a key role in health and disease. The level of linkage disequilibrium is influenced by a number of factors such as genetic linkage, selection, the rate of recombination, the rate of mutation, genetic drift, non-random mating, and population structure.
3.5. Genome-wide association studies (GWAS)
GWAS identify the common disease-causing variants by using high throughput genotyping equipment to examine hundreds of thousands of common SNPs and compare these common genetic variants in large numbers of affected cases (patient) to those in unaffected controls (non-patient) to determine whether have an association with disease (Figure 3) [24, 25].
In most chromosome regions, there is strong association among SNP, therefore, only a few SNPs in each region are selected to be sequenced to predict the alleles of the remaining SNPs in that region. An accurate mapping of LD pattern among SNPs which differ across ancestral groups is required for selecting the best tag SNPs. The precise LD maps are needed to help genetic association studies and stimulated the developing of human haplotype map [26, 27]. GWAS pinpoint genes that may contribute to a risk of developing disease. The data derived from GWAS inform about disease etiology, therapeutic targets, and gene function .
4. Method for detection of SNPs
SNP genotyping strategies typically involve allele-discrimination and allele-detection. The other methods based on physical properties of DNA .
4.1.1. Primer extension
These approaches involve allele-specific incorporation of nucleotides in primer extension reaction with a DNA template, utilizing enzyme specificity to accomplish allelic discrimination. In the reaction of a common primer extension (CPE) protocol, a designed primer is annealing with its 3′ end near a SNP site and nucleotides are added by polymerase enzyme . The extended nucleotide is examined by either mass or fluorescence to verify SNP genotype. Because of the simplicity in primer and assay design, it can detect multiple SNPs at the same time, therefore, several SNP genotyping use CPE . CPE-based methods that use MALDI-TOF MS (matrix assisted laser desorption/ionization time-of-flight mass spectrometry) for the discrimination of alleles include PinPoint assay [31, 32], MassEXTEND™ [33, 34], SPC-single base extension (SBE) , and GOOD assay . In the reaction of these methods, SNP-specific primers are simultaneously extended with numerous nucleotides using PCR products as a template yielding extended products of dissimilar masses. The genotypes of SNPs of the products are examined by mass analysis (Figure 4a). The PinPoint assay performs the simplest technique by using dideoxynucleotides (ddNTPs) for single base extension (SBE) of primer [31, 32]. The CPE approaches that use fluorescence-based detection involve SBE of primer with fluorescently labeled ddNTPs (Figure 4b). Specific primer extension (SPE) methods use two primers that have the same nucleotide sequences except specific allele/base at their 3′ end. The primers can extend if the nucleotide at their 3′ end of the primers perfectly binds with the SNP of complementary template, then allelic discrimination can be examined by mass different using gel electrophoresis (Figure 4c).
Hybridization methods use differences in thermal stability of double-stranded DNA to separate between perfectly matched and mismatched target-probe pairs for succeeding allelic discrimination. Hybridization methods have been applied on high-throughput platforms using microarrays (Figure 5a). In the GeneChip® array technology (Affymetrix, CA) protocol, probe array is synthesized in an ordered fashion on a solid surface using 25-bases oligonucleotides with specific allele by photolithography . The DNA fragment containing SNPs are amplified from genomic DNA. The PCR products are cleaved, tagged, and hybridized to the probe array under stringent conditions, then wash, and label with fluorescent. The genotypes of SNPs are examined from the fluorescence signal based on probe-target hybridization. Numerous probes that differ at a single base are used to confirm each SNP to increase genotyping accuracy. In this method, a single array contains millions of probes and be used for parallel genotyping of 104 to approximately 105 SNPs . The TaqMan® genotyping assay (Applied Biosystems, CA) combines the procedure of hybridization and 5′ nuclease activity of polymerase coupled with fluorescence detection (Figure 5b). It employs four oligonucleotides containing two allele-specific oligonucleotide probes that have a single base mismatch and a pair of PCR primers neighboring the SNP covering region [29, 39].
Ligation method discriminates the allele by using specificity of ligase enzymes. When two oligonucleotides (allele-specific probes and ligation probes) hybridize to single-stranded template DNA with perfect complementarity, they are nearby to each other, then ligase enzymes join them together to form a single oligonucleotide. Normally, three oligonucleotide probes are used in ligation assays, two probes are allele specific that bind to the template at the SNP site and ligation probe . Combinatorial fluorescence energy transfer (CFET) tags have been used with ligation for SNP genotyping . CFET tags are composed of fluorescent dyes that can transfer energy when they are in close proximity. For SNP genotyping, two probes of allele specific are labeled with CFET tags and common probe is labeled with biotin (Figure 6a). After the ligation reaction, product is separated using the biotin-streptavidin interaction. Genotyping is carried out by using capillary array gel electrophoresis based on tag fluorescence. In Padlock technology, a linear oligonucleotide probe is used with its ends designed to mimic the allele-specific probe and common probe for ligation at the SNP site (Figure 6b).
4.1.4. Enzymatic cleavage
The method is based on the capacity of enzymes to cleave DNA by recognition of specific sequences and structures. Difference between alleles can be discriminated when SNPs are located in an enzyme recognition site and affect the enzyme activity.
The ability of restriction enzymes has been used for detection of genetic variation by the method of restriction fragment length polymorphism (RFLP) . These enzymes recognize specific sequences in double-stranded DNA and cleave both strands at a specific site in the sequence or near it to generate smaller DNA fragments (Figure 7a). For SNP genotyping, the PCR product containing the SNP is incubated with appropriate restriction enzyme and separated by gel electrophoresis. The SNP genotype is simply determined from sizes of the digested products. This method does not need any probes but it has limited amount and a number of SNPs.
The Invader® assay (Third Wave™ Technologies, WI) utilizes structure-specific cleavage by a flap endonuclease enzyme (Figure 7b). It uses three probes for genotyping a SNP, two allele-specific probes and a third common probe (invader) .
4.2. Allele detection methods
4.2.1. Mass-based detection
For mass analysis of oligonucleotides, MALDI-TOF MS is a widely used method. It involves the use of a small organic molecule termed matrix that absorbs energy from a laser source of certain wavelength for ionization. When analytes are mixed and cocrystallized with matrix, they are ionized in the form of intact molecules owing to transfer of energy from the matrix molecules [44, 45]. Figure 8 demonstrates a multiplex SNP genotyping by primer extension and MALDI-TOF MS detection.
4.2.2. Fluorescence signal-based detection
Monitoring fluorescence signal is widely used in genotyping technologies because its operation is simple and detection is fast with high sensitivity. Fluorescence detection is used for direct sequencing (DS) using capillary array electrophoresis. Fluorescence polarization (FP)-based detection uses the change in polarization of plane polarized light by a fluorescent dye molecule owing to change in its molecular weight under conditions of constant temperature and solvent viscosity . FP has been coupled with other SNP genotyping techniques including TaqMan® and Invader® . TaqMan® assay is a single-step assay that uses fluorescence-based detection. It is well suited for low-to-medium throughput genotyping applications but is currently limited to genotyping of one SNP per assay.
Chemiluminescence has several advantages as a detection technique, such as high signal-to-noise ratio, rapid detection, and feasibility for automation. Pyrosequencing™ (Biotage, Sweden) employs chemiluminescence-based detection for SNP genotyping using a cascade of enzyme reactions . Pyrosequencing™ is an approach that combines sequencing-by-synthesis with chemiluminescence detection. In SNP genotype, it provides sequence information on the region surrounding the SNP site.
4.3. The other methods
4.3.1. Single-strand conformation polymorphism (SSCP)
SSCP discriminates the allele by using secondary structure of single-stranded DNA. The single-stranded DNA molecules which differ at a single base run on a nondenaturing gel electrophoresis display different mobility based on their native conformations (Figure 9) .
4.3.2. High resolution melting analysis (HRM)
This method, the fragment cover SNP is amplified by real-time PCR and followed by HRM. The HRM is a technique for the detection mutations and SNPs, which based on analysis of melting curve when double-strand DNA (dsDNA) separate into single-strand DNA (ssDNA) during increased temperature from around 50°C up to around 95°C. At some point during this process, the melting temperature of the amplicon is reached and the two strands of DNA separate. This can visualize the melting behavior of the product through a fluorescent dye . The fluorescent dye binds to double-strand DNA during the amplification resulting in an increase of fluorescence (Figure 10). This method is simple, cost-effective, fast and able to accurately genotype many samples rapidly. It also reduces the need to design multiple pairs of primers or purchase expensive probes.
4.3.3. Denaturing high performance liquid chromatography (DHPLC)
It is a method for screening DNA samples for SNPs and inherited mutations. The analysis begins with a PCR amplification, followed by a step of denaturation-renaturation to create hetero-and homoduplexes from the two populations in the PCR. The heteroduplexes with mismatch pairing and homoduplex can be detected on reversed-phase chromatography of denaturing high performance liquid chromatography (DHPLC). The heteroduplexes thermally that less stable than their corresponding homoduplexes will be resolved by chromatography when subjected to a sufficiently high temperature. This mismatch will decrease the interaction with the column and a reduced retention time compared to the homoduplexes (Figure 11).
5. Application of SNPs
5.1. SNPs as biological markers of human diseases
Most SNPs are not responsible for a disease state. They serve as biological markers for pinpointing a disease on the human genome map. SNPs occur on average once every 200 base pairs [18, 51, 52, 53] in the human genome. Common SNPs (a minor allele frequency range from 5% to
5.2. SNPs and drug development
Variants of genes encoding drug metabolizing enzymes or drug targets have been studied in association with personal drug responses. SNPs are popular molecular markers in such pharmacogenomics studies. Using SNPs to study the genetics of drug response will help in the creation of personalized medicine or the most appropriate drug for an individual and could be determined in advance of treatment by analyzing a patient’s SNP profile. SNPs may be associated with the absorbance and clearance of therapeutic agents. The association of different SNPs with a wide range of human diseases such as cancer, infectious diseases autoimmune, neuropsychiatric and others can be used as targets for drug therapy .
6. Studies of SNPs in association with human diseases
Cancer is a disease that involves abnormal cell growth. There are several kinds of cancers. Gemignani et al. studied polymorphisms in dopamine receptor gene,
Schizophrenia is a severe psychiatric disorder characterized by hallucinations, delusions, cognitive deficits, and apathy, with a lifetime prevalence of ∼1%. Epidemiologic studies on twins indicate that schizophrenia has a complex genetic background with heritability estimated at 73–90%.
Arinami et al. genotyped 5861 SNPs of 602 individuals from 236 Japanese families using the BeadARRAY™ Linkage Panel (IV) for the genome-wide linkage analysis . They found a strong association of schizophrenia to the region 1p21-p13 and implied that schizophrenia might have common susceptibility loci across populations with different ethnicity-specific effects.
Panichareon B et al. used GWAS-discovered SNPs of Europeans ancestry in OPCML gene and investigated SNPs in Thai schizophrenia patient by using polymerase chain reaction (PCR) and high-resolution melting (HRM) analysis. The results of this study found a strong association between an intronic SNP (rs1784519) and the risk of schizophrenia in a Thai population [
Dyslipidemia is an abnormal of lipid and/or lipoproteins in the blood. It is a major risk factor of coronary heart disease and atherosclerosis. A genome-wide association study (GWAS) examined the concentrations of HDL-C and triglyceride in European ethic and identified the SNP at 15 loci which associated with HDL-C levels (such as, APOA1/C3/A4/A5 gene cluster) and SNPs at 12 loci associated with triglycerides (such as APOB, APOE gene) . Thongket et al. examined SNP in apolipoprotein E receptor 2 gene using real-time PCR and HRM analysis and found that the rs2297660 showed strong association with risk of dyslipidemia in Thai population .
6.4. Diabetes mellitus
Diabetes mellitus (DM) is a group of metabolic disorders. Untreated diabetes patient can cause many complications. Acute complications can include diabetic ketoacidosis, hyperosmolar hyperglycemic state, or death . SNPs in the gene encoding aldose reductase (
Polymorphism is a variation in DNA sequence that may affect individual phenotypes. It occurs more often in the general population than mutations (frequency ≥ 1%). The majority of variation is single nucleotide polymorphism (SNP) which is a single base change in a DNA sequence that occurs at a specific position in the genome. SNP may locate within coding, or non-coding, or intergenic regions of genes. Most of SNPs have two alleles, for an individual SNP, one is major allele and the other is the minor allele based on their observed frequency in the general population. Genome-wide association studies (GWAS) search for SNPs that occur more frequently in person with a particular disease than in person without the disease and pinpoint genes that may contribute to risk of disease. Linkage disequilibrium (LD) is commonly used to indicate that two or more genes are physically linked. It plays an important role in health and disease. Most SNPs are not responsible for a disease state but serve as biological markers for various complex diseases such as cancer, diabetes, dyslipidemia, schizophrenia, and so on. There are several methods for analyzing SNPs such as MALDI-TOF MS, GeneChip® array, pyrosequencing, DHPLC, HRM, RFLP, and so on.