Single nucleotide polymorphism (SNP) represent a change in a single nucleotide within the genome. This can alter the phenotype of an individual within the same species if it occurs in a coding region of the gene. The change in nucleotide can produce desirable characteristic in plants and can become an object for selection. New SNPs have been discovered and subsequently converted to molecular markers using various non-gel based and next generation sequencing platforms. Considering that SNP markers are based on target genes, its abundance in the genome, high automation and multiplexability, has made it a marker of choice and an effective tool for screening plant germplasm for desirable traits. This chapter considers SNP as molecular marker, their discovery and different SNP genotyping methods was documented. A few case studies of SNP as allele specific markers and their association with traits of interest was considered. Thus, highlighting their efficacy as useful tool for marker assisted selection and plant germplasms screening.
- single nucleotide polymorphisms
- molecular breeding
- plant germplasm screening
- molecular marker
- marker assisted selection
Plant breeders usually screen a large number of plants for traits of economic value as determined by the breeding goal which may include breeding for resistance, biofortification to increase some micronutrients and gene pyramiding. The larger volume of plants screened at early stages of a breeding program can be laborious, capital intensive and time consuming. Germplasm screening is usually an initial step for a number of breeding programs. The aim of screening large collection of plants is for narrowing on those with desired characteristic for advancement to the next stage. It is extremely important to get it right from the beginning in other to meet breeding objectives for crop improvement.
Recently, molecular breeding has brought about a revolution in plant breeding and has been widely applied in plant improvement programs. Molecular breeding also called marker assisted breeding or marker assisted selection (MAS) is the method of using molecular or DNA markers that are closely linked to a phenotype to aid selection for such trait in a breeding scheme. A number of molecular markers have been developed and successfully used in selection due to their association with a phenotype of interest in different plant species. Thus, they have been applied in plant germplasm screening for desirable traits. Because selection is based on target genes, they provide a higher rate of accuracy during screening, reduce time and labor, therefore bringing about reduced cost.
Several types of molecular markers have been applied to different areas of plant breeding, firstly developed markers include Restriction fragment length polymorphism (RFLPs) and Randomly amplified polymorphic DNA (RAPDs). These gave way to the popular techniques Amplified fragment polymorphism (AFLP) and Simple sequence repeats (SSRs) due to their ease in detection and automation. Simple sequence repeats have been extensively applied to screen for resistant germplasms, biotic and abiotic stress and variety identification in potato, groundnut rice [1, 2, 3]. SSR markers are generally PCR-based, their technicality is simple and relatively cheap. The disadvantage of these markers is that they require polyacrylamide gel electrophoresis to achieve a high resolution of allele fragments, which is laborious to perform. They give information about a single locus per assay although multiplexing of more than two markers is possible, it is relatively expensive. The continued automation in sequencing technology has resulted in a shift from first generation DNA based markers to the use of functional and gene targeted single nucleotide polymorphism (SNP) markers. Therefore, this chapter covers ways of discovering new SNPs and their conversion to molecular marker. Also, different methods of SNPs genotyping and their application in some aspects of crop improvement are documented. The discovery of SNP markers is cost effective, it is highly multiplexable and they are availability of high throughput technologies for SNP genotyping. Because SNPs are based on target genes, they are highly reliable in cultivar identification and germplasm screening.
2. Single nucleotide polymorphism markers
Single nucleotide polymorphism is a change in a single base pair at a specific locus involving two alleles where the rare allele frequency is >1% (Figure 1). It is an individual nucleotide base difference between two DNA sequences. It represents the site where the DNA sequence shows a difference by one base. They are categorized based on nucleotide substitution as transition which is an interchange of pyrimidines (C/T) and purines (A/G) or transversion which is an interchange of purine base for pyrimidine (G/C, A/T, A/C and G/T). A nucleotide base represents the basic unit of inheritance therefore SNPs present a powerful tool as molecular marker. SNPs which can be categorized as insertion or deletions, may be found in the coding or non-coding sequences of crop plants. Some individuals of a species may be heterozygous at a SNP locus or ambiguous as seen in Figure 1 (Y refers to C and T nucleotides being present at one locus), this means they possess both nucleotides on the same position on that gene. Such individuals will display an intermediate phenotype when screened phenotypically.
When SNPs or insertion/deletions found in coding sequences result in non-conservative amino acid changes, it can cause variation in the phenotype of individuals of same species (Figure 2). But the translation of non-coding sequences containing SNPs that results in conservative amino acid change will not significantly affect the phenotype of the individuals. Considering the figures above, Figure 1 had SNP present at two different positions (549 and 572) for C/T and A/C nucleotides of the PSY2 gene in cassava, respectively. Upon translation to amino acid as seen in Figure 2, the SNP at position 549 was synonymous because it caused no amino acid changes therefore may not be an effective marker in marker assisted selection with respect to yellow and white cassava roots. But the SNP located at position 572 of the gene caused non-synonymous changes in their amino acid sequence as individuals with the A nucleotide gave Alanine while those with the C nucleotide gave aspartic acid. Individuals with white root carrying the C nucleotide gave Alanine while those with yellow root carrying the A nucleotide gave Aspartic acid. Thus, this SNP can be effectively utilized as molecular marker in selecting for root color of cassava even before it gets to the stage of developing roots for phenotypic evaluation. The presence of SNPs in regulatory and coding regions of genes can cause significant phenotypic effect on function of protein and how genes are expressed. This permits the association of genotypic and phenotypic variations which has been successfully exploited in cultivar identification and genetic diversity analysis. Though the association of traits of economic importance may be less than 100%, they can still be successfully utilized in marker assisted selection and gene isolation. Given its precision in germplasm identification, it can also be an efficient tool for plant germplasm screening.
The translation of individuals with the A nucleotide gave Alanine (A) while those with the C nucleotide gave Aspartic acid (D), X shows ambiguous position. Consensus sequences are plotted as dot with reference to the reference sequence (RefSeq).
Single nucleotide polymorphism is now a highly preferred genetic marker due to the increase in amount of sequence information and the determination of gene function due to genomic research. Their widespread abundance in the genome and the development of new SNP genotyping platforms has made them the preferred marker for plant germplasm screening or characterization and identification of functional genes for traits of interest [4, 5, 6]. SNPs are easily automated with high throughput techniques and are being used for large segregating populations.
A lot of techniques and methods have been applied to identify SNPs and use those that can successfully discriminate between traits of interest for marker development. SNP markers can be identified by carrying out locus specific PCR. Here, locus specific PCR primers are synthesized from genomic sequences that are known and available in the public databases or previously sequenced data. The primers are used to amplify DNA samples from several individuals of a plant species. The resultant PCR amplicons are sequenced and aligned. Alignment is searched for availability of SNPs which are base changes within a particular locus. Depending on the informativeness of such SNP after characterization, it can be further evaluated for its effectiveness as a marker for germplasm screening or marker assisted selection. This method of SNP discovery can only be used if there is an existing information concerning the sequence to be amplified. This method was used by Udoh et al.  to identify SNPs causing non-synonymous changes in amino acid of phytoene synthase gene in cassava linked to expression of yellow color in roots of some cassava varieties. Also, Harjes et al.  identified SNPs in the regulatory regions of lycopene epsilon cylase genes causing accumulation of carotenoids in maize.
The availability of whole genome sequences and expressed sequence tags (ESTs) databases has allowed for non-gel-based approach to SNP discovery. Unigenes or EST sequences of interest can be analyzed de novo or exported to other convenient computer software proprams for alignment and SNP searches. Alignment of genomic sequences may identify SNPs in both coding and non-coding regions of the genome but ESTs are preferred because they are coding sequences and SNPs identified here can affect gene expression thus can be evaluated further for downstream applications. This approach to discover SNP is relatively easy and cost effective although the authenticity of sequences used may not be guaranteed because they were mined from public databases.
Also, high throughput automated next generation sequencing (NGS) platforms such as Illumina Genome Analyzer, Roch/454 FLX and ABI SOLiD can generate lots of SNPs when used for whole genome sequencing, RNA sequencing, methylated DNA sequencing and exome capture procedures. SNPs generated through these platforms can be between different varieties of plant species or between the same unigenes. Although these platforms are relatively expensive to utilize, but prices are gradually easing with increasing patronage. This method has been used to discover thousands of good quality SNPs in four pea recombinant inbreed lines . Nevertheless, limitations exist with regards to accuracy, sensitivity and reproducibility of reads generated. A major concern using the NGS platform is the need to use very good assemblers to organize reads for SNP calling; examples of some assemblers include Genome Analysis Toolkit (GATK) , SOAPsnp [11, 12] and freebayes . Different SNP callers have been compared in searching for a more versatile tool [14, 15, 16, 17, 18]. In a study for SNP discovery using RNA-sequence data, a combination of SNP callers Trinity-GATK gave 100% accuracy in peach and mandarin RNA-sequencing .
The versatility of SNPs has also led to their widespread use in phylogenetics to study the relatedness of organisms through the use of molecular sequencing data resulting in the identification and accurate classification of organisms. It has also been applied in phytogeography for determining the distribution of plant species. A major advantage of the single-base resolution of SNPs is that it allows better detection of ‘perfect’ markers, which are causally linked to agronomic traits. Another high-throughput method used for detecting SNPs is the GBS (genotyping by sequencing) which utilizes a range of techniques including those of reduced-representation sequencing and whole genome resequencing. Generally, this method identifies SNPs that are broadly distributed throughout the genome of organisms by fragmenting the genome using restriction enzymes, fragments are ligated to adapters and amplified. Amplified products are sequenced and aligned to a reference genome to call SNPs. GBS is gradually leading a transition from population genetics to population genomics, so that high-throughput marker recognition in plant population is affordable. A lot of commercial crops have been studied using GBS to aid breeding processes in Rice [20, 21, 22], Maize [23, 24, 25], potato [26, 27, 28]. Although GBS was initially developed as a reduced-representation sequencing (RRS) approach using restriction enzymes to decrease genome complexity before sequencing [29, 30, 31, 32]. Whole genome re-sequencing approaches was applied to allow higher genomic resolution. Since the creation of GBS, it has undergone continuous development, based on reduced-representation sequencing or whole genome resequencing methods. Combined with phenotypic data, GBS procedures provide a powerful basis for rapid mapping and identification of SNPs in genes underlying agronomic traits, which can then be utilized as efficient molecular markers for crop germplasm improvement.
Notwithstanding, the fact that a lot of next generation SNP genotyping techniques have been developed, they are all within the same bracket with regards to limitations of cost, complexity and accuracy. Important quantitative trait loci and SNPs associated with desirable agronomic traits have been employed to improve productivity of crops. Whole genome resequencing of
2.1 Single nucleotide polymorphism genotyping methods
Single nucleotide polymorphism is an individual nucleotide base difference between two DNA sequences. When SNPs occur within a gene, they may play a more direct role on the trait by affecting the gene’s function and such SNPs can be exploited as molecular markers. Molecular markers enable precise identification of genotypes without the confounding effect of the environment , because selection is based on molecular determination and not the morphological expressions observed. A more informative marker gives a high polymorphic information content result . SNP markers for chickpea and pigeon pea were evaluated and found to show 100% consistency and polymorphic information content values between 0.02 to 0.5 [22, 40]. SNP markers can be used for association studies, conservation genetics, germplasm screening or characterization, genetic diversity analysis and are fast becoming the preferred marker system in marker assisted breeding programs.
In the last 10 years, the rapid transformation in sequencing technology have enormously affected crop genotyping procedures. These new procedures enhanced rapid, high-throughput genotyping of whole crop population and gives opportunity to advance use of molecular tools in plant breeding. There is an urgent need in crop improvement programs to speed up crop production through marker assisted selection or introduce alleles that confers plants with resistance to pest and diseases, abiotic stress adaptation and high yield potential. Elite cultivars, store very useful genetic information that needs to be introgressed. Molecular marker approaches have been used in analyzing and identifying alleles associated with desirable agronomic traits in diverse germplasm pool of legumes and cereals .
Some SNPs genotyping methods that are easy to use and accurate and can specifically genotype SNP markers at specific loci for a collection of plant population are presented below.
2.1.1 Tetra ARMS allele specific PCR
Tetra ARMS (tetra-primer amplification refractory mutation system) allele specific PCR is a versatile, rapid and economical SNP detection tool. Other contemporary SNP genotyping tools include allele specific PCR, high resolution melting analysis, PCR single stranded conformation polymorphism, PCR-primer introduced restriction analysis and real-time PCR-based genotyping. It involves a single PCR step followed by gel electrophoresis. Tetra ARMS allele specific PCR utilizes four primers including outer forward, outer reserve, inner forward and inner reverse primers. The outer forward or outer reverse primer combination generates the outer fragment of the SNP locus and acts as an internal control for the PCR. The inner forward or outer reverse and outer forward or inner reverse primer combination yield allele-specific amplicons depending on the genotype of the sample used. The placing of the inner primers is not the same as those of the corresponding outer primer to produce amplicons with different sizes and easily visualized on gel and distinction is made accordingly [42, 43, 44].
A study by Ehnert et al.  using tetra ARMS allele specific PCR method described three common single nucleotide polymorphisms in the
With this technique, they is almost always a need for trouble shooting to standardize the procedure especially at initial steps of the protocol to adapt it to the genotype investigated. This really reduced its wide spread application, despite the fact that it is economical and precise in SNP genotyping. In other to improve the ARMS-PCR procedure, several modifications have been suggested to optimize its usage. Two improvements were suggested by Tanha et al. ; one is to equalize outer primer and inner primer strength by adding a mismatch at 2 positions of outer primers and the second is to equal annealing temperature which should be a little higher than melting temperature. This resulted in the improvement of expected result and specificity. Another study by Alyethodi et al.  suggest that the use of Strand displacement polymerase rather than conventional Taq polymerase resulted in the generation of amplicons by 25 cycles in the PCR reaction while Taq polymerase needed a minimum of 35 cycles. Also, reaction with Strand displacement polymerase did not require PCR enhancers like dimethyl sulfoxide, thus it was time saving and efficient.
2.1.2 KASP assay for SNPs genotyping
Another robust and easy to use SNP genotyping method is Kompetitive allele-specific PCR (KASP) genotyping assay based on competitive allele specific polymerase chain reaction, developed by LGC genomics (www.lgcgroup.com). It is widely applied in plant breeding because of its reduced cost in genotyping large number of samples. It allows for biallelic scoring of SNPs and insertion/deletions at specific loci and can be conveniently used for small number of SNPs. Here, the SNP-specific KASP assay mix and the universal KASP master mix are added to the DNA samples, followed by thermal cycling. The bi-allelic discrimination is carried out by competitive binding of two allele-specific forward primers. Each of the primers are labeled with fluorescence resonant energy transfer cassettes a FAM dye and an HEX dye [48, 49].
This method is a PCR-based homogenous fluorescent SNP genotyping set up which is cost-effective to run and more reliable than other SNP genotyping techniques. Since the introduction of KASP, it has been developed and used to genotype rice, wheat, soybean, cumber, chickpeas and many other crops. It has been employed in the enhancement and production of efficient markers in Chinese cabbage. The authors re-sequenced 4 Chinese cabbage and carried out SNP survey in the genome. They established KASP-SNP resource and converted 258 SNP variations into KASP molecular markers. These molecular markers discovered in Chinese cabbage will be invaluable for germplasm identification and cabbage research around the world . Also, Khanal et al.  reported flanking sequences of 162 putative SNPs, none of them have been previously evaluated to determine whether they performed as intended. Therefore, a subset of 31 putative SNPs that represent the entire nematode genome were designed to form a residual emission fluorescence KASP.
With KASP primers, biallelic scoring of SNPs at specific loci is possible. Cotton (
For crop improvement purposes, the maize breeders at International Maize and Wheat Improvement Center developed 16 marker-assisted recurrent select (MARS) populations. The parents of these MARS populations were initially genotyped along with over 450 maize inbred and advanced breeding lines using the GoldenGate assay .
2.2 Application of SNP in crop improvement
SNP markers are used for the improvement of crops in a number of ways which include to select disease resistant crops, high yielding varieties, plants that can withstand biotic and abiotic stress and many more. The development of SNP markers has become a regular process, especially for crops with reference genome and this new order has influenced the application of SNP markers in plant breeding. For more than two decades, researchers experienced so much throwback in a bid to develop markers linked to discovered QTLs. However, the revolution in sequencing technology, brought about easy identification of SNP markers underlying genes in a QTL. SNP markers were used to characterize natural variation of sorghum grain nutrients composition in a global sorghum panel and genome wide association study was use to map QTL responsible for this variation. It was discovered by Rhodes et al.  that protein, fat and starch all had strong correlation across years, but protein was the most significant. Also, protein had the highest narrow sense-heritability. Further investigation showed that there is a strong negative correlation between starch and protein and fat, and strong positive correlation between protein and fat.
SNP markers have been frequently used in marker assisted selection due to its abundance in the genes of all species. In breeding for resistance to root-knot nematode (
The allelic, high number of loci that can be multiplexed and possibility of automation of SNP markers makes them very useful for cultivar identification. Grapevine cultivar identification was carried out using over 300 SNPs in its genome. Re-sequencing method was used in the selection of 11 genotypes, 48 SNPs spread across all grapevine chromosome providing enough information content for genetic cultivar identification .
The quality of a crop is highly dependent on the number of micronutrients it contains. Genome wide study was used to identify SNPs associated with micronutrient (Fe, Se and Zinc) concentration in pea (
Germplasm screening is usually a first step in plant breeding programs. It aims to reduce the large collection of plants and narrow down on those that fit the breeding objectives in view. It is usually laborious and time consuming, but the use of molecular markers can substantially aid this process. Several molecular markers including simple sequence repeats (SSRs) have previously been utilized in plant germplasm screening but SNPs markers enable selection based on target genes that code for specific trait of interest.
Advances in sequencing technologies has given rise to SNP markers and now the markers of choice in genetic studies because they are robust, widely distributed throughout the genome of plants and highly multiplexable. SNPs represent difference in a single nucleotide in the genome and those that are linked to a phenotype as a result of nonsynonymous amino acid changes can be reliably used as molecular markers.
A number of techniques and methods have been applied to discover new SNPs including non-gel methods where SNPs are mined from multiple sequence alignment in databases. SNPs have also been generated using next generation sequencing platforms like Illumina genome analyzer and Roch/454 FLX. Discovered SNPs are developed into user-friendly SNPs markers and used for genotyping. A number of SNP markers have been validated for marker assisted selection. In a study by Burow et al. , SNP markers were developed from sequences of brown midrib (bmr) trait of sorghum and used to accurately identify bmr6 or bmr12 individuals at the seedling stage. This validation was for a group of sorghum germplasm and a genetic population. Also, fifteen KASP SNP markers for bmr6 and bmr12 were developed and used for allele discrimination to select bmr individuals. Another study by Khanal et al.  employed KASP SNPs to determine the genetic variability present in 26 isolates of
In Panax species, Nguyen et al.  identified 1128 SNPs in coding gene sequences and developed 18 SNP markers from the chloroplast genic coding sequence region that can be used to distinguish all the seven Panax species from each other. Because SNPs markers are based on target gene, they are a highly reliable tool in identification of cultivars and germplasm screening.
My profound gratitude goes to my husband and children for their understanding and support during the period of writing this chapter. God bless you all.
Conflict of interest
No conflict of interest.
I thank my coauthors for accepting to contribute to the chapter. We are grateful to Almighty God for the grace of life.
Appendices and nomenclature
Amplified fragment length polymorphism Amplification refractory mutation system Genotyping-by-sequencing Competitive allele-specific polymerase chain reaction assay Marker assisted selection Next generation sequencing Polymerase chain reaction Single nucleotide polymorphism Simple sequence repeats Quantitative trait loci
Amplified fragment length polymorphism
Amplification refractory mutation system
Competitive allele-specific polymerase chain reaction assay
Marker assisted selection
Next generation sequencing
Polymerase chain reaction
Single nucleotide polymorphism
Simple sequence repeats
Quantitative trait loci