Insights into Marker Assisted Selection and Its Applications in Plant Breeding

Burgeoning the human population with its required food demand created a burden on ever-decreasing cultivated land and our food production systems. This situation prompted plant scientists to breed crops in a short duration with specific traits. Marker-assisted selection (MAS) has emerged as a potential tool to achieve desirable results in plants with the help of molecular markers and improves the traits of interest in a short duration. The MAS has comprehensively been used in plant breeding to characterize germplasm, diversity analysis, trait stacking, gene pyramiding, multi-trait introgression, and genetic purity of different cereals, pulses, oilseeds, and fiber crops, etc. Mapping studies pointed out several markertrait associations from different crop species, which specifies the potential application of MAS in accelerating crop improvement. This chapter presents an overview of molecular markers, their genesis, and potential use in plant breeding.


Introduction
It was estimated that the global population would touch 9 billion individuals, and the annual growth rate will be 0.75 percent by 2050. To feed this burgeoning human population only, it is required to produce a surplus of one billion tons of cereals by the end of 2050 [1]. It is well known that to achieve these targets new integrated approaches must be practiced with the conventional breeding programmes to accelerate the breeding cycle by reducing net time and cost per unit production [2, 3].
The primary objective of plant breeding is to increase crop yield [4], and the secondary objectives are quality improvement, development of photo & thermoinsensitive cultivars, tolerance to biotic and abiotic stresses, synchronous maturity, water and nutrient use efficiency, elimination of toxic substances, and different crop maturity groups [5, 6] for high agricultural output and sustainable development. The advanced understanding and developments in molecular genetics have significantly enhanced the efficiency of plant breeding to achieve the desired objectives in crop

Molecular markers: road for easy and reliable selection
Any fixed property of an individual showing the heritable variations is termed as a character or trait, whereas marker can be defined as any mark which inherits together with the trait of interest throughout generation [20,21]. Markers are categorized into four main groups-morphological, biochemical, cytological and molecular (DNA based) markers [22].
Morphological markers are also known as naked eye marker or phenotypic marker, used for quality traits such as flower shape, size, color, seed structure, growth habit, and other agronomic traits in plants. These markers are eco-friendly; easy to use, and need not require any specific instrument; however, their number is limited in crop species and highly influenced by prevailing environmental conditions [22][23][24].
Biochemical markers, mostly isozymes, are the results of variation in enzymes (protein and amino acid sequences) encoded by various genes, but functionally they are the same [25]. They are the result end product of allelic variation of enzymes. They are co-dominance in inheritance, cost-effective, and easy to use. They have been widely used in plant breeding for the study of gene flow, population structure, and genetic diversity [26]. However, they are limited in number, show less polymorphism, and predominantly affected by plant tissue being used, growth stage, and method of their extraction [27].
Cytological markers are based on prevailing variation in number, shape, size, the position of chromosomes, and their banding pattern. Cytological analysis reveals the unique characteristics of chromosomes such as knob and satellite, and the number of nucleoli in the nucleus, etc. This variation shows a different pattern of euchromatin and heterochromatin in the chromosome [22], such as Giemsa stain recognizes G bands. They have been extensively utilized in plant breeding for the identification of linkage groups and physical mapping [9]. In contrast, molecular markers are defined as nucleotides polymorphism present between individuals as a result of deletion, duplication, insertion, substitution, point mutation and translocation, etc. [27] but do not affect the function of the gene.
Molecular markers do not inevitably target genes, instead, inherit as a 'flag' with the gene of interest during transmission of a trait from one generation to the next generation [28]. Molecular markers associated with the close proximity of genes of interest are known as gene tags i.e. linked with target gene [9]. The essential characteristic features of an ideal marker are co-dominance inheritance, high level of polymorphism, high reproducibility, whole-genome coverage, easy and fast to detect, neutral to environmental conditions, high resolution, low cost, and whole-genome coverage [22,27,29]. Different types of molecular markers have been developed, and are used in various crops. These molecular markers are mainly categorized into the following classes based on their method of detection.

Hybridization-based markers
DNA bands are captured where labeled probe i.e. DNA fragment of known sequence hybridizes with DNA fragment digested by restriction endonuclease enzyme. The restriction fragment length polymorphism (RFLP) was the first and last marker which was only based on the hybridization method [22].

PCR-based markers
The idea of polymerase chain reaction (PCR) was conceived by Kary Mullis in 1983, and invented the process in 1985 which is based on denaturation, annealing, and extension [30]. The PCR based markers use primer dependent PCR amplification and/ or DNA hybridization followed by electrophoresis. Polymorphism is detected based on the presence or absence of an amplicon or based on the band size and mobility. The most commonly used PCR based markers are Randomly amplified polymorphic DNA (RAPD) [31], Amplified fragment length polymorphism (AFLP) [32], microsatellites or simple sequence repeats (SSRs) [33], sequence-related amplified polymorphism (SRAP) [34], inter simple sequence repeat (ISSR) [35], cleaved amplified polymorphic sequences [36], sequence characterized amplified region (SCAR) [37].

Sequence-based markers
Sequencing technique is characterized by the identification of nucleotide sequences and their order along with the DNA strand [38]. Sequence-based markers are designed as per a specific sequence of DNA in a pool of unknown DNA. The modern sequencing techniques are genotyping by sequencing (GBS) and nextgeneration sequencing (NGS), which help to develop a large array of polymorphism at the nucleotide level; however, the most commonly used marker are single nucleotide polymorphisms (SNPs) [39] and diversity array technology (DArT Seq), which are known to be more accurate and reliable [22,40].
The historical development of molecular markers is also represented in the Table 1, which is adapted and modified from: Singh and Singh [41].
We have discussed several molecular marker systems; however, the most commonly used markers in plant breeding are RFLP, SSR, RAPD, AFLP, SCAR, and SNP [42]. The single-locus markers are RFLP, VNTR, SSLP, STMS, SSR, STS, SNP, CAPS, and SCAR whereas; multi-locus markers are RAPD, AP-PCR, ISSR, AFLP, M-AFLP, and S-SAP marker [43]. All these markers are used in plant breeding for germplasm characterization and protection, gene tagging, genome mapping, linkage map construction and analysis, evolution studies, parental selection, F1 hybrid testing, genetic purity test of seeds, genes or QTLs mapping etc. [44,45].

Marker assisted selection (MAS)
The direct phenotypic selection in plant breeding for crop improvement is labor-intensive, costly, and time-taking. This selection is also affected by target gene expression, their specific biological or environmental condition, and heritability of a trait. Phenotypic selection is less efficient for the quantitative traits that are frequently under the selection [46].
In MAS, the phenotypic selection is made with the help of genotypic markers. This technique helps to avoid difficulties and challenges that are occurred during the conventional crop breeding [47]. It is mostly used by plant breeders in their breeding programmes for the identification of desired dominant or recessive alleles throughout generations, also it helps to identify best genotypes from segregating generations [48]. The prerequisite for an efficient MAS program is reliable markers, quality of DNA extraction method, genetic maps, knowledge of marker-trait association, quick and efficient data processing, and availability of high throughput marker detection system [49]. Marker development pipeline adapted from [5] Collard and Mackill, 2008, in Figure 1 explain that how marker assisted selection imposed from development of population through various steps.

1923
Sax reported a linkage map between quantitative (seed size) and qualitative trait (seed coat color) in common bean for the first time.

Variations of MAS
There are different molecular approaches used under the umbrella of MAS, such as marker-assisted backcrossing (MABC), gene pyramiding, marker-assisted recurrent selection (MARS) and genomic selection (GS). These approaches have been utilized in plant breeding for the characterization of genetic material and selection of individuals in the early segregating generation, which fastens the breeding cycle with more accuracy [22].

Marker-assisted backcrossing (MABC)
Convention backcrossing is an age-old practice and is a very useful technique for the transfer of oligogenic traits from donor parents to recipient parents by recovering the whole genome of recipient parents except trait of interest after 6-7 generations of backcrossing. The MABC is a backcrossing technique and is assisted by molecular markers [50] to speed up the selection process and genome recovery of recipient parents. The MABC technique has been extensively used to remove the undesirable traits such as insect and disease susceptibility, and anti-nutritional factors etc. from high yielding popular varieties by introducing gene of interest or quantitative trait loci (QTLs) from donor parent [51].
The fundamental basis of MABC is the close association of marker with gene/s or QTLs. Recovery of recurrent parent genome is specified by using formula-1-(1/2) m+1 (m is the number of generation of selfing or backcrossing). This technique has been used in different crops such as rice [52], wheat [53], barley [54], soybean [55], cotton [56], tomato [57], and pea [58], etc. There are three basic steps in the MABC technique viz. foreground selection, recombinant selection, and background selection.
Foreground selection is the first step of MABC, where the gene of interest from the donor parent is the primary target which is linked with the marker. The Plant Breeding -Current and Future Views 6 efficiency of foreground selection depends on marker-trait association, the physical distance between marker and gene of interest, genetic load or linkage drag, number of genes/QTLs/loci targeted to selection, etc. [59]. Linkage drag is undesirable for selection due to the negative effect of associated genes on targeted traits.
Recombinant selection is the second step of MABC, where selection is made for target gene in backcross progeny, and the recombination process is done between the gene of interest and linked flanking marker for reducing the effect of linkage drag [22].
Background selection is the third step of MABC, where the major target is the recovery of a large amount of recipient parental genome from backcross progeny by using molecular markers that are unlinked with the gene of interest [5]. The efficiency of background selection is determined by various factors such as the size of the population, the number of markers and targeted genes, and linkage drag, etc. It helps to speed up the recovery of the recipient parent genome with the trait of interest and also termed as 'complete line conversion' [60].

Marker-assisted gene pyramiding (MAGP)
Current breeding programs mainly focus on the development of lines governing complex traits such as biotic and abiotic stress. Modern MAS methods involve pyramiding of different genes to accomplish such goals referred to as MAGP. In MAGP, two or more than two genes at a time are selected for pyramiding. Different approaches have been utilized for pyramiding multiple genes/QTLs from donor parent to recipient parent. Some of them are recurrent selection, backcrossing, and multiple-parent crossing or complex crossing. The 3-4 desirable genes from other lines would be incorporated by convergent or stepwise backcrossing. The incorporation of more genes is usually carried through multiple crossing or recurrent selection. If we want to pyramid multiple genes/QTLs, marker-assisted convergent crossing (MACC) can be used [8, 61].

Marker-assisted recurrent selection (MARS)
Recurrent selection is an efficient technique used in plant breeding for the improvement of quantitative traits by continuous crossing and selection process. However, its efficiency of selection is adversely affected by environmental fluctuations which leads to delays breeding cycle. In MARS, molecular markers are used at each generation level for the targeted traits. Here, the selective crossing is done in selected individual plants at every crossing and selection cycle. The selection is made based on phenotypic data with marker scores. Thus, it increases the efficiency of recurrent selection and accelerates the breeding or selection cycle. The MARS has been extensively used for polygenic traits such as crop yield, biotic and abiotic stress tolerance, and considered as a forward breeding tool for augmenting multiple genes or QTLs [62].

Genomic selection (GS)
The genomic selection was developed by Hayes and Goddard [63] and is known as an advanced version of MAS. It can predict the genetic values of selected individuals which depend on genome estimated breeding values (GEBVs) by using high-density markers that are distributed throughout the genome. The GEBV prediction model combines genotypic data with phenotypic data with their pedigree and increases the prediction accuracy. The GS is mostly dependent on all the molecular markers which have both major and minor marker effect. Molecular markers are selected based on their whole genome coverage and all the QTLs should be in linkage disequilibrium with at least a single marker [23,62,63]. Two different types of populations are used in GS, such as training and testing population. The training population is related to the breeding population, and used to estimate the genomic selection model parameter. A testing population is a group of individuals in which genomic selection is carried out. The GEBV value is calculated by using molecular markers. Selection is based on GEBVs values, and no direct phenotypic selection is required [22,[64][65][66].

Innovative breeding schemes of MAS
Utilizing molecular markers, MAS has a broad spectrum application in plant breeding. Molecular markers can genotype all the accession present in germplasm. This potentiality permits the categorization of germplasm as well as reducing duplication. Here some of the innovative applications of MAS have been presented.

Combined marker-assisted selection
The MAS, along with phenotypic selection, increases genetic gain to unravel unidentified QTLs through QTL mapping compared to phenotypic screening or MAS alone [67]. The term 'combined MAS' was coined by Moreau et al., 2004 [68]. This approach not only reduces the population size but also increase selection efficiency. The combination of phenotypic selection and MAS also helps select traits where markers genotyping is economical compared to phenotypic screening [69]. With this view, this scheme explain that always a confirmation of MAS is necessary through phenotypic screening like in the case of QTL identified for Fusarium head blight resistance [70].

Marker-directed phenotyping
In most cases, there is a low level of recombination between QTL and marker is observed [13] which means we cannot believe 100% on markers for selecting desirable phenotypes. However, it will reduce the number of plants that are about to evaluate. This approach is mainly used for quality traits [71]; where phenotypic screening is costlier than marker genotyping [72]. The method is also known as tandem selection [71] and stepwise selection by [73]. One of the successful examples to explain this scheme is that rice primary QTL sub 1controls submergence tolerance, which assisted in breeding for the same [74].

Inbred or pureline enhancement and QTL mapping
This approach's main features are constructing the introgression library, evaluating the line for QTL detection, mapping, and further superior line used in the breeding program [41]. This scheme starts with hybridizing the two inbred line. One is the recurrent parent (agronomically superior having defects for one trait), and the other is the donor parent (have the desirable target gene). Further, the F 1 obtained from this cross is backcrossed again to the recurrent parent, and genome-wide markers have been utilized to select the genetic segment from the donor parent. To generate a set of NILs, F 1 is repeatedly backcrossed to the recurrent parent, and this set of NILs is known as the introgression line library. Therefore, this scheme seeks to introduce QTL from a suitable donor parent and simultaneously maps the QTL [75].

Advance backcross QTL analysis
It is designed to facilitate QTL introgression from unadapted germplasm like landraces and wild species into elite lines, simultaneously mapped for introgressed Plant Breeding -Current and Future Views 8 QTL [76]. This scheme is somewhat similar to the introgression line library, as discussed in section 5.3. However, the differences in the incorporation of phenotypic selection are in contrast to the introgression line library. Apart from this, several advantages like simplicity of mapping population in phenotype to the recurrent parent and reducing deleterious allele from donor parent, possibility of epistasis, andlinkage drag. After QTL mapping, only one or two generationsare needed for identifying QTL-NILs. In several crops like maize, tomato, soybean, cotton, rice, barley, and wheat, this approach is effectively used [9].

Single large scale MAS: a strategy applied at early generation
Single large scale MASwas proposed by Ribaut and Betran, 1999 [77], where marker-assisted selection is utilized at first segregating generation (F2 or F3). As the name describes, a single means one; large scale means up to three QTLs, explaining the most considerable phenotypic variance. The shortening of crop duration by reducing the breeding cycle prompted the idea of early generation MAS. Further plants having targeted gene/QTLs are selected whereas undesirable gene combination was discarded. Further, selected alleles were fixed in homozygous condition, and individual plants with undesirable genes would be discarded in early segregating generations. Thus, emphasis can be given on a few selected lines in the later stage, which reduces the wastage of resources and increases the selection efficiency [78].

Breeding by design
MAS's most ambitious objective is to improve plant type having the anticipated alleles at each locus participating in the control of all the traits [79]. Plant breeders will exploit known allelic variation to frame elite lines by accumulating multiple favorable alleles through this approach [80]. Therefore, the breeder can pre-plan the combination of genes he is looking for, and consequently, he can select the plant with the desired characteristics that will save expensive field testing.

Mapping As You Go (MAYG)
This method revised assessments of QTL allele effects for remapping new elite germplasm produced continuously over the selection cycle. In this approach, initial breeding crosses are utilized to estimate the QTL location and its impact. The information revealed from this estimation will be used in the mapping. This updated QTL information will be used in a new set of breeding cycles as the name suggest, mapping as you go, which means that the breeding cycle can be continued as long as desired. Overall an enhanced response has been achieved with frequent re-estimation of QTL compared to single QTL estimation at the initial level of this approach [41]. Hence, this method's advantage is that it ensures that the QTL estimate remains significant for the germplasm currently used in the breeding program [81].

Characterization of breeding material
Well-documented and characterized breeding material is a prerequisite for improving crop yield in plant breeding programs. The MAS could help to select desirable traits and have been exploited to identify cultivars/purity assessment, evaluate genetic diversity and selection of suitable donor parent, heterotic grouping, and identification of genomic regions for effective utilization in breeding programs [82][83][84].

Achievements made through MAS
Several examples illustrate the achievement, made through marker-assisted selection; however, in Table 2, few paradigm crop-wise and trait wise have been presented.
Apart from the improvement in specific traits through an indirect selection via MAS, there are varieties that are released through MAS also presented in Table 3

Conclusion and future perspectives
Molecular marker technology has traveled more than 30 years since the identification of the first marker i.e. RFLP, and reached its peak by using SNP or DArT. Molecular marker can assist in the selection process with phenotypic selection and speed up the pace of the breeding cycle. In recent times modern technologies such as NGS i.e. low cost with high throughput, GS, and GBS have been used in plant breeding but could not achieve the desired goal. The most probable reason