Progression of DNA Marker and the Next Generation of Crop Development

will discuss various aspects of DNA molecular markers associated with crop development. They include the progression of molecular marker technology, prospect and current limitations, applications in unraveling global genetic potential, and specific utilization for exploiting global genetic sources and re-purpose some of the traits to fulfill local demand in the given environmental conditions.


Introduction
Advancement in genomic technology has been the main thrust for the progression of DNA markers that is now approaching a critical point in providing a platform for the next generation of varietal development. Improving total yield production to meet the increasing need to feed the world population remains the major goal. However, achieving more sophistication in providing high quality crop products to meet the emerging demand for better nutritional values and food functionalities will increasingly become important goals. Progression in high throughput marker analyses, significant reduction in the cost per data point, sophistication in computational tools, and creation of customized sets of markers for specific breeding applications are continuing and expected to have direct implications for highly efficient crop development in the near future. An advanced DNA marker system can be used to accomplish breeding goals, as well as achieve various scientific goals. The goals encompass a wide array of targets from understanding the function of specific genes so detailed that the quality of gene output or products can be controlled to attaining a global view of genomic utility to improve crop development efficiency. The combination of molecular understanding at the individual gene levels and genetic manipulation at the genome levels may lead to a significant yield leap to meet global food challenges.
Historically, plant breeding always integrates the latest innovations to enhance crop improvement. Starting out with the prehistoric selection based on systematic visual observations leading to the first plant domestication (Harlan, 1992), crop development was further enhanced by employing Darwin's scientific principles of hybridization and selection, then applying Mendel's principles of association between genotype and phenotype, and now through DNA markers and genomics that will lead to the next generation of crop development. Recent progression in high throughput marker genotyping, genome scanning, sequencing and re-sequencing, molecular breeding and bio-informatics, software and algorithm, and precise phenotyping (Delseny et al. 2010;Edwards & Batley, 2010;Varshney et al., 2009;Mochida & Shinozak, 2010;Davey et al., 2011) is a conduit for the next generation of crop development that uses different views and avenues to approach the same goal. This chapter will discuss various aspects of DNA molecular markers associated with crop development. They include the progression of molecular marker technology, prospect and current limitations, applications in unraveling global genetic potential, and specific utilization for exploiting global genetic sources and re-purpose some of the traits to fulfill local demand in the given environmental conditions. 6 quantitative steady state fluorescence intensity readings to be made (Jenkins & Gibson, 2002). It also has been applied for genotyping with TaqMan, Invader and rolling-circle amplification. Fluorescence plate readers allow measurement of additional fluorescence parameters, including polarization, lifetime and time-resolved fluorescence, and fluorescence resonance energy transfer. In addition, mass spectrometry and light detection are also used for high throughput SNP genotyping.
DNA chip or gene chip is a SNP detection platform for high-throughput genotyping. It consists of a collection of microscopic DNA spots attached to a solid surface. This is one of the fastest research developing areas. More than 1.8 million markers (about 906,600 SNPs and 946,000 probes) are available from the Affymetrix® Genome-Wide Human SNP Array 6.0 for the detection of copy number variation. Luminex has developed a panel of 100 bead sets with unique fluorescent labels that can be processed by flow analyzer. Besides detecting SNPs, genotyping, or re-sequencing mutant genomes, DNA microarrays has been used to measure gene expression. SNP detection also can be done using mass spectrometry based on molecular weight difference of DNA bases. Variation of this technique includes MALDI-TOF (matrix-assisted laser desorption/ionization-time of flight) mass spectrometry that uses allele-specific incorporation of two alternative nucleotides into an oligonucleotide probe to allow measurement of the mass of the extended primers. This approach can also detect PEX products in multiplex very efficiently. Both DNA microarrays developed by Affymetrix (Santa Clara, USA) and a high-density biochip assay by Illumina Inc. (San Diego, USA) are two major chip-based high-throughput genotyping systems that offer different levels of multiplexes of several thousands (Yan et al., 2010). When an ultra-high density SNP map is used, QTL gain detection efficiency has improved considerably compared to using maps from traditional RFLP/SSR markers .

DArT (diversity array technology) and RAD (restriction site associated DNA)
Dramatic advancement of SSR and SNP marker technology and their applications have been achieved in important organisms, including humans and a number of model animals and crops. However, discovering sequence polymorphism in non-model species, especially 'orphan' crop and other crops that have complex, polyploid genomes, remains slow. DArT (diversity arrays technology) is a microarray hybridization-based marker system that can be used to overcome the problem, since it does not require prior knowledge of genetic or genomic sequence (Yang et al., 2011;Alves-Freitaset al., 2011;Jaccoud et al., 2001;Wenzl et al., 2004). It has relevant applications for species with complex genomes and especially for the 'orphan' crops important for Third World countries. In addition to its high throughputness, DArT is relatively quick, highly reproducible, and cost effective about tenfold lower than SSR markers per data point (Xia et al., 2005). It is designed for open use and not covered by exclusive patent rights. Users can freely specify the scope of genetic analyses and it can be expanded as needed.
Typical DArT analyses include 1) Constructing a reference library representing the genetic diversity of a species through extraction of total genomic DNA (metagenome) from a pool of individuals (i.e. a group of cultivated genotypes or to be combined together with their wild relatives, followed by complexity reduction to produce genomic representation, and cloning using suitable vector and E. coli; 2) Preparing "discovery array" containing individual clones; 3) Generating genomic representations of individual lines studied; 4) Hybridizing www.intechopen.com Progression of DNA Marker and the Next Generation of Crop Development 7 with genomic representations of all genomes in the metagenome library; 5) Identifying polymorphic clones and assembling polymorphic data into "genotyping array"; and 6) Genotyping analyses, including construction of linkage mapping or other type of analyses.
Beside its great potentials, DArT has inherent limitations. First, DArT markers are dominant markers (present or absent), which restrict its value in some applications. Second, it is a microarray-based technique that involves several steps, including preparation of genomic representation for the target species, cloning, and data management and analysis. These steps require expertise, additional cost, and also utilization of supporting software, such as DArTsoft, DArTdb, and DArtsoft 7. These may pose some limitation to its full utilization potential in the developing countries. Beside a slow start centered around the team that developed the system, an increasing number of independent research groups now have routinely utilized the methodology involving a broader range of species for various purposes, including linkage mapping (Yang et al., 2011), genotyping of closely related species (Alves-Frietas et al., 2011), genotyping very large and complex genomes such as wheat (Paux et al., 2008) and sugarcane (Wei et al., 2010).
More recently, a variety of microarrays (including tiling/cDNA/oligonucleotide arrays) also has been used to develop the so-called RAD markers for study of genomewide variations associated with restriction sites for individual restriction enzymes. For this purpose, first a genome-wide library of RAD tags is developed from genomic DNA, which is then used for hybridization on to the chosen microarray to detect all restriction site-associated variations in a single assay. The development of RAD tags involves the following steps: (i) digestion of genomic DNA with a specific restriction enzyme; (ii) ligation of biotinylated linkers to the digested DNA; (iii) random shearing of ligated DNA into fragments smaller than the average distance between restriction sites, leaving small fragments with restriction sites attached to the biotinylated linkers; (iv) immobilization of these fragments on streptavidin-coated beads; and (v) release of DNA tags from the beads by digestion at the original restriction sites. This process specifically isolates DNA tags directly flanking the restriction sites of a particular restriction enzyme throughout the genome. The RAD tags from each of a number of samples, when hybridized on to a microarray, allows high-throughput identification and/or typing of differential hybridization patterns. These markers have clear advantage over the existing marker systems (for example, restriction fragment length polymorphisms, AFLPs and DArT markers) that could assay only a subset of SNPs that disrupt restriction sites. RAD markers were successfully developed in a number of organisms, including fruit fly, zebrafish, threespine stickleback, and Neurospora (Lewis et al., 2007;Miller et al., 2007a, b) and will certainly find their way in most of the laboratories working on higher plants.
Another high throughput restriction-based marker is RAD (Restriction site Associated DNA) markers that can be used for genetic mapping. To generate RAD markers, RAD tags (the DNA sequences immediately flanking each instance of a particular restriction enzyme site throughout the genome) need to be isolated. This involves digesting DNA with a particular restriction enzyme, ligating biotinylated adapters to the overhangs, randomly shearing the DNA into much smaller fragments than the average distance between restriction sites, and isolating the biotinylated fragments using streptavidin beads (Miller et al., 2007b). Different RAD tag densities can be obtained by utilizing different restriction enzymes during the isolation process. Once RAD tags are isolated, they can be used for microarray analysis (Miller et al., 2007a;Lewis et al., 2007).
www.intechopen.com As an alternative, RAD analyses can be incorporated into high-throughput sequencing (i.e. on the Illumina platform; Baird et al., 2008). For that, the RAD tag isolation procedure will need to be modified. After the production of DNA fragments much smaller than the average distance between restriction sites by random shearing, it will be followed by preparation of the sheared ends and ligation of the second adapter, and amplification of specific fragments that contain both adapters using PCR. The first adapter contains a short DNA sequence barcode. Different DNA samples can be prepared with different barcodes to allow for sample tracking when multiple samples are sequenced in the same reaction (Hohenlohe et al., 2010;Baird et al., 2008). These RAD tags can then be subjected to highthroughput sequencing for more efficient RAD mapping. The sequencing approach produces higher genetic marker density than microarray methods.

Random, genic, and functional markers
DNA markers can be classified as 1) random markers (anonymous or neutral markers) when they are derived at random from polymorphic sites across the genome, 2) gene targeted or candidate gene markers when they are derived from polymorphisms within genes, and 3) functional markers when they are derived from polymorphic sites within genes that are causally associated with phenotypic trait variation (Andersen & Lübberstedt, 2003;Wei et al., 2009). Each marker type may be used for specific purposes. Random markers, for example, can be used as an effective tool for establishing a breeding system, studying a gene flow among natural populations, and determining a genetic structure of population or characterizing a GeneBank collection (Xu et al., 2005). Although the predictive value of a random marker depends on the known linkage phase between marker and target locus alleles (Lübberstedt et al., 1998), so far, a random marker remains the marker system of choice for marker-assisted breeding and QTL analyses in a wide variety of crop plants (Semagn et al., 2010;Xu, 2003b).
Both genic and functional markers are derived within the genes. Therefore, they are correlated well with gene function and have a high predictive value for the targeted gene in selection (Anderson & Lübberstedt, 2003;Wei et al., 2009). Because of that, they are most suited for use in marker-assisted breeding. The number of both genic and functional markers increases substantially in the recent years due to the availability of DNA sequence information from whole genome sequence projects that are available publically for a number of plant species, including rice, soybean, cassava, maize, barley, wheat, potato, and tomato (Mochida & Shinozak, 2010). Sequence data of fully characterized genes and fulllength cDNA clones are also available for some plant species, including those described above. The sequence data for ESTs, genes, and cDNA clones can be downloaded from GeneBank and scanned for identification of markers, including SSRs which are typically referred to as EST-SSRs or genic microsatellites. Many gene-derived SSR markers for maize, for example, have been developed from genes using the available information in GeneBank and their primer sequences are available at www.maizeGDB.org.
Genic SSRs are more transferable across species than genomic markers, especially when the primers are designed from more conserved coding regions (Varshney et al., 2005). EST-SSR markers could, therefore, be used in related species where information on SSRs or ESTs is limited. These markers can also be used effectively for comparative mapping (Shirasawa et al., 2011;Yu et al., 2004;Varshney et al., 2005;Oliveira et al., 2009). EST-SSRs can be used to produce high quality markers, but they are often less polymorphic than genomic SSRs Aggarwal et al, 2007;Eujayl et al., 2002;Thiel et al., 2003). The EST resources can be further mined for SNPs (Ramchiary et al., 2011;Li et al., 2009).

Physical map
A physical map provides information on the order of genetic components on the chromosomes in terms of physical distance units (base pairs). Deciphering actual biological functions of the physical map hold the key to unravel the overall genetic potential of organisms. However, a construction of a whole-genome physical map is crucial. It provides a solid blueprint for quantifying species evolution, revealing species-specific features, delineating ancestral biological functions shared by a certain group of plant species, predicting and interpreting regulatory signatures, and for practical purposes identifying candidate genes needed in crop improvement through sequences of functional or structural orthologs among closely related or model species. The construction of a physical map has been the critical component in numerous genome projects, including the Human Genome Project (HGP), the first genome project initiated in 1990 and completed in 13 years, to produce and integrate genetic, physical, gene and sequence maps. As of today, 25 published plant genome sequence (complete, publicly available, and can be used without restriction) is available, including for potato, grape, Arabidopsis thaliana, A. lyrata, Thellungiella, Brassica rapa, poplar, cucumber, cannabis, apple, strawberry, soybean, Pigeon pea, lotus, Medicago, Date palm, maize, sorghum, Brachypodium, rice, selaginella, and Physcomitrella (CoGePedia, 2011). Rice (Oryza sativa) genome sequence was the second (after Arabidopsis) to be published in plants, but it is the first monocot, grass, grain, and food crop genome. Its original published genome published in 2002 is consisted of a dual publication from two independent groups, using two subspecies of rice, japonica and indica. The current version of the rice genome contains ~370 megabases of sequence and 40,577 non-transposon related genes spread across 12 chromosomes (CoGePedia, 2011).
Physical mapping can be carried out using BAC-by-BAC or clone-by-clone strategy using two-step progression; First is the establishment of BAC clones (typically 100-150 kb) for the target genome/chromosome together with a set of overlapping clones representing a minimal tiling path (MTP) to be ordered along the chromosomes of the target genome. Shotgun sequencing is then applied to the individually mapped clones of the MTP. The DNA from each BAC clone is randomly fragmented into smaller pieces to be cloned into a plasmid and then subjected to Sanger sequencing (dideoxy sequencing or chain termination method) or sequenced directly using Next Generation Sequencing (NGS) technologies. The resulting sequence data are then aligned so that identical sequences overlap and contiguous sequences (contigs) are assembled into a finished sequence. Unlike the Sanger sequencing technology, NGS technologies are based on massive parallel sequencing, do not require bacterial cloning, and only rely on the amplification of single isolated DNA molecules. Tens of millions of single-stranded DNA molecules can be immobilized on a solid surface, such as a glass slide or on beads, and analyze them in a massively parallel way providing extremely rapid sequencing. Physical mapping can also be done through whole-genome shotgun (WGS) strategy involving the assembly of sequence reads generated in a random, genomewide fashion. The entire target genome (chromosome) is fragmented into pieces of certain sizes that can either be subcloned into plasmid vectors or sequenced directly using NGS technologies. Highly redundant sequence coverage across the genome or chromosome can be generated through sequence reads from many subclones and using various computational methods, the sequences are assembled to produce a consensus sequence.
One of the most expected outcomes of the genome sequence is high-throughput development of molecular markers to assist genetic analysis, gene discovery and breeding programs (Fukuoka et al. 2010). Because of the genome sequence, rice, for example, is now rich in tools for mapping and breeding. It has high density SSRs of about 51 SSR per Mb, comprehensive SNPs (1,703,176 SNPs, approximately one SNP every 268 bp), insertiondeletion polymorphisms (IDPs) and custom designed (candidate gene) markers for markerassisted breeding (Feuillet et al, 2011). Upon the completion of genome sequence, various efforts have been dedicated to tributary SNP markers identified from the sequence into breeder's chips where it can be used as a breeding tool (McCouch et al. 2010). A combination of low-, medium-and high resolution SNP assays are being developed for variety of purposes. The low density SNP chips, the 384-SNP OPAs are particularly attractive to the breeding and geneticists because of their reliability and require little technical adjustment once they are designed and optimized. Hundreds or thousands of individuals can be assayed within a short time window and are relatively inexpensive compared to the time, labor and bioinformatics requirements of other marker technologies.

Genetic map
A genetic map is produced by counting recombinant phenotypes revealing important genetic layouts of the organism. Marker based-genetic linkage map is generally constructed using the same principles for constructing classical genetic maps. The components of mapping include selection of markers, development of mapping populations from selected parental lines, genotyping and phenotyping each individual in the mapping population using molecular markers; and constructing linkage maps from the phenotypic and marker data. To define a recombination frequency between two linked genetic markers, genetic distance units known as centiMorgans (cM) or map units are used. Two markers are 1 cM apart if they are found to be separated in one of 100 progeny. However, 1 cM does not always correspond to the same length of DNA physical distance. The actual length of DNA per cM is referred to as the physical to genetic distance. In the genome areas where recombination occurs frequently (recombination hot spots), shorter length of DNA per cMas low as 200 kb/cM, can be found. The characteristic of recombination hot spot is that the gene or genes where crossovers occurred are mostly located in very small genetic intervals, consisting mostly of 1-2 genes, and that those genes almost always harbor one or more single feature polymorphisms (Singer et al., 2006). In other parts where recombination may be suppressed, the physical to genetic distance can be 1500 kb per 1 cM. The lowest recombination rates typically occur at the centromeres due to heavily methylated heterochromatin (Haupt et al., 2001). The proportion of recombinant gametes depends on the rate of crossover during meiosis and is known as the recombination frequency (r). The maximum proportion of recombinant gametes is 50% when crossover between two genetic loci has occurred in all the cells. This is equivalent to non-linked genes where the two loci are inherited independently. The recombination frequency depends on the rate of crossovers which in turn depends on the linear distance between two genetic loci. Recombination frequencies, range from 0 (complete linkage) to 0.5 (complete independent inheritance). A measure of the likelihood that genes are linked is expressed as the logarithm of the odds (LOD). The LOD score (logarithm (base 10) of odds), developed by Newton E. Morton, is a statistical test to determine that two loci are linked. Positive LOD scores indicate the presence of linkage, whereas negative LOD scores indicate that linkage is less likely. By convention, a LOD score greater than 3.0 is considered evidence for linkage.

Mapping populations
There are various types of populations that can be used to create genetic maps, develop marker linked to target genes, and facilitate marker verification. The most common populations created for mapping purposes include F 2 s, backcrosses (BCs), double haploids (DHs), recombinant inbreed lines (RILs), and near isogenic lines (NILs). In association mapping, natural populations are used. DHs are produced from chromosome doubling of haploids via in vivo and in vitro. They have several advantages over other diploid populations of F 2 s, F 3 s, or BCs, since no dominance or dominance-related epistasis effects involves in the genetic model. As a result, additive, additive-related epistasis, and linkage effects can be investigated properly. As a permanent population, DH lines can be replicated as many times as desired across different environments, seasons and laboratories, providing endless genetic material for phenotyping and genotyping and to evaluate the genotype-byenvironment interaction (Forster & Thomas, 2004;Bordes et al., 2006). In DH populations, the additive component of genetic variance is larger than that of F 2 and BC populations. Detailed quantitative genetics associated with DH populations have been previously discussed, including detection of epistasis, estimation of genetic variance components, linkage test, estimation of gene numbers, genetic mapping of polygenes and tests of genetic models and hypotheses (Choo et al., 1985;Bordes et al., 2006).
Recombinant inbred lines or random inbred lines (RILs) can be produced through various inbreeding procedures. They include full-sib mating for open-pollinated plants and selfing for self-pollinated plants. In self-pollinated plants, RIL can be developed through a bulking method where hybrids are bulk planted and harvested until F 5 to F 8 before they are planted by families. RIL can also be produced through single seed descent (SSD) where one or several seeds are harvested from each F 2 plant and planted to produce the next generation until F 5 to F 8 . Near-isogenic lines are the product of inbreeding through successive backcrossing.

Mapping software and tools
Almost all molecular maps on the first generation of molecular markers, such as RFLPs, were constructed using MAPMAKER/EXP (Table 1). For severe distortion of segregation, statistical modifications will be needed and MAPDISTO can be used to solve this problem. JOINMAP can be used for construction genetic linkage for BC 1 , F 2 , RIL, F 1 -and F 2 -derived DH and out-breeder full-sib families. It can combine ('join') data derived from several sources into an integrated map, with several other functions, including linkage group determination, automatic phase determination for out-breeder full-sib family, several diagnostics and map charts (van Ooijen & Voorrips, 2001). A software package CMAP, a web-based tool, allows users to view comparisons of genetic and physical maps. The package also includes tools for curating map data . There are many commercial or freely available software packages for establishing association between marker genotypes and trait phenotypes. The most commonly used are QTL CARTOGRAPHER, MAPQTL, PLABQTL and QGENE. All of these only handle bi-allelic populations, while MCQTL (Jourjon et al., 2005) can perform QTL mapping in multi-allelic situations, including bi-parental populations from segregating parents, or sets of biparental, bi-allelic populations. The most frequently used QTL software during the 1980s and 1990s was MAPMAKER/QTL. MAPL allows a user to get results on segregation ratio, linkage test, recombination value, group markers, and order of markers by metric multidimensional scaling, and to draw a QTL map through interval mapping and analysis of variance (ANOVA).
A currently widely used QTL mapping software is QTL CARTOGRAPHER (Table 1). PLABQTL uses composite interval mapping with many functions similar to QTL CARTOGRAPHER. QTL can be localized and characterized in populations derived from a biparental cross by selfing or production of DHs. Simple and composite interval mapping are performed using a fast multiple regression procedure and can be used for QTL × environment interaction analysis (Utz & Melchinger, 1996). Recently, QGENE has been rewritten in the Java language and can be used for analyses of trait and QTL permutation and simulation for populations and as well as traits. Several software packages can be used for constructing linkage maps in out-crossing plant species, using full-sib families derived from two outbreed (non-inbreeding) parent plants (Garcia et al., 2006). Bayesian QTL mapping has received a lot of attention in recent years. Several software packages have been developed; For example, BQTL can perform maximum likelihood estimation of multi-gene models, Bayesian estimation of multi-gene models using Laplace Approximations, and interval and composite interval mapping of genetic loci. BLADE was for Bayesian analysis of haplotypes for LD mapping. MULTIMAPPER is a Bayesian QTL mapping software for analyzing backcross, DH and F 2 data from designed crossing experiments of inbred lines (Martinez et al., 2005). MULTIMAPPER/OUTBRED for populations derived from out-bred lines. Several mapping software packages were developed for QTL mapping for some specific situations. MCQTL was developed for simultaneous QTL mapping in multiple crosses and populations (Jourjon et al., 2005), including diallel cross modeling of the QTL effects using multiple related families. MAPPOP was developed for selective and bin mapping by selecting samples from mapping populations and for locating new markers on pre-existing maps (Vision et al., 2000). In addition, QTLNETWORK was developed for mapping and visualizing the genetic architecture underlying complex traits for experimental populations from a cross between two inbred lines (Yang et al., 2008).
Web-based QTL analytical tools are also available. Some of the tools developed in other system can potentially serve as a model for plants. WEBQTL (Table 1) provides dense errorchecked genetic maps, as well as extensive gene expression data sets (Affymetrix) acquired across more than 35 strains of mice. To map QTLs in out-bred populations, QTL EXPRESS (Seaton et al., 2002) was developed for line crosses, half-sib families, nuclear families and sib-pairs. It provides two options for QTL significance tests: permutation tests to determine empirical significance levels and bootstrapping to estimate empirical confidence intervals of QTL locations.
Association or LD mapping is another mapping tool using unstructured populations of unrelated individuals, germplasm accessions, or randomly selected cultivars. Prior to LD mapping, genotype units are subjected to statistical analysis to remove population structure, which can cause false positive associations due to circumstantial correlations rather than real linkage. To meet the requirement, the STRUCTURE software (Pritchard et al., 2000) can be used. Some software packages have already included the population structure analysis functionality. STRAT, as a companion program to STRUCTURE, uses a structured association method for LD mapping, enabling valid case-control studies even in the presence of population structure (Pritchard et al., 2000). TASSEL can be used for trait analysis by association, evolution and linkage, which performs a variety of genetic analyses including LD mapping, diversity estimation and LD calculation (Zhang, et al., 2006). MIDAS can be used for analysis and visualization of inter-allelic disequilibrium between multi-allelic markers (Gaunt et al., 2006). With PEDGENIE, any size pedigree may be incorporated into this tool, from independent individuals to large pedigrees and independent individuals and families may be analyzed together. GENERECON is another software package for LD mapping using coalescent theory. It is based on a Bayesian Markovchain Monte Carlo method for fine-scale LD mapping using high-density marker maps. Genome-wide association (GWA) studies are used to find the link between genetic variations and common diseases in humans, as well as agronomic traits in plants. A well-powered GWA study will involve the measurement of hundreds of thousands of SNPs in thousands of individuals. Statistical tools developed for GWA studies include GENOMIZER, MAPBUILDER, CATS (Table 1).

DNA marker utilizations
One of the most successful practical uses of molecular markers to date is gene introgression and pyramiding. Publicly available information on gene-marker association for a number of important agronomic traits can readily be used to introgress and pyramid these genes into elite breeding lines used in cultivar development. Marker-assisted backcrossing (MABC) is a straight forward method to introgress or move target gene(s) from parental donors to parental recipients. It involves successive backcrossing to remove the genetic background of the donor while recovering genetic properties of recurrent parents as much as possible. Statistical methods and schedule of backcrosses to create effective MABC have been reviewed in various papers (Hospital, 2001;Hospital & Charcosset, 1997;Herzog & Frisch, 2011). MABC with marker-based genome scanning has allowed a speedy recovery of most recurrent genome in a few crosses (Frisch et al., 1999;Frisch & Melchinger, 2005). MABC can also be used to develop cleaner near isogenic lines by minimizing carried over donor segments flanking the target locus, providing precise introgression of individual genes for detailed characterization of the QTLs. Marker-assisted gene pyramiding has been successfully utilized to combine multiple genes of male sterility (Nas et al., 2005) or to provide broader-spectrum of resistance against major diseases, such as rice blast and bacterial blight (Yoshimura et al., 1996;Jeung et al., 2006). Individual genes have unique reactions against pathogenic races and some of them have overlapping spectra that make selection based on disease reactions or symptoms more challenging. This problem can easily be overcome using molecular markers linked to individual disease-resistant genes allowing effective selection to be carried out to stack the genes.

MAPMAKER/EXP
The first and most frequently used mapping software for map construction in the early era of DNA markers developed by the Whitehead Institute (Lander et al., 1987).

MAP MANAGER CLASSIC
Provides a graphical presentation and interactive tool to map Mendelian loci for codominant markers, using backcrosses or RILs in plants or animals (Manly, 1993).

MAP MANAGER CLASSIC
Provides a graphical presentation and interactive tool to map Mendelian loci for codominant markers, using backcrosses or RILs in plants or animals (Manly, 1993).

MAPDISTO
Can be used to address segregation distortion in segregating populations, such as backcross, double haploid (DH) and RIL populations. It computes and draws genetic maps through a graphical interface and analyzes marker data by showing segregation distortion due to differential viability of gametes or zygotes. Maps or data from multiple populations derived from different crosses can be combined into single or consensus maps through joint mapping. Allows a user to get results on segregation ratio, linkage test, recombination value, group markers, and order of markers by metric multi-dimensional scaling, and to draw a QTL map through interval mapping and analysis of variance (ANOVA). Developed by Ukai et al., 1995), 9 QTL CARTOGRAPHER Implements several statistical methods using multiple markers simultaneously, including composite interval and multiple composite interval mapping. Interaction between identified QTLs can also be estimated. 10 PLABQTL Uses composite interval mapping.

QGENE
Intended for comparative analyses of QTL mapping data sets, developed in 1991 as a map and population simulation program, to which QTL analyses were added later on (www.qgene.org/)

MAPQTL
Calculates QTL positions on genetic maps for several types of mapping populations, including BC1s, F 2 s, RILs, DHs. It can also be used for QTL interval mapping, composite interval mapping and non-parametric mapping using functions for automatic cofactor selection and permutation test.

BQTL
Used for the mapping of genetic traits from line crosses and RILs (Borevitz et al., 2002).

BLADE
Used for Bayesian analysis of haplotypes for LD mapping (Liu et al., 2001;Lu, et al., 2003).

MULTIMAPPER/ OUTBRED
Has multi uses, including populations derived from out-bred lines.

WEBQTL
Used for exploring the genetic modulation of thousands of phenotypes gathered over a 30-year period by hundreds of investigators using reference panels of recombinant inbred strains of mice in web-based applications.

TASSEL
Comprehensive LD-based QTL mapping for trait analysis by association, evolution and linkage, which performs a variety of genetic analyses, including LD mapping, diversity estimation and LD calculation (Zhang et al., 2006).

MIDAS
For analysis and visualization of inter-allelic disequilibrium between multi-allelic markers (Gaunt et al., 2006) 19 PEDGENIE Used as a general purpose tool to analyze association and transmission disequilibrium (TDT) between genetic markers and traits in families of arbitrary size and structure (

QTL mapping
A long history of breeding suggests that grain yield is controlled by many genes with small effects. For this type of trait, applicability of finding and introgressing QTLs are limited since estimates of QTL effects for minor QTLs are often inconsistence. Even though these minor QTLs could show consistent effects, pyramiding these minor QTLs is increasingly challenging as the number of QTLs pyramided into one line increases (Bernardo, 2008). Inconsistency of estimated QTL effects for complex traits controlled by many minor genes brings the following important consequences. Due to limited transferability of estimated QTL effects across different populations for traits such as grain yield, QTL mapping will have to be repeated for each breeding population. Under this condition, Marker-assisted recurrent selection (MARS) is suitable since genotyping, phenotyping, and construction of selection index are repeated for each population (Koebner, 2003;Campbell et al., 2003). Because GXE interactions have a great influence on complex traits controlled by many QTLs with minor effects, QTL mapping from the same population needs to be conducted in each target set of environments. Finally, because the effects of sampling errors are high, population size of 500 to 1,000 is suggested (Beavis, 1994).
Mapping of multiple trait complexes in multiple environments can be conducted by employing algorithmic models to predict the association of genetic markers with trait of interest based on the effect of variance and covariance of the analyzed. These models allow the designing of new mapping frameworks and simulation tools, and the association to be extrapolated to the progeny of the plant or genetic materials tested in multiple environments. There is no limitation on the number of environments where the traits are scored. Based on simulation studies (Howes et al., 1998;Wang et al., 2007a), combining favorable marker alleles for more than 12 unlinked QTLs appears to be not feasible. The breeder may initially target a large number of QTLs but expects to accept having fewer QTL alleles fixed in a recombinant inbred. Since the improvement can only be targeted in a limited QTL number, breeders need a high level of confidence that the target QTLs do not represent a false positive that implies stringent levels of significance, P ≤ 0.0001, when identifying the QTLs initially. A stringent significant level, however, can lead to an upward bias in estimating QTL effects (Beavis, 1994;Xu, 2003a) and therefore lead to overly optimistic expectation of response from MAS. Based on empirical and simulation studies, selection responses are increased when less stringent significant levels of P = 0.20 to 0.40 were applied in MARS. These relaxed significant levels allow QTLs with smaller effects to be selected and these minor QTLs can exceedingly compensate for the higher frequency of false positive. Less stringent significant levels are acceptable for pointing QTL locations, and when the goal is to predict genotypic performance such as in MARS, more stringent significant levels are required for combining favorable QTLs in recombinant inbred, introgression, and gene discovery. Along this line, QTLs should ideally be tagged by the markers inside the QTLs, or closely linked, or flanking the QTLs. Based on simulation studies in maize, the response of MARS in a population size of 144 plants was highest when about 128 markers are used (Bernardo & Charcosset, 2006), indicating that markers should be placed 10 to 15 cM apart and, therefore, denser markers are not necessary for predicting the performance.
Methods for using genetic markers, such as gene sequence diversity information, to improve plant breeding in developing cultivars by predicting the values of phenotypic traits based on genotypic, phenotypic, and optional family relationship information to identify markertrait associations in the first population and used to predict the value of the phenotypic trait in the second or target population . These locally important traits are complex qualitative traits that are affected by many genes, the environment, and interaction between genes and environments. The next wave of QTL mapping should be targeted for locally important QTLs directly associated with cultivar development, including the matrix QTLs. It has been suggested that specific targets need to be clearly defined before embarking into the QTL mapping (Bernardo, 2008). In the context described above, yield potential and its components, quality traits, and local adaptation are among the most important QTL mapping targets. The architecture of genetic matrix of these complex traits could be dissected through QTL mapping to provide critical information on genomic regions and fragment sizes that produce the effects and relative importance of additive and non-additive gene action (Fridman et al., 2004).
Multiple QTL Mapping (MQM Mapping) using haplotyped putative QTL alleles has been used as a simple approach for mapping QTLs in plant breeding population (Jansen & Beavis, 2001). It described a method for mapping a phenotypic trait to correspond to chromosomal location. Statistical methods to correlate pedigree with multiple markers (haplotype) are used to determine identical-by-descent (IBD) data to map the phenotypic traits. The statistical model, HAPLO-IM+, HAPLO-MQM, and HAPLO-MQM+ are used for mapping traits to determine a single gene or QTL. This invention provides an efficient method for mapping phenotypic traits in interrelated plant populations. The basic principle of this method is clustering of the original parental lines into groups on the basis of their haplotypes for multiple genetic markers is the basic principle of this method. The effect of a QTL on the phenotype is modeled per haplotype group instead of per family, allowing an examination of the effects of haplotype-allele across families. Simulations of realistic plant breeding schemes have shown a significant increase in the power of QTL detection. This approach offers new opportunities for mapping and exploitation of QTL in commercial breeding programs. In addition, selection can be performed at any stage of a breeding program, including among genetically distinct breeding populations as a preselection to increase the selection index and to drive up the frequency of favorable haplotypes among the breeding populations, among segregating progeny from breeding population to increase the frequency of favorable haplotypes for the purpose of developing cultivars, among segregating progeny from a breeding population to increase the frequency of the favorable haplotypes prior to QTL mapping within this breeding population, and among parental lines from different heterotic groups in hybrid crops to predict the performance potential of different hybrids.
The index values generated from haplotype window-trait association allows pre-selection, which is widely considered as the next generation of MAS, to further economize breeding by not only removing the need of required phenotypic evaluation but also enabling screening of inbreed lines prior to making crosses. Breeders can initiate their programs by selecting a list of crosses and building a model based on haplotypes carried by each parental line in the cross. Selecting a model from cross to cross and inclusion of target genomic regions in the model will increase the complexity of the models. If it is not controlled, it will compromise the predictive ability and selection gain. For controlling the model's complexity, Automatic Model Picking (AMP) algorithm can be employed. The relative strength of each cross can be predicted using the Best Linear Unbiased Predictions (BLUP) approach, calculated on parental lines using phenotypic data. Once the final model is determined, the full gain of for each trait is calculated and the frequency-adjusted predicted gain can be obtained based on expected allele frequency. An additional optimization step can be included to either decrease or increase the importance of the secondary trait in the model based on frequency-adjusted predicted gain. This method provides haplotype information that allows the breeder to make informed breeding decisions based on genotype rather than phenotype into predictive breeding.

Channeling molecular information into new cultivar development
Successful marker breeding requires integration of molecular information into cultivar development programs. The development and testing of a QTL mapping population and the development of near-isogenic lines (NIL) can take several years before the results can be utilized (Monforte & Tanksley, 2000;Chaib et al., 2006). Because of that by the time the QTL identified, the recurrent parent used in the population development is probably commercially obsolete. The purified QTLs still require to be reintroduced into a competitive germplasm for commercial use via a time-consuming backcross scheme. Frampton (2008) has proposed a direct integration of genomic technologies into commercial plant breeding by designing specific crossing schemes to allow the development of marker profiles, QTL mapping of major gene loci, and new cultivars to be advanced simultaneously. The method can be applied repeatedly to achieve complete integration. The breeding population is developed through an initial cross, followed by two backcrosses and self-pollination of BCF 1 plants. Molecular marker development consists of QTL identification using the means of BC 2 F 2 family, gene fine mapping, and new marker development using bulk-segregant analysis. Therefore, the method provides simultaneous development of a breeding population with molecular marker development and gene mapping, and integration of molecular marker platform with the breeding platform.

The bottom line: Prospect and current limitations
Significant progress has been achieved in marker detection methodology in term of speed and cost. Current efforts by various consortiums supported by both public and private entities are underway to push the development of genomic tools to make them more economically and logistically feasible for the breeders. High resolution of marker assay covering all important information across the genomes, such as SNP chip sets being developed, will provide a tremendous asset to mobilize and assemble critical alleles that can improve crop production systems in a significant way. Over 1,200 reports of mapped QTLs are available through various publications in 12 major crop species (Bernardo, 2008;Xu & Crouch, 2008). Each typically reported an average of 3 to 5 QTLs for the trait studied (Bernardo, 2008;Eathington et al., 2007). This large volume of published molecular markertrait associations will continue to grow as a result from the abundant amount of available markers, high density molecular assays, and development of sophisticated user-friendly computer software, and improved cost and technical efficiency in marker analyses. Despite a significant influx of reported marker-QTL trait associations to date, successful exploitation of available mapped QTLs remains low, indicating a lack of synchronization between the QTLs reported and actual breeding goals in the cultivar development. Successful integration of genomic tools in the cultivar development requires sufficient knowledge of breeding materials from molecular perspectives. This is essential for marker-based accumulation of favorable alleles and the ability to predict the effects of new QTLs assembled during cultivar development. Dissection of individual QTLs will lead to a better understanding of their interaction, discovery of hidden QTLs, and a new way to characterize and classify QTLs to facilitate a speedy assembly of critical genes needed to maximize the end products, such as grain production or other specific quality traits of economic significance. The genes critical for maintaining local adaptation and standard industrial and market quality are also among the most important QTLs.
At present, there is a mounting gap between available QTL mapping information and marker-based QTL applications in cultivar development. This vast mostly unexplored area presents a tremendous opportunity for both public researchers and private industry to tag pivotal genes, including their interactions that play important roles in grain production. Because of the massive nature of the undertaking in both financial and human powers, a number of consortiums have been formed and operated to achieve greater common goals nearly impossible to achieve by any single lab. Major companies that have sufficient resources have shown an increased intensity in their effort in this arena, though only very limited information goes to the public.
Publicly available plant databases provide a large amount of genomic data for a wide range of plant species. This wealth of knowledge, however, has not yet found its way into mainstream plant breeding due to several reasons; first, there is no apparent connection between the primary information generated in plant genomics and real life breeding application. Second, databases storing various bits of supportive information (e.g. pedigree, genotype and phenotype) are usually stored in different places and managed by different groups of scientists. And third, there is a gap between breeders and molecular geneticists in perceiving their focus of interest, i.e. tools and interfaces for bioinformatic data focuses vs. organism level. Integration of the fragmented information, views, and priorities will be one of many challenges to overcome. Bioinformatics data typically consist of cDNA and genomic sequence data, genetic maps of mutants, DNA markers and maps, candidate genes and quantitative trait loci (QTL), physical maps based on chromosome breakpoints, gene expression data and libraries of large inserts of DNA such as bacterial artificial chromosomes and radiation hybrids. Information flows from molecular markers to genetic maps to sequences and to genes. However, the relationship between breeding (i.e. germplasm, pedigree and phenotype) and sequence-based information has not been established. An example of how genetic information can be integrated into plant breeding programs to produce cultivars from molecular variation using bioinformatics and what crop scientists might want from bioinformatics have been previously discussed (Mayes et al., 2005). How to best utilize all relevant genomic information efficiently and comprehensively, and harnessing the power of informatics to support molecular breeding is a challenge to modern plant breeding.
Processing speed and cost of DNA markers have improved substantially in the past several years, resulting in a significant reduction in processing time and cost per data point. This is one of the research areas where most rapid development occurs. However, its current application as a breeding tool has not reached its potential fully. Marker approach involves a separate line of research activities, and almost in all cases, it requires substantial upfront support. In addition to the cost of genotyping and phenotyping, it requires lab facilities and bioinformatics personnel to analyze complex data. This vast mostly unexplored area poses current limitations, but it could also present a tremendous opportunity for both public researchers and private industry to tag pivotal genes, including their interactions that play important roles in grain production, and to subsequently protect their invention.

Common platform and supporting tools
An appropriate experimental design and data analysis are a critical component for successful application of molecular breeding. Various models of data flowchart and analytical tools to funnel DNA marker data into cultivar development have been proposed. However, they lack of simple-to-use guidelines to allow breeders to confidently select the appropriate design and analysis. Communications between genomics scientists, geneticists, bioinformaticians, and breeders are still limited hampering the development of truly integrated tools for applied molecular breeding design, integrated mapping, and MAS. Decision support tools for marker breeding that can model, simulate, and analyze most of the pre-existing genetic conditions will help breeders design and implement the efficient breeding scheme in term of cost and time using the optimum combination of MAS and phenotypic selection. Similarly important are decision support tools that include sample collection and depositing, retrieving, and tracking data, and also acquiring, collecting, processing, and mining databases.
For information-driven plant breeding, databases and supporting tools that allow an interchangeable flow of information through communicable platforms that required minimum maintenance and updates are critical. The use of universal language within different platforms will strengthen interaction among breeders, database curators, bioinformaticians, molecular biologists and tool developers. Interchangeable format and data content across all plant species are needed to develop a universal database. Current models, such as the one provided by Gene Ontology and Plant Ontology projects, offer a glimpse of future possibility in this area. An automatic ontological analysis has been used to develop biological interpretation of the data (Khatri et al., 2002). Currently, this approach becomes the standard for the secondary analysis of high-throughput experiments. A large number of tools have been developed for this purpose. Khatri and Draghici (2005) provided a review of detailed comparison for the 14 available tools using six different criteria; scope of the analysis, visualization capabilities, statistical model(s) used, correlation for multiple comparisons, reference microarray available, and installation issues and sources of annotation data. These analyses help researchers to select the most appropriate tool for a given type of analysis. Despite a few drawbacks in each tool associated with conceptual limitations of the current state-of-the-art in ontological analysis, this type of analysis has been generally adopted. These limitations are some of the challenges to overcome in order to create the next generation of secondary data analysis tools. Another major challenge is to construct a graphical presentation of systematic biological relationship that integrates gene, protein, metabolite, and phenotype data as suggested by Blanchard (2004). This will include an assembly of large-scale data sets into a more comprehensive presentation by minimizing high false positive rates and validating the existing models using probability and graph theory.

Added complexity in scope and time management
Crop development is a complex process, and molecular marker information further increases the complexity. To successfully apply molecular marker-assisted breeding, breeders have to structure their specific breeding methodologies to allow for the integration of empirical results from molecular marker analyses. All of molecular activities, including molecular analyses, establishment of genotypic-phenotypic associations, and molecular marker-based decision making, have to be completed in the same limited time frame in conventional breeding. They must be synchronized with seed planting preparation, progeny selection, yield trials, collecting phenotypic data, harvesting, data analyses, and use of offseason nurseries.
While some breeders have the access to computational infrastructure and statistical expertise needed to generate and analyze the gigabytes of genomic data, the majority will www.intechopen.com Progression of DNA Marker and the Next Generation of Crop Development 21 depend on the availability of smaller subsets of genomic data that can be analyzed using an MS Excel spreadsheet. In the rice SNP system, for example, one of the current major efforts is to develop low-resolution SNP assay (through Affymetrix's custom-designed SNP genotyping arrays and Illumina's custom-designed SNP oligonucleotide pools assays (OPAs), or other platforms developed by KBiosciences) to address the problem (McCouch et al., 2010). In addition to reduce computation complexity, breeders will eventually be able to request targeted SNP detection assays that can be tailored into their specific breeding purposes or selecting their population base at a fraction the current cost of re-sequencing, particularly when the bioinformatic requirements are taken into account.
Breeders utilize breeding information from many different sources to obtain a description of genetic background and phenotypic traits under specific growing environment. The depth and types of information needed by individual breeders will vary greatly. However, data that critical for individual breeders will include some basic information, such as germplasm information (pedigree, genealogy, genetic stock data, etc.), genotypic information (DNA markers, sequences, and expression information), phenotypic data and environmental information. In addition, historical data preserved timely in the repository system can be used to reanalyze hypotheses and guide new research for molecular marker breeding. To obtain a high quality of mapping, both genotyping and phenotyping have to be conducted effectively. While molecular detection systems are rapidly enhanced, methods of phenotyping have not been improved as fast. Dissection of agronomically important QTLs requires phenotyping under target environments in multiple test sites. Proper techniques to ensure the consistency of phenotyping over multiple growing environments will need to be established.

Unraveling genetic potential globally
Providing sufficient food for an increasing world population is a tremendous challenge to overcome. Finding ways to boost the yield potential of major grain crops beyond current productivity levels, therefore, is critically important. One of the keys to solving the problem is to increase the ability to find novel alleles that are not present among cultivated species Various studies show that wild species have a wider genetic diversity where critical alleles hidden or lost during the early domestication process and along the progression of modern breeding processes can be recovered. Extensive germplasm of various crop plants and their wild relatives are available in various places. In rice, for example, more than 102,547 accessions of Asian cultivated rice O. sativa, 1,651 accessions of African cultivated rice O. glaberrima and 4,508 accessions of wild ancestors are maintained in the International Rice Germplasm Collection (IRGC) at IRRI (McNally et al. 2006) in addition to an extensive rice germplasm collection in Japan, China, Taiwan, India, Korea, the USA, and many other countries. Relatives of rice species, such as Oryza rufipogon appear to have many new putative yield-related QTLs that can potentially be used to improve cultivated rice (Tan et al., 2007). Genome sequencing of wild species and map alignment are current ground breaking projects to provide a basic road to unravel the whole potential of wild species. The Oryza map Alignment Project (OMAP) is set to develop physical maps of 12 wild species to be aligned with the reference genome sequence of Nipponbare (Ammiraju et al., 2006;Wing et al., 2005;. Sequence data will provide direct evidence of evolutionary path of Oryza genus. However, the most important expected outcomes from this current endeavor are to find new genes and QTLs that can be used to improve grain production, levels of pest and disease tolerance, ability to tolerate stress and other less favorable growing environments. The ideas to unlock wild genetic variation to improve global grain production (McCouch et al., 2010;Fridman et al., 2004;Matsumoto et al., 2005) have, therefore, gained renewal interest from time to time. The precision of DNA markers to unravel the intercalating process of gene expression to determine the productivity of grain crop is needed to rediscover valuable alleles that can be funneled into the pipeline of cultivar development.

Practical utilization: Global source, local purpose
To survive in a very competitive market that demands high quality product, breeders have to assemble a series of genes that give rise to high yielding cultivars that have stable grain quality, disease resistance, optimum plant maturity and height, and are very adapted to target growing regions of a typically narrow niche of environments. These quality traits of industrial standards are critical for successful commercial production of crops in a modern era and often are the breeding priorities in current breeding programs. Long breeding selections have resulted in the formation of a specific matrix of complex QTLs that support quality traits required by the market. This matrix provides a skeleton for newer cultivars in grain crop breeding programs. Any efforts to improve current yield potential should, therefore, be built to correspondingly maintain or enhance the trait matrix. To stay competitive, breeders will be required to expand their crop to provide additional traits that are not currently available in their breeding populations. During the introgression of foreign traits into their breeding lines, all necessary matrix traits to produce high quality standards need to be maintained. Breeders have acquired detailed knowledge on the genetics underlying the matrix of these complex traits among individual breeding lines in the pipeline of cultivar development. Should molecular markers be employed in the breeding program, the same in-depth molecular knowledge must be acquired for the QTL matrix, target QTLs, and their individual breeding lines in their programs.
Incorporation of molecular marker-based selections into a conventional breeding program will require breeders to custom their molecular breeding schemes and tailor them directly into their specific breeding objectives. However, understanding molecular properties of the quality matrix requires tremendous investment and undoubtedly represents the current bottleneck as to why successful exploitation of available mapped QTLs into cultivar development remains limited at the present time. With the advancement in molecular techniques, such as high throughput SNP technology, developing a SNP chip to specifically guard the quality matrix will be possible. Once customized chips can be developed for individual breeding programs, any novel traits from the global source (different genetic backgrounds, inter or intra subspecies or wild-related ancestors from global populations) can potentially be incorporated into their breeding programs to add and/or improve specific quality or to boost yield without jeopardizing locally adapted standard qualities. Private companies have developed proprietary methodology that allows their breeders to combine their germplasm knowledge and breeding population objectives with molecular phenotypic trait association in order to develop genetic modeling for multiple markerassisted selections and obtain rapid increase in the frequency of favorable alleles associated with target traits within the breeding population (Eathington et al., 2007).