Assessment and Utilization of the Genetic Diversity in Rice ( Orysa sativa L . )

The basis for raising crop production and improving crop quality is to breed new varieties. The key to breed new varieties are largely depended on the breakthrough of mining the crop germplasm resources. Therefore, the research and utilization of crop genetic diversity plays an important role on crop improvement in the future. Previous researches have indicated that genetic bottleneck effects existed in the procedure of crop domestication and modern breeding, i.e. the allele variation within wild species and landrace would be lost and result in the reduction of gene diversity during domestication and breeding (Tanksly et al., 1997). The narrow genetic basis would lead to cultivars without resistance to new pets and virus and tolerance to bad environment as well as producing the platform effect of yield. These lost alleles in modern cultivars could only trace back to their original landrace and wild species and be recovered. The original landrace are close to cultivars and possess high genetic diversity and many exotic genes, therewith provide useful germplasm resources for crop breeding.


Introduction
The basis for raising crop production and improving crop quality is to breed new varieties.
The key to breed new varieties are largely depended on the breakthrough of mining the crop germplasm resources. Therefore, the research and utilization of crop genetic diversity plays an important role on crop improvement in the future. Previous researches have indicated that genetic bottleneck effects existed in the procedure of crop domestication and modern breeding, i.e. the allele variation within wild species and landrace would be lost and result in the reduction of gene diversity during domestication and breeding (Tanksly et al., 1997). The narrow genetic basis would lead to cultivars without resistance to new pets and virus and tolerance to bad environment as well as producing the platform effect of yield. These lost alleles in modern cultivars could only trace back to their original landrace and wild species and be recovered. The original landrace are close to cultivars and possess high genetic diversity and many exotic genes, therewith provide useful germplasm resources for crop breeding.
Identification, uses and conservation for the genetic diversity within crop germplasm resources are of importance for their sustainable use in plant breeding. The current rapidly development of bioinformatics, genomics, and molecular biology as well as conventional breeding methods provides useful means to mine the desirable genes in the resources.
Rice (Oryza sativa L.) feeds more than 50% of the world's population and is one of the most important crops in the world. Rice genetic resource is the primary material for rice breeding and makes a concrete contribution to global wealth creation and food security. Therefore, understanding its valuable genetic diversity and using it in rice genetic improvement is of importance for raising rice yield and the resistance to biotic and abiotic stress as well as improving rice quality to secure global food supplies. Furthermore, as a model plant of cereal family, two rice genome sequence map have been generated (Goff et al., 2002;Yu et al., 2002) and great progress has been made in gene mining with omics technology. Such researches helps to make use of rice genetic resources however in turn requires to make further insights of rice genetic diversity.
China is well known as an origin center of cultivated rice, with abundant rice genetic resources. As early as 1920-1964, Ying Ting, An academician of Chinese academy of science, collected more than 7128 rice landrace from all over China as well as some main rice www.intechopen.com Genetic Diversity in Plants 88 cultivated countries. As far as we know, the collection is one of the earliest collections for rice germplasm resources and therefore we named it as Ting's collection (Lu et al., 2006). Therefore, this chapter aims to explore effective methods on mining the exotic genes within these novel rice germplasm resources.

Ting's rice germplasm collection
The Ting's rice germplasm collection consists of 7128 accessions, which was collected and conserved by Academician and Professor Ying Ting during 1920-1964 from 20 different provinces of China as well as from North Korea, Japan, Philippines, Brazil, Celebes, Java, Oceania, and Vietnam (Fig. 1). Most of them are rice landraces and possess high genetic diversity. Due to that it is one of the earliest systematically rice collections in China and covered most of the Chinese rice cultivated regions, it could serve as an representative for the genetic diversity of Chinese rice germplasm resources.
Most accessions were characterized for taxonomical, geographical, morphological and agronomical descriptors, recorded by the previous laboratory of rice ecology of Chinese academy of agricultural sciences, South China Agricultural College and Guangdong academy of agricultural sciences, China (1961China ( -1965. These recorded traits include 20 unordered qualitative traits, i.e. origin of variety, indica vs. japonica, paddy vs. upland, waxy vs. non-waxy, grain shape, rice color, grain quality, leaf color, leaf margin color, leaf cushion color, auricle color, inner sheath color, outer sheath color, stem color, leaf-green color, stigma color, glume-tip color, sterile lemma color and glume color; 14 ordered qualitative traits, i.e. early-or late-season, type of maturity, shattering habit, awn, awn length, leaf face pubescence, leaf base pubescence, flag-leaf angle, erect vs. bending leaf, compact vs. loose stem, panicle shape, compact vs. loose rachis-braches, sparse vs. dense glume hair, compact vs. loose glume hair; and 15 quantitative traits, i.e. culm length, culm size, thickness of culm wall, the second internode length, number of panicles per plant, panicle length, panicle size, number of seeds per panicle, grain length, grain length/width ratio, grain size, flag leaf length, flag leaf width, length of elongated uppermost internode, grouth duration. These data provide a good basis for studying their phenotypic genetic diversity as well as core collection construction based on the phenotypes.

Genetic diversity of phenotypes of Chinese rice germplasm resources
About 6500 accessions of rice germplasm resources from the Ting's collection with well passport data were selected and studied for their genetic diversity of phenotypes. The origin, type and distribution of these accessions are listed in Table 1

Genome-wide distribution of genetic diversity assessed with SSR markers
A subset containing 150 accessions were taken from the whole collection (described below) and were genotyped with 274 genome-wide distributed SSR markers. Gene diversity for the varieties in the subset is 0.544. Among them, Indica rice shows a higher gene diversity (0.484) than that of Japonica rice (0.454). Similarly, non-waxy rice shows a higher gene diversity (0.540) than that of waxy rice (0.515). However, early-seasonal rice shows a higher gene diversity (0.546) than that of late-seasonal rice (0.510) in our case. Cultivated rice has been intensively selected during its domestication and breeding. Consequently, the genomic regions controlling traits of economic importance are expected to be shaped by this selection. Therefore, characterizing the genome-wide distribution of genetic diversity of cultivated rice germplasm which has been selected for different traits, such as waxy vs. non waxy might help to identify the genes controlling these traits. To do so, as one example, gene diversity was calculated for the waxy rice as well as non-waxy rice for each marker separately across the genome. Similarly, a measurement for genetic distance, modified Roger's distance (MRD) between waxy and non-waxy rice was calculated on an individual marker basis.
Our results indicated that gene diversity for waxy and non-waxy rice varied across the genome (Fig.3). A different degree of divergence (as measured by MRD) between these two germplasm types was observed across the genome (Fig.4).
The unequal distribution of genetic diversity across the genome could be explained by the selection history of the different genome regions. Therewith, the genome-wide distribution maps of genetic diversity might be a first step to identify the target genes or regions selected during breeding history. For example, genes related to waxy and non-waxy rice might be present in the most divergent genomic regions between these two germplasm types. Common genes under selection in the breeding program of the both germplasm types (e.g. disease resistant genes) might be present in the genomic regions showing the same level of gene diversity and low MRD.

Constructing a core collection to make use of genetic diversity
A large number of accessions in the germplasm collection (7128 accessions in our case) makes it difficult to choose the most promising ones for utilization. One feasible method is the development of core collections. A core collection is intended to contain, with a minimum repetitiveness, the maximum genetic diversity of a crop species and its wild relatives (Brown, 1989a(Brown, , 1989bFrankel and Brown, 1984). The development of a core collection could enhance the utilization of germplasm collections in crop improvement programs and simplify their management. Selection of an appropriate sampling strategy is an important prerequisite to construct a core collection with appropriate size in order to adequately represent the genetic spectrum and maximally capture the genetic diversity in available crop collections. Our studies were aimed to evaluate how sample size, clustering methods, sampling methods, and different data types affected the construction of core collection and tried to find out an optimal strategy concerning the above factors for core collection construction.
By using three sampling strategies, three kinds of trait data, eight hierarchical clustering methods, and 15 kinds of different sampling proportions were applied to choose the optimal constructing strategies. Analysis of variance (ANOVA) and multiple comparisons were applied to compare different strategies. In order to choose the optimal constructing strategies, 12 evaluated parameters were applied to evaluate the validity of sampling.
The ANOVA analysis showed significant difference for different clustering methods, data types, sample size and sampling methods (Table 2). Furthermore, there were significant interaction effects between these factors except clustering method and sampling method, sampling size and sampling method. The results indicated that these factors as well as their interaction would affect the construction strategy and must be considered carefully.
For different sampling methods, preferred sampling plus multiple clustering and sampling on the degree of variation is better than preferred sampling plus multiple clustering and random sampling, and the completely random sampling is the worst; For the eight clustering methods, clustering analysis with shortest distances has the best of genetic diversity index, average Shano-Weaver index, phenotype retained percentage, and variance of phenotypic frequency; For the three different data types (qualitative trait data, quantitative trait data, intergrated qualitative and quantitative trait data), the core collections constructed by integrated qualitative and quantitative trait data retain the greatest genetic diversity and is the best one. For the sampling rate, the sampling rate of 3.4% ∼ 24% is sufficient to retain the greatest genetic diversity of the initial population (Table 4-6).
Finally, a core collection was constructed by using preferred sampling plus multiple clustering and sampling on the degree of variation, clustering analysis with shortest distances, and based on the integrated qualitative and quantitative data. This core collection contains 150 accessions out of 2262 original collection with full recorded data from Ting's collection, accounting for 6.6% of the initial collection.     Table 6. Multiple comparison for different sample size** populations by crossing between parents (e.g., F 2 , Double haploid, Backcross population) and linkage mapping would be done in these segregation populations. The accuracy of QTL mapping is dependent largely on selected but limited parents and only two or a few alleles from the parents were detected. Moreover, abundant genetic variation stored in germplasm have not been developed and utilized due to lack of appropriate statistical methods. Provided using conventional QTL linkage mapping method for mining the abundant genetic variations in a large germplasm resources population, it is necessary to make diallel crossing with all studied accessions, which is hard to develop such mapping population and would take much more time, cost, space and analysis.
An alternative, association mapping based on linkage disequilibrium (LD) analysis might be an effective way to identify the function of the gene (or targeted high-resolution QTL), which has been successfully applied in human genetics to detect QTL coding for simple as well as complex diseases (Corder et al., 1994;Kerem et al., 1989). This method uses the LD between DNA polymorphisms and genes underlying traits. LD refers to the non-random combination among different genetic markers. The main mechanism for LD existing in a population is genetic linkage among different loci. Therefore, it is possible to detect QTLs by identifying LD between markers loci and potential QTLs. Through detecting abundant genetic markers loci locating in genome or those nearby candidate genes, the loci which link tightly with QTLs and show correlated to QTLs can be found. The application of association mapping to plant breeding appears to be a promising approach to overcome the limitations of conventional linkage mapping (Kraakman et al., 2004).
Furthermore, choice of an appropriate germplasm to maximize the genetic diversity and the number of historical recombinations and mutation events (and thus reduce LD) within and around the gene of interest is critical for the success of association analysis (Yan et al., 2011). As described above, core collections are the core subset of the original collections with minimum samples while having the maximum genetic variability contained within the gene pool. Therefore, association mapping with a core collection population helps to catch as more phenotypic variation as possible and would make use of both the advantages of association mapping and core collection, thus could be an effective way to mine and utilize the abundant genetic diversity in the crop germplasm resources.

Population structure and LD pattern
Population structure is an important component in association mapping analysis because it can reduce both type I and II errors between molecular markers and traits of interest in an inbreeding specie. Moreover, low level of LD could lead to impractical whole-genome scanning because of the excessive number of markers required. Furthermore, the resolution of association studies in a test sample depends on the structure of LD across the genome. Therefore, information about the population structure and extent of LD within the population is of fundamental importance for association mapping.
The rice core collection consisting of 150 varieties were genotyped with 274 SSR markers. Based on the genotyping data, STRUCTURE software was run to detect the number of subgroups within the core collection population and assign the varieties into different subgroups with the membership probability of 0.80 as a threshold. To compare and confirm the STRUCTURE subgroups, a additional principal component analysis was done. STRUCTURE indicated that the entire population could be divided into two subgroups (i.e. SG 1 and SG 2) (Fig. 5). With the membership probabilities of ≧ 0.80, 111 varieties were assigned to SG 1, 21 varieties were assigned to SG 2 and 18 varieties were retained to the admixed group (AD) (Fig. 5). Principal component analysis confirmed the population structure, i.e. the varieties from SG 1 and SG 2 located in two distinct clusters, and those from AD located between the two subgroups (Fig. 5). The varieties in SG 1 are mainly indica rice, and those in SG 2 japonica rice, whereas those in AD intermediate (indica-or japonica-inclined) rice. Furthermore, the varieties from the same cultivated zone were clustered closely.
LD measured as squared correlation of allele frequencies (r 2 ) between loci pairs in the core collection and different germplasm types were calculated (Table 7). The average r 2 between linked loci (the loci at the same chromosome) varied between 0.0188 and 0.1. Using the 95% quantile of r 2 between unlinked loci pairs as a threshold, 6.23% linked loci pairs were in significant LD. For different germplasm types (indica, japonica, early-seasonal, late-seasonal, waxy, non-waxy rice), the percentage of loci pairs in LD varied between 5.33 and 6.36%. LD (r 2 ) against genetic map distance (cM) between linked loci pairs was plotted and a nonlinear regression of r 2 vs. genetic map distance according to Heuertz et al., (2006) was performed (Fig. 6). The LD decays against the genetic distance, which indicated the linkage might be the main reason for the causes of LD. The LD decays to the threshold, i.e. the 95% quantile of r 2 between unlinked loci pairs, at 1.03 cM in the entire collection (Table 7). The cut-off decay Fig. 5. Principal component analysis for the rice core collection combined with STRUCTURE subgroup assignment. PC 1 and PC 2 refer to the first and second principal components, respectively. The numbers in parentheses refer to the proportion of variance explained by the principal components. Symbols indicate different type of rice, and colors indicate different subgroups from STRUCTURE software. FJ-Foreign japonica, IG-glutinous Indica, Jearly seasonal Japonica, J II-late seasonal Japonica, JG-glutinous Japonica, P-early seasonal Indica from Pearl river region or south China, P II-late seasonal Indica from Pearl river region, R-early seasonal Red grain rice, R II-late seasonal Red grain rice, Y-early seasonal Indica from Yangtze River region, Y II-late seasonal Indica from Yangtze River region, and N-Unknown origin.  Table 7. Linkage disequilibrium (measured as R 2 value) for linked loci, percentage of linked loci pairs in LD, and the cut-off decay distance for the core collection and different germplasm types.
distances for indica, japonica, early-seasonal, late-seasonal, waxy, non-waxy rice were 0.89, 1.10, 1.00, 1.04, 0.87, and 1.01cM, which were about 200-500kb physical distance. The results indicated that choice of the core collection could maximize the number of historical recombinations and mutation events and thus reduce LD within and around the gene of interest which is critical for the success of association analysis (Yan et al., 2011). Such short Fig. 6. Plot of linkage disequilibrium measured as squared correlation of allele frequencies (r 2 ) against genetic map distance (cM) between linked loci pairs in the core collection. The red line is the nonlinear regression trend line of r 2 vs. genetic map distance. The dashed line indicates the 95% quantile of r 2 between unlinked loci pairs.
LD decay distance suggested that fine mapping with a core collection for desirable genes could be possible. However, due to low percentage of linked loci pairs in LD and the quick decay of LD, in turn it indicated that the density of markers for genome-wide association mapping should be greatly increased as compared to our study. Considered for the LD decay distance in the core collection and 1700cM map distance of rice genome, it might at least in theoretically require more than 1700 markers for a genome-wide association mapping with such a core collection. If higher power is needed, the number of required markers could be even more.

Association mapping
Mining the elite genes within rice germplasm is of importance to the improvement of cultivated rice. Therefore, genome-wide association mapping was applied with the rice core collection using 274 SSR markers.
All of the 150 rice varieties were cultivated at the farm of South China Agricultural University, Guangzhou (23°16N, 113°8E), during the late season (July-November) for two consecutive years (2008 and 2009). The yield related traits (such as grain weight, filled grains, tillers per plants, etc.) were measured for both years. As for an example, the trait yield per panicle (gram) was furthered used for association mapping.
The software STRUCTURE was applied to infer historical lineages that show clusters of similar genotypes and get the Q matrices (Pritchard et al., 2000). Kinship matrix (K) was calculated by software SPAGeDi (Hardy and Vekemans 2002). The quantile-quantile plots of estimated -log10(p) were displayed using the observed p values from marker-trait associations and the expected p values assuming that no associations happened between markers and any trait in the software SAS. Using the TASSEL software and the mixed linear regression model (MLM), association test was performed for the yield trait, incorporating K and Q matrices.
The QQ plot of observed vs. expected p values (Fig. 7) indicated that MLM model incorporating K and Q matrix was suitable for the association analysis for the yield trait. A total of 17 markers in 2008 and 15 markers in 2009 were detected to significantly (P < 0.05) be associated with the yield traits (Table 8 and 9). 12 marker-phenotype associations were confirmed by previous researches (either using linkage mapping or association mapping) for 2008 year's results. And it was 7 for the 2009 year's results. Moreover, two markerphenotype associations were located in the similar position (RM471 with RM218, PSM188 with RM235) for both years. The genetic variants explained by the markers varied between 3.49% and 24.86% in 2008, while it was between 3.02% and 13.87% in 2009. The genetic variants explained by the marker RM346 and PSM336 were more than 15%. It is worth to note that less common marker-phenotype associations were detected in both years, which indicated the yield trait might be easily influenced by the environment. Such problem might be overcome by using the yield data for multiple locations and years. The markerphenotype associations could be further used in rice breeding by marker-assisted selection.

Discussion and prospect
Rice feeds more than 50% of the world's population and is one of the most important crops in the world. Rice germplasm resource is the primary material for rice breeding and makes a concrete contribution to global wealth creation and food security. Therefore, understanding its valuable genetic diversity and using it in rice genetic improvement is of importance for raising rice yield and the resistance to biotic and abiotic stress as well as improving rice quality to secure global food supplies.
To mine the wide genetic diversity in plant germplasm populations, identification of phenotypic traits might the first and an important step. Besides the agronomic traits, physiological traits, stress-related traits, quality traits, resistant to virus and pets traits, etc. should be furthered studied in details. Based on the full evaluation of phenotypes, a dynamic core collection could be constructed either on a specific target trait or on all the traits so that the core collection could retain as much as genetic diversity with the minimum accessions. The core collection could be furthered studied with high density markers as well as exact measurement of its phenotype.
Association mapping has become a promising approach to mine the elite genes within germplasm population compared to traditional linkage mapping. Association mapping based on a core collection would help to catch as more phenotypic variation as possible. Compared to a natural population or a breeding population with narrow genetic basis, the LD level in a core collection might be low due to its diverse origin. Therefore, more markers might be needed for genome-wide association mapping. However, due to the quick LD decay, fine mapping might be possible with a core collection. To perform a precisely association mapping, multiple replications either locations or years for the phenotypic identification, exact measurement of the population structure and the kinship should be considered. Furthermore, though a rapid progress has been made in genotyping, a quick, automated, economic genotyping technology (such as SNP array) for a large number of germplasm resources are desirable for association mapping with germplasm resources population. How to effectively combine the linkage and association mapping in plants (such as the nested association mapping in maize, NAM) might be another question which should be concerned. Due to such associations could be further applied in rice breeding with molecular assisted selection, it provide a bright future to make use of the elite genes in the diverse germplasm resources.
A strategy is proposed for exploring and utilization of the wide genetic diversity in plant germplasm populations, i.e. firstly, evaluation of the genetic diversity for germplasm populations at phenotypic and genotypic level; secondly, constructing core collection to achieve the maximum diversity with minimum accessions; thirdly, combining linkage mapping and association mapping to map desire QTL in a large scale and with high resolution; fourthly, developing near isogenic lines to verify and fine map QTLs; finally, cloning desirable genes and make use of them in cultivated plant breeding.