Rice Germplasm in Korea and Association Mapping

Rice is a traditional staple food crop in Korea and many other countries. Although the center of rice origin is still unclear, it is believed to be introduced from China to the Korean Peninsula in the early Bronze Age via one of two possible routes—across the West Sea or along the northeastern seashore from China according to Hammer (2005) and Vavilov (1935). Rice germplasm has evolved through several millennia of cultivation and selection by our farming ancestors. An important consequence of the domestication of both plants and animals is a reduction of genetic variability (Hancock, 2004). Maintaining biodiversity is an important worldwide problem and different countries have various policies intended to preserve biodiversity. Because conservation of biodiversity and ecosystems is closely linked to the quality of human life, the preservation and improvement of ecosystems are problematic for agriculture.


Introduction
Rice is a traditional staple food crop in Korea and many other countries. Although the center of rice origin is still unclear, it is believed to be introduced from China to the Korean Peninsula in the early Bronze Age via one of two possible routes-across the West Sea or along the northeastern seashore from China according to Hammer (2005) and Vavilov (1935). Rice germplasm has evolved through several millennia of cultivation and selection by our farming ancestors. An important consequence of the domestication of both plants and animals is a reduction of genetic variability (Hancock, 2004). Maintaining biodiversity is an important worldwide problem and different countries have various policies intended to preserve biodiversity. Because conservation of biodiversity and ecosystems is closely linked to the quality of human life, the preservation and improvement of ecosystems are problematic for agriculture.
Genetic diversity in a crop species is essential for sustained high productivity. Breeding efforts have been devoted to improving grain quality, yield potential, resistance to diseases and insect pests, and environmental stress tolerance. Progress in plant breeding requires a continuous supply of genes or gene-complexes. In this respect, the researcher is often handicapped by the limited availability of germplasm resources. The assembly of large varietal collections, systematic screening for desired traits and subsequent incorporation of the relevant genes into existing cultivars is imperative to meet these needs. The use of landrace varieties has increased in recent years. Wild rice accessions have contributed greatly to rice breeding as a source of resistance genes (e.g., Xa21, BPH14, BPH15) (Ronald et al., 1992;Song et al., 1995;Yang et al., 2004;Du et al., 2009;Hu et al., 2012). Much of the diversity in the rice gene pool is contained in gene banks around the world. Molecular biology has contributed significantly to an increased understanding of many aspects of plant biology by generating technologies and methods of analysis that provide new approaches or supplement classical methods of analysis. Plant genetic resource scientists and other researchers are increasingly aware of the potential benefits of applying new technologies to germplasm conservation and research.
The integration of genetic data with molecular biotechnology will help breeders produce new rice varieties with the desired traits and make the conservation of rice genetic resources more efficient. Because of newly developed methods for association mapping of genes or QTLs related to desired traits, many genome-wide association analyses have been conducted in rice and resulted in valuable genome-wide association maps to describe the genetic architecture of complex traits. However, further efforts are needed to obtain more genomic information to fill in the gaps of our knowledge and meet the needs and challenges of rice breeding. This chapter will focus on the status of rice germplasm preservation activities, research programs, and outcomes of association mapping in rice in Korea.

Research on rice germplasm in Korea
The Ministry of Foreign Affairs and Trade (2009) had outlined eight major environmental issues as current threats to Korea; global warming, desertification, wildlife extinction, rain forest reduction, acid rain, depletion of the ozone layer, marine pollution, and air pollution. The rate of climate change is faster in Korea than the global average, leading to a rapid reduction in national biodiversity. One hundred and ninety families comprising 4000 species of vascular plants and ferns occur in Korea (Lee and Yim, 1978). Approximately 3700 different kinds of flowering plants are estimated to occur naturally (Chung, 1957;Lee, 1980). Four hundred and seven different endemic taxa in six genera are distributed throughout Korea (Lee, 1982). However, some plant species are on the verge of extinction because of pollution and a wide range of developmental activities during the last 20 years in Korea (Ministry of Environment, 1994), highlighting the importance of conservation efforts . Conservation programs usually involve activities such as collection, characterization, evaluation, regeneration, documentation, and storage of each germplasm accession.
The National Biodiversity Strategy was implemented in 1997 to integrate and consolidate plans formulated by various ministries and government institutes, including Comprehensive Biological Resources Conservation Plans. The Rural Development Administration (RDA) Gene Bank is one of the institutions responsible for these plans. Rice research programs covering agronomic practices, physiology, post-harvest technology, grain quality evaluation, rice breeding and genetics, and biotechnology, are led by the National Institute of Crop Science (NICS) under the RDA. Other institutions affiliated with NICS carry out rice research programs to target specific problems in various regions of Korea. From 1980 to 1990, rice sheath blight (Acrocylindirum oryzae) was the most destructive disease affecting production from damaging approximately 555,000 hectares of rice paddy fields in Korea. Furthermore, rice pests including brown plant hopper, white-backed plant hopper, and small brown plant hopper attacked 586,000 hectares of rice nationwide during the same period (NASTI, 1996). A continuous cultivation of only five or six cultivars countrywide should be responsible to the extensive losses from the pests.
Rice season normally begins in mid-April and ends in mid-October in Korea. The lowest temperature in both April and October is 13°C. Because of the unprecedented yield loss due to cold damage in 1980 (damage to 80% of total rice hectarage and a yield reduction of 3.9 tons per hectare), cultivation of high-yielding "Tongil-type" rice cultivars declined rapidly, and only high-yielding japonica cultivars have been grown since 1990. In 2010, 20 mid-to latematuring japonica cultivars were grown on 891,493 hectares, accounting for 92.9% of the total rice production area (Kang and Kim, 2012). Large decrease of temperature also occurred in 1971 and 2003, causing damage to 17% and 20% of total rice hectarage, respectively. Preharvest sprouting may become a serious problem for rice production, as well as for other crops, because of the trend in recent years for frequent and unusually heavy rain at harvest time. Breeders are making efforts to address this problem.
Rice breeders see the development of genetically improved cultivars using modern breeding techniques as an efficient way to reduce the losses in rice production caused by biotic and abiotic stresses. Sequencing the rice genome for genotyping and developing marker-assisted selection (MAS) system have fast-tracked research efforts. In the past, most national programs gave a lower priority to collecting wild relatives of rice than to collecting rice cultivars. Wild rice resources are agronomically unattractive, relatively expensive to conserve, and difficult to use. However, wild rice germplasms are known to contain a broad array of useful genes (Hodgkin, 1991). The benefits for the landrace germplasms to be used in breeding new cultivars in response to climate and environment changes in Korea are resistances to diseases in order to maintain superior qualities suited to consumers' preferences. Plant germplasm resource activities in Korea are performed by The Basic Conservation Programme for Nature and Environment (1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003) under the Ministry of Environment (NASTI, 1996).
The RDA Gene Bank conserves 24,673 rice accessions, including Korean landraces and wild types. Many gene banks are having financial difficulty to maintain germplasm collections due to a rapid increase of accession number. These problems may restrict a full exploitation, evaluation, and utilization of these accessions, thus managing such collections presents major challenges (Holden, 1984). The concept of a core collection for resolving these problems has received increased attention over the last few years. Germplasm sampling methods include sequential, stratified, biased (for example, by ecology or country), and random sampling. An understanding of factors underlying the traits being sought will help reduce the time required for identification of useful genes. For very rare traits, such as some associated with resistance to virus infection, searching among wild Oryza species and O. glaberrima may be most appropriate. Efficient methods for evaluation of germplasm to identify genes for crop improvement will promote the use of conserved germplasm. Frankel and Brown (1984) proposed the concept of a core set of lines to resolve such problems. A desire core set should include the maximum genetic diversity in a crop species including its wild relatives with minimum repetition and provide a manageable set of accessions to gene bank managers, plant breeders, and research scientists. Such a core collection would become the focus of the search for desirable new characteristics, detailed evaluation, and development of new techniques. An initial set of 4406 rice accessions was selected based on ecological types and accession passport information, including their countries of origin. Using simple sequence repeat (SSR) genotype information, a final core set comprising the 166 conserved accessions currently used by the RDA was generated by a heuristic approach using the PowerCore software developed by Kim et al. (2007). Based on this resulted core set, some association mapping studies have been conducted and further researches are still being undertaken.

Association mapping in rice
Association mapping analyzes loci in diverse populations and associates them with both one another and with phenotypes. It is a powerful genetic mapping tool for crops and provides high-resolution, broad allele coverage, and cost-effective gene tagging for the evaluation of plant germplasm resources. Genetic mapping of QTL can be performed in two main ways (Ross-Ibarra et al., 2007): (1) Linkage-mapping as well as "gene tagging" using experimental populations (also referred to as "biparental" mapping populations) and (2) LD-mapping or "association mapping" using diverse lines from the natural populations or germplasm collections (Abdurakhmonov and Abdukarimov, 2008). LD mapping is based on identification of associations between phenotype and allele frequencies. The advantage of LD mapping for the breeder is that mapping and commercial variety development can be conducted simultaneously (Langridge and Chalmers, 2005). For phenotypes or traits that are governed by multiple genes or QTLs, diverse alleles or advantageous allele combinations should be mined by association mapping followed by gene-tagging efforts using biparental crosses.
The localization of alleles relies on creating a statistical association between markers and QTL alleles and on the efficacy of markers. For markers to be effective, they must be closely linked to the target locus and be able to detect polymorphisms in material likely to be used in a breeding program. Improvements in marker screening techniques have facilitated the tracking of genes (Subudhi et al., 2006). Isoenzyme and other protein-based marker systems had in wide long been used before DNA markers became popular (Langridge and Chalmers, 2005). Since then, a variety of DNA-based molecular markers have been developed, including restriction fragment length polymorphisms (RFLPs), random amplified polymorphic DNAs (RAPDs) (Williams et al., 1990), amplified fragment length polymorphisms (AFLPs) (Vos et al., 1995), SSRs (Litt and Luty, 1989), single-strand conformational polymorphisms (SSCPs), cleaved amplified polymorphic sequence (CAPS) markers (Koniecyzn and Asubel, 1993), sequence tagged sites (STSs) (Olson et al. 1989), sequence-characterized amplified region (SCAR) markers (Martin et al., 1991), and single nucleotide polymorphisms (SNPs) (Brookes, 1999) The next generation of genetic markers is based on SNPs, which provide an attractive tool for QTL mapping studies and marker-assisted selection in plant breeding programs (Mohler and Schwarz, 2005). SNP discovery is performed primarily in silico or using new sequencing approaches (Henry and Edwards, 2009). Large-scale SNP analysis is now possible in plants using a range of platforms. The increasing ease of sequencing and automated genotyping has made association mapping in plants a more attractive option by altering the conventional plant genome mapping method, which involves linkage analysis in a segregating population. This trend is likely to continue as the sequencing of genomes increases. Recently, genome-wide association studies (GWAS) with SNP variants have been conducted using new sequencing platforms (Table 1).

International rice association-mapping activity
Genome mapping of rice was first attempted using linkage analysis of appearance or phenotype (Nagao and Takahashi, 1963). Nowadays, improvement of the linkage map has been achieved using isozymes (Nakagahra, 1977) and  Table 1 in which most are conducted using SSR markers.
Whole-genome resequencing is a promising strategy to identify the relationship between sequence variation and normal or mutant phenotypes. High-throughput genome resequencing -if accurate -has the advantage of allowing researchers to identify the specific nucleotides associated with a given phenotype, and allowing the effective detection and analysis of genetic variations important for molecular breeding. An important application of NGS is the resequencing of targeted regions for the identification of mutant alleles, and we believe that mapping by sequencing will become a centerpiece in efforts to discover the genes responsible for QTLs. Generally speaking, the availability of a wide range of low-and high-multiplex single nucleotide polymorphism (SNP) assay methods (sequencing accuracy and depth of coverage relies on the experimental design) makes SNPs an ideal marker option for QTL mapping, association analysis, MAS, and the construction of high-density genetic maps for fine mapping and cloning of agronomically important genes (McCouch et al., 2010).
SNP discovery by resequencing whole-genome or subgenome of sample materials is often among the first use of a reference genome sequence. For inbreeding species such as rice, lines to be resequenced are normally purified through 1 or 2 generations of inbreeding (via single seed descent). After a DNA sample is resequenced using NGS technology, SNPs can be identified by comparing the sequenced genome with a reference genome like Nipponbare for japonica rice. For example, using information on the features of the B73, Gore et al. (2009) targeted the gene fraction of the maize genome for resequencing in the founder inbred lines of the nested association mapping population. Two data sets comprising 3.3 million SNPs were used to produce a first haplotype map ("HapMap") and to analyze the distribution of recombination and diversity along the maize chromosomes.
A suitable example is the construction of a comprehensive HapMap for rice that was used for the genome-wide associate study of 14 agronomic traits, such as heading date and tillering (Huang et al., 2010). The researchers made use of low-coverage (1-fold per rice line) sequence data across lines, for a combined coverage of ~508-fold, and detected 3.6 million SNPs which can explain ~36% of the phenotypic variance for 14 agronomic traits. This work provided a new approach to low-fold sequence coverage, which can be used to detect not only SNPs but also more complex polymorphisms, and partially overcome the need for deeper sequence coverage (Clark, 2010). Further study was performed with the similar strategy for 950 worldwide rice varieties by Huang et al. (2012) and thirty-two novel loci associated with flowering time and ten grain-related traits were identified. Additionally, 40 cultivated accessions selected from the major groups of rice and 10 from their wild progenitors (O. rufipogon and O. nivara) were resequenced to >15X raw data coverage (Xu et al., 2012). After mapping the sequence read back to an IRGSP reference genome, the authors investigated the genome-wide variation pattern in a comparative analysis. The data revealed examples of structural variation in genomes and included 6.5 million high-quality SNPs after excluding sites with missing data in any accession. Using these population and SNP data, the authors also identified thousands of new rice genes and tracked down those with a significantly lower diversity in cultivated, but not wild rice. These variants represent a valuable resource for those interested in improving rice cultivars.
Preferences in terms of the processing, cooking, and eating qualities of rice differ globally. Plant breeders are attempting to fulfill consumer demand for rice varieties with specific qualities. The major components of rice grain quality include appearance, milling, cooking, eating, and nutritional aspects. The chemical composition of rice grain is important because of its relationship with eating quality of rice. Amylose content is one of the most important traits that determine cooking quality, which is controlled by a major locus waxy (Wx) on Rice nutritional quality is another important factor for consumer acceptance. In developing countries where rice is the main food, its nutrient content makes a significant contribution to the intake of some essential nutrients. Interest in natural antioxidants in rice is growing due to their role in preventing oxidative stress-related diseases Although many QTL analyses and genetic mapping studies of grain quality have been conducted, association-mapping studies of biotic and abiotic traits in rice are few in number. The genes or QTLs related to these traits are complex. Genetic mapping, including association mapping and linkage mapping, are useful methods of identifying alleles for these traits. As shown in Table 1, most association-mapping studies focused on morphological and agronomic characteristics. Four studies were related to grain and eating quality and only one addressed disease resistance and aluminum tolerance. Biotic and abiotic stress-tolerance traits remain to be explored by association mapping.

Association mapping of rice in Korea
To Association mapping was conducted on this core set of lines over the past 2 years (as shown in Table 2). Zhao et al. (2012b) analyzed 130 accessions from the core set using 170 SSR markers for association analysis of physicochemical traits related to eating quality. Linkage disequilibrium (LD) patterns and distributions are of fundamental importance for genomewide mapping associations. The mean r 2 value for all intrachromosomal loci pairs was 0.0940. LD between linked markers decreased with distance. Marker-trait associations were investigated using the unified mixed-model approach considering both Q and kinship (K). In total, 101 marker-trait associations (P <0.05) were identified using 52 SSR markers covering 12 chromosomes (Fig. 2.). Although direct comparisons of the chromosomal locations of marker-trait associations with previously reported QTLs are difficult because different materials and mapping molecular markers were used, most marker-trait associations were located in regions containing QTLs associated with a given trait. Indeed, some were located in similar or proximal regions related to starch synthesis. The new markers related to eating quality will facilitate the understanding of QTLs and marker-assisted selection (Zhao et al., 2012b). Values of ΔK, with its modal value used to detect the true K of four groups (K = 4). For each K value, five independent runs (blue diamonds) were considered and averaged over the replicates (Zhao et al., 2012b).  Association analysis of candidate genes has been used to trace the origin of agronomically important traits. Lu et al. (2012a) used the rice core lines for association-mapping to investigate the relationship between sequence variations from parts of 10 SSRGs and the amylose content (AC) and rapid viscosity analysis (RVA) profiles. Eighty-six sequence variations were found in 10 sequenced amplicons including 79 SNPs, six insertion-deletions (indels), and one polymorphic SSR. Among them, 61 variations were exon-based, of which 41 should lead to amino acid changes. The association mapping results showed a sum of four significant associations between three phenotypic indices and three sequence variations. An ADPglucose pyrophosphorylase small subunit 1 (OsAGPS1) SNP (A to G) was significantly associated with increased AC (P <0.001, r 2 = 15.6%) while a 12-bp deletion of an AGPase large subunit 4 (OsAGPL4) ( Table 3) Table 3. Association between sequence variations and phenotype   (Fig. 4), which accounted for more than 40% of the total variation (Zhao et al., 2009). In our research group, association mapping of rice traits related to cold-stress tolerance during germination, preharvest sprouting resistance, salt tolerance, blast disease resistance, and grain physicochemical properties are undertaken using SSRs and SNP variants on advanced resequencing platforms.

Reference Numbers of lines and population type Number and types of markers used Traits
In conclusion, association mapping is a promising approach to overcoming the limitations of conventional linkage mapping in plant breeding. Recent research has demonstrated the significant potential of LD-based association mapping of physicochemical traits and other important agronomic traits in rice accessions using SSR/SNP markers. This type of mapping could be a useful alternative to linkage mapping for the detection of marker-trait associations, and lead to implementation of marker-assisted selection in rice breeding programs.

Genomics and GWAS in germplasm research
With the development of next-and third-generation sequencing technologies, the whole genomes of individual rice accessions can now be sequenced with less than $ 1000 (US dollar).
Also, new efficient genotyping technologies, such as RADs (Restriction Associated DNAs) (Baird et al., 2008) and GBS (Genotyping-by-Sequencing) allow the generation of genotyping data for up to 40,000 genes at low cost in few days.
Natural alleles and alleles obtained from artificially mutagenized populations provide an important resource for crop breeding. By using all available alleles and detailed phenotypic data from core sets of rice lines, new genes and useful traits can be identified. Molecular tags for useful traits developed using GWASs based on genotypic and phenotypic information can be used to track target traits during segregation of populations in rice breeding ( Figure 4).
To identify new alleles from a representative core set of rice lines and transfer them into elite lines, we finally selected 166 from ~25,000 accessions in the RDA Gene Bank. We completed whole-genome resequencing of 84 core accessions with 7x coverage in 2012. We plan to resequence the whole genomes of the remaining 82 core accessions in addition to 84 bred varieties from a validation set in 2013. We are currently undertaking the phenotyping of the core accessions for agronomic traits, and chemical composition for the GWAS analysis with the resequence information. We are also planning to improve the software algorithm for the association analysis to increase the ability to identify alleles from the core set of lines using whole genomic SNP or indel genotype data and phenotypic information. More precise characterization of rice traits that confer resistance to stress from climate change is required to screen useful alleles using GWASs. Using whole-genome genotype information, we are able to develop large numbers of molecular tags across 12 different rice linkage groups based on their contributions to specific phenotypes.

Strategy for identification of major and minor QTLs for molecular breeding
The core accessions are highly diverse with many traits useful for rice breeding. Upon selection of an accession with a desirable trait, bi-parental mapping populations will be developed using two japonica varieties (Shindongjinbyeo and Junambyeo) and one indica variety (Hanareumbyeo). Major QTLs will be surveyed with an F 8 RIL-segregating population using wholegenome resequencing of 96 samples for first mapping, and then, we can resequence this target region using the expanded 3000 to 5000 samples for fine mapping till the targeted gene can be cloned. We expect that all major QTLs will contribute more than 10% to target traits. To identify minor QTLs that contribute less than 10% to a target trait, BC 4 F 1 population will be first developed, and then, selfing will be done till BC 4 F 8 . The recurrent parent will be an elite line for the purposes of QTL mapping and for transferring target traits into the elite lines. Mapping of minor QTLs will be performed using a BC 4 F 8 segregating population (as shown in Fig. 5). Natural variation results from the expression of different alleles during evolution. As a result of the contributions to farmers over the past ~8000 years, many important traits have been accumulated in the natural germplasm collections currently maintained in seed banks. Whole genome resequencing allows efficient identification of unused alleles from conserved germplasm. We are at present developing a platform for allele mining in rice breeding systems using GWAS approaches and diverse germplasm accessions with the support of the Next-Generation BioGreen 21 Program (No.PJ009099) from Rural Development Administration, Republic of Korea. We believe our effort will facilitate the molecular breeding of rice. Figure 5. Strategies for identification of major and minor QTLs in rice from selected accessions carrying useful traits through GWAS. The major QTLs will be localized and tagged by molecular markers in the F 8 generation. Minor QTLs will be localized using a BC 4 F 8 population.