Exploring Statistical Tools in Measuring Genetic Diversity for Crop Improvement

Increase in global numerical population especially in developing nations has gradually led to food shortage and hence increase in poverty. Addressing and tackling the issue and causes of poverty in the developing nations is one major challenge to breeders (Fu and Somers 2009). The different theories of econometircs have identified the human and material resources traceable to poverty, but fail to identify the crop improvement techniques in addressing world food shortage (Baudoin and Mergeai 2001). Crop improvement techniques therefore remains a major concern to plant breeders (Akbar and Kamran, 2006; Aremu et al, 2007a). Several factors affect crop improvement for specific or general environment performance. Such factors include climate, weather, soil, edaphic and biological and more importantly crop genotype (Aremu, et al, 2007b). Crop genotypes are composed of different crop forms including inbred or pure lines hybrids, landraces, wildraces germplasm accessions, cultivars or varieties. These crop genotypes have wide and diverse origin and genetic background known as genetic diversity. Genetic diversity study is a major breakthrough in understanding intraspecie crop performance leading to crop improvement (Aremu, 2005). Knowledge of crop performance in genetic diverse population reveals the differences in the nature of genetic materials used.


Introduction
Increase in global numerical population especially in developing nations has gradually led to food shortage and hence increase in poverty. Addressing and tackling the issue and causes of poverty in the developing nations is one major challenge to breeders (Fu and Somers 2009). The different theories of econometircs have identified the human and material resources traceable to poverty, but fail to identify the crop improvement techniques in addressing world food shortage (Baudoin and Mergeai 2001). Crop improvement techniques therefore remains a major concern to plant breeders (Akbar and Kamran, 2006;Aremu et al, 2007a). Several factors affect crop improvement for specific or general environment performance. Such factors include climate, weather, soil, edaphic and biological and more importantly crop genotype (Aremu, et al, 2007b). Crop genotypes are composed of different crop forms including inbred or pure lines hybrids, landraces, wildraces germplasm accessions, cultivars or varieties. These crop genotypes have wide and diverse origin and genetic background known as genetic diversity. Genetic diversity study is a major breakthrough in understanding intraspecie crop performance leading to crop improvement (Aremu, 2005). Knowledge of crop performance in genetic diverse population reveals the differences in the nature of genetic materials used.
Genetic diversity studies therefore, is a step wise process through which existing variations in the nature of individual or group of individual crop genotypes are identified using specific statistical method or combination of methods (Christini et al. 2009;Warburton and Crossa 2000;Aremu, 2005;Weir 1996). It is expected that the identified variations would form a pattern of genetic relationship useable in grouping genotypes.
Several researchers including breeders have employed different data source and type from diverse crops in their methods to study genetic diversity. Such data source include morphological and agronomic, pedigree, proximate or biochemical and molecular data (Aremu, et al., 2007a in cowpea;Liu et al., 2000 in cotton;Mostafa et al., 2011 in wheat;Adewale et al., 2010 in African Yam bean;Christine et al., 2009 in bentgrass.
The choice of statistical method to be used is dependent on the achievable objectives laid out in the studies. This chapter reveals the underlying importance of genetic diversity and

Importance of genetic diversity studies
Study on genetic diversity is critical to success in plant breeding. It provides information about the quantum of genetic divergence and serves a platform for specific breeding objectives (Thompson et al, 1998). It identifies parental combinations exploitable to create segregating progenies with maximum genetic potential for further selection, as proven by Akoroda (1987), Weir, (1996, Liu et al.( 2000); Dje et al.(2000), (Aremu et al, 2007b). Genetic diversity exposes the genetic variability in diverse populations and provides justification for introgression and ideotype breeding programmes to enhance crop performance. Mostafa al et. (2011), postulated that genetic diversity studies provides the understanding of genetic relationships among populations and hence directs assigning lines to specific heterogeneous www.intechopen.com Exploring Statistical Tools in Measuring Genetic Diversity for Crop Improvement 341 groups useable in identification of parents and hence choice selection for hybridization. Choice of parent has been identified to be the first basic step in meaningful breeding programme (Akoroda 1987); (Aremu et al. 2007a); (Islam 2004), (Rahim et al, 2010). Furthermore, the choice of parent selection in diversity studies is valuable because it is a means of creating useful variations in subsequent progenies. ); Dje et al. (2000), discovered that the higher the genetic distance between parents, the higher the heterosis in the developed progenies. Hence the heterotic progenies can be further hybridized and selections based on transgressive segregation. Akbar and Kamran, (2006). exploited this parental selection technique in wheat breeding program through hybridization. Mostafa et al. (2011), investigated genetic distance among 36 winter wheat genotypes cultivated in different regions of Iran using principal component analysis and discovered five major groups in the genotypes to distantly related. Comprehensive and significant emphasis are made by researchers especially plant breeders on the analysis of genetic diversity in a number of field crops white and yellow yam, (Akoroda, 1987); cowpea, (Adewale and Aremu, 2010); African yam bean, (Baudoin and Mergeai 2001); Flax, (Mohammadi et al. 2010); wheat, (Mostafa et al. 2011) and several other crops.
The diversity studies on these crops at their respective primitive levels (Landrace, wildtype, accessions, lines etc) led to the development of their widely distributed cultivars and varieties with proven characteristics based on stability and adaptability of performance with consistent tolerance to adverse weather conditions and resistant to diseases around the world. Fu and Somers (2009) supported that the use of identified wheat parents resistant to environmental stress under different growing conditions has led to increased world wheat production. The early report of Mohammadi and Prasna (2003) revealed that appropriate parent selection for hybridization in maize using a definite diversity study technique, Bohn et al (1999), identified six groups of wheat land races in the Western Iran that can be grown in different geographical locations for improved yield. Martin et al., (2008) discovered 42 cultivars of bentgrass in the mancet city and that only diversity studies would identify reliable and definite cultivar(s) with varietal purity and ensure protection of breeder and consumer rights. Understanding the inter and intra specie genetic relationships as provided by diversity studies has proven to increases hybrid vigor and reduce or avoid re-selection within existing germsplasm. It is worthy of note that existing cultivar populations have narrow genetic bases, hence need for creating variability within and among cultivars using genetic diversity methods.

Genetic diversity measurement tools
Genetic diverse populations arising from pure lines, accessions, landraces, wild or weed races are analyzed using a number of methods. Such method can be single or in combination of two or more methods. Franco et al. (2001) stressed the need for careful considerations to be made when measuring genetic diversity within and between crop populations in research. Such considerations include: 1. Use of multivariate data collected from morphological or agronomic traits. Such data may effectively display discrete, continuous, binomial ordinal etc. variables. 2. Use of multiple data sets arising from morphological, biochemical and DNA-based collections. The use of such multiple data sets in diversity study helps to reveal the adequacy in terms of strength and constraints in the choice of each of the data sets. The use of multiple data pose some puzzles including can analysis and result interpretation be based on individual or combined data sets? And more worrisome is the puzzle on how to effectively combine the different data sets and still achieve meaningful result. To provide answers to these puzzles, Wrigley et al. (1982), studied phylogenetic relationships among triticeae species using individual and combined analysis of data sets consisting of morphological and DNA-based traits and discovered divergent results in the analysed individual and combined data. The discrepancies in the results may be attributed to the discrete nature of DNA-based data and the continuous variable nature of the morphological data. No wonder Hillis 1987; Chippindale and Wein (1994) suggested the assignment of specific numbers to both quantitative and qualitiative traits in morphological, biochemical and molecular data set. In view of this, Pedersen and Seberg (1998) advised that both individual and combined data sets can be analyzed in many possible and meaningful ways to draw conclusions on genetic divergence. In 1999 and 2001, Taba et al. and Franco et al., respectively utilized the modified Location Model (MLM) which combines all variables into one multinomial variable called "W" to classify maize accessions from the genetic resource centres of Latin America. Better still, this MLM can combine molecular and morphological data to classify data better than when individual data set is employed. Individual data from morphological, biochemical or molecular data set can be analyzed using one or a combination of techniques. These techniques shall be discussed. 3. Expected objective to be achieved. This dictates choice of statistical tool in measuring genetic distance and the level of clustering of the intragenic factors in use. Such objective(s) include to determine the quantum of variation and grouping such genotype based on genetic distance, identify action following parental selection. In essence, breeding focus determines applicable method in explaining the nature of genetic divergence.
Variations are recorded in the measurement of genetic diversity in genotype relationships based on genetic distances and grouping populations from individual genotypes such as accessions, lines, wild races etc. The recorded variations are primarily because of the differences in the nature of genetic materials. Therefore, the basis or genetic variance theories which identifies genotype relationships based on genetic distance estimating genetic diversity depends largely on statistical genetic variance theories which identifies genotype relationships based on genetic distance / variance. Nei, (1973), first defined Genetic distance as the difference between two entities that can be described by allelic variation. This definition was later in 1987, modified to "extent of gene differences among populations that are measured using numerical values. Betterstill, in 1998, Beaumont et al., provided a more comprehensive definition of genetic distance as any quantitative measure of genetic difference at either sequence or allele frequency level calculated between genotype individuals or populations.

The use of morphological data to measure genetic distance
The first early work of Anderson (1957), proposed the use of metrogliph and index-score to study the pattern of morphological variations in individual data set. In the early seventies (Singh and Chaudhary 1985) used this method to study morphological variation in green gram. This method uses a range of variations arising from trait such that extent of trait variation is determined by the length of rays on the glyph. The performance of a genotype is adjudged by the value of the index score of that genotype. The score value determine the length of ray which may be small, medium or long Akoroda (1987); Ariyo and Odulaja (1991) and Van Bueningen and Busch (1997), extensively explored the use of metroglyph and index-score to morphological variations in yellow yam, Okro and wild rye accessions respectively.
Similar to metroglyph and the score index is Euclidian Distance (ED) measurement. According to Nei (1987), Euclidian distance measures similarity between two genotypes, populations or individuals using using statistical measures where two individuals i and j, having observations on morphological traits (p) denoted by x 1 , x 2 , x 3 ,……x n and y 1 , y 2 ,……y n for i and j individuals respectively.
Metroglyph and index-score methods measures genetic distance by use of morphological traits. Euclidian distance measurements utilize both morphological and molecular based marker data sets. Smith et al. (1991), applied the following statistic to measure ED.
Where T 1 and T 2 are the values of the ith trait for 1 lines and 2 and  2 T(i) is the variance for the ith trait over all the lines used. Much later, Weir (1996) developed a formula for calculating genetic distance to be. d(I,j) = [(x 1 -y 1 ) 2 + (x 2 -y 2 ) 2 + …..(x p -y p ) 2 ] 1/2 where i and j is the ED between two individuals lines having morphological traits (p) x 1 , x 2 ……x p is the traits for i individuals and y 1 , y 2 ……x p is the traits for j individuals from here, the individual character distances are summed and then divided by the total number of characters scored in both individuals. ED measurement allows the use of both qualitative and quantitative data several workers identified genotype distances using ED. Van Bueningen and Busch (1997) in wheat, smith et al, 1987in sorghum and Ajmone -Marsan (1998 in maize.

The use of molecular data in measuring genetic distances
The advent and explorations in molecular genetics led to a better definition of Euclidean distance by Beaumont et al; to mean a quantitative measure of genetic difference calculated between individuals, populations or species at DNA sequence level or allele frequency level.
Various genetic distance measurements are proposed for analyzing DNA-based data for the purpose of genetic diversity studies. Powel et al. (1996), identified different DNA-based marker techniques to include Random Amplified Polymorphic DNA (RAPD), Amplified Fragment Length Polymorphism (AFLP), Restriction Fragment Length Polymorphic (RFLPs) and the most recent Simple Sequence Repeats (SSR) and Microsatellite (MT) of single nucleotide polymorphism (SNPs). The above nucleotide differences can be used effectively to run individual or combined data sets of morphological, biochemical or DNA based data.
For DNA based data, where the amplification products are equated to alleles, the allele frequencies can be calculated and the genetic distance between i and j individuals estimated as follows.
Where X ai is frequency of the allele a for individual I, and n is the number of alleles per loci; r is the constant based on the coefficient used. In its simple form, i.e r = 1, genetic distance can be calculated as: Where r = 2, d(i,j) is referred to as Rogers (1972) measure of distance (RD), where Where allele frequencies are to be calculated for some of the molecular markers, the data must first generate a binary matrix for statistical analysis. Binary data has been long and widely used before the advent of molecular marker data to measure genetic distance by Rogers (1972); Nei and Chesser (1983) coefficient and known as GD MR and GD NL respectively.
In the use of any given statistical formula to determine genetic diversity in molecular based data, one specific problem usually encountered is the failure of some genotypes to show amplification for some primer pairs. Robinson and Harris (1999) noted that lack of amplification may be due to "null alleles". Most often, it is difficult to ascribe lack of amplification to "null allele". It is therefore the reposed confidence of the researcher, that a "null allele" status of a genotype will not be considered as missing data during computation of genetic similarity-distance matrix so as to avoid gross error during result interpretation.
DNA based marker data have been successfully used to measure genetic distance in some crops (Pritchard et al. (2000) in pigeon pea; Beaumont et al. (1998) in wheat;Franco et al., (2001) in maize; Dje et al. (2000) in Sorghum.

Grouping techniques in measuring genetic diversity
Genetic relationship among and with breeding materials can be identified and classified using multivariate grouping methods. The use of established multivariate statistical algorithms is important in classifying breeding materials from germplasm, accessions, lines, and other races into distinct and variable groups depending on genotype performance. Such groups can be resistant to diseases, earliness in maturity, reduced canopy drought resistant etc. The widely used techniques irrespective of the data source (morphological, biochemical and molecular marker data) are cluster analysis, Principal Component Analysis (PCA), Principal Coordinate Analysis (PCOA) Canonical Correlation and Multidimensional Scaling (MDS).
Cluster analysis presents patterns of relationships between genotypes and hierarchical mutually exclusive grouping such that similar descriptions are mathematically gathered into same cluster (Hair et al. 1995); (Aremu 2005). Cluster analysis have five methods namely unweighted paired group method using centroids (UPGMA and UPGMC), Single Linkages (SLCA), Complete Linkage (CLCA) and Median Linkage (MLCA). UPGMA and UPAMC provide more accurate grouping information on breeding materials used in accordance with pedigrees and calculated results found most consistent with known heterotic groups than the other clusters (Aremu et al., (2007a).
Principal components, canonical and multidimentional analyses are used to derive a 2-or 3dimensonal scatter plot of individuals such that the geometrical distances among individual genotypes reflect the genetic distances among them. Wiley (1981), defined principal component as a reduced data form which clarify the relationship between breeding materials into interpretable fewer dimensions to form new variables. These new variables are visualized as different non correlating groups.
Principal components analysis first determines Eigen values which explain the amount of total variation displayed on the component axes. It is expected that the first 3 axes will explain a large sum of the variations captured by the genotypes. Cluster and principal component analysis can be jointly used to explain the variations in breeding materials in genetic diversity studies.

Conclusion
Genetic diversity studies is in no measure the first basic step in meaningful breeding programme and therefore require accurate and reliable means for estimation. Data sets sourced can morphological biochemical several workers successfully utilized various statistical tools in analysis diverse data sets and identified two major framework to really explain divergence in genotype performance. Genetic distance among and within individual data sets can be conveniently determined using specific tools while classificatory and cluster analysis require principal component and polymorphic sequence tools. Since each data set provide different molecular type of information, based marker data set is visualized to provide more reliable differentiate information on the genotypes. Analysis of data sets can be complex. Many software packages are available. There is still a need for a comprehensive and user-friendly software packages that would integrate different data set for analysis and generate reliable and useable information about genetic relationship. Equally important in genetic diversity studies is the need for a genetic resource centre. Studies should incorporate utilization of genetic diversity information in developing genetic resource centre accessible to breeders.

Acknowledgement
Many thanks to Ibirinde Olalekan for the secretariat assistance. I also appreciate Olayinka Olabode the Head of Department of Agronomy LAUTECH for the technical contributions given to this chapter.