The Sensitiveness of Expected Heterozygosity and Allelic Richness Estimates for Analyzing Population Genetic Diversity

Genetic diversity comprises the total of genetic variability contained in a population and it represents the fundamental component of changes since it determines the microevolutionary potential of populations. There are several measures for quantifying the genetic diversity, most notably measures based on heterozygosity and measures based on allelic richness, i.e. the expected number of alleles in populations of same size. These measures differ in their theoretical background and, in consequence, they differ in their ecological and evolutionary interpretations. Therefore, in the present chapter these measures of genetic diversity were jointly analyzed, highlighting the changes expected as consequence of gene flow and genetic drift. To develop this analysis, computational simulations of extreme scenarios combining changes in the levels of gene flow and population size were performed.


Introduction
Genetic diversity comprises the total of genetic variability contained in a population and it represents the row material for evolutionary changes since it determines the microevolutionary potential of populations.
The most popular measure of genetic variation is the average heterozygosity expected in Hardy-Weinberg equilibrium. Nei [1] called this measure as gene diversity index, and defined it as either the average proportion of heterozygotes per locus in a randomly mating population or the probability that two alleles randomly and independently selected from a gene pool will represent different alleles. Expected heterozygosity at n loci within a population is calculated, as: Being p i the allele frequency. Since this index has been formulated entirely in terms of alleles and genotypic frequencies, its treatment is biologically the most direct [2]. Expected heterozygosity can be applied to any population of all organisms (sexual or asexual, diploid or non-diploid) independently of the number of alleles at a given locus or the pattern of evolutionary forces [1].
The total number of alleles at a locus has also been used as a measure of genetic variation and is an important measure of the long-term evolutionary potential of populations [3]. The major drawback of the number of alleles is that, unlike heterozygosity, it is highly dependent on sample size. Therefore, samples sizes must be equal in order to obtain meaningful comparisons between samples because of the presence of many alleles at low frequencies in natural populations. In this way, the allelic richness estimator (r) can avoid this problem owing to this estimator represents a measure of allelic diversity that takes into account the sample size [4]. By means of rarefaction method, the r estimator calculates the expected number of alleles at a locus for a fixed sample size, considering generally the smallest sample size in a series of sampled populations [5].

Loss of genetic diversity in reduced sized populations
The starting question for analyzing the effect of reduced sized populations on genetic diversity levels is how population size (N) influence on the allele and genotype frequencies. In case that Hardy-Weinberg principle assumption of infinite population size being violated, genetic drift will occur in populations. Genetic drift is a stochastic sampling process that determines what alleles will constitute the gene pool in the next generation. Fragmentation and isolation due to habitat loss and landscape modification can reduce the population size of many species of plants and animals throughout the world hence understand genetic drift and its effects is extremely important for biodiversity conservation [3].
The implementation of molecular biology techniques for differentiation of individuals directly at DNA level allows inferring genetic diversity parameters in real populations even these parameters were defined prior to the development of DNAbased molecular markers. In addition, technological development of capillary electrophoresis has improved the resolution power for allele identification and advances in computer power has allowed the analysis of a huge number of highly polymorphic loci simultaneously in a simply and quickly manner.

Molecular markers as workhorses for genetic diversity studies
A molecular marker is known as any specific DNA fragment that may or may not correspond to coding regions of the genome [6] and is representative of differences at the genomic level [7]. In case that a molecular marker shows segregation according to the Mendelian laws of inheritance, it can also be defined as a genetic marker and it provides genetic information [6]. Molecular markers offer advantages over conventional alternatives based on phenotype, since contrary to morphological data, molecular data are stable and detectable in all tissues without being related to the development, differentiation, growth, or defense state of the cell and they are not influenced by environmental effects [7,8].
Although there are several type of molecular markers the ideal genetic marker must be reliably measurable, exhibit highly variable loci, be codominant, and be densely distributed throughout the genome. The microsatellite markers also called Simple Sequence Repeat (SSRs) meet all these requirements [9]. SSRs are monotonous repeats of short nucleotide motifs of 1 to 6 base pairs (e.g., cgtcgtcgtcgtcgt, which can be represented by (cgt) n where n = 5). These repetitive elements can be found interspersed in the three eukaryotic genomes: nucleus (SSRs), mitochondria (mtSSRs) and chloroplasts (cpSSRs) [10]. The different SSRs alleles are mainly generated through simple repeat addition and subtraction mechanisms that occur 3 The Sensitiveness of Expected Heterozygosity and Allelic Richness Estimates for Analyzing… DOI: http://dx.doi.org /10.5772/intechopen.95585 with equal probability [11], and they are rarely found in coding regions [9]. SSRs are informative and practical markers because of they provide information about the amount and distribution of genetic diversity and the processes that determine the genetic structure and variation within and between natural populations [12]. Regarding methodological concerns, they present high stability with high intra-and inter-laboratory repeatability and they can be implemented in low complexity laboratories using external sequencing services. A limitation for SSRs implementation is that the sequence of repetitive flanking region is required to the development of specific primers although the cross transference of primers between closely related species is usually successful. SSRs have become the most widely used DNA marker in population genetics for genome mapping, molecular ecology, and conservation studies [3]. Despite the fact that massive sequencing methods to identify single nucleotide polymorphisms (SNPs) have gained prominence, microsatellites continue to be widely used tool because the analysis of generated data is simple and easily comparable with previous studies.

Simulations as a tool for predicting what is expected under certain conditions
Simulations help to recreate the stochastic process that accompanies the transmission of genes from parents to offspring because they recreate the movement of alleles under a model with same conditions several times. In addition, using different model conditions can help to disentangle sampling effects and scale dependencies, as well as historical influences of gene flow.
Any model (analytical, simulation, and otherwise) makes simplifying assumptions, excepting that it be "an entire reconstruction of the actual system-whereupon it ceases to be a model" [13].
The focus of this chapter is define the simplest model that show the effects of population size and gene flow on contemporary levels of genetic diversity, attending to the influence that multiplicity and abundance play on the classic genetic diversity estimators.

Simulations
In order to test the effect of population size and gene flow on the magnitude of genetic diversity parameters simulated genetic data were obtained using IBDsim program [14]. This program simulates genetic data under isolation by distance model using a backward simulation strategy at population level. Stepping Stone Model was considered which assumes discrete populations, discrete number of generations, genetic drift within each population, and migration between adjacent or spatially proximal population [15][16][17] being m the total dispersal rate in one dimension [18]. Four different scenarios were simulated considering a population composed by a square grid of 6 x 6 subpopulations. Those scenarios combine two subpopulation sizes (n): 100 or 20 diploid individuals and two migration rates (m): 0.5 or 0.005, respectively ( Table 1). The four combinations of n and m allowed to obtain scenarios that show expected genetic diversity with low or high levels of gene flow in population of small or large populations. Scenarios A-C and A-D allowed to evaluate the consequences of high or low levels of gene flow on the diversity parameters in populations of high size, respectively while scenarios B-C and B-D allowed to evaluate the consequences of high or low levels of gene flow on diversity parameters in populations of small size, respectively. Each data set was composed by 180 diploid individuals sampled from nine subpopulations. To avoid edge effects, a two-dimensional lattice was represented in a torus [18]. At grid edges, we used 'absorbing' boundaries in IBDSim whereby 'the probability mass of going outside the lattice is equally shared on all movements inside the lattice' [19]. The total simulated population was kept constant, but samples were taken from a smaller area of 3 x 3 subpopulations with 20 individuals per node. This sampling strategy was implemented in order to restrict the sampling design to a relatively small geographical area in order to work at a local geographical scale [19]. Each individual was characterized by a multilocus genotype defined by ten nuclear microsatellite loci of a two base pair repeated motif with a mutation rate (μ) of 10 −3 with two to 20 alleles per locus. From each scenario, 10 data sets were simulated.

Analysis of simulated data
Expected heterozygosity (He) was estimated using Nei's gene diversity index (1) [1] and allelic richness (r) was estimated using a rarefaction method. Both estimators were calculated for each subpopulation (nine in each data set) under each scenario (four) and for each repetition (10 in each scenario) obtaining as result 360 estimations of each genetic diversity measures. These estimations were developed using FSTAT software [20]. Means of He and r were estimated for each scenario. In order to determine if differences between means were statistically significant a standard t-test of means was implemented. Differences between means was considered statistically significant if the chance occurrence of such statistic was 5 percent or less (p < 0.05). This test was implemented using Microsoft Excel software.
In addition, the spread and skew of both estimated parameters in all simulations by each scenario was shown using box and whisker plots that display a five-number summary: minimum, maximum, median, upper and lower quartiles. The central rectangle spans the first quartile to the third quartile, or the interquartile range (IQR). A segment inside the rectangle shows the median while whisker to the left and to the right show the locations of the minimum and maximum. These estimations were calculated using Microsoft Excel software.

Results
Combination of n and m allowed analyze the effect of population size and genetic isolation among population on genetic diversity estimators based on all differences between scenarios parameters estimations were statistically significant ( Table 2). Scenarios A-C and A-D which consider large population size the allelic richness and the expected heterozygosity were higher than scenarios B-C and B-D which consider small population size (Figure 1). However, allelic richness showed lower values than heterozygosity in smaller populations comparing with large    (Table 3). Furthermore, the reduction was higher for r than the reduction for He between scenarios considering large population size with differences in migration rates (A-C vs. A-D). However, the reduction was higher for He than the reduction for r between scenarios considering small population size with differences in migration rates (B-C vs. B-D) ( Table 4 and Figure 3).

Discussion
Genetic diversity is a pre requisite for population adaptation to environmental changes [12]. Large populations of naturally outbreeding species usually have extensive genetic diversity, but genetic diversity is usually reduced in populations and species of conservation concern [12]. Theoretical analyses based on simulations give information for understanding empirical results.   The total allele number by locus is a complementary measure of genetic diversity because it is more sensitive to loss of genetic variation as consequence of small population size than heterozygosity. In this way, r becomes in an important measure   Table 3.  for long-term evolutionary population potential [3]. We will represent this statement using a hypothetical situation: population A (n = 100) and population B (n = 10) (Figure 4). There, population B is a random sample from population A. Population B shows three out of eight alleles from population A because of the reduction in population size, which cause that only alleles present in a high frequency remain in the small population. It means that by chance the more frequent alleles have a highest probability to being contained in the gene pool of small population while the rare alleles shows low frequency and as consequence they have high probability to be lost. In this way, the genetic drift is operating and as consequence of this microevolutionary process, not all alleles of a population will be present in the next generation producing a sampling error. As results of this sampling error, the change in the allelic frequencies is at random and the action of genetic drift does not have pre-established direction. However, in the analyzed example (Figure 4) the estimated value of He changes from 0.719 to 0.620 as consequence of 10 times reduction of population size. This change could indicate that He is less sensitive to rare allele lost as consequence of population size reduction. We can explain it by means of other hypothetical situation: We consider four pairs of small populations that contain between eight and 10 alleles (Figure 5). At left side of  Hence, the estimation of He is highly dependent on allele frequencies and its value will be determined in a greater extent by the presence of alleles at high frequency which usually show high probability to be proportionally maintained when population reduce its size. The effects of changes in population size on genetic diversity estimators considering different gene flow levels were studied in the present chapter by means of simulations (A-C vs. B-C and A-D vs. B-D, respectively). As expected, reductions in r and He values were obtained between large and small populations. In case that r and He are used for detecting genetic diversity reduction, r is more sensitive than He to detect genetic diversity reduction independently gene flow levels ( Table 3).

Reduction of allelic richness (r) and expected heterozygosity (He) as consequence of changes in population
The effects of gene flow levels on genetic diversity estimators considering different population sizes were studied in the present chapter by means of simulations  (A-C vs. A-D and B-C vs. B-D, respectively). In large populations, r is more sensitive than He to detect genetic diversity reduction as consequence of low gene flow level. On the other hand, in small populations He is more sensitive than r to detect genetic diversity reduction as consequence of low gene flow level ( Table 4).
Gene flow is a microevolutionary process that maintain the genetic exchange among local populations increasing population genetic diversity [21]. Gene flow can be quantified by the parameter m, which describes the movement of each gamete or individual independently of population size [22]. As microevolutionary process, gene flow counteracts the genetic drift effect and the balance between gene flow and genetic drift determine genetic diversity levels for neutral alleles. Genetic diversity is the basis for local adaptation and genetic drift could be understood as a threat for biodiversity because of it causes genetic diversity loss in natural populations. Current climate change and fragmentation of natural populations as consequence of anthropic impacts are calling to urgent collective and interdisciplinary actions from researchers. The study of genetic diversity levels is especially important for the management of endangered and valuable species. The focus in conservation biology is the maintenance of genetic diversity because of inbreeding and reduction in reproductive fitness is often associated with loss of genetic diversity [12]. Although the International Union for Conservation of Nature (IUCN) recognizes the need to conserve genetic diversity as one of three global conservation priorities [23] the genetic factors are not currently considered to assign the conservation status of species [24].

Conclusion
The comprehensive quantification of genetic diversity levels demand the estimation of r and He because of the sensitiveness of both estimators depends on allele multiplicity and frequencies. In this way, the estimation of r and He is recommended for genetics studies in populations that inhabit disturbed environments.