Microsatellite Markers in Analysis of Forest‐Tree Populations Microsatellite Markers in Analysis of Forest ‐ Tree Populations

The present state of knowledge regarding the genetic diversity of forest tree species has been greatly improved with the development of the powerful research tool that the microsatellite markers represent. These noncoding sequences are considered to be neutral, highly polymorphic, and species specific. The usefulness of the microsatellite markers was recently proven by the determination of differentiation at inter‐ and intrapopulation level, gene flow in natural forest‐tree populations, heritability processes, and sustainable management of forest genetic resources in many natural forest stands. In this chapter, I aim to describe the practical approach of microsatellite markers, used in determination of genetic structure of 14 Scots pine populations from North‐eastern Poland. Investigated pine populations exhibited high genetic parameter variation, for example, mean PIC = 79.3, Shannon Index I = 2.488, observed ( H O = 0.778) and expected ( H E = 0.849) heterozygosity. Low level of F st = 0.031 demonstrated that studied populations are more differentiated within than among stands, which were grouped into one cluster of genetic similarity. In conclusion, the present distribution of genetically related populations of Scots pine in North‐eastern Poland seems to reflect the historical events such as postglacial colonization of Poland from different European refugia and/or human management carried out in the past.


Introduction
The sustainable management of forest genetic resources requires a good knowledge of the genetic diversity of species. Because of their longevity and wide geographic distribution, forest-tree species have developed a high level of genomic heterogeneity as a genetic potential through which they adapt to the specific environmental factors of a given habitat [1,2]. Human industrial activities and changing environmental conditions have exposed many species to the threat of extinction, and, with a view to the appropriate gene-conservation measures being taken, many governments are aware of the need for forest management to maintain the biodiversity of locally adapted species. Equally, not only endangered forest-tree species but also economically important ones should be protected in a specific conservation programs based on valuable genetic data [3].
If the conservation of forest-tree genetic resources is to be pursued, molecular markers such as DNA sequences would seem suitable where the study of the genetic variation among trees is concerned [4][5][6]. Appropriate marker systems can facilitate investigation of the genetic relationships between forest-tree stands and the mapping of gene positions on chromosomes. For these purposes, several methods of DNA diversity assessment are commonly used, for example, RAPD (random-amplified polymorphic DNA), AFLP (amplified fragment length polymorphism), RFLP (restriction fragment length polymorphism), STS (sequence-tagged site), and microsatellites [4,5,[7][8][9][10][11][12].

Characterization of microsatellite markers
Since the early 1990s, a powerful molecular marker has emerged in the shape of the microsatellite sequences discovered in the genomes of all living organisms. Microsatellites (or SSRssimple sequence repeats) comprise tandem repeats of short DNA sequences from one to six base-pair motifs, largely distributed over the entire genome. They are considered to be highly polymorphic DNA markers with codominant inheritance and selectively neutral behavior [4,5,13]. SSR sequences are present in all living organisms, including protists, prokaryotes, eukaryotes, and fungi. In many species, the majority (48-67%) of tandem repeats are dinucleotides, mostly localized in noncoding regions of the genome [14]. Mononucleotide repeats are considered to be the most abundant class of microsatellites in primates, while tri-, tetra-, and hexanucleotide SSR repeats are reported in other organisms. Exposed to high incidences of mutation ranging from 10 -2 to 10 -6 nucleotide per locus and per generation, microsatellites are characterized by considerable polymorphism and species specificity [4,14].
Despite the neutrality assigned to microsatellite markers, the SSR sequences seem to serve some function in different eukaryotic organisms [15]. So far, no evident role for the abundant tandem-repeated sequences has been found, though the SSRs are presumably involved in chromatin organization in the nucleus, DNA replication, regulation of gene expression, and (putatively) in the mismatch-repair system [4,16]. Tandem-repeated sequences located in the introns of genes could trigger the disruption of the triplet-reading code. The new reading frame may be lethal, or present some advantage from the evolutionary point of view. In fact, the microsatellite triplets are more often subjected to polymerase slippage during the replication and transcription of genes. Long trinucleotide repeats, for example, CAG, CTG, CGG, and CCG, may also form secondary structures of DNA strands and influence recombination [4,5]. Many promoters contain repeated cis-acting DNA fragments, while microsatellites may also be involved in the regulation of gene expression.

Advantages and weak points of SSR markers
The precise identification of biological samples based on microsatellite loci remains a fundamental for population genetics study [17,18]. These markers present many advantages, for example, locus specificity, the small amount of DNA required, the almost absolute sizing of alleles, and fast detection [4,5,19]. The SSR fragments (also called alleles) are screened by their length expressed in base pairs, and the differences in allele sizing among individuals of one species are caused by varying numbers of repeats in microsatellite motifs.
From practical point of view, an unexpected allele sizing of microsatellite sequences sometimes occurs. In many genomes, the microsatellites mutate by errors in replication or unequal crossing over during recombination process [20]. Moreover, homoplasy, null alleles, and short allele dominance may cause problems during microsatellite scoring [5,14,21,22].
Homoplasy concerns the alleles of the same size but presenting different base-pair composition. Null alleles mean the lack of polymerase chain reaction (PCR) amplification of allele caused by nucleotide mutation in primer-binding sites. The short allele dominance is observed when large allele size dropout occurs. The amplification of nonexpected allele size often results from polymerase slippage during PCR. First of all, long and nonperfect motif repeats of microsatellite loci, especially with polyA tracks in the internal sequence, may enhance polymerase slippage [23]. Furthermore, some fluorescent dyes such as Ned, 6-Fam, and Hex in ABI sequencer 3500 Genetic Analyzer (Life Technologies™) or Well-Red D2, D3, and D4 dyes in CEQ™ 8000 Genetic Analysis System (Beckman Coulter, Fullerton, CA) used to label the primers can modify the mobility of the PCR products on the gel [24], and generate nonstandard scoring of alleles. The various lengths of SSR-flanking regions should also be taken into consideration as a putative source of nonstandard allele polymorphism [24]. Sometimes, the microsatellite allele sizes alone are insufficient to determine species biogeography for organisms with predominant asexual mode of reproduction [25].
The transferability of the microsatellite loci between conifers is generally difficult. Many microsatellites need to be isolated de novo because the specificity of flanking SSR regions is high [32,33]. This is partly due to the high rate of nucleotide substitution in noncoding regions of the genome. Moreover, conifers exhibit larger genome size (between 21 × 10 9 and 134 × 10 9 megabases) and higher genome complexity than deciduous trees [34]. Transferability of SSR sequences between P. taeda, P. radiate, and P. pinaster has been reported, for example, by Chagné et al. [31] and González-Martínez et al. [33]. In Scots pine, most SSR investigations are based on microsatellite loci transferred from P. taeda or P. pinaster [35,36].
The structure of the Scots pine genome is complex. Nevertheless, some studies of microsatellites in European Scots pine populations reveal a low level of genetic differentiation [9,[37][38][39]. These data are concordant with the low genetic variation in polymorphism frequencies of Scots pine stands assessed with isozyme markers in Europe [40]. The main reason for this limited genetic variation in Scots pine populations lies in the transfer of seed material in the past, as enhanced by the long-distance gene flow occurring among Scots pine stands in Europe [41].
The microsatellite markers in forest-tree species are analyzed following the general pathway composed by four general steps: (1) isolation of genomic DNA from plant tissue, (2) DNA amplification by polymerase chain reaction, (3) fragment length sizing and allele determination of the obtained PCR products performed using a capillary electrophoresis in automatic sequencer, and (4) statistical analyses of population genetic variation and differentiation.

Isolation of genomic DNA
Many methods of genomic DNA extraction from plant tissue have been proposed, for example, cetyltrimethylammonium bromide (CTAB) method-based isolation described by Doyle and Doyle [42], DNeasy Plant Mini Kit (Qiagen ® ), MagAttract 96 DNA Plant Core Kit (Qiagen ® ) [43], and NucleoSpin Plant II (Macherey-Nagel ® ) [43]. The mentioned methods yield c.a. 1-2 µg of DNA per 50-100 mg of plant tissue, which is sufficient for nuclear and organelle DNA amplification. According to the tissue type, that is, cambium, sapwood, or hardwood, a different yield of the DNA may be obtained, in favor of cambial cells in P. radiata [44] and Quercus robur [45]). Good quantity and quality DNAs were also obtained by Asif and Cannon [46] and Tibbits et al. [44], who supplemented the classical CTAB method with buffer containing NaCl and BSA effectively removing co-extracted contaminants. The main difficulty in DNA-based analyses remains in proper DNA extraction method from wood tissues because of the high amount of polysaccharides and polyphenolic compounds residuals which inhibit the Taq polymerase during the PCR [44]. The removal of contaminants guarantees the success of further amplification and accurateness of DNA fragment (allele or gene) detection during the capillary electrophoresis performed in automated sequencer.
Sometimes, the genomic DNA isolation step may be overcome by a direct PCR performed on fresh plant tissue with Phire ® Plant Direct PCR kit (Finnzymes ® , Vantaa, Finland), as demonstrated for silver fir samples [43].

DNA amplification by polymerase chain reaction
Prior to amplification, the quality of DNA is checked by electrophoresis or with NanoDrop ® ND-1000 spectrophotometer (Wilmington, USA). The first method relies on classical gel-based separation in the electric field of DNA fragments in c.a. 1% agarose gel or on chip-based electrophoresis in Bioanalyzer apparatus using Agilent DNA 1000 kit (Agilent Techn. Wald-bronn, Germany). Good quality and sufficient quantity of DNA molecules guarantee high yield of further amplification by polymerase chain reaction. Developed in 1983 [47], the PCR consists in three major steps: (1) initial denaturation of double-stranded DNA matrix generally in temperature of 94-98°C for 30 s, to 1 mi; (2) annealing of primers in temperature of 50-60°C for 20-30 s; and (3) extension and elongation step in 72°C. The time and the temperature of each step strongly depend on primer structure and polymerase used in the reaction [48]. All steps are repeated 30-40 times in a thermal cycler, for example, Veriti 96 Thermal Cycler (Life Technologies™, USA), T1000 Touch™ Thermal Cycler (Bio-Rad Laboratories, Inc., USA), or TPersonal Thermocycler (Biometra ® , Germany). At the end, several thousands of copies of initial DNA matrix are generated.

Fragment length sizing and allele determination of the obtained PCR products performed using a capillary electrophoresis in automatic sequencer
The PCR products are generally analyzed with capillary sequencer, for example, CEQ8000™ (Beckman-Coulter ® , USA) or 3500 Genetic Analyzer (Life Technologies™, USA) using appropriate software for data collection. The typical programs are: CEQ™8000 Genetic Analysis System version 9.0 (Beckman Coulter ® ) in the case of the CEQ8000 apparatus, and 3500 Data Collection Software and GeneMapper ® v. 5 in the case of the 3500 Genetic Analyzer (Life Technologies™, USA).

Statistical analyses of population genetic variation and differentiation
In general, statistical analyses of population genetic variation and differentiation comprise the parameters describing population genetic variation and differentiation, that is, observed and expected number of alleles (n a , n e , respectively), observed and expected heterozygosity (H O , H E ), Shannon diversity index (I), and fixation index/inbreeding coefficient of F-statistics (F is , F st ). The significant deviations from Hardy-Weinberg equilibrium (HWE) per each locus, analysis of null alleles (commonly found in SSR loci), and polymorphism information content (PIC) are also computed [21,[49][50][51]. The statistical methods, used in the study of population genetics, should be applied according to the defined objective. Many genotype-distribution methods are based on data for allele/gene frequencies, distograms of genetic dissimilarity, or mapping of gene position. The spatial patterns depend on many factors such as isolation by distance, and factors of environmental selection, migration, and human activity [52]. Several items of software can be applied in this field (e.g., GeneAlEx, PopGen, SPAGeDi, etc.). Those programs take into account Hardy-Weinberg equilibrium, multiple allele and loci inheritance, natural selection, genetic drift, migration, mutation, and inbreeding analyses [51,53].
All statistical methods should consider the effect of interaction between genotype and the environment, in order to precise the estimation values of observed genotype in given conditions. Forest-genetic field experiments are based on tests of adjustment for local environmental factors and on the estimation of breeding values. The multi-trait selection measures attempt to predict trees' response to the selection effect. The assessment of valuable quantitative trait loci (QTL) mapping, gene-expression analysis, or the long-term response of evolutionary selection makes use of several programs, for example, analysis of variance (ANOVA), statistical analysis system (SAS, restricted maximum likelihood (RML), and S-Plus [38].
In order to illustrate the genetic similarity between studied populations, usually the dendrogram based on the distance matrix is constructed. To this end, very often the UPGMA (unweighted pair group method with arithmetic mean) method is applied [50,53]. To produce a dendrogram of genetic similarity, the UPGMA method employs a sequential clustering algorithm. For instance, the DendroUPGMA software is a good tool for computing the clustering from the sets of variables [49,54], with several factors such as Pearson coefficient, Jaccard similarity coefficient, and Dice coefficient.
The resulting tree (dendrogram) of genetic similarity gathers the populations in branches defined by, for example, 100-bootstrap replicates, which give an estimation of probability for particular node. The calculation of the CoPhenetic correlation coefficient (CP), which values are comprised between 0 and 1, gives a measure of distance accurateness of the dendrogram.

Object of the study
Scots pine (P. sylvestris L.) is the most widely distributed coniferous species in Europe. The species enjoys major economic relevance, especially in Northern and Eastern European countries. In Poland, P. sylvestris accounts for 69.4% of total forest area, in Finland 64.9%, and in Lithuania 36.5% [55][56][57]. The present genetic structure to the Scots pine stands in Europe has been largely influenced by climatic and environmental factors [58]. Above all, the recolonization of the continent after the last glaciation period contributed to the rapid expansion of Scots pine populations from their South-European and Central Russian refuges to the North of the continent [40,[59][60][61]. Second, the distribution of many Scots pine stands in the European landscape reflects the present situation and socioeconomic changes, for example, privatization, the increased demand for wood, deforestation, and reforestation [58]. Due to the high level of anthropogenic pressure, the genetic resources of many forest-tree species in Europe have frequently been impoverished. Moreover, the transfer of genetic material across European countries has modified the natural gene pools in many forest-tree stands [58].
Recent advances in regard to the genetic diversity P. sylvestris have highlighted the usefulness of nuclear SSR markers in forest-tree genetics, focusing especially on genotyping of the Scots pine populations in Poland. In the present study, 14 natural or seminatural, 110-year-old Scots pine populations, located in North-eastern part of Poland were investigated ( Table 1).

Methodology of Polish case study
The extraction of total DNA from the 100 mg of needles was performed using Qiagen DNeasy Plant Mini kit according to the manufacturer's instruction (Qiagen ® Hilden, Germany). The quality and purity of DNA were analyzed by absorption in 230, 260, and 280 nm in Nano-Drop ® spectrophotometer (Wilmington, USA). Four nuclear microsatellite DNA markers were amplified, that is, SPAG 7.14, SPAC 12.5, PtTX3025, and SsrPt-ctg4363 [9,31,38]. For all loci, Well-Red labeled primers were synthetized by Sigma-Aldrich Company (St Louis, USA). The PtTX primers were originally designed for P. taeda but they were proved to be as useful as markers developed for P. sylvestris. The obtained PCR amplicons were analyzed using DNA capillary electrophoresis in CEQ8000 Beckman Coulter ® sequencer, and analyzed using the software CEQ™8000 Genetic Analysis System v 9.0 (Fullerton, USA). Parameters of genetic diversity (H O , H E , H T ), differentiation (F-statistics), and genetic distance matrix were computed according to Nei [49,50] in GenALEx v. 6 software [53]. The mean polymorphism information content values were established for each set of markers in MolKin 2.0 software [62].

Quality and quantity of the analyzed DNA
Spectrophotometrical assessment of the genomic DNA isolated from Scots pine samples yielded good quantity and quality of the nucleic acids (Figure 1). For all samples, the mean DNA purity (A 260/280 = 1.67 and A 260/230 = 1.82) and the mean DNA concentration (148.89 ng/µl ± 11 S.E.) were suitable for further amplification of microsatellite loci in PCR.

Genetic differentiation level
The studied trees harbored both heterozygotes and homozygotes in four microsatellite loci as illustrated in Figure 2. All loci were very polymorphic (mean PIC = 79.3), with highest values for loci SPAG 7.14 (PIC = 95.4) and SPAC 12.5 (PIC = 94.5). Total allele frequency distribution revealed 50 different alleles in SPAG 7.14 locus (Figure 3), 48 alleles in SPAC 12.5 (Figure 4), 31 alleles in PtTX3025 (Figure 5), and 18 alleles in SsrPt-ctg4363 locus (Figure 6). The allele sizing was corrected in all loci because consecutive polymerase slippage was denoted. Null allele content was minor (2.3%) for all microsatellite loci.      Genetic differentiation level of microsatellite nSSR loci in studied Scots pine populations has been resumed and is listed in Table 1 Borsukowina stands, respectively. The lowest (H = 0.774) was observed in Rudka stand. Total genetic diversity among populations was high (H T = 0.848). Low level of F st = 0.031 proved that the studied Scots pines are more differentiated within than among examined stands ( Table 1).

Genetic distance (D N )
The dendrogram built on the distance matrix based on SSR markers frequencies ( Table 2) revealed two main clusters of populations (Figure 7). Two populations from the first group of dendrogram (number 2, Czarna Białostocka Budzisk, and 13, Knyszyn Kopisk) were separated by a distance of 0.612 from the second group. Moreover, two populations from the first group were closely located one to another in North-eastern Poland (Figure 8). Nevertheless, the robust MCMC analysis revealed only one cluster of population genetic grouping, proved also by CoPhenetic Correlation Coefficient value close to 1 (CP = 0.993).
In Poland, Scots pine resources are classified by reference to 26 seed regions, based on the boundary delineation of physicogeographical features, for example, a homogeneous climate and geographic conditions [12]. Programs for the in situ conservation of valuable Scots pine provenances are put in place with regard to the distribution of seed regions, as well as the location of what are known as natural-forest regions. The present rules for the transfer of Scots pine genetic resources in Europe are mainly founded upon such provenance tests, with only a few investigations being based on molecular markers [12,40,73].
In the present study, low genetic differentiation level of 14 Scots pine stands from the Northeastern Poland was determined thanks to the DNA profiles established on a basis of four microsatellite nuclear DNA loci (SPAG 7.14, SPAC 12.5, PtTX3025, and SsrPt-ctg4363). These data support previous investigations of the genetic structure performed using four nuclear microsatellite markers on 42 Scots pine populations located in different regions in Poland [38]. Pine trees from 42 stands were characterized by high polymorphism level (PIC = 80.0%), and low level of interpopulation differences (F st = 0.033). The Baltic, Śląska, and Wielkopolsko-Pomorska Regions revealed the highest genetic differentiation (F st = 0.036, H S = 0.323, and H S = 0.207, respectively). The UPGMA analysis performed with nuclear microsatellite markers in 42 populations generated two main groups of populations with a very weak probability of clustering. The geographical distribution of the genotypes emerging from dendrogram was scattered across the country. Moreover, no spatial correlation between the gene diversity and the geographical locations of stands was found [38]. In this regard, data obtained for 14 Scots pine populations from North-eastern Poland (present study) reflect similar level of the genetic variation (F st = 0.031), and no spatial correlation between stand location and genetic distance was found. Such a situation is often described for many forest-tree species natural populations, and reflects forest-tree characteristics, such as longevity, long-distance pollen dispersion, and great potential for adaptation to various climatic changes [1,2,6,26,41,60,61].
Scattered distribution of genetically related populations of Scots pine seems to reflect the historical events such as colonization of Poland by this species from different postglacial refugia and/or by significant human management practiced in the past. These data were supported by mitochondrial gene study, which have a maternal mode of transmission, and non-recombinational nature in conifers was used in the study of maternal lineages and the postglacial migration of P. sylvestris across Europe [10].
Another type of microsatellite sequences located in chloroplast genome (cpSSR) could also present an interesting tool to which the genetic diversity and gene flow among Pinus populations could be analyzed. Since the chloroplast and mitochondrial genomes are uniparentally inherited in conifers, these markers are not exposed to the recombination process [74]. CpSSR loci present some advantages, for example, they are less variable than nuclear SSR, express low mutation rate, and high species specificity [37]. Most of the cpSSR analyses have been reported for different Pinus species, for example, P. leucodermis [66], P. halepensis [75], P. pinaster [11,33,76], P. resinosa [67], P. brutia [77], P. torreyana [37], P. cembra, P. sibirica, and P. pumila [78], and P. echinata [79]. In most cases, the cpSSR markers have been successfully used in paternity analysis, in the monitoring of the gene flow between populations and in the study of population history following the postglacial migration of pine species.
Recently, investigation focusing on nuclear and chloroplast microsatellite DNA markers in wood tissue identification is an efficient method to be used for forensic purposes. The present methodology helps to compare detailed DNA patterns of Scots pine (P. sylvestris), Norway spruce (P. abies L. Karst.), European silver fir (A. alba L.), and European larch (L. decidua Mill.), with high probability of identity (c.a. of 99.99%) [43].
Both adaptive and neutral markers (e.g., microsatellites) present many advantages in modern forest genetics [60,65,75,78,80]. In order to find the genetic basis of the neutral or adaptive diversity of natural populations, simulations based on adaptive traits, quantitative trait loci, and neutral markers are performed [81].

Conclusions
The conservation of genetic variability is a major focus in forest-tree selection and sustainable forest management (SFM). The preservation of genetic diversity in different forest-tree species facing changes of environmental conditions and increasing human industrial activity is still the great challenge for researchers involved in adaptive and evolutionary genetics. Genetic variation may be investigated by means of several molecular techniques using DNA markers. Among them, the microsatellites are the most powerful and suitable tool in the identification and characterization of the genetic resources in forest. Because of their relatively high mutation rate, microsatellites are often used to study genetic variation and population structure. The SSR markers constitute an effective tool by which the European Scots pine populations have been studied on the basis of nuclear and chloroplast DNA. In this context, stress is placed on the accurateness of the chosen marker for a given purpose, as well as the statistical methods of calculation.
The nuclear SSRs are mainly used in studying genomic differentiation. The discriminatory power of nuclear SSR markers points out their applicability to the study of various forest-tree populations. The comparative study of dominant and codominant nuclear markers in foresttree genetics shows that even a few microsatellite loci can be used in the high-accuracy prediction levels of genetic diversity. It is supposed that the populations with low level of genetic variation are generally less genetically stable and more vulnerable to pathogenic infections and harmful changes of environmental conditions [1,39,41]. The researchers involved in the field of forestry foresee the need for further analysis using molecular genetic tools.
Particular attention should be drawn to the avoidance of some errors occurring during the scoring of microsatellite allele (in Scots pine or other organisms, we can meet null allele, short allele dominance, and polymerase slippage). The use of the specialized genotyping software is therefore strongly advised.
Many approaches to the conservation of genetic diversity, the exploration of plant-genetic resources, and the design of plant-improvement programs require a specific knowledge on the amount and distribution of genetic diversity within investigated species. The genetic information contained in DNA, particularly in microsatellite sequences, offers valuable input when it comes to the in situ and ex situ conservation of forest-genetic resources. Notwithstanding the intensive use and management of the species, very little is still known about the genetic variability of Scots pines in Europe. The present chapter attempted to give an introduction to the practical side of microsatellite analysis and the interpretation of genomic data obtained for Scots pine (P. sylvestris) populations in Poland.