Capillary Electrophoresis as Useful Tool in Analysis of Fagus sylvatica L. Population Genetic Dynamics

The Capillary Electrophoresis (CE) is one of the method widely used in modern molecular genetics, applied for fast and efficient DNA fragment separation in the sieving polymer in the electric field [1]. The use of this method has increased dramatically over the last fifteen years, due to high precision of small nucleic acid separation (even to the singe nucleotide level) of the available material analysed in the field of analytical chemistry, physical chemistry, biochemistry, and biotechnology. Essentially, the CE technique has been used for genome sequencing projects, e.g. Human Genome Project [2] or many others assignments (for animals, plants, bacteria and fungi), published in NCBI database (www.ncbi.org).


What kind of tool is Capillary Electrophoresis (CE)
The Capillary Electrophoresis (CE) is one of the method widely used in modern molecular genetics, applied for fast and efficient DNA fragment separation in the sieving polymer in the electric field [1]. The use of this method has increased dramatically over the last fifteen years, due to high precision of small nucleic acid separation (even to the singe nucleotide level) of the available material analysed in the field of analytical chemistry, physical chemistry, biochemistry, and biotechnology. Essentially, the CE technique has been used for genome sequencing projects, e.g. Human Genome Project [2] or many others assignments (for animals, plants, bacteria and fungi), published in NCBI database (www.ncbi.org).
Progress in the CE area in recent years relied more on increasing the number of samples analyzed at the same time, as well as the development of new gels (which allow multiple separation in the same capillary filling), and "chemistry" (a mixture of buffers, substrates, and the so-called polymerase enhancers) for analyzing a sequence of one fragment nearly 1000 base-pairs. Nowadays, the CE technique applied for sequencing analysis is overcome by pyrosequencing method relying on the luminometric detection of pyrophosphate that is released during primer-directed DNA polymerase catalyzed nucleotide incorporation [3].
The CE technique is commonly performed in automated sequencers, i.e. CEQ™ 8000 Genetic Analysis System (Beckman Coulter, Fullerton, CA) composed by two main components: hardware (apparatus) and the CEQ System software. The named model is equipped by 8 capillary system which ensure 8 sample analysis in the same time.
Another apparatus used for CE, especially recommended in case of DNA sequencing, is e.g. 3500 Genetic Analyzer (Life Technologies) which includes the capillary electrophoresis instrument with the workstation and the 3500 Data Collection Software for instrument control, data collection, quality control and autoanalysis of sample files for basecalling and fragment sizing. Auto-analysis can also be performed in this model thanks to the GeneMapper® or GeneMapper® ID-X Softwares.
Both sequencers cited above are equipped with 8 capillaries, but 16-or even more capillary system are available [4]. All type of automated sequencers require appropriate chemistry used for the given goal (genotyping or sequencing), comprising polymer (separation gel), chemical buffers and washing solutions.
The current chapter describes application of DNA based genotyping of Fagus sylvatica population by capillary electrophoresis performed in CEQ™ 8000 (Beckman Coulter) sequencer.

Advantages of Capillary Electrophoresis (CE) and weak points
The CE method is widely used in modern molecular biology science in assessment of gene or allele presence (genotyping) in a given DNA sample, as well as in determining of the DNA nucleotide composition (sequencing) of the studied gene.
In general, the CE technique has several advantages: • Good tool to support conservation and management of forest trees genetic resources, in order to: • Characterize the genetic structure of the forest tree stands • Assess the initial gene pool of the population • Detect the selection processes and to maintain high level of the natural diversity of forest stands • Reflect the history of the stand in relation to the post-glacial migration refugia in Europe and in the world (phylogeny study) • Provide genetic characteristics of different forest tree species reproductive material in: • Mother and progeny stands gene flow analysis • Seed orchards and progeny plantation mating system • Assignment of populations selected for preservation in gene banks or in situ or ex situ measures • Solving of problems from seed stand management point of view: • Clonal/pedigree identification/selection processes • Pollen contamination especially important in management of artificial tree stands like forest seed plantations Field Effect Electroosmosis -A Novel Phenomenon in Electrokinetics and its Applications in Capillary Electrophoresis 50 • Patterns of gene flow and mating system in natural and artificial stands • Tracing of forest tree species with DNA markers as a support in the combat with illegal logging: • Thanks to the DNA profiles established on a basis of minimum 4 microsatellite nuclear DNA loci, and at least one cytoplasmic (mitochondrial or chloroplast) DNA marker • Strong proof to support the decision taken by several District Law Courts, as far as the identification of wood samples is proved with a high probability (approximately 98-99%) • Consistent with the assumptions of the European Parliament Directive on Timber Regulation (EUTR), which came into effect in 2013 to stop the circulation of illegally logged wood in the European Union.
Major weak points of the CE method are derived from: • Can be avoid by appropriate programme i.e. GeneMapper® or GeneMapper® ID-X Softwares which analyse, process and report the date basing on bank of dataset • Addition of at least 10% formamide improves the denaturing capacity of the sieving matrix and reduces the compressions • Can be avoid by appropriate machine running respecting all routine maintenance and installation rules according to the manufacturer.

Methodical problems
Two main errors may occur during the genotyping procedure performed with CE technique. First of all, some alleles are not identified as peaks in chromatograms (so called null alleles), influencing the general allele distribution in the population. Appropriate software, e.g. GenALEx [5] or Micro-Checker (http://www.microchecker.hull.ac.uk) may help in calculation of the probability of the null allele occurrence in the studied group of trees.
Secondly, the homplasy phenomenon may also occur during genotyping. This term is applied to the DNA fragments of the same size (in base-pairs, i.e. 324 bp) deriving from different microsatellite loci e.g. mcf-5 and mcf-11 in the case of Fagus sylvatica species. Such errors are avoided by different fluorochrome labelling of the primers during PCR prior to the CE detection in automated sequencer. All those troubleshoots occurring within genotyping and sequencing procedure can be overcome by use of the control samples (individual of reference, with known DNA structure) analysed in the same run. The other possibility of solving problems related with CE analysis is repetition of the experiment from the beginning with new reagents and new PCR reactions (see Methods below).

The idea of Capillary Electrophoresis (CE) electrophoresis
The assessment of the genetic structure (both for genotyping and sequencing) rely on five basic steps of treatment applied to the plant material collected from the field.
Applied procedure of DNA analysis in CE consists on the following steps: • Total genomic DNA isolation from the plant tissues (homogenization and extraction of nucleic acid molecules) • Specific loci amplification via PCR technique using labeled primers • Separation of the amplified fragments via CE in automated sequencer • Allele scoring and / or sequence data processing • Computing of the obtained data base with utilization of proper software.
Concerning the DNA isolation, there are several techniques of extraction, based on the lysis of cell walls to facilitate proper isolation of nucleic acids from the plant, fungal and animal tissues [6]. Good performance of extraction is guaranteed by kits ready for DNA isolation, e.g. DNeasy Plant Mini Kit (QIAGEN ® ) or NucleoSpin® Plant II (Machery-Nagel).
The first step in obtaining DNA molecules from plant tissues, is initial mechanical homogenization or grinding the material in liquid nitrogen. Liquid nitrogen damages mechanically cell walls, which allows easier access to DNA molecules, at the same time as their stability at low temperature -176°C, is maintained. Efficiency of DNA extraction is then analyzed by spectrophotometry ( Fig. 1) or by electrophoresis in agarose gel, followed by staining with ethidium bromide (50 mg ml -1 ). During sequencing, template impurity due to presence anions coming from unpurified PCR product mix or the presence of another DNA sequence can significantly influence the separation performance of the capillary. At the end, obtained DNA molecules can be stored in stabilizing buffer (pH 7.0) for a long term (even years) at -75°C.
The polymerase chain reaction (PCR) proceeds in DNA multiplication in a thermocycler programmed for multiple (average of 30 to 40) cycles (Fig. 2). Most of DNA techniques are based on amplification of the genomic DNA fragments thanks to the thermostable enzyme Taq polymerase deriving from thermophilic bacteria Thermus aquaticus.
Field Effect Electroosmosis -A Novel Phenomenon in Electrokinetics and its Applications in Capillary Electrophoresis Figure 2. General scheme of the microsatellite loci analysis using the CE in automated sequencer. After PCR amplification followed by gel-electrophoresis, the DNA samples are loaded to the sequencer (e.g. HITACHI Abi-Prism 3100 Genetic Analyzer) for the CE running. After this, the separated DNA fragments are examined in software for precise loci location and genotyping comprising the visual inspection of the peaks (alleles), numeric data collection (list of alleles) and final statistical analysis. The PCR reaction involves the DNA-matrix in the following reaction mixture: labeled oligonucleotide primers (from 10 to 24 base pairs length), four types of free nucleotides (dATP, dGTP, dCTP and dTTP), magnesium ions (Mg 2+ ), reaction buffer and Taq polymerase. The first stage of amplification takes about a few minutes and leads to double-stranded DNA template denaturation at 94°C. Then the stage of annealing at 32-42°C comes, lasting from 30 sec. to a few minutes, when the formation of complementary DNA strand to the matrix (at 72°C) occurs. The temperature and duration of each stage depends on many factors, mainly on the G/C and A/T content in primers and the size of duplicated DNA fragments. Efficiency and precision of PCR is very high, and theoretically it allows reproducing the output-template DNA molecules present in the extract up to 109 copies [6].
Then, the PCR products, labeled with different fluorochromes (e.g. WellRed D2, D3 and D4 for CEQ™ 8000 model; and FAM, JOE, ROX and TAMRA for Abi-Prism sequencer) are subjected to the CE run in automated genetic analyzer ( Fig. 2 and 3). Several labeling strategies have been developed. With some exceptions, the DNA primer labeling is generally done at the 5' end of one primer. The DNA markers separated in the gel during electrophoresis can be detected by florescence of specific nucleotides in labeled primers.
During electrophoresis in polyacrylamide gel, negatively charged DNA molecules migrate in the direction to a voltage positive electrode of the gel at a speed proportional to their size and molecular weight. After completion of electrophoresis on capillary, the DNA fragments are detected by laser thanks to the appropriate fluorochrome labeling of the primers in PCR reaction (Fig. 3). The separated DNA fragments are then analyzed using computer software. Field Effect Electroosmosis -A Novel Phenomenon in Electrokinetics and its Applications in Capillary Electrophoresis

Genetic differentiation evaluation process
The most crucial step in allele scoring is the exact allele-size determination. For this, a general scheme can be applied, helping to avoid the erroneous allele listing for examined population (Fig. 4). The most tricky allele assignment occurs into 2-base-pair repetition in the SSR fragment, especially when two adjacent alleles for the same locus differ only by 2-bp length (Fig. 4C). Good quality laboratory manipulation and broad experience of the scientist easily overcome such a discrepancy and depict the heterozygous loci in an individual. Otherwise, the false result would lead to the excess of homozygotes in the examined group of trees.
The nuclear microsatellite DNA sequences have so far been considered as the most informative markers, and have been used in the genetic diversity studies of many organisms. They are formed by short repeats of 1-6 base-pairs, so called Short Sequence Repeats (SSR or SSRs), and constitute the most powerful tool in modern population genetics and forensic studies. The advantages of microsatellite sequences are numerous: they are uniformly distributed over entire genome, are present in high proportion in forest tree species, form discrete loci and codominant alleles. The observed mutation rates for the SSR markers vary from 10 -3 to 10 -6 . The SSR fragments obtained after the CE technique applied to the European beech populations in Poland illustrate the precision of the detection of different alleles in four nuclear SSR loci investigated (Fig. 5) The sequencing methodology based on CE relies on nucleotide extension product growing from 5' to 3' direction by forming a phosphodiester bridge between the 3'-hydroxyl group at the growing end of the primer and the 5'phosphate group of the incoming deoxynucleotide. The principle of DNA replication follows Sanger dideoxy sequencing procedure [7]. The DNA sequence is copied with high fidelity at each base on the DNA template, as far as the DNA polymerase incorporates only one complementary nucleotide. The resulting nucleotide alignment is registered in chromatogramme and four letter code corresponding to the studied gene fragment (Fig. 6) Capillary Electrophoresis as Useful Tool in Analysis of Fagus sylvatica L. Population Genetic Dynamics http://dx.doi.org/10.5772/59197

Analysed parameters to describe population genetic variation and differentiation
The genetic diversity is defined as the probability of occurrence of the identical genotype among randomly chosen trees in a forest stand. The picture of electrophoretic separation of DNA fragments is converted to numerical data, using software such as CEQ System software (Beckman Coulter sequencer) or GeneMapper® or GeneMapper® ID-X Softwares (Abi-Prism sequencer).
The allele size obtained from sequencing data can be checked with the use of S-Plus software version 3.4 release 1 for SPARC (Statistical Sciences, Math Soft Inc., Seattle, WA).
In fact, the population genetic variation and differentiation are based on the heterozygosity parameter (1) calculated as values from zero (no heterozygosity) up to nearly 1.0 (when we observe a large number of almost equally frequent alleles). Instead of average number of alleles per locus more precise measure of effective number of alleles per locus (n e ) -Crow & Kimura [8] can be used (2).

7
The advantages of microsatellite sequences are numerous: they are uniformly distributed over entire genome, are present in high proportion in forest tree species, form discrete loci and co-dominant alleles. The observed mutation rates for the SSR markers vary from 10 -3 to 10 -6 . The SSR fragments obtained after the CE technique applied to the European beech populations in Poland illustrate the precision of the detection of different alleles in four nuclear SSR loci investigated (Fig. 5) A    Heterozygosity is often one of the most important parameter when we describe the genetic data. Using this measure we explain the general trend in the structure of analysed populations -even is their history and future genetic structure is concerned. Low values of heterozygosity is influenced by small population size and processes of genetic drift e.g. bottlenecks effect. A lot of heterozygotes in population signify high genetic variability. When we compare the level of the observed and expected heterozygosity in balanced populations concerning random and open mating system (i.e. under Hardy-Weinberg equilibrium) and the observed heterozygosity level is higher than the expected one, we can presume the gene flow via alien pollen outside of population. If the observed heterozygosity is lower than expected one we can assume that some inbreeding processes may occur in the population. The interpopulational variation is described very often as G ST [9] used as equivalent of F ST in G ST statistics [10,11], and it enables to assess the distance for each population from other populations (3).

H S -intrapopulation heterozygosity
The F ST parameter called as fixation index is the measure of proportion of the total genetic variance within subpopulations in relation to the total genetic variance (4). The values of this parameter can range from 0 to 1. High F ST implies a considerable degree of differentiation among populations.
F IS (inbreeding coefficient) is the proportion of the variance in the subpopulation. High F IS implies a considerable degree of inbreeding (5). Values can range from-1 (outbred) to+1 (inbred).
H T -total heterozygosity for a population H S -heterozygosity within a subpopulation H I -heterozygosity of an individual

Software
One of the oldest programs enabling computing of DNA markers analysis data is BIOSYS-2. This program was elaborated to help biochemical population geneticists to describe the analysis of electrophoretically detectable allelic variation. It can be utilized to study allele frequencies and genetic variability measures, to test deviation of genotype frequencies from Hardy-Weinberg, expectations, to calculate F-statistics, to perform heterogeneity of chi-square analysis, to calculate a variety of similarity and distance coefficients, and finally to construct dendrograms using among others cluster analysis procedures. The program, documentation, and test data are available from the authors [12].
The statistical analysis of the alleles consists in calculation of the genetic parameters estimated by Nei [9,13], i.e. expected heterozygosity (H E ), observed heterozygosity (H O ), observed number of alleles (A O ) and population differentiation parameters (H S , H T , F ST ). Those parameters may be calculated with the programs like GENEPOP software 3.2a [14], GenALEx [5] or ARLEQUIN (http://lgb.unige.ch/arlequin/). The spatial correlations may be evaluated with SPAGeDi v.1.2 [15] and the genetic distances estimated according to Nei [13]. Another one interesting software enabling the analysis of DNA data markers is POPGENE [16]. The current version of POPGENE is designed specifically for the analysis of co-dominant and dominant markers using haploid and diploid data. The software performs most types of data analysis encountered in population genetics and related fields. It can be used to compute summary statistics, including: allele frequency: estimates of gene frequencies at each locus from raw data, effective number of alleles per locus, percentage of all polymorphic loci, observed and expected homozygosity, Shannon Index, gene diversity Nei's [9], F-Statistics, gene flow from the estimates of G ST or F ST and many others parameters.
All those programs represent good tool for population genetics analysis and simulations, including: Hardy-Weinberg Equilibrium (HWE), multiple allele and loci inheritance, natural selection, genetic drift, migration, mutation and inbreeding.

Object of the study
European Beech (Fagus sylvatica L.) is one of the most important forest tree species in Poland. Beech forests cover about 5.6 % of forest area [17]. The most typical beech forest tree populations are formed at the lower forest belt in Carpathians and Sudety Mountains on the South and at the moraine landscape of Pomeranian Lake District of the North of the country. In Poland beech attains its north-eastern limit of natural range [18,19]. Varying environmental conditions have resulted in a great number of ecotypes and populations which are characterized by various ecological requirements [20,21]. The growth of beech stands outside the natural beech limit indicates that species possess potentially much wider range [22,23].
Present genetic structure of beech populations in Poland was formed by many different factors, not only environmental and genetic ones but also anthropogenic [24,25,26,27]. Recent investigations of beech variation in Poland performed with isoenzyme study [28,29,30,31], showed high genetic diversity, similar to other neighboring European populations, slight decrease of average number of alleles per locus and lower level of differentiation towards the North of the natural range limit, which generally confirm the migration paths after glaciations. The present paper describes the genetic structure within one generation, i.e. mother and progeny beech stands in Poland assessed with chloroplast and nuclear DNA markers.
There were investigated six beech populations representing natural beech range in Poland. Dentario enneaphyllidis-Fagetum (Zdroje) - Fig. 7. The genetic structure of these populations was analysed.
The extraction of total DNA from the leaves was performed using Qiagen DNeasy TM Plant Minikit according to the manufacturer instruction (Qiagen). The quality and purity of DNA were analyzed on 1% agarose gel electrophoresis and via absorption in 230, 260 and 280 nm in NanoDrop® spectrophotometer (Wilmington, USA). DNA samples were analyzed with DNA capillary electrophoresis in Beckman Coulter® sequencer, and analyzed using the software CEQ™8000 Genetic Analysis System v 9.0 (Fullerton, USA).
Parameters of genetic diversity (H S and H T ) and differentiation (G ST ) were counted and compared between mother and progeny generation according to Nei [36,11] in PopGene 1.32 software [16].

Quality and quantity of the analyzed DNA
The very high quality and purity of DNA were assessed on the basis of the ratio of absorbance at 260 and 280 nm. A ratio about ~1.8 was typical for most of the samples (Fig. 2), and proved a high purity of the extracted DNA. A ratio ~2.0 is generally accepted as RNA-free. If the ratio is lower in either case, it indicates the presence of contaminants. The good quality of isolated DNA from samples was confirmed also by measurement of absorbance at 230 nm wavelength. The quantity of the genomic DNA samples balanced between 35 up to 160 ng.µl-1 and was fully appreciated to perform next steps of the DNA analysis procedure.

Genetic structure based on nuclear DNA markers
As far as nuclear microsatellite markers of Polish beech stands were concerned, different fragment-variants of haplotypes, i.e. from 73 to 348 base-pair size were distinguished. In investigated trees both heterozygous and homozygous alleles were found (Fig. 5).
The FS1-03 locus was the most polymorphic, i.e. exhibited 27 allele variants: 73 up to 154 bp DNA fragments, while the smallest number of variants was observed in case ofmcf-11locus -11 alleles variants with DNA fragment length from 315 up to 348 bp. Usually the observed number of alleles per locus was higher for mother stands comparing to progeny stands except practically only population Tomaszów. For instance, mother trees presented mean observed number of alleles per locus 9.4, comparatively to 7.4 found in progeny from Tomaszów population (Fig. 8). Mean gene diversity among and within all studied mother stands (Tab. 1) were more or less at the same level (H T =0.8291, H S =0.7693, respectively) than mean genetic diversity in the progeny stands (H T= 0.8257, H S =0.7672), as well as the gene diversity level of all mother and progeny stands were almost at the same level: F ST= 0.0587 and F ST =0.0576 respectively (Tab. 1). The differentiation among studied Polish Fagus sylvatica population stands and their progeny can be explain that most genetic diversity resides within the stands.
Differentiation based on Nei [13] genetic distances was independent to geographical location of populations (Fig. 9).      [38]. Another two loci, ccmp4 and ccmp10, shared the same range of 5 allele-size variants of 116, 117, 118, 119 and 120 bp. Generally, mother beech trees had more variable allele, than the progeny trees of the same provenance. For instance, mother trees presented alleles from 117 to 119 bp, comparatively to only one allele of 119 bp found in progeny from Bieszczadzki NP population. Mean gene diversity among and within all studied mother stands (Tab. 2) were slightly higher (H T =0.4916, H S =0.3606 respectively) than mean genetic diversity in the progeny stands (H T =0.4600, H S =0.3375). The gene diversity level of all mother and progeny stands were almost at the same level: G ST =0.2666 and G ST =0.2663 respectively (Tab. 2). The overall haplotypic differentiation among studied Polish Fagus sylvatica populations was quite low (G ST =0.016), which means that most genetic diversity resides within the stands.

Locus
Differentiation based on Nei [13] genetic distances was independent to geographical location of populations (Fig. 10).

Discussion
The DNA markers constitute powerful tool when gene variability of forest trees is assessed. The complicity of genome of these organisms makes impossible in most cases to obtain information about particular gene structure or variability. Nowadays, microsatellite markers analysed via CE technology overcome this difficulty, enabling the genetic variation study of the given organism, but only at the level of non-coding DNA regions.
The great advantage of the CE method is that all steps of allele fragment genotyping and DNA sequencing analysis are performed automatically in one integrated system. The laboratory tasks are then concentrated on high purity DNA molecules isolated from plant tissue and on appropriate PCR amplification procedures.
The data of CE application to study the SSR markers in European beech stands in Poland revealed: high genetic diversity of beech, similar like in other neighboring European popula- tions, slight decrease of average number of alleles per locus and level of differentiation towards the North of the natural range limit, which generally confirm the migration paths after glaciations but it is not the basis to distinguish geographic regions. More cpDNA variation in the chloroplast ccmp4, ccmp7 and ccmp10 loci (G ST =0.810) was reported for other 400 European beech populations [37]. Nevertheless, some populations e.g. Bieszczadzki NP showed less polymorphism in the locus ccmp7, with only one 147 bp variant present, similarly to the previous study performed on Polish beech [38].
In most cases, higher level of genetic variation was found within the investigated beech populations, predispose them to higher genetic tolerance against harmful environmental factors-reference [39].

Conclusions
In recent years the development and dissemination of new molecular methods based on CE method has remarkably increased. The use of automatic machines (sequencers) enables the investigation not only of DNA and RNA structure, but also gene bank creation for the forest tree species in order to achieve the appropriate forest stand management level. The CE gives very good resolution in separation of DNA molecules at the level of single nucleotide basepair. Moreover it helps to overcome the problem of homoplasy in population genetics and offer cheap and fast results in sequencing of a small amount of samples. Concurrent development of software and computerisation offered the possibility of complete automation of the sample processing data, including the transfer of data and results. Application of CE electrophoresis method is useful tool in forest trees genetic diversity assessment, despite of their limits. It is possible to use them on wild scale, especially in: • Genetic characteristics of different forest reproductive material of natural and artificial stands e.g.
• Mother stands and progeny stands gene flow • Good tools to support conservation and management of forest genetic resources e.g. to support following activities: • Selection and protection of ecotypes • To asses initial gene pool for needs of effective gene conservation measures • To present selection processes of forest stands to maintain rich natural diversity • Prevention of illegal logging procedure thanks to the polymorphic SSR markers assessed with CE technique.