Genetic Biodiversity and Phylogenetic Studies in Poplar by Means of the Metallothionein Multigene Family

Biodiversity can be defined as the totality of living forms present on earth, with the wealth of genetic information that they have (Grassi et al., 2006). It can be viewed at various levels, from ecosystems, down to species, populations, genomes and genes. Populations are representative of the species and may contain a large part of their genetic biodiversity. They are the basis of allopatric and sympatric speciation phenomena, due to a reduction of gene flow, caused by the presence of new geographical barriers, the former, and to autoor allopolyploidy events, the latter.


Introduction
Biodiversity can be defined as the totality of living forms present on earth, with the wealth of genetic information that they have (Grassi et al., 2006).It can be viewed at various levels, from ecosystems, down to species, populations, genomes and genes.Populations are representative of the species and may contain a large part of their genetic biodiversity.They are the basis of allopatric and sympatric speciation phenomena, due to a reduction of gene flow, caused by the presence of new geographical barriers, the former, and to auto-or allopolyploidy events, the latter.
During the last two decades many molecular tools based on PCR (Polymerase Chain Reaction) have been developed and used to calculate and evaluate, through bioinformatics, genetic biodiversity of all living organisms and of plants in particular (Lowe et al., 2004).These molecular tools comprise DNA sequencing of single genes, or of entire genomes (e.g.rice, poplar, wheat, grape, etc.), locus specific (SSRs and SNPs) and random genome analysis (AFLPs, RAPDs, etc.).
Poplar, after man and rice, was one of the first organisms with a fully sequenced plant genome (Tuskan et al., 2006).Moreover it has also been a subject study of many plant biologists and physiologists due to its commercial and economic importance, and to its relevance for riparian ecosystems.Many populations and germplasm collections of different poplar species have been investigated (Brundu et al., 2008;Castiglione et al., 1993;Castiglione et al., 2010;Fossati et al., 2003) using random and locus specific molecular analysis to evaluate their genetic biodiversity.Poplar is a perennial flowering plant, native to most of the northern hemisphere, known to play valuable ecological roles: in fact, it is considered a trustworthy health indicator of riparian ecosystems and a promising phytoremediation tree of polluted soils (Castiglione et al., 2009;Sebastiani et al., 2004;Yadav et al., 2010).Poplars, aspens and cottonwoods are part of the genus Populus that, along with the two other related genera Salix and Chosenia, belongs to the Salicacae family.It is now generally accepted that the genus Populus comprises 29 species, identified on the basis of diagnostic morphological characters and assigned to six different sections, namely Abaso, Aigeros, Leucoides, Populus, Tacamahaca and Turanga (Dickmann, 2001).However, correct identification and classification of poplar species are made difficult by: (a) the very high variability of their morphological traits; (b) their strong tendency towards intra-section hybridization, especially in sympatric ranges, resulting in hybrids with mixed and broadspectrum parental features; (c) the unusual possibility to inter-section hybridization (e.g., between specimens belonging to the Aigeros and Tacamahaca sections -Dickmann, 2001).Moreover, it should be noticed that fertile F1 hybrids could backcross towards one/both parental species, or produce further generations, thus making species and hybrid identification even more problematic (Fossati et al., 2004;Lexer et al., 2005).Recently, several studies using molecular markers have been carried out in order to revise the assignment of poplar species to the above-listed sections and even to resolve their controversial taxonomy (Cervera et al., 2005;Hamzeh & Dayanandan, 2004).Indeed, due to discrepancies among phylogenetic trees reconstructed with nuclear ribosomal DNA (rDNA) or chloroplast genes, it was established that nuclear genome (nDNA) of P. nigra (Aigeros) is divergent from species belonging to Populus section, although its chloroplast DNA (cpDNA) is strongly associated to that of trembling poplars (Populus) (Hamzeh & Dayanandan, 2004).Thus, P. nigra should not be considered derived from the inter-section crossing between P. deltoides x P. alba (Smith, 1988), but might be classified in a brand-new and independent taxon (i.e., Nigrae).Further studies, using Amplified Fragment Length Polimorfisms (AFLP), revealed that P. mexicana, the single species present in the Abaso section, is so divergent from all the other poplar sections that might be considered part of the Salix genus (Cervera et al., 2005).Moreover, AFLP data also showed that species belonging to the Populus section are characterized by the greatest levels of interspecific genetic variability, whereas the opposite trend has been observed for the sections Aigeros and Tacamahaca.Given that ancestral species have greater variation with respect to the derived ones, Populus is believed to be the oldest section, whilst Aigeros and Tacamahaca the most recent ones (Cervera et al., 2005;Eckenwalder, 1996).However, controversial conclusions have been drawn on the taxonomy of the Populus section using different molecular markers.On one hand, the analysis of rDNA sequences indicates that Populus section could be further sub-divided into two major clades: one comprising P. alba, P. tremula and P. davidiana (Korean aspen); the second one grouping together the American aspens P. tremuloides and P. grandidentata (Hamzeh & Dayanandan, 2004).On the other hand, AFLP data suggested that P. tremula and P. tremuloides are ecotypes still capable of hybridization and thus part of the same group, whilst P. alba represents a totally different taxon (Cervera et al., 2005).In addition, Tacamahaca section was surprisingly found to be a polyphyletic group, i.e., a kind of meta-cluster comprising many small sub-groups, each containing closely related species (Hamzeh et al., 2004).Finally, American cottonwoods P. trichocarpa and P. balsamifera, which have been ever considered subspecies of a single taxon, were discriminated as different species when assessed by AFLP analysis (Cervera et al., 2005).Such inconclusive and contrasting results indicate that both rDNA and AFLP markers are not suitable to correctly classify poplar species and fully resolve their phylogenetic relationships.Indeed, it should be stressed that AFLP analysis is intended to get insight into the overall structure of genomes, thus being more properly used to determine interspecific genetic variability.Moreover, the rDNA gene family is known to be affected by concerted evolution and gene conversion events, which may alter the phylogenetic signals carried by the analyzed sequences.Despite these limitations, molecular evolution of plants is mainly inspected using rDNA and/or cpDNA alone, which possibly lead to evident mistakes in phylogeny reconstruction due to biases present at the molecular level (Alvarez & Wendel, 2003;Doyle & Gaut, 2000;Small et al., 2004).Undoubtedly, the drawing of accurate and resolved phylogenetic trees may be supported using additional molecular markers, such as nuclear -and thus mendelian-inherited-single-copy genes and/or multigene families.In particular, multigene families are very attractive since they offer the possibility to sample independent loci (i.e., not associated in linkage units) that share a deeply related evolutionary framework.This is a crucial feature as far as hybridization and introgression events are concerned: thus, the use of such nuclear genes should be considered for reconstructing the phylogeny of poplar and also for assessing the genetic variability within the studied genus.Actually, several analyses of plant gene families, with particular emphasis on P. trichocarpa, the poplar model system, have been carried out in the recent past in order to understand their evolutionary history, the functional diversification of their members and their detailed expression in poplar (Lan et al., 2009;Petre et al., 2011).At present, however, little information is available about phylogenetic relationships among poplar species by means of nuclear genes analysis (Fladung & Buschbom, 2009).Hence, in order to shed light on the unresolved problematic issues, we have undertaken a deep investigation of the nuclear-encoded metallothionein (MT), a gene family involved in response to plant stress.In particular, MTs are known to have protective roles against plant wounds of pathogenic origins and senescence (Kohler et al., 2004), and to be involved in metal homeostasis (Cobbett & Goldsbrough, 2002), being upregulated in foliar tissue, together with polyamines, to enhance poplar tolerance towards heavy metal-contaminated soils (Castiglione et al., 2007;Cicatelli et al., 2010;2011), or in water deficit (Berta et al., 2009).Nonetheless, no comprehensive study on MT gene family has been performed so far, even if it shows interesting and potentially attractive features.In fact MT multigene family is characterized by: (a) a quite high but manageable number of members (six), at least in P. trichocarpa (personal observation); (b) a peculiar organization of specific cysteine residues, whose function is cooperatively binding metal ions (Kille et al., 1991), thus allowing easy arrangement of orthologous MTs into three types, namely MT1, MT2 and MT3 (Cherian & Chan, 1993;Cobbett & Goldsbrough, 2002); (c) presence of a full-exon coding for a specific region, called "spacer", which show remarkable levels of genetic variability (Buchanan-Wollaston, 1994;Zhou & Goldsbrough, 1994).Indeed these spacers could be rich of singlenucleotide polymorphism (SNP).SNP including single-base changes or indels (insertion or deletion), at specific nucleotide positions, has been shown to be the most abundant class of DNA polymorphism in many organisms (Brookes, 1999;Cho et al., 1999).SNP variation analysis and SNP marker development from candidate genes could provide valuable information regarding their evolution and effects on complex traits.Very recently, Fladung and Buschbom (2009) have used partial DNA sequence of six nuclear genes to draw phyologenetic relationships among a limited number of poplar species using SNPs.This study substantiate the importance of this class of molecular markers as a very promising molecular tool to study the evolution or measure the genetic variability of plants, in general, and within Populus, in particular.The present study is part of a wider and ample collaborative project between Italy and the People's Republic of China focused on poplar.
For this reason, particular interest was devoted to the analysis of Chinese poplar species belonging to the Populus section (Castiglione et al., 2010;Lexer et al., 2010).In the present study MT genes have been used to estimate phylogenetic relationships among 11 Populus ssp belonging to three poplar sections (Aigeros, Populus and Tacamahaca).Furthermore a natural population of 63 P. alba trees, collected along the banks of the river Sele (South Italy), was analysed by means of nuclear ( 10) and (3) chloroplast SSRs.The population genetic variability was evaluated using indices commonly employed to estimate genetic biodiversity (e.g.h, Na, Ne, Ho, He etc.) or genetic dissimilarity (Jaccard index, PCA).On the basis of the estimated genetic biodiversity, a subset of the population (5 specimens), characterized by very high genetic dissimilarity based on Jaccard index, underwent to DNA sequence analysis in order to identify new SNPs in MT genes and an exemplificative analysis is given for the isoforms of MT1a and MT3a.Based on our results, we can state that MTs are promising markers to shed light on both intra-and inter-section relationships among poplars, as well as to assess genetic variability within selected natural populations.

Material and methods
Poplar samples.Eleven different species, representing three distinct sections (Tacamahaca, Aigeros and Populus) of the genus Populus, were selected (Tab. 1) for phylogenetic study.Nine species were sampled for DNA sequencing, among those three were Italian (P.nigra, P. alba and P. tremula) and six Chinese (P.pyramidalis, P. adenopoda, P. davidiana, P. serrata, P. bonatii and P. tomentosa).For each species, fresh young leaves of 2-3 individuals were sampled and dried in Silica Blue Gel (Sigma-Aldrich Italia -Milano, Italy) prior to DNA extraction.The sequences of the remaining 2 balsam poplars, i.e.P. trichocarpa and P. balsamifera, were retrieved from publicly available databases (GenEMBL).It should be stressed that the selected Chinese poplars have never been characterized from a molecular point of view, and that their classification within the Populus section is troublesome due to peculiar morphological features (see the description given in the Chinese Floras Atlas at: http://www.efloras.org/flora_page.aspx?flora_id=2 ).To study the genetic biodiversity of a P. alba natural population, 63 individuals were sampled along the banks of the river Sele (Salerno -South Italy) in the springs of 2008 and 2009.Young leaves were collected from single individuals and then stored in absolute ethanol until use.Genetic biodiversity of white poplar natural population of the Sele river was compared with eight reference samples: two clones collected in Sardinia (b33SS, b4SS); further two clones part of a collection of Northern Italy clones and analysed during a previous research project focused on phytoremediation [AL22, AL35 - (Castiglione et al., 2009)], and four hybrids (P.x canescens) collected along the banks of the Ticino river in Northern Italy (# 1, # 13 , # 15, # 2 - Castiglione et al., 2010).DNA extractions.Total genomic DNA was extracted from dried poplar leaf (species reported in Tab. 1) using DNeasy Plant Mini kit (Qiagen; Milano, Italy), or "REDExtract-N-Amp Plant PCR Kit", (Sigma-Aldrich Italia, Milano, Italy) following the supplier instructions.

Identification of MT genes in poplar.
The first step for our analysis consisted in the identification of genomic position of the metallothionein (MT) genes of P. trichocarpa.This was done through BLAST searches against the P. trichocarpa genome v2.0 (available at http://www.plantgdb.org/PtGDBand http://plants.ensembl.org/Populus_trichocarpa)using as queries the mRNAs of P. x generosa (P.trichocarpa x P. deltoides) retrieved from GenEMBL (MT1a: AY594295, MT1b: AY594296, MT2a: AY594297, MT2b: AY594298, MT3a: AY594299, MT3b: AY594300).Based on the identified sequences, we designed specific primers (available on request) for MT genes PCR amplification in the remaining poplar species.For each MT-gene, two primer pairs were designed, so that partially overlapping amplicons, suitable for assembling and analysing DNA sequences, were obtained.As only exception, MT2a was sequenced using only one primer pair due to its unusual exon-intron structure (see the Results section).The primer pairs, designed on the basis of P. trichocarpa genome, were successfully used even for the amplification and sequencing of MT1b, MT3a and MT3b genes from a specimen of Salix matsudana L. growing at the Orto Botanico Cascina Rosa (Milano-IT).The remaining MT genes of the same Salix species were retrieved from GenEMBL (MT1a: EF157299, MT2a: EF157297, MT2b: EF157298).
PCR amplification and sequencing.MT genes were amplified in PCR reactions contained 200 μM dNTP, 1.5 mM MgCl 2 , 2.5 M of each primer, 1.5 U of PolyTaq -Recombinant 5 U/μl (PolyMed, Sambuca Val di Pesa, Italy), and 5 l of Poly-Taq 10X Buffer in a total volume of 50 L.The PCR thermal profile was as following : 94°C for 60 s, TA°C for 60 s TA= Annealing Temperature for each single primer pairs, available on request) , and 72°C for 90 s for 35 cycles.After purification and estimation of DNA quantity by agaroseelectrophoresis, single-band amplified DNAs were sequenced using ABI Big Dye Terminator version 3.1 Cycle Sequencing Ready Reaction Kit (Applied Biosystem, Monza, Italy).PCR sequencing products were electrophoresed on an ABI 310 automated genetic analyzer (Applied Biosystem).Some of MT amplified products, showing double bands of different molecular weight, were cloned, according to the procedure provided by Clone JET PCR Cloning kit (Fermentas, Burlington, Ontario), and sequenced as above described.Each amplicon was sequenced at least two times, using the same PCR primers, and chromatograms were processed by careful visual inspection.In order to assemble MT-genes, the amplicons of each gene belonging to the same species were aligned using the ClustalW multiple sequence alignment software (Thompson et al., 1994).Aligned DNA sequences were verified by manual editing of the sequence alignments.
Gene annotation and analysis of sequence variability.The exon-intron structure and the protein-coding sequence (CDS) of the amplified genes were identified using the GenomeScan tool (http://genes.mit.edu/genomescan.html),which also allowed us to verify the presence of the canonical splicing sites.The CDS of MT-genes were aligned using the RevTrans program (http://www.cbs.dtu.dk/services/RevTrans), which takes into account the codon structure of the analyzed sequences.The six resulting alignments, which support the exon-intron structure identified with GenomeScan, were manually curated in order to remove the primer sequences.Variability analyses were carried out on a final dataset including 12 taxa (i.e., 11 poplar species plus the outgroup S. matsudana) for each of the six MT-genes.Moreover, the amino acid sequences inferred from the translation of the MTgenes were aligned using MUSCLE program (http://www.ebi.ac.uk/Tools/msa/ muscle/#).For each data set, the proper variability analysis was performed using the MEGA4 software (Tamura et al., 2007), i.e. we calculated the number of synonymous and non-synonymous substitutions for the nucleotide alignment, and number of amino acid invariant and variant sites for the protein alignment.
Phylogenetic data analysis.Phylogenetic analyses were performed on each of the six nucleotide dataset and on the concatenation of all MT-genes in a so-called "supergene" (1,224 sites with no gapped position).Only the last dataset proved to be enough informative to resolve the relationships among poplar species.As for inferred translation of MT-genes, due to both the very low number of sites and variability levels detected among the different poplar species (see Results), we decided to exclude protein sequences from the phylogenetic analysis.With respect to the nucleotide "supergene" dataset, model selection was performed using the on-line version of Modeltest software, (http://www.hiv.lanl.gov/content/sequence/findmodel/findmodel.html), which showed that the most suitable model was GTR+G (Lanave et al., 1984;Saccone et al., 1990).The parameters of the model were: base frequencies: A 0.28, C 0.23, G 0.31, T 0.31; substitution rate matrix: (A-C) 1.672 (A-G) 0.312 (A-T) 0.275 (C-G) 0.710; (C-T) 1.251; (G-T) 1.0; gamma distribution parameter (alpha): 0.041.Phylogenetic analyses and bootstrapping were carried out using programs of the PHYLIP package (Felsenstein, 1993) and PHYML software (Guindon & Gascuel, 2003).In particular, phylogenetic analysis was performed using two different Maximum Likelihood (ML) procedures: a "Classical ML" and a "Hybrid ML" method.The Classical ML analysis was directly carried out by means of the PHYML software.
On the other hand, the Hybrid ML method consisted of two steps: (a), calculation of ML distances by TreePuzzle software (Smith, 1988); (b) calculation of phylogenetic trees via the Neighbor Joining (NJ) method.NJ was performed using the NEIGHBOR software, one of the several phylogenetic tools belonging to the PHYLIP package.In the case of the Classical ML method, bootstrapping was implemented as follows: (a), generation of 1,000 replicates of the original data set, using SEQBOOT software; (b), bootstrap test by means of PHYML, activating "multiple data sets analysis" option.For the Hybrid ML method, bootstrapping was implemented as follows: (a) generation of 1,000 replicates of the original data set using SEQBOOT software (PHYLIP package); (b) bootstrap test, using the UNIX script "puzzleboot" (http://rogerlab.biochemistryandmolecularbiology.dal.ca/puzzleboot.php).Phylogenetic trees for each of the obtained distance matrix were calculated using NIGHBOR and CONSENSE software (PHYLIP package).Finally, the calculated phylogenetic trees using either the Classical or the Hybrid ML methods were visualized and manipulated using TreeView (Page, 1996) and TreeMe (http://en.bio-soft.net/tree/TreeMe.html).
SSR genotyping of a P. alba natural population.Molecular analysis on the natural white poplar population growing on the banks of the Sele river was performed by means of Simple Sequence Repeats (SSR).SSR assays was performed as described by Yin et al. (2004) and Van der Schoot et al. (2000) using ten nuclear (ORPM_30a, ORPM_30b, ORPM_312a, ORPM312b, ORPM_60, PMGC_2852, WPMS5, WPMS14, WPMS18, WPMS20) and three chloroplast DNA primers [CCMP2, CCMP6, CCMP10 - (Weising & Gardner, 1999)] These were chosen among those characterized by high polymorphism, absence of multiple bands and high discrimination power.PCR reactions were performed using a traditional twoprimer approach, being one of the two primer fluorescent-labelled. Reactions were performed in 10 µl total volume containing: 2 µl of template DNA, 2 µl of reverse primer (1 µM, unlabeled), 2 µl of forward (1 µM, a third of which was labelled), 4 µl of RedExtract-N-Amp PCR Ready Mix (Sigma-Aldrich).For WPMS5 and WPMS14 primers, cycling conditions were as follows: initial denaturation at 94°C for 3 min; followed by 35 cycles of 1 min and 15 s at 94°C, 1 min and 15 s at annealing T (annealing T was 50°C for WPMS5 and 60°C for WPMS14), 1 min and 45 s at 72°C; and a final extension time of 10 min at 72°C.ORPM_30 region was amplified using the following thermal profile: 94°C for 3 min; followed by 30 cycles of 1 min at 94°C, 1 min at 55°C, 1 min at 72°C, and a final extension time of 10 min at 72°C.Remaining PCR amplifications were performed using: initial denaturation at 94°C for 3 min; followed by 15 cycles of 30 s at 94°C, 1 min at 62°C (annealing temperature was reduced by 0.5 degree per cycle); followed by 20 cycles (or 30 cycles for ORPM_312) of 30 s at 94°C, 1 min at 52°C, 1 min and 30 s at 72°C; and a final extension time of 7 min at 72°C.The SSR genotypes were analyzed on ABI-PRISM 310 Genetic Analyzer (Applied Biosystems, Monza, Italy), while fragment sizing was carried out with Gene Mapper version 4.0 (Applied Biosystems) utilizing the internal 500 ROX Size Standard (Applied Biosystems).Population biodiversity was evaluated by number of alleles, frequency, observed and expected heterozygosity; within-population inbreeding coefficient F IS for microsatellite loci was also estimated using the available free software package GenAlex6 (Peakall & Smouse, 2006) freely available at the following web site: http://www.anu.edu.au/BoZo/GenAlEx/new_version.php.Furthermore, some of the nuclear SSR markers were used to determine the genetic relationship of the Sele population compared to some selected specimens belonging to already surveyed P. alba populations (Brundu et al., 2008;Castiglione et al., 2009;Castiglione et. al., 2010), as described above.The molecular similarity of the defined four groups of poplars was assessed performing a PCA (Principal Component Analysis) by means of the NTSYS-pc program version 2.1 (Rohlf & Marcus, 1993), using as distance the Jaccard dissimilarity index (Jaccard, 1908).Finally, in the light of previous analyses, in order to maximize the observed genetic variability we picked out five clones from the Sele population, and carried out a molecular analysis of the SNPs (Single Nucleotide Polymorphisms) present in their MT-genes, searching for point mutations and indels in both exons and introns.

Results
MT Multigene family within the genus Populus.MT genes, sequenced or available in public databases for the chosen poplar species (Tab.1), were analyzed in order to evaluate their sequence variability and, thus, their resolving power as molecular markers for assessing the phylogenetic relationships among cryptic species of the genus Populus.The amplification of purified PCR products of the MT CDS, followed by RevTrans alignment of all nucleotide sequences and their arrangement as a supergene, resulted in a data matrix of 1,224 characters (with no gapped positions) for all 11 poplar species plus the outgroup (S. matsudana).It should be noticed that the corresponding DNA sequence of two or more specimens of the same taxon were identical, e.g.P. alba (2).BLAST searches performed against the Poplar Genome Browser, using as queries mRNA sequence from P. x generosa, allowed the identification of the number and genomic position of all MT genes in the genome of model system P. trichocarpa.As expected, poplar MT genes are encoded by a multigene family encompassing six members, that is two genes for each MT type, as described by Koheler and co-workers (2004).It should be notice that the two members belonging to the same MT type, e.g.MT1a and MT1b, are neither found on the same chromosome, nor are organized as cluster like the well-known rDNA genes.
Exon-intron structure of poplar MT genes and intron variability.Poplar MT genes proved to have a well-conserved exon-intron structure: in fact, the two isoforms of the same MT gene show a CDS of equal length as well as the same number of exons/introns.According to Cobbet & Goldsbrough (2002) and Koheler et al. ( 2004), MT genes are encoded by short genes, composed of small exons and introns.Our analyses confirmed data present in literature, having for MT1, MT2b and MT3 mean length of 585, 473 and 616 bp, respectively.The MT genes code for small polypeptides with mean lengths of 73, 78 and 66 amino acids for MT1, MT2b and MT3, respectively.The only important exception is the MT2a gene, which is 293 bp long and has 2 exons coding for a polypeptide of 79 aa.This exon-intron structure is peculiar to MT2a, and it is neither an annotation nor sequencing artifact.In fact, all the MT2a genes identified by means of BLAST searches, or by sequencing, have the same structure in all the analyzed poplar species (i.e. two exons and one intron).The molecular analysis performed on MT2a showed that it has a different CDS length (240 bp) with respect to MT2b CDS (237 bp) due to the presence of a triplet "GCG" (coding for a valine residue) at positions 148-150 with respect to MT2b.The deletion is located in a very short but variable region of the MT2a spacer domain, which had 12 out of 20 different residues compared to the MT2b homologous region (data not shown).Generally speaking, we could consider the CDS of MT-genes as formed by three exons: (a) the first coding for almost the entire Cys-rich domain; (b) the second coding for the majority of the spacer region; (c) and the third coding for the terminal part of the spacer plus the whole C-terminal domain.A remarkable sequence variability was also detected within the introns of the considered MT gene family.In particular, we found an SSR within intron I of MT3b, which showed a different and peculiar number of tandem repeats in each of the analyzed species.Moreover, we detected several insertion/deletions (indels) specific to different taxonomical rank (e.g., sub-section), or related to the geographical origin of poplars exclusively in the introns of MT genes (data not shown).
Sequence variability in MT-genes and their products.Our analyses, focused on the CDS of MT-genes, showed that poplar MT-genes have a high level of nucleotide sequence identity, i.e. 96% (1,170/1,224 sites), as well as a very high level of amino acidic sequence identity, i.e. 93% (380/410 sites), see Table 2.A detailed variability analysis of metallothionein CDS showed that 29/54 (54%) of the variable sites are non-synonymous (Tab.3).

Gene
Thus, based on the number of non-synonymous (N) and synonymous substitutions (S), we calculate the N/S ratio with respect to the different MT types.Interestingly, most of MT-genes are subject to positive evolutionary pressures, being N/S>1 for all genes with the exception of MT1a (N/S=1, neutral selection) and MT1b (N/S<1, purifying selection).The N/S ratio, calculated separately for each nucleotide sequences coding for a given MT domain, showed that 33/54 (61%) variable sites are present in the spacer domain, thus indicating that this domain is the most variable region of MT-genes.This observation is also supported by the fact that the spacer domain contains 20/29 (69%) non-synonymous substitutions globally found in all MT-genes.Finally, the N/S ratio showed that the spacer region is subject to positive selection (N/S=1.54),whilst both N-and C-domains are, as expected, under slight purifying pressures (N/S=0.75).Phylogenetic analysis of the MT-supergene.As described in Materials and Methods, we carried out the phylogenetic analysis following two different procedures: a Classical and a Hybrid ML methods.We used these methods to calculate phylogenetic trees for each MT gene under study, testing the statistical reliability of the obtained trees by means of nonparametric bootstrapping (1,000 repetitions).The phylogenetic trees separately calculated for each of the MT genes were almost completely unresolved (data not shown).Therefore, we decide to concatenate the CDS of all six MT-genes in order to shape a so-called "supergene" encompassing a total of 1,224 characters.Both the Classical and the Hybrid ML methods resulted in comparable phylogenetic trees with respect to either topology and bootstrapping value of their internal nodes.Therefore, we just show the consensus phylogenetic tree obtained through the Classic ML method (Fig. 1).

Gene
In general, the calculated phylogenetic tree is well-resolved since 4/9 (44%) internal nodes are supported by bootstrapping values higher than 75, and 7/9 (78%) are associated to bootstrapping value higher than 50.In particular, it can be noticed that the six MT-genes arranged as supergene are able to group together the two balsam poplars P. trichocarpa and P. balsamifera with a good supporting value (65), and correctly distinguish section Tacamahaca from P. nigra (Aigeros) with a very high supporting value (92).As for the Populus section, it is reliably identified with a bootstrap value equal to 88.Moreover, poplars belonging to this section are further clustered into two distinct groups: (a) the first one, comprising the only Italian P. alba with the full bootstrapping value ( 100  Genetic diversity within the Sele population was also estimated using 10 nuclear SSRs (nSSR see Tab. 5).The total data set included 84 alleles, whilst the number of alleles per locus ranged from 2 up to 17.
The total data set included 84 alleles, whilst the number of alleles per locus ranged from 2 up to 17.The most informative markers were ORPM312a, ORPM312b, ORPM30b and WPMS14 with more than 10 alleles for each single locus.Multilocus analysis of nSSR revealed a high level of gene diversity H e = 0.58.The highest level of H e was estimated for ORPM312b (0.852) locus, whilst the lowest for the locus ORPM30a (0.058).The mean observed heterozygosity (H o ) was 0.46.Since H o was smaller than H e for 7/10 analysed loci, an excess of homozygosity was observed.Finally, the inbreeding coefficient (Fis) for all loci was 0.18 (ranging from 0.040 to 0.752).Based on SSR markers, we also determined the genetic structure of the Sele population (63 individuals) using, as reference samples, eight white poplars representing a wide biodiversity molecular range for the Italian P. alba, as already stated in Material and Methods section.A PCA was carried out on the combined dataset comprising 71 samples (Fig 2 ), using the Jaccard dissimilarity index as genetic distance.As a result, we could assess the tendency of the Ticino samples to form an independent group, well-separated from all the remaining poplars.On the contrary, the Sele river population was more scattered and overlapping with the selected poplars sampled from Sardinia and Northern Italy.

Sequence analysis of MT-genes from the Sele population.
Based on the previous SSR analyses, five genetically distinct poplar clones of the Sele river population were chosen for investigating the levels of SNP in all MT-genes.It should be stressed that the selected clones represented the most divergent samples of the data set, thus we expected to maximize the number of SNP found.Successful and reliable PCR amplifications were obtained for all 5 selected clones as single DNA bands with the notable exception of MT1b, which was not reproducibly amplified in all 5 clones (and for this reason not considered in the analysis) and of MT1a, which resulted in a double DNA bands for four out the five selected clones.Therefore, the double PCR product was cloned in E. coli, and several E. coli colonies were randomly chosen and sequenced to identify single base mutations.Overall, 104 SNPs, i.e. 56 point mutations and 48 indels, were identified on 1,456 sites and 18 different alleles, thus being 7.1 SNPs found every 100 bp.To better illustrate what we found, we may describe the cases of the MT1a and MT3a genes.As for MT1a, the two fragments amplified via PCR were singularly cloned on the basis of their diverse molecular weight, and then sequenced to identify SNPs.In the first MT1a fragment, made up of 332 intronic and 166 exonic bp, a total of 30 SNPs (i.e., 24 indels and 6 single-base changes) were identified; whilst in the second fragment, 27 SNPs (i.e., 4 indels and 23 single-base changes) were found in 409 non-coding plus 79 coding bp.As previously stated above, the number of SNPs was higher in noncoding with respect to the coding region.In fact, 55/57 variable sites were located in the introns of the MT1a gene (9.5 SNPs per 100 bp), whilst only two SNPs were present in its exons (0.49 SNPs per 100 bp).Furthermore, two haplotypes (conventionally named B and C, data not shown) of the Sele population, related to MT1a gene, were characterized by a greater number of SNPs, mostly (>80%) consisting in insertion/deletion events.As for the MT3a gene of Sele poplar population, a single fragment of 476 bp, corresponding to 110 bp of coding and 366 bp of non-coding sequences, was cloned and sequenced: only 1 SNP was detected in its three exons (0.91 SNPs per 100 bp), and 2 were found in the non-coding regions (0.55 SNPs per 100 bp).As a conclusive remark, it should be mentioned that many different and short indels were observed comparing the five Sele clones to the P. alba specimens employed for phylogenetic analysis.

Discussion
Poplar phylogeny.This pilot study provides, for the first time, an extensive genetic survey of the metallothionein multigene family in different poplar species and in a natural population of P. alba, a member of the Populus section.The position along the Populus genome of the six genes of the MT family may support the indications provided by Brunner and co-worker (2000) regarding the genome duplication event occurred in Populus during the last 60 millions years.Here, it should be noticed that, in contrast to Arabidopsis, the genome of Populus has not been truncated (Kelleher et al., 2007;Tuskan et al., 2006), which means that for most of the genes still two copies are present in poplar genome.Coherently, we confirm that two isoforms (commonly indicated as "a" and "b") are present in the P. trichocarpa genome for each of the three MT types.This observation is very interesting but contributes in complicating the possible evolutionary pattern of this multigene family.In fact, the analysis of 89 MT genes retrieved from public data bases and belonging to 19 different plant species, ranging from basal bryophytes to higher dicotyledons [data not shown - (Lupi, 2007)] revealed that at least two different duplications are necessary to originate the full MT family.In particular, the more ancient event duplicated the ancestor gene into the MT3 and MT2 types, whilst a more recent duplication of MT2 likely gave rise to MT1 genes.However, some of the analyzed monocotyledon and dicotyledon species showed a different number of MT genes, suggesting that MT family underwent gain/loss of members via independent events in the diverse lineages, or even to degeneration of its various members [as occurred in A. thaliana; (Zhou & Goldsbrough, 1994)].In the light of the above outlined evolutionary framework, it is not surprising that the evolution process is still acting on some of the poplar MT genes.In fact, as shown via the N/S ratio (Tab 3.), i.e. the ratio between non-synonymous and synonymous substitutions, MT1a and MT1b are under neutral and purifying selection, respectively, whilst the remaining four genes are subject to a mildly strong positive selection.This would suggest that MT1a and MT1b could have already acquired a more specific and "frozen" role in the evolutionary process of the MT family with respect to the other four members.On the contrary, since evolution favors the occurrence of non-synonymous mutations along the MT polypeptide sequence, the other four genes might be still available to become more specialized into a somewhat different function always linked to metal homeostasis.Therefore, on the basis of our data, we could hypothesize that different paralogous genes belonging to the same multigene family may play for distinctive functions implied in plant metal metabolism and/or response to stress tolerance, as already documented by Kohler et al.(2004) and Castiglione et al. (2007Castiglione et al. ( , 2010Castiglione et al. ( and 2011)).The sequence analysis of the MT genes shows that exon-intron structure is well conserved within the genome of all analysed Populus spp (three exons and two introns) with the only exception of the MT2a gene.In this case, the gene presents only two exons and one intron.On the light of the typical MT gene structure found in both Populus and Salix, the most parsimonious explanation might be that an intron has been lost during the evolution process.Another interesting feature of poplar MT genes in regard to the structure of their CDS.In fact, each of three exons constituting the CDS corresponds to a well-defined functional region of the polypeptide, that is the Cys-rich N-terminal domain, the central spacer and the Cys-rich C-terminal domain.In particular, the number and position of the Cys-residues inside the two terminal domains represent molecular signatures useful to univocally classify the MT genes into the three types (MT1, MT2 and MT3).The presence of well-conserved Cys-residues strongly suggests that terminal domains can coordinate positively charged ions aimed at metal homeostasis and heavy metal tolerance, as observed in the case of different poplar species and natural populations (Castiglione et al., 2010).On the contrary, the spacer region shows a less conserved amino acidic sequence, thus suggesting that the about 40 amino acids that constitute this region may have no other function but linking the two terminal Cys-rich domains.As for molecular variability, only low levels for both nucleotide (96%) and amino acidic (93%) MT sequences were detected within the Populus genus.However, these results are not surprising since genes pooled from 11 closely related species were analyzed.As already stated, the spacer domain bears the vast majority (61%) of the variable sites, therefore it proved to be very useful to elucidate the evolutionary relationships among the analyzed poplar species.However, the regions showing the highest variability levels were, as expected, the introns (Lupi, 2007).For instance, the species-specific SSR identified within intron I of MT3b could have a practical application and therefore used to identify hybridization events either in cultivated hybrids of unknown origin, or to discriminate crosses among compatible poplar species in the early generations of natural populations.Interestingly, both section-specific and geographicspecific point substitutions were observed within MT CDS for the different groups of aspens, Italian white poplar and Chinese poplars.These mutations have been fundamental to calculate the phylogenetic tree based on the MT supergene (1,224 sites) and to discriminate the poplar sections and species analyzed.The phylogenetic tree obtained through the Classical ML (Fig. 1) method shows a monophyletic origin of the genus Populus.Such results are in contrast to what obtained by Hamzeh and Dayanandan (2004) using both chloroplast and nuclear rDNA sequences, but they are in agreement to what found by Cervera and co-workers (2005) using a different molecular approach (AFLP) in order to estimate phylogenetic relationships among poplars.Our analyses also provide a clear separation between the Tacamahaca and Aigeros (92 bootstrap value), as well as the Populus section (88), thus confirming the results obtained by Fladung and Bushbom (2009) in their pilot study, where five different poplar species belonging to the same three sections were analyzed on the basis of SNP mutations detected in six nuclear genes.Moreover, the P. trichocarpa and P. balsamifera species were clearly distinguished and coherently grouped together in a cluster corresponding to the Tacamahaca section with a support value higher than 50 (65), thus corroborating the findings by Cervera et al. (2005).This is a remarkable result since the relationships between P. trichocarpa and P. balsamifera were not resolved in the phylogenetic analysis conducted by Hamzeh and Dayanandan (2004) using the ITS and rDNA, i.e. the two commonly most exploited markers for phylogenetic studies concerning plant organisms.The obtained ML tree also shows that Populus section is basal with respect to the remaining younger sections, Aigeros and Tacamahaca.Once again, our results are in accordance with those reported by Cevera et al. (2005) and also in agreement with the evolutionary pattern proposed by Eckenwalder (1996) on the basis of completely different markers, i.e. morphological and phenetic characters.As shown in Figure 1, poplar species belonging to the Populus section are clearly separated from the representative species of the other sections (red circle).Besides, the sub-clusters b1 and b2 are well separated from the Italian P. alba clones.It should also be mentioned that the Chinese white-poplar cluster of the Populus section is well-resolved (85 bootstrap value), so that P. tomentosa and P. pyramidalis actually appears as closely-related species that are likely paraphyletic with respect to the Italian P. alba.Albeit this result could depend, at least in part, on the geographical origins of the considered poplar species, a different and quite easy explanation for this paraphyletic relationships between the Italian P. alba and the Chinese white poplars could be assessed.In fact, although many Chinese taxonomists still consider P. tomentosa as a distinct species from P. alba (the so-called "Chinese P. alba"), Dickman ( 2001) stated that the correct nomenclature for this species is P. x tomentosa Carr: in other words, P. tomentosa might be an hybrid derived from a natural cross between P. alba and P. adenopoda that, in some cases, could be a tri-hybrid due to further introgression toward the P. tremula genome.On the other hand, P. pyramidalis (the "Bolleana poplar") may be a variety of P. alba (Dickmann, 2001), probably consisting in a single genetic unit with a columnar growth habit particularly appreciated for ornamental uses and line planting.Therefore, it is not unfeasible that P. pyramidalis could be a P. x tomentosa clone gathered, long time ago, from a Chinese natural population, and subsequently employed for wind breaking and landscape gardening.As for the four Chinese poplars of the Populus section P. davidiana, P. adenopoda, P. bonatti and P. serrata, usually considered trembling aspens, they are part of a poorlyresolved group (cluster b.2 in Fig. 1) in our phylogenetic tree.The only exception is represented by P. davidiana and P. adenopoda that are clustered together with a low supporting bootstrap value (55).Therefore, all four trembling poplar species provided by the Chinese author counterpart of this article for phylogenetic purposes seem to be part of a single species with very limited variations at the genomic level as compared with the Italian P. tremula.Indeed, on the basis of the low dissimilarity of both morphological characters and phenetic traits, Eckenwalder (1996) already argued to merge these four Chinese species in a single taxon.Therefore, we could state that both our molecular data on MT genes and the AFLP study, conducted by Cervera and co-workers (2005), strongly confirm the section revision proposed by Eckenwalder (1996), so that, from a genetic point of view, the four Chinese aspens here assessed and P. tremula could be considered a unique species.
Genetic biodiversity of a white poplar population along the river Sele.In this study, the genetic biodiversity present in a large natural population of P. alba along the Sele river was estimated.Sele is a river located in south-western Italy originating from the Monti Picentini in Caposele.It flows through the region of Campania, in the provinces of Salerno and Avellino.Its delta is in the gulf of Salerno on Tyrrhenian sea.White poplar and black poplar are the dominant tree taxa along the banks in the area close to the outfall and the spring of the Sele, respectively.The natural and semi-natural banks of the river are surrounded by grazing lands for buffalo and fields of maize and vegetables, and are also the ideal sites for the spontaneous reproduction of white poplar natural populations.The Cp-SSR analysis performed on the 63 individuals of the Sele river population highlighted a number of alleles (per single locus and in total) and haplotypes comparable with that observed for the white poplar population of the Ticino river previously investigated by this research group (8 Vs 9 haplotypes, Brundu et al., 2008), and slightly higher than that of two populations of the Danube river (8 Vs 5, Lexer et al., 2005), but a little lower than that calculated for a limited number of white poplar specimens collected in different regions of the Mediterranean basin (8 Vs 10, Brundu et al., 2008).However, the higher number of haplotypes observed in the population of the Ticino river could be ascribed to the presence of several hybrids (P.x canescens) derived from the natural cross between P. alba and P. tremula (Fossati et al., 2004;Castiglione et al., 2010), being the last observation further confirmed by Lexer and co-workers in the case of two populations of the Danube and of one of the Ticino (2005Ticino ( , 2010)).However, this shouldn't be the case of the Sele river population since PCA analysis clearly separates the white poplars of the Sele from the hybrids of the Ticino (see comments below).The mean h value of the Sele population was slightly higher in comparison with that calculated by Salvini et al. (2001) in the case of five populations of P. tremula distributed along the Italian peninsula (0.51 Vs 0.33).These moderately high level of both genetic variability and number of haplotypes with respect to the same P. tremula Italian populations [8 Vs 6 - (Salvini et al., 2001)] can possibly be attributed to the fact that the white poplar population here analysed is located in an area (the Cilento) considered as a glacial refugium and a hot spot of biodiversity for several angiosperm species (Cottrell et al., 2005;Fineschi & Vendramin, 2004;Grassi et al., 2009;Petit et al., 2002;Petit et al., 2003).The nSSR analysis carried out on the same population revealed that the expected heterozygosity was relatively high (He 0.460), but in accordance with those reported by Castiglione et al. (2010) in the case of three populations distributed along the banks of three rivers in northern and central Italy and with that calculated by Lexer et al. (2005) for two populations of P. alba (0.419 and 0.341), and two of P. tremula (0.466 and 0.483 - Lexer et al. 2005).Moreover the produced data were also comparable also with those of four different P. tremuloides populations collected in North American [0.460, 0.310, 0.560 and 0.410 - (Cole, 2005)].The nSSRs were also employed to estimate the genetic diversity among the Sele population and the other 8 specimens, used as white poplar reference samples representative of an ample range of Italian P. alba biodiversity.In particular, the PCA plot (Fig. 2) clearly splits the Sele poplars from the Ticino hybrids, but not from the remaining reference samples (Sardinia and Northern Italian clones).Thus, this intermingled group does not reflect the geographic origins of the species growing in the Italian peninsula and in the Sardinia island, suggesting that the white poplar population of the Sele is highly variable and consistently different only with respect to the Ticino river representatives.Here, it should be mentioned that the Ticino white poplars are not genetically pure individuals, as in the case of other populations present in the European river basins (Lexer et al., 2005;van Loo et al., 2008), indeed they show different levels of introgression towards P. tremula as extensively demonstrated by Fossati et al.(2004), Lexer et al. (2010) and Castiglione et al. (2010).Based on such considerations, the very high value of genetic diversity found for P. alba so far analyzed can be explained not only by the high degree of polymorphism typical of the surveyed SSR loci, but also by the reproductive features of poplar as dioecious species.In fact, poplar is strictly outcrossing and both pollen and seeds can be moved over large distance by wind, facilitating their dispersion even among geographically well separated populations (Castiglione et al., 2010).
To estimate the numbers of SNPs within the MT gene family, we used an innovative approach that proved to be extremely rapid and also cost effective.In fact, we made use of the already "in house" available information about the genetic biodiversity [previously estimated by means of nSSR (Innac, 2009)] on the river Sele white poplar population, so that the five most genetically divergent specimens were selected on the basis of the Jaccard dissimilarity index graphically plotted on a UPGMA dendrogram (data not shown).The purified DNA of the five chosen poplar genotypes were PCR amplified and sequenced to identify the SNPs present along the six MT genes.SNPs were detected in all MT gene sequences of the five chosen white poplar specimens.The only exception was MT2b, where no SNP was detected.This observation corroborated our phylogenetic sequence data, which revealed a very low number of variable sites in a MT2b multialignment comprising 11 different nucleotide sequences belonging to different poplar species.Interestingly, an uneven distribution of SNPs was observed among the diverse MT-gene isoforms, as in the specific case of MT1a and MT3a (see below), suggesting that the occurrence of SNPs varies among the different members of the MT multigene family.These observations suggest a d i f f e r e n t r o l e o f M T g e n e s i n m e t a l homeostasis and in response to different stimuli and stresses (Berta et al., 2009;Castiglione et al., 2007;Cicatelli et al., 2010;2011).The study on MT genes of P. alba Sele population showed an average of 7 SNPs and 0.9 SNPs per 100 bp in MT1a and MT3a genes, respectively.In comparison to other Populus spp., SNP frequencies were substantially greater than those observed in: P. tremula, [one SNP in every 60 bp and one of 208 bp - (Ingvarsson, 2005;Ingvarsson, 2008); P. nigra (one SNP in every 26 bp over the nine sequenced genes - (Chu et al., 2009)]; P. balsamifera [one SNP in 520 bp - (Breen et al., 2009)].Nevertheless, a quite similar nucleotide variability at the intraspecific level (7.0%) was observed by Fladung and Buschbom (2009), over 3,221 bp corresponding to the sequences of six genes which are very important for both plant growth and metabolism, and which in addition proved to be suitable molecular tools for estimating genetic and phylogenetic relationships among poplar species.

Conclusions
In the light of the data here presented and of the above considerations, we can state that the results of this study on the MT multigene family are sufficient to resolve phylogenetic relationships among poplar species belonging to different sections: in fact, phylogenetic trees based on the MT supergene can be considered more reliable than those obtained by other authors using several type of different molecular markers as in the case of rDNA, ITS and AFLPs.Moreover the SNPs identified in different specimens of the white poplar population of the Sele river suggest a further investigation in other poplar species and populations to validate our findings, aimed particularly at the possible use of poplar for heavy metals phytoremediation purposes.
); (b) the second one, encompassing all the remaining poplar species.Cluster (b) is further sub-divided into two smaller and morphologically-coherent groups: (b.1) a sub-group that contains the Chinese white poplars, namely P. alba var.pyramidalis and P. tomentosa; and (b.2), another cluster that corresponds to all trembling under study (P.tremula, P. adenopoda and P. davidina), plus the two Chinese species whose classification was uncertain (P.serrata and P. bonatii).

Fig. 1 .
Fig. 1.Maximum likelihood reconstruction based on the "Classical ML method" of the nucleotide supergene encompassing the CDS of all six members of MT multigene family.Bootstraping values are showed above each node, when greater than 50; unresolved nodes are represented as collapsed.(T): Tacamahaca section; (A) Aigeros section; IW: Italian white poplar; CW: Chinese white poplar; TP: trembling poplars.Nodes corresponding to the Populus section and to Trepidae sub-section are indicated with a red and blue circle, respectively.Chinese species are starred.The cluster names (a, b, b.1 and b.2) are those explained in the text.Sequences belonging to S. mastudana were used as outgroup.SSR analysis of the P. alba natural population.The number of alleles per chloroplast (Cp) SSR locus varied from three to five, with an average of four (Tab.4).Combining the data of the three Cp-SSR loci, eight different chloroplast haplotypes were detected in the P. alba population of the Sele river.Nei's (1973) gene diversity index per locus (h) varied from 0.447 to 0.566 (average of 0.512), which indicate moderately high level of variability (Tab.4).

Fig. 2 .
Fig.2.Two-dimensional plot of the PCA performed on a combined data set comprising 71 poplar samples, i.e. 63 individuals from the Sele population (Southern Italy) plus eight reference poplars.The poplars from river Sele are indicated with a number, whilst the reference sample were from: Sardinia (b33SS and b4SS) or Ticino (Ticino-#1, etc) natural populations, as well as from a collection of Northern Italy white poplar clones selected for pronounced heavy metal tolerance (AL22, AL35).

Table 1
. The 11 analyzed poplar species.(T) or (A) refers to the sub-sections Trepidae and Albidae, respectively.

Table 2 .
Number (No.) of total and of invariable sites detected in the coding sequences of the MT-genes, based on a dataset of the 11 poplar species listed in Table1.Statistics referring to nucleotide sequences are on the left half of the table, whilst those referring to translated sequences are on the right half.

Table 3
. Substitution types found in the coding sequences of the MT genes belonging to the 11 poplar species listed in Table1.Syn: synonymous substitution; non-syn: non-synonymous substitution.

Table 4
. Genetic biodiversity, at the chloroplast level, of the P. alba natural population (63 individuals) of the Sele river.Allelic frequency, number of alleles per locus, gene diversity index per locus (h) are reported.

Table 5 .
Genetic diversity, at the nuclear level, of the P. alba natural population (63 individuals) of the Sele river.Allelic frequency, number of alleles per locus, expected (He) and observed (Ho) heterozigosity, F IS values are reported.