GPV, registered cultivars, PDO, and PGI products for the 20 most economically important crops in Italy.
A total of 90 original articles concerning the varietal characterization and identification by means of SSR analysis of the five most economically relevant crops in Italy (i.e., Olea europaea L., Solanum lycopersicum L., Vitis vinifera L., Triticum spp. and Malus × domestica Borkh.) have been selected and reviewed. Since the genetic traceability of processed products may result more complex, wine and olive oil have been considered too. Specifically, this chapter deals with three main aspects: (i) the criteria adopted for the selection of the most appropriate number, type, and distribution of SSR marker loci to be employed for varietal genotyping, (ii) the use of genetic statistics and parameters for the evaluation of the discriminant ability and applicability of SSR marker loci, and (iii) how to make different experimental works on the same species that are standardized, reliable, and comparable. What emerges from the studies reviewed here is a lack of wider consensus among the authors regarding the strategy to design and to adopt for genotyping plant varieties with SSR markers. This finding highlights the urgent need to establish a common procedure, especially for characterizing and preserving landraces, and for supporting its rediscovery and valorization locally.
- DNA genotyping
- plant varieties
- genetic traceability
- food labeling
1. The Italian agriculture scenery and the utility of SSR markers to develop a reference method for genotyping plant varieties
The Food and Agriculture Organization (FAO) indices of agricultural production describe the relative level of the aggregate volume of agricultural production for each year in comparison with the base period 2004–2006 . According to the most recent data available in The Food and Agriculture Organization Corporate Statistical Database, the gross value of the total Italian agricultural production was equal to $ 41.9 billion, about € 32.7 billion . It is worth noting that 20 products contribute to over 50% of gross production value (GPV), as shown in Table 1.
|Crop plants||Value  of agriculture production USD (106)||Registered cultivars||PDO and PGI |
|Olives (table and oil)||5064.24||644 ||3|
|Grapes (table and wine)||2770.60||638 ||3|
|Wheat (durum, common, spelt)||2558.23||489 ||0|
|Rice, paddy||750.28||194 ||3|
|Peaches and nectarines||570.69||311 ||4|
|Carrots and Turnips||355.33||8 ||2|
|Cauliflowers and Broccoli||314.07||41 ||0|
|Hazelnuts (with shell)||247.36||25 ||3|
On average, each species is characterized by dozens or hundreds of cultivars and, as defined in Article 2 of the International Code of Nomenclature for Cultivated Plants, a “cultivar is an assemblage of plants that has been selected for a particular character or combination of characters, that is distinct, uniform, and stable in those characters, and that when propagated by appropriate means, retains those characters” . If some cultivars are virtually ubiquitous, some others are associated with specific geographical contexts and often provide the basis for the establishment of protected designation of origin (PDO) and protected geographical indication (PGI) products (Table 1).
It is not a coincidence that Italy, with its 268 brand products, including 106 PGI, 160 PDO, and 2 traditional specialty guaranteed (TSG) labels, is the European leader in terms of certified productions and that 20% of them arise from the 20 crops listed in Table 1. As a whole, the Italian certified products reach around 500 units, including two important derivatives such as olive oil and wine (Table 2). It is worth noting that the wine GPV (Table 2) is four times higher than the grape GPV and slightly less than half of the total GPV shown in Table 1, a demonstration that shows producing food derivatives could be more profitable than selling raw products.
One of the main problems that needs to be addressed is the lack of a uniform, complete, and updated register of cultivars. For the cultivars of some, species like cereals or vegetables are already available as official registers provided by the Ministry of Agricultural, Food and Forestry Policies (MIPAAF, National register of agricultural varieties and National register of horticultural varieties). Concerning fruit trees, on the contrary, there is not a register yet, although Article 7 of the Italian Legislative Decree no. 124/2010 has established a “National Register of fruit trees varieties” . For this reason, the inventory of cultivars of some fruit species is still ongoing and there is a total lack of official data for some of them (see for instance orange, lemon, and mandarin, Table 1). Moreover, for species of particular interest, there exist registers apart (see for example, Olea europaea L. and Vitis vinifera L.).
In the past, cultivars have been extensively characterized by morphological traits, including plant, leaf, fruit, and seed characteristics. Since objectivity is crucial to perform an accurate morphological typing, it is constraining to use exclusively morphological descriptors for plant cultivars, especially because most of the morphological traits are influenced by environmental factors. Several cases of misidentification, owing to classifications carried out only employing morphological traits, are reported in the scientific literature for a wide range of vegetal crops [12, 13, 14] and fruit trees [15, 16, 17, 18]. Moreover, the uneven distribution, simultaneous cultivation of local varieties, ambiguous names, continuous interchange of plant materials among varieties and/or farmers of different regions and countries, possibility of the cultivation of varietal clones, and uncertainty of varietal certification in nurseries have complicated the identification of genotypes [19, 20, 21]. At the same time, cultivar and clone identity is also very important for protecting plant breeders’ rights not only for commercial seeds but also for processed materials and food derivatives, especially for the final consumers’ safeguard. Another important aspect to highlight is the need to ensure that each specific variety grown by farmers and its food product bought by consumers is the one declared on the label. This is especially true if the product is sold in a processed or transformed form (thus difficult to recognize phenotypically) and/or if the product is subjected to a form of certification (PDO or PGI). In a modern market, it is crucial being able to identify agricultural products and foodstuffs by means of reliable traceability systems, including genetic molecular markers.
The method of DNA genotyping based on microsatellite markers represents an efficient, reliable, and suitable technique that is able to complement the information provided by morphological traits and that has been extensively used for the characterization of plant varieties [22, 23, 24] and the certification of food products [25, 26, 27].
Microsatellites (or simple sequence repeats (SSRs)) are PCR-based molecular markers valued for their abundant and uniform genome coverage, high levels of polymorphism information content as a consequence of their marked mutation rates, and other valuable qualities such codominant inheritance of DNA amplicons/alleles and request of little amount of DNA for the amplifications . A unique pair of primers defines each SSR marker locus; as a consequence, the molecular information exchange among laboratories is easy and allows individuals to be uniquely genotyped in a reproducible way .
SSR markers have been shown repeatedly as being one of the most powerful marker methodologies for genetic studies in many crop species. In fact, since they are multiallelic chromosome-specific and well distributed in the genome, microsatellite markers have already been used for mapping genes with Mendelian inheritance , for identifying quantitative trait loci (QTLs, ) and for molecular marker-assisted selection . In many species, microsatellite markers have also been used for ascertaining the genetic purity of seed lots , as well as to assess the capability to protect the intellectual property of plant varieties . These markers are also largely used for assessing the genetic diversity and relationships among populations and lines, and for identifying crop varieties.
The advantages of SSRs over single-nucleotide polymorphisms (SNPs), another co-dominant marker system increasingly exploited in breeding programs, include relative ease of transfer between closely related species [35, 36] and high allelic diversity [37, 38]. On the contrary, SSRs when compared to SNPs have some limits: the development phase is quite long and expensive for multilocus assays and the throughput is relatively low because of drawbacks for automation and output data management. Recently, progresses in the development of multilocus assays have been made in several directions, suggesting that SSR markers still remain as relevant molecular tools at least for specific applications and genetic studies . In fact, PCR-based SSR genotyping has rapidly evolved in plants, and methods for the simultaneous amplification of multiple marker loci coupled to semi-automated detection systems have been developed . The identification and selection of SSR markers have become cheaper and faster due to the emergence of next-generation sequencing technology means. Moreover, the possibility to multiplexing specific combinations of microsatellite markers has become much easier and the availability of capillary electrophoresis equipment relying on automated laser-induced fluorescence DNA technology has facilitated the adoption and exploitation of this methodology in applied breeding programs [41, 42, 43].
Genotypic characterization through SSR loci analysis represents a molecular tool applicable to all species and able to support the phenotypic observation in order to characterize and describe a cultivated variety as well as to define its uniformity, distinctiveness, and stability (DUS testing). At the same time, SSR markers are largely used for the genetic identification of varieties and the authentication and traceability of their foodstuffs [44, 45, 46].
The main goal of this work is to provide an updated and detailed description of the applications of SSR markers for varietal characterization and identification, reviewing the state of the art of genotyping in the most economically relevant Italian crop plants and food products: Olea europaea L., Solanum lycopersicum L., Vitis vinifera L., Triticum spp., and Malus × domestica Borkh., wine and olive oil. In this respect, the chapter aims to assess the real achievements of different genotyping analyses, to evaluate the strengths and limitations according to applied research studies, and to emphasize the striking lack of data related to the applications of SSR technology. Through the careful investigation and evaluation of a large number of scientific papers, our review highlights some critical aspects on the use of microsatellite markers and formulates recommendations for standardizing the strategies and methods for ascertaining the genetic identity of plant varieties and for achieving the genetic traceability of their food derivatives. Here, we focus on three main aspects: (i) how to choose and use SSR markers, (ii) which parameters/indices calculate for the genetic characterization of plant materials, and (iii) assess a standardized way to make SSR data from different works on the same species comparable.
2. Applications of SSR markers for the genetic characterization of crop plant varieties
Some of the most economically important crops in Italy have been chosen for this study, and the search has been focused on their varietal characterization through SSR analysis. In particular, olive (Olea europaea L.), grape (Vitis vinifera L.), and apple (Malus × domestica Borkh.) were reviewed among the fruit trees, whereas wheat (Triticum spp.) and tomato (Solanum lycopersicum L.) were selected as representative of cereals and vegetables, respectively. A large number of commercial cultivars are available for each of these species, and the annual Italian GPV for these crops is about 18 billion Euro . Moreover, scientific articles dealing with the genetic identification in wines and olive oils were also evaluated because these two derivatives contribute to the annual Italian GPV for another 15 billion Euro .
Although passport data, morphological, and agronomical descriptors have been collected, data are not informative enough to assess the numerous cases of misidentification, mislabeling, homonymies, and synonymies as well as voluntary or accidental frauds . With regard to this, several research groups characterized and identified cultivars using SSR markers (Table 3).
|Olive (Olea europaea L.)||[23, 25, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]|
|Tomato (Solanum lycopersicum L.)||[27, 44, 64, 65, 66, 67, 68, 69, 70]|
|Grape (Vitis vinifera L.)||[15, 19, 24, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]|
|Wheat (Triticum spp.)||[22, 26, 90, 91, 92, 93, 94, 95, 96, 97, 98]|
|Apple (Malus × domestica Borkh.)||[99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111]|
|Wine||[45, 112, 113, 114, 115, 116]|
|Olive oil||[46, 58, 117, 118, 119, 120, 121, 122, 123, 124, 125]|
Article searches were performed using the three most popular sources of scientific information: Scopus, Web of Science, and Google Scholar, while PubMed was excluded from the queried datasets because it focuses mainly on medicine and biomedical sciences and also because Google scholar already includes its index . A total of 90 articles based on SSR genotyping analysis were selected from the international literature in the last 15 years, covering all the plant species/food products taken as reference list. Only articles dating from 2000 to now were reviewed assuming that researches published earlier would have lost their steering effects on the activities of plant DNA genotyping, given that the development of new and large marker datasets, and technologically advanced and automated protocols has been very fast in the last 15 years.
3. What number and how to select a panel of SSR marker loci according to their linkage map position and polymorphism information content
More than 800 SSR markers have been developed in apple (Malus × domestica Borkh., 2n = 2 × = 34), and nearly all of them have been mapped on a consensus map produced starting from five different genetic maps . These markers are distributed across all 17 linkage groups, with an average of 49 microsatellites per linkage group. Moreover, the genome database for Rosaceae  is a long-standing community database resource providing hundreds of microsatellite loci, in most cases accompanied by a wealth of information about map position, repeat motifs, primers, PCR conditions, amplicon length, and publication source. A discriminatory set of markers should ensure the uniform distribution across the genome of the microsatellite loci to represent adequately each linkage group and, thus, the genome in its entirety . In fact, assessing the genetic diversity by focusing only on restricted regions of the genome may threaten to distort results. Nevertheless, neglecting the most ambitious study on Malus × domestica Borkh. carried out by Patocchi et al.  using an extremely high number of SSR markers (82), the number of selected and analyzed genomic loci varies from 4 to 19 with an average value of 12 ± 6 SSR markers, less than a microsatellite locus per linkage group. Extending this reasoning to the other crops reviewed, the emerging output is often the same: for all the plant species, very detailed genetic maps are available [129, 130, 131, 132] as well as dedicated databases for SSR markers (Table 4).
|Species||Genome size (Gb)||Ploidy||SSR available (SSR database)||SSR employed (mean ± st.dev)||No. of reference cultivars||No. of reference SSRs|
|Olea europaea L.||1.42–2.28 ||2n = 2× = 46||12 (OLEA Database) ||11 ± 5||21 , 17 ||11 , 8 |
|Solanum Lycopersicum L.||0.90–0.95 ||2n = 2 × = 24||146,602 (Tomato microsatellite database) ||14 ± 7||n.a.||n.a.|
|66,823 (Tomato genomic resources database) |
|21,100 (Tomato: Kazusa Marker Database) |
|Vitis vinifera L.||0.48 ||2n = 2× = 38||56 (Grape microsatellite collection) ||15 ± 11||49 ||6 , 38 |
|443 (Italian Vitis Database) |
|6 (The European Vitis Database) |
|Triticum spp.||12.3–13.00 (T. durum Desf) ||2n = 4× = 28||588 (Wheat microsatellite consortium) ||18 ± 3||n.a.||46 |
|16.50–17.00 (T. aestivum L.) ||2n = 6× = 42||21 ± 6|
|Malus × domestica Borkh.||0.75 ||2n = 2x = 34||664 (HiDRAS SSR database) ||12 ± 6||7 ||12 , 15 |
|2449 (Genome database for Rosaceae) |
Olea europaea L. (2n = 2 × = 46) includes 23 chromosome pairs and the average number of microsatellite markers used in the reviewed articles is 11 ± 5, much less than a microsatellite locus per linkage group. The same is also true for Vitis vinifera L. (2n = 2 × = 38) in which the average number of microsatellite markers explored for genotyping cultivars is 15 ± 11 in spite of the 19 chromosome pairs of this species. Even the varietal identification of their respective derivatives (olive oil and wine) has been accomplished by exploring, on average, 8 ± 3 and 10 ± 4 SSR markers, respectively. On the contrary, in wheat, the varieties of both Triticum durum Desf. (2n = 4 × = 28) and Triticum aestivum L. (2n = 6 × = 42) have been characterized by means of genotyping with SSR markers analyzing, on average, 18 ± 3 and 21 ± 6 microsatellite loci respectively, that is more than one microsatellite per linkage group. This latter choice is perhaps associated with the high complexity and large size of the Triticum aestivum L. genome, approximately equal to 17 Gb/1C . In fact, for a correct representation of the entire genome, not only the number of homologous chromosomes but also their size (i.e., total amount of DNA) should be considered when choosing the optimal panel of microsatellite loci to be investigated. Finally, in tomato (Solanum lycopersicum L., 2n = 2 × = 24), the average number of SSR markers employed for genotyping varieties is 14 ± 7 (Table 4).
Only few studies [65, 74, 96, 106] evaluated the position within linkage groups of the microsatellites selected: the choice often falls on SSR markers with unknown or not specified position or mapped on few chromosomes, thus resulting in a poor representation of the entire genome. In this regard, the results from Cipriani et al.  and van Treuren et al.  represent a good model for the choice of molecular markers to investigate the genetic diversity in germplasm collections and to solve synonymy/homonymy cases as well as paternity and kinship issues. The former group selected microsatellite sequences from scaffolds anchored to the 19 linkage groups of Vitis vinifera L. with the aim of analyzing 38 well-distributed SSR markers, ideally two loci for each linkage group, whereas the latter group also considered the specific map position of genetic and genetic association with traits of agricultural interest.
Two important issues must be pointed out. The number of SSRs to employ should be also evaluated according to the type of analysis. For example, the EU-Project Genres CT96 No81  selected six highly discriminating microsatellites, thus less than one marker per linkage group, that could be sufficient to differentiate among hundreds of grape cultivars. The same microsatellite set could be very inadequate to discriminate among clones. Moreover, it is worth noting that, in some cases, increasing the number of marker loci does not necessarily mean improving the resolution of cultivar characterization and identification. For example, Baric et al.  reported that extending the set of microsatellite markers to 48, from an initial analysis based on 14 SSR loci, it was impossible to improve the genetic discrimination among the 28 accessions of Malus × domestica Borkh. analyzed.
Connected to the distribution and position of the microsatellite loci within a genome, there is also the possibility to choose between genomic SSR (gSSR) and EST-derived SSR (EST-SSR). Generally, EST-SSR markers are less polymorphic than genomic SSR ones, as reported for Triticum spp. [93, 95] and Solanum lycopersicum L. , being the formers found in selectively more constrained regions of the genome. Of particular interest is the comparison of Leigh et al.  between sets of 20 EST-SSR and 12 genomic SSR markers in terms of discrimination ability among 66 varieties of Triticum spp. The results indicate that the panel of EST-derived SSR markers used is slightly less efficient at discriminating between hexaploid Triticum aestivum L. varieties compared with the second panel of genomic SSR markers. EST-SSR markers also have the disadvantage that amplicon sizes can differ from expectations, as a consequence of the undetected presence of introns in flanking regions . Nevertheless, these findings support the possibility that EST-SSR markers could in the near future complement and outnumber the genomic SSR markers. In fact, EST-SSR markers should have some important advantages over genomic SSR markers. In particular, they are easily obtained by bioinformatic querying of EST databases while the development phase of genomic SSR markers is quite long and expensive; EST-SSR markers could be functionally more informative than genomic SSR markers because being associated with the transcribed regions of the genome, thus reflecting the genetic diversity inside or adjacent to the genes . Moreover, the rate at which SSR flanking regions evolve is lower in expressed than nonexpressed sequences and the primers designed on these sequences are more likely to be conserved across species, thus resulting in high levels of SSR transferability . A suitable combination of EST-SSR and genomic-SSR markers could be optimal for distinctiveness, uniformity, and stability testing applications for crop plant varieties . Overall, the vast majority of studies are based on genomic SSR markers, and only three articles out of 90 take into account the possibility of employing EST-SSR markers.
In terms of location, nuclear SSR (nSSR) markers are largely used and more exploited than plastidial and mitochondrial SSR (cpSSR and mtSSR, respectively) markers. First, the development phase of extranuclear SSR markers is complicated: high purity chloroplast or mitochondrial DNA is typically very hard to extract due to nuclear DNA contaminations . Moreover, Wolfe et al.  have shown that comparing nuclear, chloroplast, and mitochondrial genomes, the frequency of chloroplast genome gene silencing and replacement was half that of the nuclear genome, and three times that of the mitochondrial genome, indicating that the evolution of mitochondrial genome has been slower and implicating lower levels of polymorphism. Nevertheless, the use of markers belonging to mitochondrial or chloroplast sequences may be useful due to their haploid nature, relative abundance, and stability in comparison with nuclear sequences. For instance, Borgo et al.  suggested that the circular form increases stability and resistance against heat disintegration. Boccacci et al.  analyzed musts and wine samples using a set of nine nSSR and seven cpSSR markers in order to identify cultivars. Findings from these studies confirm a low level of polymorphism for the extranuclear markers due to their lower frequency of mutation. Also Baleiras-Couto and Eiras-Dias  and Pérez-Jiménez et al.  have exploited this kind of SSR markers, with similar results.
The choice of the number of SSR loci usually depends on their polymorphism degree. With some exceptions for which this information is not available, the average number of marker alleles per SSR locus is equal to 7.1 for Olea europaea L., 3.5 for Solanum lycopersicum L., 8.2 for Vitis vinifera L., 6.9 for Triticum spp., 9.4 for Malus × domestica Borkh., 6.5 for olive oil, and 5.2 for wine. Both EST-SSR and cpSSR were found to be less polymorphic, with a low average number of alleles per locus, than genomic SSR markers [45, 68, 93, 113, 125]. The polymorphism degree may depend on several factors, including the SSR motif length and the SSR localization on coding or not-coding regions.
In order to estimate the level of genetic diversity detected by each microsatellite, marker frequencies are widely used to estimate the polymorphism information content (PIC, Table 5) values, according to the methods of Botstein et al. . The authors reported the following formula for the calculation of the PIC value of an n-marker allele:
where pi and pj are the population frequencies of the ith and jth marker alleles, respectively. A PIC > 0.5 is considered as being a highly informative marker, while 0.5 > PIC > 0.25 is an informative marker and PIC is 0.25, a slightly informative marker. As reported by Nagy et al. , PIC can be defined as the probability that the marker genotype of a given offspring will allow deduction, in the absence of crossing-over, of which of the two marker alleles of the affected parents it received. In other words, this parameter is a modification of the heterozygosity measure that subtracts from the H value an additional probability that an individual in a linkage analysis does not contribute information to the study. On this aspect, there is no full agreement among the authors. Some studies on olive oil [58, 122] and Malus × domestica Borkh. [101, 103], referring to Anderson et al. , contend that the occurrence of rare marker alleles has less impact than common marker alleles on the PIC estimates and consider that this index can be assimilated to the expected heterozygosity (He), calculated by the following simplified formula:
where pi is the population frequency of the ith marker allele.
|Index||Full name||Formula||Definition||No. of papers account for it|
|PIC*||Polymorphism Information Content||Probability that the marker genotype of a given offspring will allow deduction, in the absence of crossing-over, of which of the two marker alleles of the affected parents it received ||36|
|PD**||Power of Discrimination||Probability that two randomly sampled accessions would be differentiated by their marker allele profiles ||14|
|C**||Confusion probability||Probability that any two individuals are identical in their genotypes at all SSR loci by chance alone ||3|
|PI*||Probability of Identity||∑i(pi)4 + ∑i∑j(2pipj)2||Probability that two individuals drawn at random from a population will have the same genotype at one marker locus ||21|
|PIt*||Total probability of identity||Probability of two individuals sharing the same marker genotype by chance ||2|
In addition to the PIC value, calculated taking into account allelic frequencies, there are several indexes focusing on genotype frequencies. For example, as reported by Aranzana et al. , other two important indexes that should be evaluated are the power of discrimination (usually PD)—or diversity index (D), as reported by Zulini et al.  and Martínez et al. —and the confusion probability (C). The first one provides an estimate of the probability that two randomly sampled accessions of the study would be differentiated by their marker allele profiles:
where pi is the frequency of the ith marker genotype. As already described for the PIC, among the authors, there are different interpretations and procedures to calculate the PD index. Pasqualone et al.  in their study on Olea europaea L. genotyping reported that “the power of discrimination, sometimes referred to as polymorphism information content, or diversity index, was calculated […],” assuming in this way that PD and PIC correspond to the same parameter.
The confusion probability (C) index, also defined as the combined power of discrimination of overall loci , is the probability that any two cultivars are identical in their genotypes at all SSR loci by chance alone and it depends on PD. It can be estimated as follows:
where PDi is the power of discrimination value of the ith locus. Notwithstanding its informativeness, only three articles of the 90 reviewed take into account this value (Table 5). Martínez et al.  in their attempt to assess the genetic diversity of Vitis vinifera L. varieties calculated the power of discrimination index as follows:
where pi is the frequency of different marker genotypes for a given locus. In this case, C is the probability of coincidence, corresponding to the probability that two varieties match by chance at one locus.
About 21 articles, mainly focused on the species Vitis vinifera L. and oil from Olea europaea L., report also the probability of identity (PI) index of each single SSR marker locus either in addition or in substitution of PD value (Table 5). This index can be estimated as follows:
where pi and pj are the frequencies of ith and jth marker alleles, respectively. It represents the probability that two individuals drawn at random from a population will have the same genotype at one marker locus. For example, Vietina et al.  and Corrado et al.  in their studies, regarding the genetic traceability of monovarietal olive oils, refer to this value in order to determine the efficacy of the SSR marker pool to discriminate among the cultivars. Martínez et al.  adopted the following formula to calculate the same value:
Equally interesting is the total probability of identity (PIt) that represents a compound probability defined as the probability of two cultivars sharing the same marker genotype by chance and calculated as follows:
where PIi is the probability of identity value of the ith marker locus.
Finally, Qanbari et al.  reported that PD and PI are complementary parameters:
The use of standardized parameters is essential to make SSR data comparable across species and laboratories, and it can be especially beneficial for the preliminary evaluation of the discriminant ability and applicability of SSR marker loci.
4. The choice of the best microsatellite motifs and the problem of the null alleles
Microsatellite repeat units typically vary from one to six bases. Shortest motifs (mono- or dinucleotide repeats) usually have a high number of alleles , and they allow packing more loci on a given separation system, resulting in larger multiplexes. However, this kind of SSR motifs can be difficult to assay accurately. It is very common to observe a stuttering in terms of multiple bands or peaks, a phenomenon commonly caused by slippage of the DNA polymerase, but the main problem arises when there is a difference of one or two base-pairs between marker alleles: in case of homozygous loci, the electrophoretic analysis results in one main band or peak, but with heterozygous loci very often one of the two marker alleles is masked by the stutter. SSR markers containing trinucleotide or higher order repeats usually eliminate this technical problem because target sequences appear to be significantly less prone to slippage . Nevertheless, microsatellite loci with long motifs are known to be less polymorphic and, in some cases, due to lack of stutter bands or peaks, which is not always possible to distinguish SSR amplicons from other aspecific PCR products and it may lead to an overestimation of the level of polymorphism of these loci .
Among the 90 studies we surveyed, only 25 of them specify the length of the SSR motifs employed and very few justifies the choice. Cipriani et al.  performed two distinct molecular analyses on the same set of cultivars, using the genetic profiles obtained from the two sets of microsatellites, the dinucleotide repeats from one side, and the tri-, tetra-, and pentanucleotide repeats from the other, with the aim of comparing their performance in the discrimination of the genotypes analyzed. Both microsatellite data sets produced identical consensus tree topology, but the authors underlined that dinucleotide SSR markers scored a higher number of alleles per locus, and consequently, a potentially higher power for identifying and distinguishing closely related genotypes. On the other hand, the microsatellite dataset based on tri-, tetra-, and pentanucleotide SSR markers proved to have the advantage of ease in scorability, while maintaining a very high power of discrimination for successful genotyping of the Vitis vinifera L. cultivars.
Microsatellites have also been classified according to the type of repeat sequence as perfect or imperfect, according to the occurrence of simple or uneven repeats, respectively . The preference should be given to perfect motifs because using imperfect ones, there is no more equivalency between fragment length and amplicon sequence, and hence several sequences can correspond to a given length variant . This is the reason why only four studies employed imperfect SSRs among the 25 ones specifying the motifs.
The occurrence of null alleles is something to avoid when using SSR markers for genotyping plant materials. A microsatellite null allele is any marker allele at a genomic locus that consistently fails to amplify by the polymerase chain reaction, resulting in the lack of detectable amplicons. Lack of amplified fragments could preclude the detection of heterozygous loci, which would be computed as homozygotes. In the same way, null alleles at homozygous loci are characterized by a complete lack of amplification with the consequent production of missing data. On the whole, null alleles may interfere with the genetic identification of cultivars, by wrongly reducing the genetic diversity among accessions . In the 90 studies surveyed, only 38 of them estimated the probability of null alleles, mainly using the formula of Brookfield :
being He the expected heterozygosity and Ho the observed heterozygosity.
5. Comparisons across studies of SSR-based genotyping: Reference marker sets and reference plant varieties
In most cases, it is impossible to make valid comparisons across studies on the same species since different sets of SSR loci are used in different laboratories . For some species, the choice of microsatellites begins to be fairly uniform (Table 4). For instance, almost all of the studies aimed to genotype Olea europaea L. cultivars make use of SSR markers belonging to four main datasets developed by Sefc et al. , Carriero et al. , Cipriani et al. , and de La Rosa et al. . Based on these studies, two informal universal sets of SSR markers were proposed for genotyping Olea europaea L. cultivars by Doveri et al.  and Baldoni et al. . Cipriani et al.  suggested a list of 38 markers with excellent quality of peaks, high power of discrimination, and uniform genome distribution (1–3 markers/chromosome) for genotyping Vitis vinifera L. cultivars. Li et al.  assembled a reference kit of SSR markers for genetic analysis in Triticum spp. that comprises 46 microsatellites. Moriya et al.  developed a set of SSR markers for genotyping Malus × domestica Borkh. cultivars, which includes 15 microsatellites. Not only independent research works, but also some international programs and projects attempted to pursue this goal. The European Cooperative Programme for Plant Genetic Resources (ECPGR) has recommended a new set of 12 SSR marker loci distributed in different linkage groups of the Malus × domestica Borkh. genome, organized in three multiplexes and designed for a four-dye system . Comparable considerations have been presented within two projects focused on the grapevine genetic resources conservation and characterization (EU-project GENRES CT96 No 81, ) and on the Traceability of Origin and Authenticity of Olive Oil (Oliv-Track, ). It is worth noting that, to the best of our knowledge, for Solanum Lycopersicum L., no SSR set of reference has been proposed yet.
Unfortunately, by establishing a reference set of microsatellite markers to use in each analysis for a given species, it is not sufficient to ensure the comparability among different studies and the reproducibility among different laboratories. Some tests have been carried out in order to investigate the reproducibility of SSR data produced by different laboratories under varying local conditions. Four different laboratories performed independent marker analyses on a common set of 21 DNA samples of Olea europaea L. cultivars and with the same set of SSR markers, using different DNA polymerase enzymes, PCR cycling conditions, amplicon separation, and visualization methods . The results are not encouraging. Many cases of allele drop out and discrepancies in allele length, up to five nucleotides for identical microsatellite loci, were recorded. This finding is probably attributable to a combination of different equipments, different sequencers, and different internal ladders, which may have affected the relative mobility estimates leading to noncomparable electropherograms. Similar results have been achieved from ten laboratories distributed in seven countries that analyzed the same 46 Vitis vinifera, L. cultivars at the same 6 SSR loci .
One of the main discoveries is that the specific microsatellite sequence dramatically influences the efficiency of analysis. Marmiroli et al.  showed that the repeatability of results among different laboratories was good enough for some microsatellites but rather low for others, confirming that the choice of SSR loci and of their primers is crucial for an efficient analysis.
Despite all the precautions and the establishment of a reference set of SSR markers, some residual variation in laboratory equipment and procedures cannot be completely avoided, and representative reference material with many different alleles should be adopted by all laboratories involved in a genotyping program for a given species . For this purpose, 21 out of 90 studies included reference cultivars, promoting new ones or exploiting cultivars already used as reference in previous works. Independent researches and international institutions are trying to find an agreement filling lists of reference accessions in order to prevent that each group uses its own reference cultivars and to standardize all works performed on these species. For example, the ECPGR has chosen eight Malus × domestica Borkh. cultivars as reference set for this species . Baldoni et al.  and Doveri et al.  proposed two different lists of reference cultivars for Olea europaea L. (Table 4).
Even if this approach is fully applicable also to the crop derivatives here taken into account (olive oil and wine), there are some additional aspects that must be considered when talking about processed products. First, sometimes, it is very difficult to make SSR marker analyses on food products and beverages because of the low DNA quantity and the lack of DNA integrity. For example, Baleiras-Couto and Eiras-Dias  reported their difficulties to investigate wines after about eight months of fermentation, as well as Recupero et al.  highlighted technical problems during the isolation of genomic DNA from Nebbiolo wine. Nevertheless, both of them managed to characterize must. For olive oil, Martins-Lopes et al.  as well as Vietina et al. , took advantage from extraction methods able to give good yield of genomic DNA and PCR amplificability. It is therefore evident how an optimized DNA extraction method is also a crucial step to carry out a reliable study on the applicability of molecular markers for identifying the varietal origin or assessing the varietal composition of crop plant derivatives.
It is not trivial considering the match between genetic profiles of crop plants and their derivatives. In this regard, there are some contrasting points of view. In the review of Agrimonti et al. , it is reported that several authors (e.g., [46, 118, 120]) have noticed a satisfying conformity between olive oil and leaf profiles with SSR markers. On the contrary, Doveri et al.  have proposed a cautionary note about the use of SSR markers, stressing the nonperfect concordance between the molecular genetic profiles of the olive oil and the original leaf sample. Furthermore, it is necessary to underline the extreme difficulty in characterizing multivarietal derivatives through SSR analysis. Most of the Italian PDO wines and olive oils are produced blending two or more cultivars in percentages strictly defined in the production regulation. In these cases, each SSR locus is represented by the combination of the marker alleles of each variety. For examples, Baleiras-Couto and Eiras-Dias , after having analyzed with six SSR markers in different divarietal musts at different percentages, reported results that confirm the complexity and difficulty of assessing multiple genotypes.
The genetic characterization of plant varieties by means of multilocus genotyping through SSR markers in the main crop species is still not based on standardized protocols making the acquisition of reproducible and transferable datasets difficult. What emerges from the analysis of the literature is a lack of wider consensus among the authors regarding the strategy to design and to adopt for genotyping plant varieties with SSR markers. This finding highlights the urgent need to establish a common procedure.
Some conclusions of general validity can be drawn on the basis of the articles here reviewed. First of all, it is quite difficult to define exactly the ideal number of microsatellite loci to assay. Usually, the number of SSR markers depends on the type and goal of the analysis. If the purpose is merely to distinguish among two or more cultivars (i.e., individual genotypes), it is possible to adopt an “as simple as possible strategy.” For example, a novel approach called the cultivar identification diagram (CID) strategy has been recently developed. This method was designed so that, at each step, a polymorphic marker generated from each PCR analysis directly allows the separation of cultivar samples . In this specific study, eight is considered the minimum number of SSR markers necessary to distinguish 60 cultivars in Malus × domestica Borkh.. Supposedly, the number of SSR markers could depend on the number of cultivars to distinguish, on their relationship and on the polymorphic degree of each marker locus. In this regard, we suggest AMaCAID  and UPIC , two very interesting tools that able the investigation of the minimum number of markers required to distinguish a specific number of accessions and, thus, the identification of the best marker combination that maximizes the genetic information.
When the purpose is to genetically characterize a cultivar in order to fulfill the requirements of a varietal register that could include hundreds or thousands of different varieties, the selection of SSR markers should be oriented to an exhaustive representation of the genome as whole. This is the reason why different authors consider one or two microsatellite for each linkage group for the minimum number required to reconstruct a reliable and selectable genotype for a given plant accession. For instance, Cipriani et al.  implemented an efficient method for Vitis vinifera L. fingerprinting using a set of 38 microsatellite marker loci scattered throughout the genome. In particular, two SSR loci were carefully chosen, on average, for each linkage group, selecting the best ones in terms of polymorphism information content (PIC) and power of discrimination (PD, Figure 1).
It is worth noting that despite some international programs and projects attempted to establish reference SSR set, there is still a lack of wider consensus. For instance, in 2003, the partners of the EU-project Genres CT96 No81  agreed on the utilization of six highly polymorphic SSR-markers for the identification of Vitis vinifera L. cultivars, but, since then, several studies continue to be performing using a higher number of markers [74, 76, 78, 84, 86]. As reported by Cipriani et al. , grape varieties selected in Western Europe, which account for most of the worldwide production of wine, likely have extensive coancestry that is a common origin from the hybridization of a few ancestors. Because of this, using too few markers for fingerprinting could hamper the discrimination of sibling varieties. For this reason, they recommend using at least 19 markers (among the 38 markers employed in their work). In general, for the selection of the panel of SSR markers, the following criteria should be followed. Based on previous works, the SSR marker loci with the highest number of marker alleles and the highest PIC and PD scores should have the priority. In addition, the position of the SSR markers across the genome, as mapped in different linkage groups and associated with adjacent chromosome blocks, is crucial in order to get a representative multilocus marker genotype. In fact, microsatellites retrieved from noncoding regions (genomic SSR markers) meet this requirement more precisely than those derived from expressed regions (EST-SSR markers). Nevertheless, the application of EST-SSR markers cannot be excluded when phylogenetic relationships have to be investigated. It is well known that SSR markers belonging to coding regions may be functionally more informative than those deriving from noncoding ones, because they are associated with transcribed regions of the genome and thus reflecting the genetic diversity within genes or adjacent to genes . Moreover, the association with trait loci with Mendelian inheritance is particularly requested in case of needs for marker-assisted selection (MAS).
About the localization of target microsatellites in the cellular genomes, nuclear SSR (nSSR) markers seem to be more polymorphic than plastidial and mitochondrial ones (cpSSR and mtSSR markers) and because of their co-dominance, the former are the only markers useful for assessing the genetic value of breeding stocks, even if the abundance and the haploid nature of the latter ones make them particularly suitable for phylogenetic and genetic diversity studies.
As far as the microsatellite repeat is concerned, the most recommended motifs are dinucleotide and trinucleotide repeats, whereas mononucleotide repeats need caution because of technical drawbacks, which can be experienced in the allele discrimination. SSR markers with tetra-nucleotide or more repeats display a polymorphism inversely proportional to the complexity of the motif. The so-called perfect SSR markers are preferred because of their ease of scorability. It is also worth emphasizing that the choice of SSR markers is also dependent on the occurrence of null alleles for a given locus and the informativeness in terms of allele diversity indexes. First of all, any rate of null alleles can underestimate heterozygosity and affect the reliability of the analysis. Second, the calculation of some informative indexes cannot be underrated: it represents a crucial step of the planning of any analysis. What emerges from the 90 studies here reviewed is a lack of wider consensus among the authors regarding the best informative index to calculate and this makes the comparison difficult also among studies performed on the same cultivars and with the same markers. The power of discrimination (PD), the confusion probability (C), the polymorphism information content (PIC), the probability of identity (PI), the total probability of identity (PIt), and the probability of null allele (r) are all parameters able to describe exhaustively the efficiency of the set of SSR markers used in a given species.
In conclusion, there is the urgent need to establish a common procedure for SSR genotyping with a universal set of marker loci to be analyzed in each species. In parallel, the reference varieties must be defined in each species in order to maximize not only the reproducibility but also the portability of marker data, being aware that the residual variation in laboratory procedures and equipment cannot be completely avoided.