A genome is written in four chemical letters (nucleotides designated as A, T, G, and C). Various combinations of these letters as a stretch of DNA molecule provide specificity and uniqueness of each gene sequence, cell types, and each individual genotype. Magic is that the order of each triplet (known as genetic code) of these four‐letter DNA sequence stretches corresponds to one of 20 amino acids. A “spelling” order of nucleotides encodes the specific protein sequences determining a life function. Therefore, variations in these four‐letter DNA sequences in a genome make the meaning and differentiation of the living organisms on Earth that generated biological diversity. Because of degeneracy of genetic code, some spelling changes in coding parts of a sequence (exons) may have no meaning and still code the same amino acid although changes may have evolutionary role and contribute to the biodiversity levels of populations. However, other changes may lead to generate novel proteins with new function and characteristics, stop the gene function and protein synthesis, or generate a partial protein sequences that is not sufficient for its functional activity—all these alter the function of the cell and generate a difference.
The pattern of DNA variations among individuals, generating beneficial and diseased phenotypes, is very useful to differentiate organisms in molecular level and to understand the evolutionary path of important genes as well as their functional and adaptive roles in the different eco‐geographic environments. One of such abundant spelling variations in a genome is the 5‐ to 50‐fold repetitions of the two to six nucleotide base pair (bp) motifs of DNA, such as (GA)n or (GTACGT)n, which are called as microsatellites [1–4]. These tandem repeats are referred to as microsatellites, simple sequence length polymorphisms (SSLP), simple sequence repeats (SSR), or short tandem repeats (STR), which are used interchangeably among researchers. Microsatellites are abundantly found in all prokaryote and eukaryote genomes.
Litt and Luty  first coined the term “microsatellite” in 1989, where the word “satellite” was used due to fact that density gradient centrifugation separates DNA fragments with repetitive sequences into the upper “satellite” fraction with less density. As a genetic marker, microsatellites have been widely used in DNA‐based genetic analyses for the past 25 years. Since the first paper by Litt and Luty  in 1989, as of October 2016,
2. Definition, occurrence, types, distribution, and density
A variable number tandem repeats (VNTRs) include both mini‐ and microsatellite DNAs. Minisatellites are the heterogeneous array of 10–60/100 core bp repeat motif sequences such as (GGGCAGGTNG)n that have repeat size of 1–15 kilobases (kb). In contrast, microsatellites, commonly, consist of a homogeneous array of core mono, di, tri, tetra, penta, and hexanucleotide motifs with repeat size of less than or around 1 kb. Controversially, some reports include all repeat arrays less than 9 bp into microsatellite category and those above nine core repeats into minisatellite group [1–3, 6]. However, the suggested core array of repeat motifs for microsatellites are 2–6 bp, which non‐randomly distributed throughout the genome  and vary largely in different regions of a genome or on different taxa . Microsatellites can be found abundantly in non‐coding parts of the genome such as introns, untranslated regions (UTR), and intergenic spaces, but they also occur in coding exonic sequences. Microsatellites also located within transposons and other dispersed repetitive elements [1–3, 6, 7].
Density of microsatellites considered to be highest in UTRs and in decreasing order in promoters, introns, intergenic regions, and coding sequences. Microsatellite repeat lengths in coding, non‐coding and intergenic regions are reported to be species specific. For instance, generally, vertebrates (e.g., turtles) tend to have more and longer array microsatellites compared to plants and then invertebrates . Although vary organism to organism and may not be true for all genomes under consideration, commonly trinucleotide motifs are more frequent than other types, being highest in plants to 61–73%, except Arabidopsis and potato (>30%) . The distribution of di‐nucleotide repeat microsatellites is higher in mammals (31%) and rodents (33%) than plants (in average ∼24%) although Arabidopsis and potato have >30 and 50% dinucleotide repeats . Plants have less frequency of GA dinucleotide repeat SSRs, while animal genomes abundantly contain these types of repeats. GC dinucleotide arrays are less frequent in coding sequences, but GC‐rich trinucleotide arrays frequently occurred in exons, while AT‐rich trinucleotides evenly distributed throughout all genomic regions. For example, ATT repeats are abundant in introns of genes of most organisms although some genomes like rodents tend to have AAG abundance in introns. Generally, ACG, ACC, ACT repeat SSRs are rare in all organisms .
Interestingly, there is a predominance of tri‐ and hexanucleotides in coding regions that is explained as a result of selection forces to keep reading frame not altered. However, microsatellites of such triplet expansions can cause harmful phenotypes such as Fragile X syndrome (FXS) and Huntington's disease . Tetranucleotide motif SSRs located predominantly in noncoding regions with abundance of AT‐rich motifs. Mammals have more tetranucleotide microsatellites compared to other organisms . Similarly, pentanucleotide motif microsatellites abundantly represented in intronic regions of animal genomes compared to bacteria and plants.
According to a repeat motif pattern, microsatellites can be classified as (1) perfect with continuous repeat of single motif, (2) imperfect with a base pair disruption between repeats, (3) interrupted with insertion of a stretch of sequence of few nucleotides within repeats, or (4) composite with multiple SRR motif repeat types that vary among different taxa . The density of SSRs also vary among different taxa and occurred one SSR in about 6.04 kb for Arabidopsis, 6–7 kb in mammals and could be less than 1 kb in puffer fish  or up to 212–292 kb in hexaploid wheat . Microsatellites can be genomic, i.e., developed from genomic DNAs (gSSRs) or can be expressed, referred to as EST‐SSRs, derived from expressed sequence tags (ESTs) [2, 9]. EST‐SSRs have high power because of their associations with expressed genes, directly contributing to a phenotype . In plants, SSRs can also be classified as nuclear SSRs if they occurred in nuclear DNA (nuSSRs) and chloroplast SSRs (cpSSRs), if they occurred in chloroplast DNA. cpSSR loci were first introduced by Powel et al. [11, 12] in 1995 as useful genetic markers with broad applications in plants, in particular for measuring the cytoplasmic diversity and introgression in plant species, although there are some limitations highlighted below .
3. Origin, evolution, and mutation mechanisms and rates
Due to sequence mutation, the genesis of microsatellite locus can be started from a mutated site with minimum of eight nucleotide repeats or from de novo points without repeat motifs leading to formation of “proto‐microsatellites/SSRs” sites. Some reports discuss the potential of minisatellites as a progenitor of SSRs, while others suggest the contribution of transposons to the birth of SSRs although it is not evident in birds and plants . Following the next generation of DNA replication process, “proto‐SSR” sites can get expanded due to errors caused by DNA polymerase strand slippage [2, 3, 6, 7]. Moreover, based on “transposon‐mediated” microsatellite birth, SSRs can be born due to transposon movement, exemplified by the origin of
Replication slippage is considered the main mechanism [2, 3, 6, 7] of microsatellite genesis, repeat expansion or reduction, and generating a variability that all are contributing to the molecular evolution of microsatellites. Besides replication‐associated slippage, the birth of microsatellites can occur during transcription‐coupled DNA repair and/or repair of double‐stranded breaks where repetitive sequences are preferably used for filling the gaps . Further, insertion and deletions (indels) or single nucleotide substitutions, which occur in increased rates [13, 14], can generate new repeat arrays in microsatellites. For instance, a comparative human and primate genome analysis revealed that the repeat number change in short microsatellites mostly occurs because of single nucleotide polymorphism (SNP) mutations rather than slippage .
Microsatellite mutations can simultaneously change one, two, or more repeat unit(s), providing a higher mutation rates of 10−2 to 10−6 per microsatellite locus per generation  than other mutation types, such as point mutation rates, which is approximately 10−9 nucleotides per generation for entire genome in eukaryotes . The base position relative to the microsatellite, genomic location, repeat type and number, base identity, flanking sequence, speed of recombination and transcription, and heterozygosity of microsatellite alleles greatly affect the microsatellite mutation rates [6, 16]. In particular, microsatellites in noncoding regions tend to mutate frequently than those in coding regions, and/or changes in perfect repeats can generate new SSRs . In addition, recombination with unequal‐length SSR alleles can increasingly cause SSR instability during meiosis [15, 16]; dinucleotide repeats mutate frequently than tri‐ and tetra‐nucleotide arrays, and/or longer and purer repeats can mutate in high rate than shorter repeats with low purity . SSR mutation rate is species and gender specific where human males have higher SSR mutation rate (0–7 × 10−3 per locus per gamete per generation) compared to females .
The rate of both repeat motif expansion and contraction is also species specific. For example, repeat expansion mutations are faster in humans than chimpanzees, or there is a loss of two repeat units per mutation in yeast compared to a loss of 1.4 repeats in Drosophila . Repeat expansion mutations predominantly found in primates, while bacteria have repeat contractions. Longer repeat arrays in SSRs are considered to be recent origin and long repeat SSRs are biased toward repeat expansion mutation. Statistically, the patterns of variation in SSR loci can be studied and predicted using “Stepwise Mutation Model” (SMM) or its two‐step modification and “Infinite Allele Model” (IAM) .
4. Biological function
Due to common understanding that repetitive DNAs are “junk” and nonfunctional, tandem repeat microsatellites have been considered as neutral elements in a genome without distinct biological function although there were numerous observations that microsatellite mutations can lead to many diseased phenotypes and change the function of proteins. The occurrence of microsatellites in coding and regulatory gene regions (as well as introns or in intergene regions) supported the biological function of microsatellites in such processes as (1) gene expression including transcription and translation, (2) gene silencing, (3) alternative splicing and mRNA transport, (4) chromatin organization, and (5) regulation of cell cycle [2, 6]. Involvement of microsatellite repeat motives in these key biological processes of cell life not only leads to the cell phenotype change and cause disease and unwanted traits but also determines the evolutionary fate, survival, plasticity, and adaptation of organisms in changing and potentially harmful environments [2, 3, 6]. Discovery of the co‐localization of SSR with pre‐microRNAs and influence of CUU repeat numbers to the loop size of pri‐microRNAs in orange plants [6, 18] or involvement of certain r(CGG)‐derived microRNAs such as miR-fmr1s in FXS-pathogenesis demonstrated a possible role of microsatellites in many developmental processes regulated by microRNAs .
There are many examples for distinctive phenotypic changes that directly associated with the increases or decreases of microsatellite repeat arrays. For instance, more than 40 neurological diseases in humans, such as FXS and spinocerebellar ataxia (SCA1) with a polyglutamine tracts, are caused by microsatellite motif length changes in trinucleotide arrays . Microsatellite repeat changes determine morphological features, for example, repeat expansion of microsatellite stretches in the
5. Utility of microsatellites as genetic markers
As a genetic marker, microsatellites can be widely applied for solving a numerous type of different tasks. These include the construction of genetic linkage groups and integrated maps; correlation of phenotypic and genotypic variations using quantitative trait locus (QTL) and/or linkage disequilibrium (LD)‐based association mapping approaches; analyses of parentage and/or ancestry; DNA barcoding for plant varieties and germplasm; evaluation of gene flow and variety/seed purity; breeding using marker‐assisted selection tools; estimation of genetic diversity, phylogeography, conservation and restoration of biodiversity, molecular evolution, taxonomy, and phylogenetic features of biological species; detection of genetic structure of native plant populations and crop germplasm, origin and domestication of crop species, migration, demographic process, population differentiation and kinship; assessment of impacts of mutagenic contaminants; and application in forensics and disease diagnostics [1, 2, 31].
5.1. Marker development
Microsatellites are polymerase chain reaction (PCR)‐based markers and require a prior knowledge on sequence structure before using them as a genetic marker. There are two ways to develop SSR markers: (1) necessary genome or its part should be sequenced following screening for microsatellite repeat arrays; or (2) preliminary sequenced genomes databases can be mined using variety of
When sequences are generated de novo or available as genome databases in the National Center of Biotechnology Information NCBI , most important step is to efficiently screen microsatellite containing sequences and design markers. For this purpose, there are many SSR array searching algorithms available such as tandem repeat finder (TRF), MIcroSAtellite identification tool (MISA), SSRFinder, and PALFinder [1, 2]. Besides there are several web servers based online tools such as CID  and WebSat . Each of these bioinformatics tools has its advantages and disadvantages, can address various aspects of microsatellite mining and marker development and be used according to study/task objectives, expertise and availability. There is some recommended software for efficient screening microsatellite repeats from DNA sequences such as MISA or Phobos . Further, there is a list of many other useful bioinformatics resources for the genetic analyses of microsatellite data .
Among all other type of molecular markers, for past three decades, microsatellite markers were the marker of choice because they are PCR based; abundant and dispersed throughout a genome; highly mutagenic, polymorphic, and informative; co-dominant, suitable for detecting heterozygotes, and multi-allelic; experimentally reproducible; transferable among related taxa; cost‐effective and easy to detect; amplified from low quality and low quantity of DNAs; and presumably neutral markers. In addition, microsatellites are of particularly useful to construct a genetic map of large genomes when a reference genome is absent . They are favored markers for small‐scale genetic studies with limited budget, potentially detecting large genetic information and physiological parameters of a genome , do not require high marker density, especially if LD block sizes of a genome are long  and benefit from inclusion of additional samples for the project without significant costs . Microsatellites can be also used for testing non‐neutrality and subjected to automated florescent dye‐based band scoring through multiplexed genotyping for large-scale studies, which help to cut the time and cost of the study . Unipartental cytoplasmic inheritance with presumably no recombination history of cpSSR  further provide a great advantage to develop universal primers to genotype and genetically analyze distantly related plant taxa although there are some limitations, too (see below). Importantly, EST‐SSR markers developed from coding genes can be a great tool to directly tag and map‐based cloning of functionally meaningful “candidate genes” through genotype to phenotype correlations in genetic mapping studies .
There are various concerns and caveats to use microsatellites, too. Some of these include but not limited to (1) need for a priori genomic sequence information that is not available for most prokaryotes where specific effort can be costly and time consuming; (2) PCR failure due to point mutations in primer sequences resulting in ‘null’ alleles and falsely hiding the reality when applying PCR primers across different species with mutated primer binding sites, or because of environmental degradation of long repeat arrays; (3) PCR stutters of short SSR arrays giving multiple bands from single locus; (4) abundance for rare, private or minor alleles; (5) issues with assigning of multiple band SSRs alleles in the absence of correct parentage and pedigree information; and (6) size homoplasy, heteroplasmy and cytoplasmic introgression (in particular with cpSSR) due to back mutations during replication slippage [1, 2, 3, 9, 12]. All these complicate and bias downstream genetic analyses, inflate F‐statistics or
However, these all do not void the usefulness of SSR markers, rather call attention of researchers using this marker system. There are several approaches to take into consideration of these caveats when SSRs are used that include verification of size homoplasy, heteroplasmy and primer site point mutations using additional cloning and re‐sequencing including NGS ; exclusion of problematic, rare, and private alleles from the analyses based on specific objective  (e.g., haplotype networks due to high potential of repeated evolution ); use of more samples and markers for genotyping rather concluding based on few samples and small number of markers; and reanalysis of the results with or in combination of different type DNA markers such as RFLPs, SNPs, etc. .
5.4. Future utilization
Because of past 5 years’ successful and wide application of SNP markers for genome‐wide applications and the emergence of cost‐effective and large‐scale sequencing, as well as SNP detection and genotyping methodology such as NGS, NGS‐based genotyping by sequencing (GBS), and restriction site–associated DNA sequencing (RAD‐Seq) techniques circumvented a rapid shift of SSR‐based molecular marker studies to SNP‐based research. This was evidenced by sharp decrease of a number of publications using microsatellites in crop species during 2010–2015 . However, as discussed and highlighted by Hodel et al. , microsatellite markers will continue to be useful and favorable markers. This is due to the fact that (1) not all studies require in‐depth genotyping as provided by NGS‐based approaches where SSRs remain a suitable choice, and (2) sample size can be largely expanded without significant cost when SSRs are used, which is costly with NGS‐based approaches. Further, (3) additional large sample inclusion can increase the power of microsatellite‐based studies, which perform similarly with SNPs; (4) existed SSR‐marker data can be readily incorporated and used with new studies; (5) multi‐allelic nature of SSRs makes them highly suitable for studying small subpopulations; and (6) microsatellites are the best markers of choice for small‐scale laboratories with limited budget.
Additionally, SSRs are still efficient markers for (1) marker‐assisted selection (MAS) programs to mobilize QTL blocks using small number of SSR markers based on LD information, (2) germplasm characterization using evenly spaced core set of few SSRs, (3) seed or variety purity testing, and (4) SSR indexing of cultivars (barcoding) and plant germplasm resource. All these invalidate any emerging opinion on prospective total “death” of microsatellites as useful genetic markers and demonstrate the future benefit of microsatellites in many genetic studies. Highlights and some updates on advantages, disadvantages, and usefulness of SSRs for various applications in agricultural and biomedical fields have been presented in following book chapters of this book, which I provide a brief information below to introduce them to readers.
6. Highlights from chapters
In this context, with the objective to provide current updates on microsatellite applications in genetic studies as well as re‐highlight the usefulness of microsatellites in current and future genetic analyses, in this edited volume, we compiled 10 chapters describing the wide utilization of microsatellite markers in different biological taxa. Generally, chapters presented research studies and review discussions on following three directions: (1) microsatellite markers in plants and genetic diversity research, (2) microsatellite markers in animal genetics and breeding, and (3) microsatellites in cancer research.
In the first section, the chapter by Jamila Bernardia and her team, Universita Cattolica del Sacro Cuore, Piacenza, Italy, presents the use of microsatellites in livestock and illustrated exploitation and versatility of microsatellites for the characterization of agricultural diversity and food traceability. Authors studied the assessment of genetic diversity in apple, pear, and sweet and sour cherry trees and explored the molecular authentication of wheat food chain of plant cultivars and farm animals. The chapter discusses that a small number of SSR markers can be efficiently used to differentiate and link each tree cultivar to its corresponding genotypic profile and be useful for molecular traceability of the whole production chain from durum wheat raw material to processed pasta despite food processing degrades DNAs.
Further, the chapter by Maria Eugenia Barrandeguy and Maria Victoria Garcia, Universidad Nacional de Misiones, Instituto de Biologia Subtropical Nodo Posadas, Argentina, has covered the development of microsatellite markers, genotyping, data analysis, and interpretation of obtained results in the examples of nuSSRs and cpSSRs. The chapter discusses the usefulness of microsatellite markers for the analysis of past and present microevolutionary forces in native forest pant populations and making inferences about future of these natural populations.
In their chapter, Rodolphe Laurent Gigant and his team from France have assessed the mating system of the natural populations of
The last chapter in this section by Beyene Amelework, University of KwaZulu‐Natal, South Africa, and Ethiopian Institute of Agricultural Research, Ethiopia, reviewed the use of microsatellite markers in genetic diversity analysis and heterotic grouping of sorghum and maize through the estimation of molecular‐based genetic distance. The chapter also discusses the existing challenges with the use of SSR markers in heterotic grouping in studied crops.
The second section of the book covers microsatellite marker application in animal sciences. The chapter by Yuta Seki and his colleagues, Tokyo Metropolitan Institute of Medical Science and Tokyo University of Agriculture, Japan, has provided a review on the currently available studies on domestic goat (
Further, Emil J. Hernandez‐Ruz and his colleagues, Federal University of Para (UFPA), Brazil, presented a research study on microsatellite marker development and evaluation of the genetic structure of the Amazonian fish
In the chapter by Hongyu Ma and his colleagues, Shantou University and Chinese Academy of Fishery Sciences, China, authors presented a research study on the development and characterization of microsatellite markers for genetic study of the mud crab (
The third section of the book includes two similar topic chapters that describe the impact of microsatellite instability (MSI) in causing the cancer diseases. In particular, Jeffery W. Bacher and his team from Promega Corporation and University of Wisconsin, Madison, USA, provided a detailed review on the role and significance of MSI in hereditary and sporadic type of cancers. They have discussed the discovery of MSI and its association with colorectal cancer or Lynch syndrome, and the use of SSR marker in disease screening. In addition, emerging and alternative NGS‐based methods in detecting both tumor MSI status and germline mutations in a single test for LS are reviewed. The chapter concludes that MSI detection is poised to take on an even greater role in prediction of responses to the new immunotherapies targeted at MSI-positive tumors Similarly, the following chapter by Narasimha Reddy Parine and Mohammad Saud Alanazi, King Saud University, Saudi Arabia, described the role of genetic instability, including MSI in colorectal cancer. Differing from previous chapter, this chapter reviews the major molecular mechanisms causing genomic and microsatellite instability, including a mismatch repair (MMR) system and cancer formation.
Microsatellite markers have been one of the most reliable molecular markers derived from the DNA molecule, which were widely and successfully used for life science research directions including agriculture and biomedical fields. As a molecular marker, microsatellites have many advantages suitable for the wide types of genetic analyses, but do present concerns and caveats that require attention and corrections for the results and their interpretation in specific analyses, which were highlighted by chapters of this book. Although the trends of molecular marker application and use for past 5 years show a decreased utilization of microsatellite markers and present a shifted growth toward the use of SNP markers, that is due to the emergence of novel generation NGS‐based genotyping technologies, microsatellite markers remain to be useful and choice of marker system for the specific genetic studies. This is because of multi‐allelic nature, simplicity of genotyping procedures, cost‐effectivity, and suitability of microsatellite markers for small‐scale laboratories with limited budget. In this book, all chapters re‐highlighted the usefulness of microsatellites in genetic analyses of various life science fields, providing updated discussions and reviews on current use and future prospects of these markers, which invalidate emerging opinion on “full‐death” of microsatellites as useful genetic markers.
I am thankful to the Academy of Sciences of Uzbekistan, Committee for Coordination of Science and Technology Development, the Office of International Research Programs (OIRP) of the United States Department of Agriculture (USDA)—Agricultural Research Service (ARS), Texas A&M University, and U.S. Civilian Research & Development Foundation (CRDF) for financial support of SSR marker–based research of cotton in Uzbekistan.