DNA Based Techniques for Studying Genetic Diversity

Genetic diversity is a study undertaken to classify an individual or population compared to other individuals or populations. This is a relative measure, as the distance between any pair of entries in the study is greater or lesser depending on all pairwise comparisons that can be made in the study. However, genetic fingerprinting is the unambiguous identification of an individual (based on the presence or absence of alleles at different markers) or a population (based on frequencies of alleles of the markers). This is an absolute measure and does not change depending on other individuals or populations under study. Both Genetic diversity and fingerprinting studies are done using molecular markers.


Introduction
Genetic diversity is a study undertaken to classify an individual or population compared to other individuals or populations.This is a relative measure, as the distance between any pair of entries in the study is greater or lesser depending on all pairwise comparisons that can be made in the study.However, genetic fingerprinting is the unambiguous identification of an individual (based on the presence or absence of alleles at different markers) or a population (based on frequencies of alleles of the markers).This is an absolute measure and does not change depending on other individuals or populations under study.Both Genetic diversity and fingerprinting studies are done using molecular markers.
Determining genetic diversity can be based on morphological, biochemical, and molecular types of information (Mohammadi & Prasanna 2003;Sudre et al., 2007;Goncalves et al., 2009).However, molecular markers have advantages over other kinds, where they show genetic differences on a more detailed level without interferences from environmental factors, and where they involve techniques that provide fast results detailing genetic diversity (Binneck et al., 2002;Garcia et al., 2004;Saker et al., 2005;Goncalves et al., 2008;Souza et al., 2008).Moreover, the discovery of high throughput platforms increases number of data per run and reducing the cost of the data and increasing map resolution.

DNA based markers for genetic diversity studies
Molecular markers are segments of chromosomes which don't necessarily encode any traits and are not affected by the environment but which are inherited in a Mendelian fashion.Some segments of the chromosome change faster than others (i.e.coding vs. non coding DNA).As a result it is recommended to use fast changing markers for closely related individuals and slow changing markers for less related individuals (different species).Different marker types therefore have different usefulness in fingerprinting individuals and populations.Moreover; a good marker for fingerprinting studies will be cheap to run, or gives a lot of information per run; very repeatable between assays; experience very low error rate and easy, unambiguous to score; and contain many alleles (high information content).The following techniques are those most used in genetic diversity studies and listed in chronological order: RFLP (restriction fragment length polymorphism) (Botstein et al., 1980), SSR (simple sequence repeats or just microsatellites) (Tautz, 1989), RAPD (randomly amplified polymorphic DNA) (Williams et al., 1990) or AP-PCR (arbitrarily primed PCR) (Welsh & McClelland 1990), ISSR (inter-simple sequence repeats) (Zietkiewicz et al., 1994), AFLP (amplified fragment length polymorphism) (Vos et al., 1995), SNPs (single nucleotide polymorphisms) (Chen & Sullivan, 2003) and, more recently, DarT (diversity array technology) (Kilian et al., 2005) and other high throughput platforms.These different types of molecular markers are also different as to their potential to detect differences between individuals, their cost, facilities required, and consistency and replication of results (Schlotterer 2004;Schulman, 2007;Bernardo, 2008).A review summarizes various tools of DNA markers technology for application in molecular diversity analysis with special emphasis on wildlife conservation was presented by Arif et al., 2011.However, authors reviewed only mitochondrial DNA based markers including ribosomal DNA (12S and 16S rDNA), mitochondrial protein coding genes, non-coding or control region sequences and nuclear DNA based markers including random amplified polymorphic DNA, Amplified fragment length polymorphism, and microsatellites or simple sequence repeats.
As a laboratory methodology, fingerprinting and diversity studies require the following steps: a) isolation of DNA, b) digestion, hybridization, and/or amplification of DNA into specific fragments, c) sizing /or separation of DNA fragment combinations or patterns into a set of individual DNA fingerprints, d) comparison of DNA fingerprints from different individuals e) calculation of similarity (or dissimilarity) coefficients for all pairs of entries in the genetic study, f) creation of a dendrogram or graph to visualize the differences.

Restriction Fragment Length Polymorphism (RFLP)
Restriction fragment length polymorphism (RFLP) has much greater power and was originally developed for mapping human genes than anything previously available (Botstein et al., 1980).This technique quickly proved their utility in virtually all species.O'Brien, 1991 groups genetic markers into two types: Type I markers are associated with a gene of known function, and Type II markers are associated with anonymous gene segments of one sort or another.For now, RFLP remain the most common Type I marker presently used in many eukaryotic organisms.
Variations in the characteristic pattern of a RFLP digest can be caused by base pair deletions, mutations, inversions, translocations and transpositions which result in the loss or gain of a recognition site resulting in a fragment of different length and polymorphism.Only a single base pair difference in the recognition site will cause the restriction enzyme not to cut.If the base pair mutation is present in one chromosome but not the other, both fragment bands will be present on the gel, and the sample is said to heterozygous for the marker.Only codominant markers exhibit this behavior which is highly desirable, dominant markers exhibit a present/absent behavior which can limit data available for analysis.RFLP has some limitations since it is time consuming.Moreover, in some organisms such as wheat, RFLP is of low frequency which is attributed to the polyploidy nature of wheat, and large genome size.However; in the past RFLP was used for several purposes including genome mapping, varietal identification, identification of wheat rye recombinants, and identification of homologous chromosome arm (Tankesley at al., 1989).RFLP was also used for varietal identification, for example, it has been used for mapping different storage protein loci.Set of polymorphic probes was used to identify 54 common wheat cultivar, mostly Italian type (Vaccino et al., 1993).

PCR based methods
With the beginning of studies that led to the development of polymerase chain reaction (PCR) technology (Saiki et al., 1985;Mullis & Faloona, 1987), there were amazing advances in the refinement of techniques to obtain specific or non-specific DNA fragments, relevant mainly to research in genetic diversity.

Randomly Amplified Polymorphic DNA (RAPD)
RAPD was the first PCR based molecular marker technique developed and it is by far the simplest (Williams et al., 1990).Short PCR primers (approximately 10 bases) are randomly and arbitrarily selected to amplify random DNA segments throughout the genome.The resulting amplification product is generated at the region flanking a part of the 10 bp priming sites in the appropriate orientation.RAPD products are usually visualized on agarose gels stained with ethedium bromide.
RAPD markers are easily developed and because they are based on PCR amplification followed by agarose gel electrophoresis, they are quickly and readily detected.RAPD technique was used extensively in studying genetic diversity between plant species.For example, it was used to study genetic structure and diversity among and between six populations of Capparis deciduas in Saudi Arabia (Abdel-Mawgood et al. 2010).As a result, RAPD's may permit the wider application of molecular maps in plant science.Most RAPD markers are dominant and therefore, heterozygous individuals cannot be distinguished from homozygotes.This contrasts with RFLP markers which are co-dominant and therefore, distinguish among the heterozygote and homozygotes.Thus, relative to standard RFLP markers, and especially VNTR loci, RAPD markers generate less information per locus examined.One disadvantage of using RAPD technique is the reproducibility between different runs which is due to the short primer length and low annealing temperature.

Inter-simple sequence Repeat (ISSR)
The Inter-simple sequence repeat (ISSR) are semiarbitrary markers amplified by polymerase chain reaction (PCR) in the presence of one primer complementary to a target microsatellite.Each band corresponds to a DNA sequence delimited by two inverted microsatellites (Zietkiewicz et al., 1994;Tsumara et al., 1995;Nagaoka & Ogihara, 1997).It does not require genome sequence information; it leads to multilocus, highly polymorphous patterns and produces dominant markers (Mishra et al., 2003).ISSR PCR is a fast, inexpensive genotyping technique based on variation in the regions between microsatellites.This method has a wide range of uses, including the characterization of genetic relatedness among populations, genetic fingerprinting, gene tagging, detection of clonal variation, cultivar identification, phylogenetic analysis, detection of genomic instability (for example, it was used in human quantification of genomic instability to estimate of prognosis in colorectal cancer (Brenner, 2011), and assessment of hybridization.
ISSRs have been used in genetic diversity studies in different crop plants (Nagaraju et al.,39 2002;Reddy et al., 2002;Obeed et al. 2008).ISSR markers are also suitable for the identification and DNA fingerprinting (Gupta et al., 2002;Gupta & Varshney, 2000).This method has several benefits over other techniques: first, it is known to be able to discriminate between closely related genotypes (Fang & Roose, 1997;Hodkinson et al., 2002) and second, it can detect polymorphisms without any previous knowledge of the crop's DNA sequence.
ISSRs are like RAPDs markers in that they are quick and easy to handle, but they seem to have the reproducibility of SSR markers because of the longer length of their primers.However, ISSR is more informative than RAPD in wheat, fruit plants (strawberry, apple and Ribes species) and the common bean for the evaluation of genetic diversity (Korbin et al., 2002;Rakoczy-Trojanowska et al., 2004).It was proven to be reproducible, and quick for characterization many cultivars like poplar (Gao et al., 2006).

Simple Sequence Repeats (SSR)
Simple sequence repeat (SSR) markers are repeats of short nucleotide sequences, usually equal to or less than six bases in length, that vary in number (Rafalski et al., 1996;Reddy et al., 2002).SSR are becoming the most important molecular markers in both animals and plants.They are also called microsatellites.SSR are stretches of 1 to 6 nucleotide units repeated in tandem and randomly spread in eukaryotic genomes.SSR are very polymorphic due to the high mutation rate affecting the number of repeat units.Such length-polymorphisms can be easily detected o n h i g h r e s o l u t i o n g e l s ( e .g .s e q u e n c i n g g e l s ) .I t i s s u g g e s t e d t h a t t h e v a r i a t i o n o r polymorphism of SSRs are a result of polymerase slippage during DNA replication or unequal crossing over (Levinson & Gutman, 1987).SSRs are not only very common, also are hypervariable for numbers of repetitive DNA motifs in the genomes of eukaryotes (Vosman & Arens, 1997;Rallo et al., 2000;van der Schoot et al., 2000).
SSR have several advantages over other molecular markers.For example, (i) microsatellites allow the identification of many alleles at a single locus, (ii) they are evenly distributed all over the genome, (iii) microsatellites can offer more detailed population genetic insight than maternally inherited mitochondrial DNA (mtDNA) because of the high mutation rate and bi-parental inheritance (iv) they are co-dominant, (v) highly polymorphic and specific (Jones et al., 1997).(vi) very repeatable (vii) little DNA is required and (viii) so cheap and easy to run (ix) need a small amount of medium quality DNA and (x) the analysis can be semiautomated and performed without the need of radioactivity (Gianfranceschi et al., 1998;Guilford et al., 1997), (xi) with the advance of DNA isolation technology, it was possible to identify loci in highly degraded ancient DNA (aDNA), where traditional enrichment procedures have been unsuccessful (Allentoft et al., 2009), (xii) with the development of high-throughput sequencing platforms, such as the GS-FLX (Roche, Branford, CT, USA) SSR has recently become fast and efficient (Abdelkrim et al., 2009;Allentoft et al., 2009;Santana et al., 2009).SSRs are typically codominant and multiallelic, with expected heterozygosity frequently greater than 0.7, allowing precise discrimination even of closely related individuals.However, since genomic sequencing is needed to design specific primers, it is not very cost effective and also requires much discovery and optimization for each species before use.
For the past two decades they have been the markers of choice in a wide range of forensic profiling, population genetics and wildlife-related research.The importance and applicability of these markers are confirmed by observing an excess of 2,450,000 hits on the word ''microsatellite'' on the Web of Science database (accessed October-2011).
Searches of EST databases for microsatellite containing sequences have been useful for a number of species including humans (Haddad et al., 1997), catfish (Serapion et al., 2004), rice (Cho et al., 2000) and barley (Thiel et al., 2003).In Rainbow trout, cDNAs and expressed sequence tags (ESTs) available in public databases offer an in silico approach to marker development at virtually no cost.Marker development in salmonids is complicated by the evolutionarily recent genome duplication event which often results in multiple copies of loci in the haploid genome (Venkatesh, 2003).The AFLP was also used to test for purity of three inbred lines by examining the pattern of 5 individuals from each of inbred tested (Ismail et al. 1999).Characterization of loci including copy number is important when conducting analyses of genetic variability in genomic regions under control of different evolutionary constraints.Two linkage maps have been published for rainbow trout using AFLP and microsatellite markers with an average marker spacing of 10 cM (Sakamoto et al., 2000;Nichols et al., 2003).
In natural plant populations, microsatellites have great potential for helping to understand what determines patterns of genetic variation, particularly when used in concert with chloroplast DNA (cpDNA) markers.Their utility has been demonstrated in studies of genetic diversity (Morand et al., 2002;Zeid et al., 2003), mating systems (Durand et al., 2000), pollination biology (White et al., 2002) and seedling establishment (Dow & Ashley 1996).ISSR was also used for hybrid identification in maize (Abdel-Mawgood et al 2006), DNA fingerprinting of wheat genotypes in conjunction with RAPD (Abdel-Mawgood et al 2007).However, few studies have been carried out using microsatellites in analysis of population structure of polyploid species because the polyploidy complicates the results of the SSR.This is likely due to the problems in analysing polyploid data as well as difficulties in amplifying loci, possibly because of differences in the parental genomes of polyploids (Roder et al., 1995).A number of studies have demonstrated that microsatellite alleles of the same size can arise from mutation events which either interrupt repeat units or occur in the regions flanking the repeat region.This has been shown to occur both within (Angers & Bernatchez, 1997;Viard et al., 1998) and among populations (Estoup et al., 1995;Viard et al., 1998) and closely related species (Peakall et al., 1998;van Oppen et al., 2000).One approach to minimizing the risk of misinterpretation of genetic information is to characterize different electromorphs by sequencing, particularly in cases in which other genetic data (e.g.chloroplast or mitochondrial sequences) suggest strong levels of genetic structuring that is not being detected by microsatellite analysis.

Amplified Fragment Length Polymorphisms (AFLP)
Amplified Fragment Length Polymorphisms (AFLP) based genomic DNA fingerprinting is a technique used to detect DNA polymorphism.AFLP is a polymerase chain reaction (PCR) based technique, (Vos et al., 1995) has been reliably used for determining genetic diversity and phylogenetic relationship between closely related genotypes.AFLP analysis combines both the reliability of restriction fragment length polymorphism (RFLP) and the convenience of PCR-based fingerprinting methods.AFLP markers are generally dominant and do not require prior knowledge of the genomic composition.AFLPs are produced in great numbers and are reproducible The AFLP is applicable to all species giving very reproducible results.It was also used in microbial population: in studying genetic diversity of human pathogenic bacteria (Purcell & Hopkins, 1996), microbial taxonomy (Vaneechonette, 1996) and in characterizing pathovars of plant pathogenic bacteria (Bragard et al., 1997).In that regards it has the advantage of the extensive coverage of the genome under study.In addition the complexity of the bands can be reduced by adding selective bases to the primers during PCR amplification.It was also used in studying genetic diversity of human pathogenic bacteria (Bragard et al., 1997).After the completion of the genome sequencing of E. coli, it was possible to predict the band pattern of the AFLP analysis of E. coli.This indicates the power of this technique.In higher plants AFLP was used in variety of applications which includes examining genetic relationship between species (Hill et al., 1996), investigating genetic structure of gene pool (Tohme et al., 1996), and assessment of genetic differentiation among populations (Travis et al., 1996;Paul et al., 1997).
Disadvantages of this technique are that alleles are not easily recognized, has medium reproducibility, labor intensive and has high operational and development costs (Karp et al., 1997).Moreover, AFLP require knowledge of the genomic sequence to design primers with specific selective bases.Dominant markers such as RFLP as well as RAPD are very limited in their ability to precisely determine parentage.They can readily be used to establish that two individuals are not the same, but the statement, that two individuals are identical is usually only approximate and no formal statistics can be attached to this assertion.There are several advantage of this technique a) no need for prior knowledge of any sequence information, b) multiple bands are produced per each experiment, c) these bands are produced from all over the genome, d) the technique is reproducible (Blears et al., 1998;Vos & Kuiper, 1997), e) have highly discriminatory power, and f) the data can be stored in database like AmpliBASE MT (Majeed et al., 2004) for comparison purposes.

Single nucleotide polymorphism SNP's
Single nucleotide polymorphism SNP's, represent sites in the genome where DNA sequence d i f f e r s b y a s i n g l e b a s e w h e n t w o o r m o re individuals are compared.They may be individually responsible for specific traits or phenotypes, or may represent neutral variation that is useful for evaluating diversity in the context of evolution.SNPs are the most widespread type of sequence variation in genomes discovered so far.About 90% of sequence variants in humans are differences in single bases of DNA (Collins, 1998).
Several disciplines such as population ecology and conservation and evolutionary genetics are benefitting from SNPs as genetic markers.There is widespread interest in finding SNP's because they are numerous, more stable, potentially easier to score than the microsatellite repeats currently been used in gene mapping in human.Within coding regions there are on average four SNPs per gene with a frequency above 1%.About half of these cause amino acid substitutions: termed non-synonymous SNPs (nsSNPs) (Cargill et al., 1999).
Because of the importance of the SNP's in the discovery of DNA sequence variants, the National Human Genome Research Institute (NHGRI) of NIH along with the Center for Disease Control and Prevention and several individual investigators have assembled a DNA Polymorphism Discovery Resource of samples from 450 U.S. residents (Collins et al., 2011).This DNA variant discovery will help in finding SNP's that are deleterious to gene function or likely to be disease associated.
In plants, SNPs are rapidly replacing simple sequence repeats (SSRs) as the DNA marker of choice for applications in plant breeding and genetics because they are more abundant, stable, amenable to automation, efficient, and increasingly cost-effective (Duran et al., 2009;Edwards & Batley, 2010;Rafalski, 2002a).Generally, SNPs are the most abundant form of genetic variation in eukaryotic genomes.Moreover; they occur in both coding and noncoding regions of nuclear and plastid DNA (Kwok et al., 1996).As in the case of human genome, SNP-based resources are being developed and made publicly available for broad application in rice research.These resources include large SNP datasets, tools for identifying informative SNPs for targeted applications, and a suite of custom-designed SNP assays for use in marker-assisted and genomic selection.SNPs are widely used in breeding programs for several applications such as a) marker assisted and genomic selection, b) association and QTL mapping, positional cloning, c) haplotype and pedigree analysis, d) seed purity testing and d) variety identification e) monitoring the combinations of alleles that perform well in target environments (Bernardo, 2008;Jannink et al., 2010;Kim et al., 2010;Moose & Mumm, 2008;Xu & Crouch, 2008;McCouch et al., 2010).
Although SNP's have several advantages over other technology, there are limitations to the discovery of SNP's in the non-model organism.This is due to the expenses and technical difficulties involved in the currently available SNP isolation strategies (Brumfield, 2003;Seddon et al., 2005).Typical direct SNP discovery strategies involve sequencing of locusspecific amplification (LSA) products from multiple individuals or sequence determination of expressed sequence tags (EST-sequencing) (Suhn & Vijg, 2005;Twyman, 2004).Other direct strategies include whole genome (WGSS) and reduced representation (RRSS) shotgun sequencing approaches.If comparative sequence data are available in public or other databases, various sequence comparison algorithms that identify nucleotide differences provide an alternative means to empirically discover SNPs (Guryev et al., 2005).In rice, for example, SNPs discovery on a genome wide basis was based on using either the genomicscale re-sequencing approaches or Sanger sequencing-based strategies.However, the later approach require the design of specific primer pairs, which are generally located in exons and may span intronic as well as exonic regions (Caicedo et al., 2007;Ebana et al., 2010;Tung et al., 2010;Yamamoto et al., 2010;McCouch et al., 2010).
A direct analysis of sequence difference between many individuals at a large number of loci can be achieved by the next generation sequencing.Re-sequencing is used to identify genetic variation between individuals, which can provide molecular genetic markers and insights into gene function.The process of whole-genome re-sequencing using short-read technologies involves the alignment of a set of literally millions of reads to a reference genome sequence.Once this has been achieved, it is possible to determine the variation in nucleotide sequence between the sample and the reference.There does not appear to be a major disadvantage to using short-read technology for SNP discovery where a reference genome is available.The approach is also low cost, at approximately $0.25 per SNP, compared with about $2.95 using Sanger sequencing.Where a draft reference genome is not available, it may be possible to combine the long-and short-read next-generation sequencing technologies, and use 454 sequencing to generate an assembly against which to align the short reads.Re-sequencing has proved to be a valuable tool for studying genetic variation and, with the advent of James Watson's genome being sequenced using this method (Wheeler et al., 2008), the challenge of whole-genome re-sequencing has largely been conquered.Whole-genome re-sequencing for SNP discovery has been demonstrated in Caenorhabditis elegans.Solexa technology was used to sequence two C.elegans strains, which were then compared with the reference genome sequence for SNP and indel identification.The software applications PyroBayes and Mosaik were used to differentiate between true polymorphisms and sequence errors (Hillier et al., 2008).Although wholegenome re-sequencing can be very useful, there are some drawbacks with this approach.First, a reference genome sequence is required and the quality of the re-sequenced genome is highly dependent on the quality of the reference sequence.In addition, for very large and complex plant genomes, a vast amount of sequence data is required to confidently call SNPs, with SNPs in repetitive sequences being particularly difficult to call.However, as sequencing technology continues to improve, it is expected that whole-genome resequencing of crop genomes will become common.
The new development in technologies that collect high-throughput data contribute substantially in the progress in evolutionary genomics (Gilad et al., 2009).The nextgeneration sequencing technologies has the potential to revolutionize genomic research and enable us to focus on a large number of outstanding questions that previously could not be addressed effectively (Rafalski, 2002b).The next generation sequencing (NGS) provides the capacity for high-throughput sequencing of whole genomes at low cost.They have advantage of improving the capacity to finding novel variations that are not covered by genotyping arrays.The efficiency of NGS-mediated genotyping has recently been improved through employing amplicon libraries of long-range PCR, which encompass discrete genomic intervals.However, at present, pitfalls of next-generation sequencing data is challenging, in particular because most sequencing platforms provide short reads, which are difficult to align and assemble.The next-generation sequencing technology' can produce very large amounts; typically millions of short sequence reads (25-400 bp).However, these large numbers of relatively short reads are usually achieved at the expense of read accuracy (Imelfort et al., 2009).Moreover, only little is known about sources of variation that are associated with next-generation sequencing study designs (Morozova & Marra, 2008).
The first commercially available next-generation sequencing system was developed by 454 and commercialized by Roche (Basel, Switzerland) as the GS20, capable of sequencing over 20 million base pairs, in the form of 100-bp reads, in just over 4h.The GS20 was replaced during 2007 by the GS FLX model, capable of producing over 100 million base pairs of sequence in a similar amount of time.Roche and 454 continue to improve data production with the expectation of 4-500 Mbp of sequence per run, and an increase to 500-bp reads with the release of their Titanium system towards the end of 2008.Two ,alternative ultrahighthroughput sequencing systems now compete with the GS FLX: Solexa technology, commercialized by Illumina (San Diego, California, USA), and the SOLiD system from Applied Biosystems (AB) (Carlsbad, California, USA).A rapid and effective method for high-throughput SNP discovery for identification of polymorphic SNP alleles in the oat genome was developed based on high resolution melting and high-throughput 454 sequencing technology.The developed platform for SNP genotyping is a simple and highly-informative and can be used as a model for SNP discovery and genotyping in other species with complex and poorly-characterized genomes (Oliver et al., 2011).

Array based platforms
Several different types of molecular markers have been developed over the past three decades (Kumar, 1999;Gupta & Rustgi 2004), motivated by requirements for increased throughput, decreased cost per data point, and greater map resolution.Recently, oligonucleotide-based gene expression microarrays have been used to identify DNA sequence polymorphisms using genomic DNA as the target (Hazen & Kay, 2003).

Diversity arrays technology (DArT)
Diversity arrays technology (DArT) is a microarray hybridization based technique that permits simultaneous screening thousands of polymorphic loci without any prior sequence information.The DArT methodology offers a high multiplexing level, being able to simultaneously type several thousand loci per assay, while being independent of sequence information.DArT assays generate whole genome fingerprints by scoring the presence versus absence of DNA fragments in genomic representations generated from genomic DNA samples through the process of complexity reduction.DArT has been developed as a hybridisation-based alternative to the majority of gel-based marker technologies currently in use It can provide from hundreds to tens of thousands of highly reliable markers for any species as it does not require any precise information about the genome sequence (Jaccoud et al., 2001).Moreover, DArT was recently shown to provide good genome coverage in wheat and barley (Wenzl et al., 2004;Akbari et al., 2006).An important step of this technology is a step called "genome complexity reduction" which increasing genomic representation by reducing repetitive sequence that is abundant in eukaryotes.With DArT platform, comprehensive genome profiles are becoming affordable for virtually any crop, genome profiles which can be used in management of bio-diversity, for example in germplasm collections.DArT genome profiles enable breeders to map QTL in one week.DArT profiles accelerate the introgression of a selected genomic region into an elite genetic background (for example by marker-assisted backcrossing).In addition, DArT profiles can be used to guide the assembly of many different regions into improved varieties (marker assisted breeding).The number of markers DArT detects is determined primarily by the level of DNA sequence variation in the material subjected to analysis and by the complexity reduction method deployed (Kilian et al., 2003).Another advantage of DArT markers is that their sequence is easily accessible compared to amplified fragment length polymorphisms (AFLPs) making DArT a method of choice for non-model species (James et al., 2008).DArT has been also applied to a number of animal species and microorganisms (The Official Site of Diversity Arrays Technology (DArT P/L).

Restriction Site-Associated DNA (RAD)
Another high throughput method is restriction site-associated DNA (RAD) procedures which involved digesting DNA with a particular restriction enzyme, ligating biotinylated adapters to the overhangs, randomly shearing the DNA into fragments much smaller than the average distance between restriction sites, and isolating the biotinylated fragments using streptavidin beads (Miller et al., 2007a).RAD specifically isolates DNA tags directly flanking the restriction sites of a particular restriction enzyme throughout the genome.More recently, the RAD tag isolation procedure has been modified for use with high-throughput sequencing on the Illumina platform (Baird et al., 2008;Lewis et al., 2007).In addition, Miller et al., 2007b demonstrate that RAD markers, using microarray platform, allowed highthroughput, high-resolution genotyping in both model and nonmodel systems.

4.3.Single Feature Polymorphism (SFP)
A third high throughput method is single feature polymorphism (SFP) which is done by labeling genomic DNA (target) and hybridizing to arrayed oligonucleotide probes that are complementary to indel loci.The SFPs can be discovered through sequence alignments or by hybridization of genomic DNA with whole genome microarrays.Each SFP is scored by the presence or absence of a hybridization signal with its corresponding oligonucleotide probe on the array.Both spotted oligonucleotides and Affymetrix-type arrays have been used in SFP.Borevitz et al. 2003 coined the term "single feature polymorphism" and demonstrated that this approach can be applied to organisms with somewhat larger genomes, specifically Arabidopsis thaliana with a genome size of 140 Mb.Similarly, wholegenome DNA-based SFP detection has been accomplished in rice (Kumar et al., 2007), with a genome size of 440 Mb, barley, which has a 5300 Mb genome composed of more than 90% repetitive DNA, (Cui et al., 2005).Thus SFPs have become an attractive marker system for various applications including parental polymorphism discovery, which.
The development of DNA based technologies such as SFP , DArT and RAD which are based on microarray have the merits of SNP;s without going through sequencing.These technologies have provided us platforms for medium-to ultra-high-throughput genotyping to discover regions of the genome at a low cost, and have been shown to be particularly useful for genomes, where the level of polymorphism is low (Gupta et al., 2008).These array based technologies are expected to play an important role in crop improvement and will be used for a variety of studies including the development of high-density molecular maps, which may then be used for QTL interval mapping and for functional and evolutionary studies.Some of these arrays are available from Illumina and Affymetrix.

Internal Transcribed Spacer (ITS)
Eukaryotic ribosomal RNA genes (known as ribosomal DNA or rDNA) are parts of repeat units that are arranged in tandem arrays.They are located at the chromosomal sites known as nucleolar organizing regions (NORs).Nuclear ribosomal DNA has 2 internal transcribed spacers: ITS-1 that is located between the small subunit (16s-18s) and 5.8S rRNA cistronic regions, and the ITS-2 which is located between the 5.8S and large subunit (23S-28S) rRNA cistronic regions.The 2 spacers and the 5.8S subunit are collectively known as the internal transcribed spacer (ITS) region.
The ITS regions of rDNA (600-700 bp) repeats are believed to be fast evolving and therefore may vary in length and sequences.The regions flanking the ITS are highly conserved and was used to design universal PCR primers to enable easy amplification of ITS region.Although the biological role of the ITS spacers is not well understood, the utilization of yeast models has definitely shown their importance for production of the mature rRNA.
The number of copies of rDNA repeats is up to-30000 per cell (Dubouzet & Shinoda, 1999).This makes the ITS region an interesting subject for evolutionary and phylogenetic investigations (Baldwin et al., 1995) as well as biogeographic investigations (Baldwin, 1993).The sequence data of the ITS region has also been studied earlier to assess genetic diversity in cultivated barley (Petersen & Seberg 1996).Generally, the ITS has become an important nuclear locus for systematic molecular investigations of closely related taxa.This is because the ITS region is highly conserved intraspecifically, but variable between different species (Bruns et al., 1991;Hillis & Dixon, 1991).Furthermore, the ITS region evolves much more rapidly than other conserved regions of rDNA (Baldwin et al., 1995).Thus, phylogenetic studies based on nrDNA, ITS sequences have provided novel insights into plant evolution and hybridization in various plant species (Sang et al., 1995;Wendel et al., 1995;Buckler & Holtsford 1996a;Quijada et al., 1998;Semerikov & Lascoux, 2003).The ITS region and trnL intron are the most widely used markers in phylogenetic analyses of the Brassicaceae (Koch et al., 2003a;Koch et al., 2003b).The ITS sequences is one of the most successfully used of nuclear genome in studying phylogenetic and genomic relationships of plants at lower taxonomic levels (Baldwin et al., 1995;Wendel et al., 1995).
The genetic diversity studies using ITS can be used by either direct sequencing of the region from different individuals followed by tree consctruction based on sequence comparison.The other method is by measuring sequence variation by restriction digestion of the ITS region and separate the digested fragments on high concentration agarose or acrylamide gel electrophoresis.Sequence variation produced by the latter method called restriction fragment length polymorphisms (PCR-RFLPs), which can be used for taxonomic goals.The ITS region was successfully used for the diagnostics and quick identification of cyst nematodes.Comparisons of PCR-RFLP profiles and sequences of the ITS-rDNA of unknown nematodes with those published or deposited in GenBank facilitate quick identification of most species of cyst nematodes (Subbotin et al., 2001& Subbotin et al., 2000).
Because of the assumed conservation in secondary structure, it is also referred to as a double-edged tool for eukaryote evolutionary comparison (Tippery & Les, D.H., 2008;Young & Coleman, 2004).This is because the ITS region is highly conserved intraspecifically, but variable between different species (Bruns et al., 1991;Hillis & Dixon, 1991).For many organisms, the ITS2 in premature rRNA is organized around a preserved central core of secondary structure from which four helices emerge (Coleman, 2003).Structural ITS2 database contains more than 288,000 pre-calculated structures for the currently known ITS2 sequences and provides new possibilities for incorporating structural information in phylogenetic studies (http://its2.bioapps.biozentrum.uni-wuerzburg.de).Global pairwise alignments from about ribosomal RNA (rRNA) internal transcribed spacer 2 (ITS2) sequences -all against all -have been generated in order to model ITS2 secondary structures based on sequences with known structures.Via 60,000 known ITS2 sequences that fit a common core of the ITS2 secondary structure described for the eukaryotes homology based modeling (Wolf et al. 2005) and reannotation procedures revealed in addition more than 150,000 homologous structures that could not be predicted by standard RNA folding programs.This database was used for studying gene phylogeny of the bioassay alga "Selenastrum capricornutum" (Buchheim, et al., 2011;Krienitz et al., 2011).
Although ITS is widely used in phylogenetic studies, there is a reported case -for the genus Corylus-in which the ITS region failed to explain the genetic relationship between species (Erdogan & Mehlenbacher, 2000).Moreover, Concerted evolution of ribosomal DNA repeats (including the ITS region) may be a problem if there are instances of allopolyploid speciation within the group (Wendel et al., 1995).Despite this potential problem, ITS is generally considered to be of great utility for phylogenetic analysis among closely related species (Baldwin, 1992;Baldwin et al., 1995),

Chloroplast DNA as source of genetic diversity
The cytoplasm DNA consisting of the chloroplast and mitochondrial genomes which became very useful for studying genetic diversity and other taxonomic studies specially after the development of molecular markers.The analysis of the chloroplast organelle provides information on genetic diversity of plants that is complementary to that obtained from the nuclear genome.The chloroplast genome is highly conserved and has a much lower mutation rate than plant nuclear genomes.Consequently, chloroplast DNA has been extensively exploited in studies of plant genetic diversity.Restriction site analysis of chloroplast (cp) DNA has been widely used for interspecific studies and in some cases the magnitude of intraspecific variation has been ssufficient to allow population based studies (Soltis et al., 1992).In addition, Restriction fragment length polymorphism (RFLP) analysis of cpDNA was used to study genetic diversity of Douglas-fir in British Columbia from coastal, interior and transition zones (Ponoy et al., 1994).The RFLP is the result of length mutation or rearrangements.Such polymorphisms are often associated with localized hotspots that contain repetitive DNA.Because of their mutational complexity and lack of representativeness of the genome, they provide biased estimate of nucleotide diversity and thus may also give rise to incorrect estimates of genetic subdivision.

Chloroplast Transfer RNAs (tRNAs)
Transfer RNAs (tRNAs) are ancient macromolecules that have evolved under various environmental pressures as adaptors in translation in all forms of life but also towards alternative structures and functions (Muller-Putz et al., 2010).In other words, the tRNA world presents a large diversity in terms of function (which includes cell wall synthesis, porphyrin biosynthesis for heme and chlorophyll, N-terminal modification of proteins, initiation of reverse transcription in retroviruses, and lipid remodelling in addition to ribosome-dependent protein synthesis) as well as in terms of structure.Several recent reviews illustrate this diversity (Hopper & Phizicky, 2003;Ryckelynck et al., 2005;Dreher 2009;Roy & Ibba, 2009).Another feature of this region is that it has highly conserved nature which allows structural changes, in the form of indels and repeat sequences, as well as base substitutions to be considered phylogenetically informative (Palmer, 1991;Raubson & Jansen 2005).Low base pair substitution rates within the chloroplast genome has lead to the use of indels in population level studies, with small structural changes (< 10 bp) being useful for increasing phylogenetic resolution and increasing the ability to discriminate within species variation (Mitchell-Olds et al., 2005).
The trnT-trnF region is located in the large single copy region of the chloroplast (cp) genome.The plant trnL-trnF intergenic spacer is less than 500-bp long.From the conserved region, two primers designed (Taberlet et al., 1991) can be used to amplify this spacer in various plant species .In addition, there is generally a high degree of polymorphism exists in the spacer between species.For example, spacer sequences of Acer pseudoplatanus and A. platamoides, two closely related species, are different (Taberlet et al., 1991).Sequence differences between different species within this spacer region can be detected using polyacrylamide gel electrophoresis, polymerase chain reaction -single stranded conformation polymorphism (PCR-SSCP), or even agarose gel electrophoresis.Therefore, the existence of the two universal primers and the high degree of polymorphism in the trnL-trnF inergenic spacer makes it a good marker for paternal analysis in many plant species, studies (Baker et al. 1999;Chen et al., 2002).The noncoding regions of the intron of trnL (UAA) and the intergenic spacer of trnL (UAA)-trnF (GAA) was also extensively utilized for evolutionary analysis in plants and for developing markers for identifying the maternal donors of polyploids with additional capacity to reveal phylogenetic relationships of related species (Sang et al., 1997;Xu & Ban, 2004).
Chloroplast genome is more stable since larger structural changes, and complex structural rearrangements, such as inversions, translocations, loss of repeats and gene duplications are not common.However, there are occasional reports suggesting structural changes in the chloroplast genome (such as pseudogene which are non functioning duplications of functional genes formation) (Ingvarsson et al., 2003).Two main mechanisms have been suggested to account for this structural changes: intramolecular recombination between two similar regions on a single double-stranded DNA (dsDNA) molecule that results in the excision of the intermediate region, yielding a shorter dsDNA molecule and a separate circularised dsDNA molecule.It has been suggested that this will be mediated by short repeats formed by slip-strand mispairing (Ogihara et al., 1988), and intermolecular recombination between two genome copies of the plastids because the plastid contains a large number of copies of the chloroplast genome.Recombination by this method could act to increase pseudogene variation through uneven crossing over (Dobes, 2007).The difference in copy number of pseudogenes can be used in genetic diversify studies as well as classification.In the Brassicaceae, for example, the trnL-F region from at least 20 genera contain vaiable copy number of a duplicated trnF pseudogene (Ansell et al., 2007;Koch & Matschinger, 2007;Schmickl et al., 2008).These copies are thought to be non-functional, and are made up of partial trnF gene fragments ranging between 50-100 bp in length.Tedder et al. 2010 suggested that the pseudogene duplications can serve as a useful lineage marker within the Brassicaceae, as they are absent from many genera, including Brassica, Draba, and Sinapis (Koch et al., 2006), and in some cases the copy number can be increased to 12 in certain genera (Schmickl et al., 2008).This study requires amplification by PCR using the 'E' and 'F' primers of Taberlet et al., 1991 and sequencing of the trnL(UAA)-trnF(GAA) gene region.This system was used to study the phylogeography of European populations of Arabidopsis lyrata (Ansell, 2007, Ansell et al., 2010), which were considered as A. lyrata ssp.petraea.Taxonomy of this genus has subsequently been altered, and now both subspecies are considered as the Arabidopsis lyrata complex (Schmickl et al., 2008).

Chloroplast SSR
Nuclear simple sequence repeats SSR is considered a popular marker for population genetics because they are considered selectively neutral, highly polymorphic, codominant and inherited in Mendelian mode (Sunnuck, 2000).SSR or microsatellites (Tautz, 1989) are sequence of repetitive DNA where a single motif consisting of one to six base pairs is repeated tandomely or number of times.The cpSSR differs from nuclear SSR in that they are consisted of mononucleotide motif that is repeated 8-15 times in comparison to one to six nucleotide repeats for nuclear SSR.Chloroplast SSR evolves faster than the other gene regions in the chloroplast genome (Jakob et al., 2007).It has been identified in number of genomes.SSR from organelle genomes, mitochondrial SSR have little impact comparing to chloroplasts.Moreover, chloroplst genomes are transimitted in multiple copies during mitosis and miosis and for that reason they are subject to random drift between and within individuals.The uniparentally inherited, haploid and non-recombinanat nature of chloroplast genome make them very useful tool in evolutionary studies (Petit et al., 2005).Chloroplast SSR (cpSSR) have been used in several plant species such as conifer (Vendramin et al., 1996), graminae (Provan et al., 2004).The potential value of cpDNA markers in complementing nuclear genetic markers in population genetics is widely recognized (Provan et al., 1999;Provan et al., 2001;Petit et al., 2005).The cpSSR evolves faster than any other part of the chloroplast genome.When they are located in noncoding region of cpDNA they show intraspecific variation in repeat numbers.Moreover the effective population size for haploid cpDNA is smaller than diploid nuclear genes so it is considered to be stronger in differentiation between species.Quintela-Sabaris et al., 2010 used cpSSR markers to analyze the colonization pattern of Cistus ladanifer.The cpSSR is sensitive to drift and was used in studying differentiation due to genetic drift (Comes & Kadereit, 1998), in genetic structure and gene flow (Hansen et al., 2005), in bottleneck phenomena (Echt et al., 1998), detection of reduction in genetic diversity within M population of Silene paradoxab (Mengoni et al., 2001).They can also be used to monitor the transmission of chloroplast genomes during hybridization and introgression in wild or breeding populations or to characterize plastid genome type for breeding purposes (Flannery et al., 2006).A comprehensive review for the technical resources, applications, and recommendations for expanding DNA discovery in wide array of plant species is presented by Ebert & Peakall, 2009.

Conclusion
Different marker types have different usefulness in studying genetic diversity.Fast changing markers can be used in studying closely related species.RFLP was the first molecular marker to be used in genetic diversity.Although reproducible, it is time consuming and in ploidy species it is of low frequency.RAPD marker is easy to perform, however; it has inherited problem of reproducibility.ISSR is more reproducible and polymorphic than RAPD.The SSR have been the marker of choice for the last two decades especially before the discovery of SNP's.SNP are the most widespread sequence variation in the genome.They are numerous, more stable and easier to score than SSR.AFLP, although having high discriminatory power, it has medium reproducibility and alleles are not easily recognized.On the other hand the development of microarray based technologies such as SFP, DArT and RAD which have the merits of SNP's without going through sequencing.They are medium-to ultra-high-throughput genotyping at a low cost.They have been shown to be particularly useful for genomes, where the level of polymorphism is low.They are expected to play an important role in crop improvement and will be used for a variety of studies including the development of high-density molecular maps, which may then be used for QTL interval mapping and for functional and evolutionary studies.
One important characteristics of ITS region is that it is highly conserved intraspecifically, but variable between different species.It provides novel insights into plant evolution and hybridization and considered one of the most successfully used of nuclear genome in studying phylogenetic and genomic relationships of plants.The noncoding regions of the intron of trnL (UAA) and the intergenic spacer of trnL (UAA)-trnF (GAA) was extensively utilized for evolutionary analysis in plants and for developing markers for identifying the maternal donors of polyploids with additional capacity to reveal phylogenetic relationships of related species.In addition to the high degree of polymorphism, the intergenic spacer of trnL (UAA)-trnF (GAA) can be amplified from several plant species using two universal primers.On the other hand, cpSSR evolves faster than any other part of the chloroplast genome It has been used in several plant species.