Sequences and hybridization characteristics of the oligonucleotides probes (Ali and Wallace, 1988)
The analysis of genetic diversity and relatedness within and between the different species and populations has been a major theme of research for many biologists. With the availability of whole-genome sequencing for an increasing number of species, focus has been shifted to the development of molecular markers based on DNA or protein polymorphism. DNA sequences originate and undergo evolutionary metamorphoses’ and thus may be used as powerful genetic markers to characterize genomes of wide range of species. This type of analysis is called fingerprinting, profiling or genotyping. DNA profiling based on typing individuals using highly variable minisatellites in the human genome was first developed by Jeffreys et al (1985). He demonstrated short repeat sequences tandemly arranged within the gene(s) and each organism has a unique pattern of the arrangement of these minisatellites, the only exception being multiple individuals from a single zygote (e.g. identical twins). DNA fingerprinting technique was notably used to help solve crimes and determine paternity. In addition, with the advances in Molecular biology techniques, isolation of genes tagged with minisatellites has become the most powerful tool for genome analysis.
The term “repetitive sequences” (repeats, DNA repeats, repetitive DNA) refers to DNA fragments that are present in multiple copies in the genome. These sequences exhibit a high degree of polymorphism due to variation in the number of their repeat units caused by mutations involving several mechanisms (Tautz, 1989). This hypervariability among related and unrelated organisms makes them excellent markers for mapping, characterization of the genomes, genotype phenotype correlation, marker assisted selection of the crop plants, molecular ecology and diversity related studies. The nature of repeats provides ample working flexibilities over the other marker systems. This is because: (i) short tandem repetitive (STR) sequences are evenly distributed all over the genome (ii), are often conserved between closely related species (iii) and are co-dominant. With these innate attributes, very small quantities of DNA can be used for simultaneous detection of the alleles tagged with STR employing minisatellite associated sequence amplification (MASA).
Current data base has the information on the genomes of various livestock species, like cattle, sheep, goat, pig, horse, chicken, Silkworm, Honey Bee, Rabbit, Dog, cat and duck (Georges and Andersson, 1996). However complete sequence analysis of several important species such as Yak, Banteng, Zebu, Donkey, Goose, Turkey, Camel and Water buffalo are still underway. Efforts are required to characterize genes controlling important traits in order to produce genetically healthy breeds and segregate superior germplasm wherever possible. Water buffalo, Bubalus bubalis is important domestic animal worldwide having immense potential in agriculture, dairy and meat industries. We have studied several repeat loci in buffalo genome using Restriction Fragment Length Polymorphism and characterized a number of important genes employing Minisatellite Associated Sequence Amplification (MASA).
MASA forms a rich basis of functional and comparative genomics contributing towards the understanding of genome organization, gene expression and development of molecular synteny. This approach also enables characterization of the same genes across the individuals within a species and amongst the individuals between the species. Thus, information about the organization of gene, its expressional, mutational and phylogenetic status, chromosomal location and genetic variations across the genomes maximize the chances of narrowing the search of possible genetic markers.
In this chapter, we discuss overall organization of the repetitive sequences, their origin, distribution, application in genome analysis and implications. In addition, use of repetitive sequences in bubaline genome mining is highlighted elucidating the potential of functional and comparative genomics. Thus, organizational variation and expressional profile of a single gene originating from a specific tissue may be studied in many ways to meet the varying requirements of biology.
2. Organization of repetitive sequences
The mammals have approximately 3 billion base pairs per haploid genome harboring about 20,000-25000 genes. A minor part of the genome (5-10%) is coding sequences (International Human Genome Sequencing Consortium, 2004; Hochgeschwender and Brennan, 1991) and the remaining part is non-coding representing repetitive DNA (Bromham, 2002). Comparison of the genome size of different eukaryotes shows that the amount of non-coding DNA is highly variable and constitutes 30% to about 99% of the total genome (Elgar and Vavouri, 2008; Cavalier-Smith, 1985). The non-coding repetitive sequences are dynamic elements, which reshape their host’s genome by generating rearrangements, shuffling of genes and modulating pattern of expression. This dynamism of repeats leads to evolutionary divergence that can be used in species identification, phylogenetic inference and for studying process of sporadic mutations and natural selection. These repetitive sequences are mainly composed of interspersed and tandem repeats (Slamovits and Rossi, 2002). The later includes satellite, minisatellites and microsatellites. Satellite DNAs are predominantly associated with centromeric heterochromatin and the same is being increasingly utilized as a versatile tool for genome analysis, genetic mapping and for understanding chromosomal organization. On the other hand minisatellite and microsatellites are dispersed throughout the genome and are highly polymorphic in all populations studied. This arrangement has led to their extensive use as genetic markers for fingerprinting, genotyping, and for forensic analysis in human system. Based on their arrangements, repetitive DNA sequences are classed into two types (Figure 1).
2.1. Highly repetitive sequences
These are are short sequences (5 to10 bp) amounting 10% of the genome and repeated a number of times, usually occurring as tandem repeats (present in approximately 106 copies per haploid genome). However, they are not interspersed with different non-repetitive sequences. Usually, the sequence of each repeating unit is conserved. Most of the sequences in this class are located in the heterochromatin regions of the centromeres or telomeres of the chromosomes. Highly repetitive sequences interacting with specific proteins are involved in organizing chromosome pairing during meiosis and recombination.
2.1.1. Satellite DNA
These are represented by monomer sequences, usually less than 2000-bp long, tandemly reiterated up to 105 copies per haploid animals and located in the pericentromeric and or telomeric heterochromatic regions (Charlesworth et al 1994). Satellite DNA constitutes from 1 to 65% of the total DNA of numerous organisms, including that of animals, plants, and prokaryotes. The term “satellite” in the genetic sense was first coined by the Russian cytologist Sergius Navashin, in 1912, initially in Russian (“sputnik”) and Latin (satelle), and was later translated to “satellite” (Battaglia, 1999). The more familiar usage of "satellite" relates to a small band of DNA with a density different (usually lower, because of a high AT-content) from the bulk of the genomic DNA, which are separated from the main band following CsCl centrifugation (Kit, 1961). Nucleotide changes and copy number variations fuel the process of their evolution within and across the species (Ugarkovic and Plohl, 2002). Satellite fraction(s), though not conserved evolutionarily (Bhatnagar et al 2004; Amor and Choo, 2002), are unique to a species and usually show similarity amongst related group of animals (Pathak et al 2011; Henikoff et al 2001; Ali and Gangadharan, 2000).
2.2. Moderately or dispersed repetitive sequences
These include short (150 to 300-bp) sequences or long ones (5-kbp) amounting about 40% and 1-2% of the total genome, respectively. These are dispersed throughout the euchromatin having 103-105 copies per haploid genome. These sequences are involved in the regulation of gene expression. In some cases, long dispersed repeats of 300 to 600-bp show homology with the retro viruses.
On the basis of their mode of amplification, repetitive DNA sequences may be tandemly arranged or interspersed in the genome (Slamovits and Rossi, 2002).
2.2.1. Interspersed repetitive DNA
Interspersed repeat sequences scattered throughout the genome have arisen by transposition, having “ability to jump from one place to another in the genome” (Miller and Capy, 2004; Brown, 2002). Even though the individual units of interspersed repetitive non-coding DNA are not clustered, taken together they account for approximately 45% of the human genome. By the mechanism of their transposition, interspersed repeats are classified into two classes:
18.104.22.168. RNA transposons
RNA transposons also known as retroelements found in eukaryotic genome require reverse transcription for their activity. Based on their structural relationship, RNA transposons are divided into two general categories:
22.214.171.124.1. LTR elements
LTR includes retroviruses whose genomes are made up of RNA. They infect different types of vertebrates.
Endogenous retroviruses (ERVs)
These are retroviruses integrated into the vertebrate chromosomes and inherited from generation to generation as part of the host genome. Some are still active and might, at some stage in a cell's lifetime, direct synthesis of the exogenous viruses. However, majority of them are decayed relics and no longer have the capacity to form viruses (Patience et al 1997).
Retrotransposons are the biggest class of the transposons. An important characteristic of this type of transposable element is that they usually contain sequences with potential regulatory activity. They have features of non-vertebrate eukaryotic genomes (i.e. plants, fungi, invertebrates and microbial eukaryotes). These elements code for mRNA molecule which is processed and polyadenylated. Retrotransposons have very high copy numbers. In maize, these elements occupy half of the genome.
126.96.36.199.2. Non LTR elements
LINEs (Long Interspersed Nuclear Elements)
LINEs are several thousand base pairs in size and make up about 17% of the total human genome (Richard and Batzer, 2009). They contain reverse-transcriptase-like gene involved in retrotransposition process. Many LINEs also code for an endonuclease (e.g. RNase H). The most abundant LINE family is the 7-kbp, L1 repeat element having >500,000 copies and accounts for approximately 15% of the human genome (Lander et al 2001). Despite its abundance, no function of LINE 1 repeat is yet known. Initial studies on mouse have associated L1s in shaping the structure and expression of the transcriptomes (Han et al 2004; Han and Boeke, 2005).
SINEs (Short Interspersed Nuclear Elements)
SINEs are small elements, usually 100 to 500-bp in length, accounting for 11% of the human genome (Richard and Batzer 2009). SINEs do not have reverse transcriptase gene, instead they borrow reverse transcriptase enzymes from other retroelements. Well-known example of SINE in the human genome is Alu sequences (Capy et al 1998), which are 350 base pairs long, do not contain any coding sequences, and have over 1 million copies (Roy-Engel et al 2001)
188.8.131.52. DNA Transposons
DNA transposons do not require RNA intermediate and transpose in a direct DNA-to-DNA manner. In eukaryotes, DNA transposons are less common than retrotransposons, but they have a special place in genetics because a family of plant DNA transposons - the Ac/Ds elements of maize. There are two types of DNA transposons that both require enzymes coded by genes within the transposons.
2.2.2. Tandem repeats
Tandem repeats consists of repeat arrays of two to several thousand-sequence units arranged in a head to tail fashion. Tandem repeats may be further classified according to the length and copy number of the basic repeat units as well as its genomic localization.
184.108.40.206. Mega satellite DNA
These are characterized by tandemly repeated DNA in which the repeat unit is approximately 50-400 times, producing blocks that can be hundreds of kilobases long. Some mega satellites are composed of coding repeats. For example: RNA genes, and the deubiquitinating enzyme gene USP17.
220.127.116.11. Minisatellite DNA
This comprises tandem copies of repeats that are 6-100 nucleotides in length (Tautz, D. 1993). Alec Jeffrey’s first described minisatellites in 1985, from the non-coding (intron) regions of the human myoglobin gene. Since then similar DNA structures have been reported in many organisms including bacteria (Skuce et al 2002), avian (Reed et al 1996), higher plants (Sykorová et al 2006; Durward et al 1995), protozoan (Feng et al 2011; Bishop et al 1998), and yeast (Kelly et al 2011; Haber and Louis, 1998) genomes. Comparison of the repeat units in classical minisatellites led to early notion of consensus or core sequences, which exhibit some behavioral similarities with the Chi sequences of λ phage (GCTGTGG). Also called as variable number of tandem repeats (VNTR) (Brown 2002), majority of the minisatellites are GC rich, with a strong strand asymmetry. Often minisatellites form families of related sequences that occur at many hundred loci in the nuclear genome. In human genome, number of minisatellite loci is estimated to be approximately 3000 and each locus contains a distinctive repeat unit with respect to size and sequence content. The degree of repetition ranges from two to several hundreds. Repeat unit within a minisatellite usually display small variations in sequence. Minisatellite mutations usually consist of gains or losses of one or more repeat units. Such mutations at hypervariable minisatellite loci are up to 1000 times more common than mutations in protein coding genes (Debrauwere et al 1997).
In the humans, majority of minisatellites are clustered near sub-telomeric ends of the chromosomes limiting their usefulness for extensive gene mapping (Lopes et al 2006; Royle et al 1988), but there are examples of interstitial locations (alpha globin gene cluster (Proudfoot et al 1982) and type II collagen gene (Stoker et al 1985). Minisatellites of other species, such as mice or bovine (Georges et al 1991), are not always preferentially clustered at chromosomal termini as in the human genome, but are distributed along the entire length of chromosomes. Unlike microsatellites, which usually alter during the DNA synthesis stage of the mitotic cell cycle, minisatellites alter during meiosis, undergoing changes in overall length and repeat composition (Jarman and Wells, 1989; Jeffrey’s et al 1998). Minisatellite tracts have proven very useful for genomic mapping (Legendre et al 2007; Jeffrey’s et al 1985) and linkage studies (Nakamura et al 1987). Examples of human minisatellite used for fingerprinting include consensus sequence of 33.6, 33.15 repeat loci. List of other minisatellite sequences according to Ali and Wallace, (1988) are mentioned in Table 1.
18.104.22.168.1. Telomeric repeats
These are composed of multiple repeats of short sequence elements (typically 5 to 8-bp in length, with a GT-rich strand oriented 5' to 3' toward the end of the chromosome) and range in length from a few repeat units to >10-kbp. Long simple sequence tandem repeats of interstitial TTAGGG arrays form a three-dimensional nuclear network of poorly transcribed domains, which involve gene silencing by repositioning. This network, as well as clusters of retroelements properly positioned in the nucleus, form unique lineage-specific structures that affect gene expression (Tomilin, 2008). The repeated sequence (TTAGGG)n is found at telomeres in all vertebrates, certain slime molds, and trypanosomes; (TTGGGG)n and (TTTGGGG)n are found in the ciliated protozoan Tetrahymena and Oxytricha species, respectively; and (TG1-3)n is found in the yeast Saccharomyces cerevisiae. In organisms whose telomeres have been examined in detail, the GT strand extends 12 to 16 nucleotides (two repeats) beyond the complementary C-rich strand. The unique structure of telomere is involved in the maintenance of the integrity of the chromosome ends.
|S.No.||Probe name||Sequence 5’-3’||Total length||Length repeat unit||No. Repeat units|
22.214.171.124.2. Subtelomeric repeats
Sub-telomeric Repeats are the classes of repetitive sequences that are interspersed within the last 500,000 bases of non-repetitive DNA located adjacent to the telomere. Some sequences are chromosome specific whereas others seem to be present near the ends of all the human chromosomes (Norman, 2001).
126.96.36.199. Microsatellite /Short Sequence Repeats (SSRs)
Tandem repeats are made up of usually, di-, tri-, or tetranucleotide units (1-6 bps), were earlier called simple sequences (Tautz and Renz, 1984). Later, this class of DNA was coined as microsatellites by Tautz 1989. Microsatellites or simple sequence repeats (SSRs) are ubiquitously interspersed in coding and non-coding regions of the eukaryotic and prokaryotic genomes (Gur-Arie et al 2000; Toth et al 2000). All the SSRs taken together occupy about 3% of the human genome in which they are widely dispersed and associated with many genes (Subramanian et al 2003). The significance of specific microsatellite in different regions has not been completely understood. However, some microsatellites occurring in flanking regions of coding sequences are believed to play significant roles in regulation of gene expression by forming various DNA secondary structures and offering a mechanism of unwinding (Catasti et al 1999). The variation of length and unit type of simple repeats in upstream activation sequences might influence transcriptional activity (Kim and Mullet, 1995; Epplen et al 1996; Martienssen and Colot, 2001; Zhang et al 2004), and affect interaction with different regulatory proteins during translation (Lue et al 1989).
Microsatellites are usually characterized by low degree of repetition at a particular locus. However, these elements containing identical motifs may be found at many thousand genomic loci. When the occurrence of SSRs in different functional genome regions is considered, it turned out that most of them show much higher density in non-coding regions. Exceptions to the rule are trimers and hexamers that are nearly two times more prevalent in exons compared to introns and intergenic regions. Their high frequency in coding regions may be explained by the fact that they do not change the reading frames and gene coding properties, thus, are much better tolerated than other SSRs. Their positive selection in exons suggests some functions for these repeats.
The high mutation rate of these repeats and their frequent length polymorphism suggest that they may be involved in the regulation of gene expression thus leaving quantitative effects on the phenotype. Few examples of repeat units used for fingerprinting and transcriptome analysis includes (GATA/GACA)n, CA, (AT)n, (GAA)n, (TCC)n, (GGAT)n, (GGCA)n, and (TTAGGG)n.
3. Evolution and inter-species variation of repeat sequences
Several mechanisms have been proposed for their evolution, such as stand slippage during replication, base misalignment and unequal cross over between homologous chromosomes during meiosis, sister chromatid exchanges or even insertion of the viral genome (Barros, 2008; Jeffrey’s et al 1985; Tautz, 1989).
Microsatellites tend to be highly polymorphic, suggesting a 'stepwise mutation' model in which most variations are introduced by replication slippage, changing the array length by only one or two repeats at a time, but also with occasional larger 'jumps' in size at much lower frequency. Minisatellites, evolve more readily by larger-scale mechanisms such as unequal exchanges. For all classes, there appears to be a general bias towards increase in array length through evolutionary time. Highly repetitive DNA tends to accumulate only in regions of low recombination such as centromeres and telomeres, where recombination is suppressed, while repeats occurring in euchromatin are much more susceptible to crossing-over and tend to be more variable in copy number relative to their array length.
As mentioned above, mechanism of loss or gain of repeat by unequal cross over and gene conversation can lead to molecular drive of any given variant in a sexually dimorphic population. During the evolution of repetitive elements by unequal cross over, some variants will be lost whereas others will increase in frequency, eventually replacing all others. These evolutionary changes leads to homogeneity in the repeats of an array within a species and heterogeneity in the units of the corresponding array in different species, giving rise to inter- species variations (Harris and Wright, 1995). This phenomenon however is affected by overall male female ratios, population size and possibility of infusion of newer genetic materials in a given gene pool and allele fixation involving evolutionary incubation time.
4. Functional significance of repetitive sequences
With respect to functional roles of these sequences, uncertainty persisted for a long time and it was largely believed that they represent detritus part of the genome (Ohno, 1972). However, recent studies have shown repeat elements influencing the structure, function, and evolution of the chromosomes in the host genomes (Sinden, 1999; Dey and Rath, 2005; Tang, 2011). Their association with the promoters and coding regions of the genes has made them very attractive objects of the study. Transcription, mRNA processing, translation, folding, stability and aggregation rates, as well as gross morphology have been found to be incrementally affected by the alterations in the tracts of tandem repeats (Fondon and Garner, 2004; Vinces, 2009). The human genome provides many instances of regulatory regions embedded in the remnants of repeat elements (Jordan et al. 2003) and studies have documented participation of repeat sequences in regulation of gene expression (Boeva et al., 2006). This suggests that the repeat elements play a major architectonic role in higher order of physical structuring of the genome (Shapiro and Sternberg, 2005; Vermaak et al 2009). More studies on repeat sequences will lead to an increased understanding on the functions and dysfunctions of the genomes.
5. Significance of repetitive sequences as marker
Primers based on VNTR provide an unprecedented opportunity to develop potential molecular markers for a particular species. Where a complete genome sequence is available for an organism, repeats may be annotated with their physical position on the genome. Markers may then be selected either for their location within a specific region of interest or for their even distribution across the regions. Where a full genome sequence is unavailable, location may be predicted through synteny using a sequenced genome or through previous mapping exercises. Alternatively, for a genome whose sequences are not known can still be analyzed employing primers from other species for gene amplification. A gene so amplified may then be localized onto the chromosomes employing FISH. Similar set of primer may be used to amplify cDNA of the species. This approach circumvents the need for screening the genomic library.
Furthermore, for species which exhibit low levels of polymorphism at repeat loci, candidate polymorphic loci may be predicted through mining large sequence datasets. The presence of short sequence repeat (SSR) polymorphisms within aligned sequences of different origin would be indicative of the level of polymorphism at that locus. These selection strategies could greatly reduce the time and cost associated with the development of repeat markers. Integration of this repetitive sequence data with genome databases would provide further benefits to genome researchers.
6. Bubalus bubalis genome
The water buffalo (Bubalus bubalis) population in the world is actually about 168 million head, of which 161 million can be found in Asia (95.83 percent); 3717 million are in Africa and Egypt (2.24 percent); 3.3 million (1.96 percent) in South America, 40 000 in Australia (0.02 percent); 500 000 in Europe (0.30 percent). Asian buffalo or Water buffalo is classified under the Genus: Bubalus, Species: bubalis. Asian buffalo includes two subspecies known as the River and Swamp types, the morphology and purposes of which are different so are the genetics. The River buffalo has 50 chromosomes of which five pairs are sub-metacentric, while 20 are acrocentric: the Swamp buffalo has 48 chromosomes, of which 19 pairs are metacentric. Swamp buffaloes are stocky animals with marshy land habitats. They are primarily used for draught power in paddy fields and haulage but are also used for meat and milk production. They produce a valuable milk yield of up to 600 kg milk per year, Swamp buffaloes are mostly found in South East Asian countries. A few animals can also be found in the northeastern states of India (Sethi, 2003). River buffaloes are generally large in size, with curled horns and are mainly found in India, Pakistan and in some countries of western Asia. They prefer to enter clear water, and are primarily used for milk meat and draught purposes. Each subspecies includes several breeds. Buffaloes are known to be better at converting poor-quality roughage into milk and meat. They are reported to have a 5 percent higher digestibility of crude fiber than high-yielding cows; and a 4-5 percent higher efficiency of utilization of metabolic energy for milk production (Mudgal, 1988).
India has about 97 million animals, which represents 92% of the world buffalo population. India possesses the best River milk breeds in Asia e.g. Murrah, Nili-Ravi, Surti Jaffarabadi, Mehsana, Kundi, Bhadavari and Nagpuri which originated from the north-western states of India (Sethi, 2003). However, despite the importance of buffalo to the economic and social fabric of the region, its population has been declining. There are many reasons for the decline of buffalo populations, foremost of which are: increased agricultural mechanization; increased urbanization, industrialization, and reforestation limiting paddy areas for buffaloes; growing buffalo slaughter rate to satisfy meat demands of a fast-growing population; poor reproductive performance; and lack of proper attention by policy makers and researchers. The low reproductive efficiency in female buffalo can be attributed to delayed puberty, higher age at calving, long postpartum anoestrus period, long calving interval, lack of overt sign of heat, and low conception rate. In addition, female buffaloes have few primordial follicles and a high rate of follicular atresia. Understanding potential quantitative trait loci associated with economically important traits will help in segregating genetically superior breeds.
7. Repetitive sequences as molecular markers in bovid genome
Based on repeat sequences, a number of probes with varying length and sequence complexities have been successfully used as genetic markers (Kapur et al 2003; Jobling and Tyler-Smith, 2003; Bashamboo and Ali, 2001; Amos et al 1991; Tourmente et al 1994; Ali et al 1986). Earlier conventional protein and biochemical markers were used for breeding program of bubaline species (Wilson and Strobeck, 1999). Subsequently, diallelic Restriction Fragment Length Polymorphism (RFLP) for the loci homologous to cattle (Blott et al 1999) were used. However due to low levels of polymorphism detected with these markers, their application remained limited. RFLP technology was followed by Random amplification of Polymorphic DNA (RAPD), followed by Amplified fragment length Polymorphism (AFLP) besides minisatellite markers. A series of synthetic oligonucleotide probes were developed as markers for genetic analysis and molecular systematics of Bubaline and related genomes. While probes based on repeat sequences are available there is no clear cut experimental approach that could assist identification and segregation of elite animals with superior QTL loci. This is because most of the physical and physiological attributes recognized to be the part of the elite animals, are controlled by several genes and it is extremely challenging to uncover all such genes implicated with superior germplasm. However marker based analysis would possibly bridge the gap and facilitate much-needed advance research to segregate genetically superior germplasm in the context of animal genetics in general and animal biotechnology in particular.
8. Restriction Fragment Length Polymorphism (RFLP)
The basic technique for detecting RFLPs involves the fragmentation of genomic DNA by a restriction enzyme. The resulting DNA fragments are then separated by length through a process known as agarose gel electrophoresis, and transferred to a membrane via the Southern blot procedure. Hybridization of the membrane to a labeled DNA probe then determines the size of the fragments, which are complementary to the probe. An RFLP occurs when the size of a detected fragment varies between individuals. Each fragment size is considered an allele, and can be used in genetic analysis. RFLP's are quick, simple and inexpensive ways to assay DNA sequence differences. It is the first DNA polymorphism to be widely used for genomic characterization, which detects variations ranging from gross rearrangements to single base changes. The polymorphisms are found by their effects on sites for restriction enzyme mediated cleavage of preparations of high molecular weight DNA. In buffalo, RFLP approach has been used to gain insight into organization and allele length variation of satellite fractions (Chattopadhyay et al 2001; Bhatnagar et al 2004). From our laboratory, BamH1 derived pDS5 and pDS4 and RsaI derived pDp1-pDp4 were found to be conserved only in buffalo, cattle, goat, and sheep (Pathak et al 2006; 2011).
9. Minisatellite Associated Sequence Amplification (MASA)
MASA involves random amplification of genomic or cDNA with primers specific to minisatellites by PCR. MASA can be performed with a small quantity of target substrate. The novel part of the current approach is that functional, structural and regulatory genes associated with minisatellites are accessed without screening the conventional cDNA library proving this be highly useful for such genome analysis where prior information is absent or inadequately available. The expression profile of genes based on MASA under normal and abnormal conditions is envisaged to be of great relevance for identification of event/stage specific mRNA transcripts. In the context of comparative genomics, mRNA transcripts commonly expressing in a large number of species may be segregated. Following this approach, genes with highest levels of expression in a given tissue may be easily identified and the information from different breeds of animals may be established. In addition, differential expression of genes accessed by MASA may be used to establish genotype phenotype correlation in the context of genetic diseases, cancer biology, stem cell research, tissue engineering, organ transplantation, animal cloning, characterization of genetic integrity of different cell lines and conducting translational research. Minisatellite sequences 33.6, 33.15 have been widely used to explore bubaline genome (Srivastava et al 2006; 2008; Pathak et al 2010). In addition microsatellite probes (2-6 base pairs) such as (AT)n, (CA)n, (GAA)n, (TCC)n, (GACA)n, (GATA)n, (GGAT)n, (GGCA)n and (TTAGGG) were used to analyze buffalo genome (Rawal et al 2012; kumar et al 2011). Following this approach, additional oligo primers based on VNTR loci may be used to undertake analysis of any desired species, cell lines, biopsied samples and cell lines.
10. Technical approaches and methodologies
We describe some of our works related to characterization of the buffalo genome. Further, in the context of functional and comparative genomics, DNA from across the species were also used. DNA was largely procured from the blood samples though in some cases, solid tissues were also used.
10.1. Collection of blood samples and isolation of genomic DNA
DNA was extracted from peripheral blood of buffalo Bubalus bubalis, goat Cipra hircus sheep Ovis aries tiger Panthera tigris, lion Panthera leo, humans Homo sapiens, langur Presbytis entellus, Indian rhinoceros Rhinoceros unicornis, fish Hetropnustes fossilis, bird Columba livia, baboon Papio hamadryas, pig Sus scrofa, rat Rattus norvegicus, jungle cat Felis chaus, bonnet monkey Macaca radiate and leopard Panthera pardus. Intactness of DNA was checked on 1% agarose gel and DNA was PCR amplified using bubaline derived β actin primers and visualized on UV transilluminator.
10.2. RNA isolation and synthesis of cDNA
Using buffalo as an experimental animal, total RNA was extracted from testis, kidney, liver, spleen, lung, heart, ovary, brain and sperm using TRIzol (Molecular Research Center, Inc., Cincinnati, OH) following manufacturer’s instructions. To check the contamination of mRNA from the cells other than spermatozoa, RNA extractions from the sperms were tested by RT-PCR both for the CDH1 (E-cadherin) and CD45 (tyrosine phosphatase). Similarly, presence of DNA was ruled out by PCR using β-actin primers. Following this, approximately 10 μg of RNA from different tissues and spermatozoa was reverse transcribed into cDNA using commercially available high capacity cDNA RT kit (Applied Biosystems, USA). The success of cDNA synthesis was confirmed by PCR employing 35 cycles of amplification using buffalo derived β-actin primers.
10.3. Minisatellite Associated Sequence amplification (MASA)
Using oligo primer and cDNA from different tissues and spermatozoa, PCR amplifications were carried out. The reaction conditions involved 95°C denaturation for 5 min followed by 35 cycles each consisting denaturation at 95°C for 1 min, annealing at the optimal temperature for 1.5 min, extension of the primer at 72°C for 1 min and final extension at 72°C for 10 min. Approximately, 25 μl of amplified product was resolved on a 20-cm-long, 3% (w/v) agarose gel in 1× TBE buffer at a constant voltage. The distinct bands were sliced from the gel, purified and cloned into pGEMT-easy vector (Promega, USA). In water buffalo, Bubalus bubalis using cDNA from the spermatozoa and eight different somatic tissues and an oligo primer based on two units of consensus of 33.6 repeat loci (5' CCTCCAGCCCTCCTCCAGCCCT 3'), Minisatellite-associated sequence amplification (MASA) identified 29 mRNA transcripts (Figure 2).
10.4. Restriction digestion of buffalo genomic DNA
Approximately, 4-5 µg of genomic DNA from buffalo, cattle, goat and sheep were subjected individually to restriction digestion using 4-5 units of BamHI and Rsa1enzyme. The digested DNA fragments were resolved on 0.8% agarose gel in 0.5X TBE for approximately 16-18 hours. In water buffalo, two distinct DNA bands of 1378 and 673 bp with Bam HI and four bands of 1331, 651, 603 and 339 base pairs were cut, gel purified (Figure 3). The eluted fragments were cloned and sequenced following standard protocol. For Southern hybridization, DNA was transferred onto Nylon membrane and immobilized by exposure to UV. Membranes were rinsed in 2X SSC, dried and UV cross- linked. Blots were hybridized at 600C overnight with 32P α-dCTP labeled recombinant plasmid (25 ng) using random priming method (rediprimeTM II kit, Amersham Pharmacia biotech, USA). Washing of the membranes was done using standard protocols and signals were recorded by exposure of the blot to X-ray film (Pathak 2006; 2011).
10.5. Copy number assessment and relative expression using Real Time PCR
Copy number of desired fragment was calculated based on absolute quantitation assay using SYBR Green dye and Sequence Detection System- 7500 (ABI, USA). The primers specific to fragments, respectively, were designed using Primer Express Software V2.0 (ABI). The standard curve was obtained using 10 folds dilution series of the recombinant plasmids ranging from 30, 00,000 to 30 copies taking 3.36 pg DNA per haploid genome of (assuming haploid genome of farm animals =3.3 pg, wt per base pair = 1.096 × 10-21 gm) as standards. The reactions were performed in triplicate using 96 well plates in a 25 µl reaction volume, each having 0.5 ng of buffalo genomic DNA and 50 nM of corresponding primers, employing conditions of 500C for 2 min, 950C for 10 min, followed by 40 cycles of 950C for 10 sec and 600C for 1 min. Real-time PCR analysis uncovered 1234 and 3420 copies of pDS5 and pDS4 fragments per the haploid genome and ~2 × 104 copies of pDp1, ~ 3000 copies of pDp2 and pDp3 and ~ 1000 of pDp4 in buffalo, cattle, goat and sheep genomes (Figure 4), respectively. (Pathak et al 2006; 2011) The copy number assessment of these repeats in different known and nondescript breeds of buffalo may enable to establish a correlation, if any, towards the delineation of different breeds.
Relative expression using Real Time PCR was carried out for the desired fragments with Sybr Green assay using cDNA from different tissues and spermatozoa. The primers were designed using Primer Express 2.0 (Applied Biosystems) software. The cyclic conditions were same as that used for copy number calculations. The reaction was performed following standard protocol (Sriavastava et al 2006). The specificity of each primer pair and the efficiency of the amplification were tested by assaying serial dilutions of the cDNA hybridized with oligonucleotides specific for target and normalization control (GAPDH). The difference in the Ct value between the target cDNA from different tissues and the control samples (the tissue showing least expression) was used for calculation. The expression level of the desired fragments was calculated using the formula: expression = (1+E) -∆Ct, where E is the efficiency of the PCR and ∆Ct = difference in threshold cycle value between the test sample and endogenous control. To achieve the maximum (one) efficiency of the Real Time PCR, the amplicon size was kept small (70-150 bp) so that the expression level of the test gene remains 2 -∆Ct. Each experiment was repeated three times to ensure consistency of the results. Maximum expression of pDS5 and pDS4 was seen in the spleen and liver, respectively. pDp1 showed maximum expression in lung, pDp2 and pDp3 both in Kidney, and pDp4 in ovary. Nine, 33.6 MASA amplified transcripts showed highest expression in spermatozoa and one each in liver and lung (Figure 5).
These 9 transcripts in the spermatozoa, representing vital genes supports their involvement in sperm development and possibly overall testicular functions. In the context of animal biotechnology, such selective tissue specific expression profile is very important to segregate the genetically superior germplasm or any other physical and physiological attributes. This is true particularly in case of buffalo since this species has several breeds. Clearly, development of MASA mediated tissue specific transcript profiles is envisaged to go a long way in undertaking molecular characterization of not only the genome of buffalo but also those of other economically important animals.
10.6 Chromosome preparation and Fluorescence In Situ Hybridization (FISH)
When once mRNA transcripts are made available, it becomes feasible to fish out full length cDNA clones employing 3’ and 5’RACE. This approach enables isolation of genes without screening the cDNA library thus circumventing many arduous steps. When clones representing genes are obtained, these can be used for conducting fluorescence hybridization onto the metaphase chromosomes. We have used some of the clones to successfully conduct FISH on buffalo chromosomes to localize the genes and also uncover the distribution of several species of repeat elements.
For chromosome culture, 2 ml of blood was drawn into heparinized vacutainer tubes with sterile syringe from buffalo, cattle, goat, sheep and human. To sterile tissue culture flask containing 5 ml RPMI- 1640, 20% fetal Bovine serum, 2% Phytohemagglutinin, PHA (2 mg/ml), 5 μl concavalin A (3 μg/ml), 2.5 μl mercaptoethanol (50 μM), 50 μl LPS (10 μg/ml), 2.2 ml Antibiotic/antimycotic (0.15mg/ml), 500 μl of blood was added and whole mixture was incubated for 72 hours in 5% CO2 at 370C. After 70 hours, colcemid (10 μg/μl) was added in culture flasks to arrest cells in metaphase stage and cells were further incubated for 2 hours. The cells were then subjected to 0.56%KCL for 30 mins followed by fixative treatment (Methanol: glacial Acetic acid, 3:1). Few drops of cell suspension were dropped onto pre-cleaned chilled slide and blow-dried. The slides were Giemsa stained (Gibco, BRL) for 20 minutes, washed with PBS / distilled water and observed under microscope to record metaphases.
For probe preparation plasmids containing gene of interest were labeled with desired fluorochromes using Nick Translation Kit from Vysis, (Illinois, USA) following supplier’s instructions. Hybridization was carried out in 20 μl volume containing 50% formamide, 10% Dextran sulphate, Cot 1 DNA and 2X SSC, pH 7 for 16 hours at 370C in a moist chamber. Post hybridization washes were done in 2X SSC at 370C (low stringent condition) and then at 600C in 0.1X SSC (under high stringent condition). Slides were counterstained with DAPI, screened under Olympus Fluorescence Microscope (BX51) and images were captured with Olympus U-CMAD-2 CCD camera. Chromosome mapping was done following the International System for Chromosome Nomenclature. The pDS5, representing the 1378-bp fragment, showed FISH signals in the centromeric region of acrocentric chromosomes only, whereas pDS4, corresponding to 673 bp, detected signals in the centromeric regions of all the chromosomes. RsaI derived pDp1, pDp2 and pDp3 showed distribution of repeats to all across the buffalo chromosomes (Pathak et al 2006; 2011) (Figure 6).
11. Applications in animal biotechnology
With the availability of human genome sequence much emphases is given to sequence all the potential farm animals. Despite of importance as farm animal research data on the water buffalo is limited. Water buffalo breeders and farmers have been facing many challenges and problems, such as poor reproductive efficiency, sub-optimal production potential, higher than normal incidence of infertility, and lower rates of calf survival. Genome research has created a broad basis for promoting and utilizing gene technologies in many fields of livestock production. Genome biotechnology will provide a major opportunity to advance sustainable animal production systems of higher productivity through manipulating the variation within and between breeds to realize more rapid and better-targeted gains in breeding value. This type of research will also make it possible to distinguish molecular phenotypes and thus improve the use of genetic resources of domestic animals.
To date, researchers have identified several genes or DNA regions that are associated with traits of economic importance including reproduction, growth, lean body, fat quantity, meat quality, physical traits and disease resistance. For example, the fatty acid composition of dairy and beef food products, increased disease resistance (and thus increased animal welfare). Similarly, decreased methane emissions in cattle help to address the needs of consumers and society for sustainable and cost-effective food production. A number of gene and marker tests are now available commercially from genotyping service companies. Examples are CAST (meat quality), ESR and EPOR (litter size), FUT1 (E. coli disease resistance), HAL (halothane – meat quality, stress), IGF2 (carcase), MC4R (growth and fat), PRKAG3 (meat quality) and RN (meat quality) (Walters, 2011). Many scientists are using genomic information embedded on SNP50 BeadChip, a glass slide containing thousands of DNA markers, to determine disease-resistant genes in cattle, swine, sheep, poultry, fish and then selectively mating the animals in order to create disease resistant animals. Understanding all the expressed genes, their organization and mode of action in bubaline or any other farm animals will positively bridge the gap and facilitate the much needed growth of animal biotechnology.
12. Concluding remarks
Genetic improvement of animals warrants continuous and complex processes of sustained research employing cutting edge tools and techniques of modern biology and recombinant DNA technology. A much deeper and detailed understanding on a given species would eventually prove to be highly useful for possible manipulation of a desired genome. Improvement of domestic animal traits has been the foremost important task for animal breeding. In this pursuit, many techniques have been developed and tested. In recent years, advances in molecular genetics have introduced a new generation of molecular markers for the genetic improvement of the animals. However, utilization of marker-based information for genetic improvement depends on the choice and judicious use of an appropriate marker system for a given application. Selection of markers for different applications is influenced by the degree of polymorphism, reproducibility of the technique, speed of the experiments and cost involved.
As the situation stand now, for a given biological phenomenon where multiple genes are implicated, technical approaches need to be developed to segregate the entire possible genes specific to that phenomenon. A good example is the spermatogenesis that involves putatively close to about 400 plus genes. However, their clear cut involvement and characterization in any species has still not been achieved. When once, such information is made available, this would then provide much needed basis of functional and comparative genomics. Perhaps then, molecular delineation of the “so-called” elite animals or specific breed representing superior germplasm would become feasible.