Centromere Evolution: Digging into Mammalian Primary Constriction

Mammalian cell division requires even and complete distribution of chromosome complement in daughter cells. At this cell cycle stage, segregation fidelity is critical in order to prevent aneuploidy. Chromosomes attach by a proteinaceous bridge called kinetochore to the spindle apparatus (Rieder & Salmon, 1994). The chromosomal locus at which kinetochore is organized and chromatids align is represented by the centromere, historically and cytologically defined as primary constriction. Throughout the mammalian order, as well as more generally among all higher eukaryotes, a few distinctive features describe the centromeric chromatin, such as the incorporation of the centromeric protein CENP-A, a histone H3 variant (Sullivan, B. A. & Karpen, 2004; Sullivan, K. F. et al., 1994), and the CENP-B, an essential component of the mitotic chromosome scaffold (Masumoto et al., 1989).


Introduction
Mammalian cell division requires even and complete distribution of chromosome complement in daughter cells.At this cell cycle stage, segregation fidelity is critical in order to prevent aneuploidy.Chromosomes attach by a proteinaceous bridge called kinetochore to the spindle apparatus (Rieder & Salmon, 1994).The chromosomal locus at which kinetochore is organized and chromatids align is represented by the centromere, historically and cytologically defined as primary constriction.Throughout the mammalian order, as well as more generally among all higher eukaryotes, a few distinctive features describe the centromeric chromatin, such as the incorporation of the centromeric protein CENP-A, a histone H3 variant (Sullivan, B. A. & Karpen, 2004;Sullivan, K. F. et al., 1994), and the CENP-B, an essential component of the mitotic chromosome scaffold (Masumoto et al., 1989).
Phylogenetic studies show that, in contrast to the high conservation of the chromosome segregation machinery, the primary sequence of centromeric DNA has undergone rapid evolution resulting strikingly different even among closely related mammalian species.These studies suggest that specialized chromatin structure is more critical in centromeric function rather than the presence of specific sequences (Torras-Llort et al., 2009), revealing a role of epigenetic mechanisms on centromeric identity and function (Karpen & Allshire, 1997).
In this chapter we review the variable DNA sequence that forms mammalian centromeres: the satellite DNA.The first pilot studies on mammal satellite DNA date back to 1960s and were accomplished on the mouse, guinea pig, and bovine genomes.Since then, plentiful analyses have been developed for the characterization of satellite DNA in several mammals, particularly in human.These studies enabled the formulation of different theories aiming to explain the rapid evolution of alphoid DNA in primates and clarify a possible role of these sequences in centromeric function.Therefore, this chapter provides an overview of the state of the art in the field of mammalian centromeric DNA organization and evolution.

Organization of centromeric satellite DNA in non-primate mammals
Mammalia is a class of air-breathing vertebrate animals, characterized by the possession of hair, three middle ear bones, a four-chambered heart, and mammary glands functional in 106 mothers with young.The class Mammalia is divided into two subclasses: the Theria, comprising the infraclasses Eutheria (the placentals) and Metatheria (the marsupials), and the Prototheria, comprising the order Monotremata of mammalian species that lay eggs (Prasad et al., 2008).
The centromeres of all Therian species examined consist of long arrays of head-to-tail tandemly repeated DNA families, the satellite DNA.One exception to this general rule is the sequence of newly formed centromeres (the centromere of horse chromosome 11 is an example), which is devoid of satellite DNA, thus demonstrating that centromeres can stably function over million years and many generations in the absence of satellite DNA (Wade et al., 2009).However, the acquisition and maintenance of satellite DNA is an obligated fate for all mammalian centromeres, since all mature centromeres possess satellite sequences.(Piras et al., 2010;Wade et al., 2009).
Centromeric satellite DNA sequences have been characterized in almost all mammalian orders.Here, we report exclusively Therian mammal centromeric satellite, because it has never been isolated from any species of the subclass Prototheria.Recently, an attempt to isolate centromeric satellite DNA of the Prototherian mammal platypus (order Monotremata) failed, suggesting that in this species centromeres are not enriched in satellite sequences (Alkan et al., 2010).Studies of other Prototherians such as the echidna might elucidate whether the lack of satellite DNA at centromeres is a platypus feature or it is common in the entire subclass.
The paragraph reports the state of the art about the sequence and organization of centromeric satellite DNA in mammalian species of the Eutherian orders Rodentia, Lagomorpha, Cetartiodactyla, Perissodactyla, Carnivora, Chiroptera, Cingulata, and Proboscidea and of the infraclass Metatheria, including the orders Diprotodontia and Didelphimorphia.The description of centromeric satellite DNA in Primates, to which the human species belongs, has a dedicate section since primate satellite sequences are the best characterized, both structurally and functionally, and plentiful information and studies have been developed in this field.
Despite its conserved function, the centromeric satellite is extraordinarily variable in the repeat unit length, sequence, organization, and relative quantity in respect to the total DNA even among closely related mammalian species.This singularity is known as the "centromere paradox" (Henikoff et al., 2001).I n f a c t , t h e r e p e a t u n i t l e n g t h o f t h e mammalian centromeric satellites ranges from 7 bp in the red-necked wallaby to 2.3 kb in the domestic cattle (Table 1).Exception to the high variability is the CENP-B box, a 17 bp sequence motif that is shared and conserved in all centromeric satellite families involved in the centromeric function (Masumoto et al., 1989).However, although this apparent lack of any rule among the satellite sequences, there is an evolutionary conserved pattern of sequence arrangement and organization (Sunkel & Coelho, 1995).
The study of centromeric satellite in different mammalian orders has contributed to the knowledge of centromeric satellite organization and evolution.Each mammalian order has its peculiarity, with some orders best revealing a particular aspect of the centromeric satellite organization.For examples in few primates the centromere is arranged in higherorder repeat (HOR) and in Cetartiodactyla satellite DNA is distributed among chromosomes according to the position of the centromere (acrocentric vs. (sub)metacentric), resulting in a similar centromeric satellite composition and organization among chromosomes that share the same centromere position.

107
Table 1.Satellite families at the centromeres of non-primate mammals.-not available data.

www.intechopen.com
We start this excursus of mammalian centromeres with the description of centromeric satellites isolated from mammals other than primates of the superorder Euarchontoglires.This superorder includes the grandorder Euarchonta and the clade Glires.The grandorder Euarchonta ("true ancestors") comprises extant mammals belonging to the orders Primates, Scandentia and Dermoptera; the clade Glires includes the orders Rodentia and Lagomorpha.We illustrate the centromeric satellites of mammals of the orders Rodentia and Lagomorpha, since satellite DNA from mammals of the orders Scandentia and Dermoptera has never been characterized.

Rodentia
The satellite sequences of several Rodentia mammals have been investigated: Mus musculus (the house mouse), four Acomys species (the spiny mice) (Kunze et al., 1999), Microtus chrotorrhinus (the rock vole) (Modi, 1992), Gerbillus nigeriae (the Nigerian gerbil) (Volobouev et al., 1995), and a chromosome 2-specific centromeric DNA repeat from Cricetulus griseus (the Chinese hamster) (Fatyol et al., 1994).The Microtus and Cricetulus species show satellite sequences with the longest monomer size (>2.5 kb) reported until now among mammals.Here, we focus on the centromeric satellites of the house mouse, the most studied Rodentia species.
The centromere of all mouse telocentric chromosomes consists of two highly conserved, tandemly repeated sequences discovered by isopycnic centrifugation in CsCl gradient: the minor and major satellites (Pardue & Gall, 1970;Wong & Rattner, 1988).Minor satellite DNA represents 0.5-1% of the mouse genome and comprises an AT-rich 120 bp monomer that contains the CENP-B box motif (Yoda et al., 1992).The minor satellite array occupies 300-600 kb of the terminal region of all mouse telocentric chromosomes (the autosomes and the X chromosome) and coincides with the centric constriction and function (Kipling et al., 1991).It does not show evidence of higher-order organization.Major satellite DNA is a more abundant, AT-rich, 234 bp tandem repeat (Horz & Altenburger, 1981), organized into long arrays of 240-2000 kb (Vissel & Choo, 1989).It represents 5-10% of the total genomic DNA and shows pericentromeric localization, adjacent to the minor satellite.The major satellite monomer is internally repetitious and consists of eight 28-mer or 30-mer subrepeats, with a set of three related 9 bp ancestral sequence motifs (Horz & Altenburger, 1981).The minor satellite sequence is divided into two similar halves, which are further subdivided into quarter repeats.However, it does not have a primordial sequence similar to the 9 bp motif of the major satellite (Fig. 1).
The minor and major satellite sequences are highly conserved across the centromeres of all mouse telocentric chromosomes: the minor satellite monomers share a mean pairwise identity of 95% (Kalitsis et al., 2006); the major satellite monomers share a mean deviation from the consensus sequence of 3.9% (Vissel & Choo, 1989).This high degree of sequence conservation argues strongly for frequent recombinational exchanges between nonhomologous telocentric chromosomes driving sequence homogenization at mouse centromeres (Kalitsis et al., 2006;Vissel & Choo, 1989).
Besides these two AT-rich satellites, the major and minor satellites, two further GC-rich satellites have been identified at mouse centromeres: the Mouse Satellite 3 (MS3) and the Mouse Satellite 4 (MS4).The MS3 monomer is 150 bp long and accounts for 2.2% of the total DNA; the MS4 monomer is 300 bp long and accounts for 1.6% of the total DNA.Both monomers contain the CENP-B box.Divergence between MS3 sequences is 0.7% due to single-nucleotide changes (Kuznetsova et al., 2005).
Mouse chromosome Y centromeric sequence has been recently investigated (Pertile et al., 2009), highlighting several features never described before at mouse centromeres.Mouse chromosome Y centromere has a chromosome-specific sequence and a unique multimeric HOR organization.It comprises a 90 kb array of an AT-rich and minor satellite-like tandem repeat (Ymin), with distant homology (76.8%) to mouse minor satellite.The Ymin satellite is closely associated with the kinetochore.It has a HOR sequence organization devoid of transposable elements, with a unit size of 2.3 (Mus musculus molossinus) or 1.6 (Mus musculus domesticus) kb, and a sequence identity among the repeated units of 99-100%.The HOR unit has a remarkably complex structure, consisting of an amalgam of highly diverged (<70% mean pairwise identity) monomers with a periodicity of 60-61 or 121 bp.The majority of the monomers form progressively larger and less diverged repeating HOR subunits, the largest comprising 2.4 copies of an 840 bp periodicity repeat (with 95% pairwise identity) that spans the greater part of the HOR unit domain.The singularity of chromosome Y centromere argues for an intrachromosomal mode of sequence homogenization and an isolated evolution (Smith, 1976).

Lagomorpha
Ekes and collaborators described for the first time the satellite DNA families of a mammal belonging to the order Lagomorpha, the domestic rabbit, Oryctolagus cuniculus (Ekes et al., 2004).They found two major centromeric satellite DNA sequences, named Rsat I and Rsat II, which are not related to each other, and a divergent Rsat II-related subfamily, Rsat IIE.The Rsat I monomer has an average length of 375 bp, whereas repeat units Rsat II and Rsat IIE are ~585 bp long.These satellites do not provide a complete coverage of the rabbit complement, since seven autosome pairs and the sex chromosomes do not contain any of these satellites.Rsat I, Rsat II and Rsat IIE satellites are each distributed in variable amounts at the centromeres of a subgroup of rabbit chromosomes, with some chromosomes containing both Rsat I and Rsat II, or Rsat II and Rsat IIE.Part of Rsat I and Rsat II satellites shows a dimeric organization.However, further studies are required to isolate the rabbit sequences that constitute the centromeres devoid of the three known satellites, and to elucidate the higher-order repeat organization of rabbit satellites.

Cetartiodactyla
The Laurasiatheria is the evolutionarily closest superorder to the Euarchontoglires, and comprises the orders Cetartiodactyla, Perissodactyla, Carnivora, Chiroptera, Eulipotyphla, and Pholidota.The isolation of satellite DNA from a mammal of the last two orders has never been reported.Among mammals of the order Cetartiodactyla (cetaceans and eventoed ungulates), we describe the centromeric satellites isolated and characterized from two species of the family Bovidae: the domestic cattle (Bos taurus) of the subfamily Bovinae, and the domestic sheep (Ovis aries) of the subfamily Caprinae.
The cattle satellite I DNA (the 1.715 family) is a 1.4 kb tandem repeat that comprises 6-9% of the total genomic DNA (Kurnit et al., 1973).It constitutes the centromeric heterochromatin of all autosomes, but not of the sex chromosomes (Plucienniczak et al., 1982;Taparowsky & Gerbi, 1982) (Fig. 1).However, the primitive form of bovine X chromosome is acrocentric and has satellite I sequences at centromere (Chaves et al., 2005).Three additional bovine satellites, satellite II, III and IV, were localized at centromeric and pericentromeric regions of autosomes, respectively: satellite II is mostly localized at the autosomal centromeres; satellite III is present on most autosomes; satellite IV is present on less than half the autosomes (Kopecka et al., 1978;Kurnit et al., 1973).None of them is present on the sex chromosomes.The repeat unit of satellite III is 2,350 bp long and consists of two related and homogeneous 23-mer tandem subrepeats, the Pvu and the Sau motives, respectively (Pech et al., 1979).The three bovine satellites I, III and IV, or a subgroup of them, are organized on autosomes following always the same order: p-ter-sat IV-sat I-sat III-q (Chaves et al., 2003) (Fig. 1).
Two repetitive DNA families, satellite I (Buckland, 1983;Reisner & Bucholtz, 1983) and satellite II (Buckland, 1985) are the major components of the sheep centromeric and pericentromeric heterochromatin, respectively (Burkin et al., 1996;D'Aiuto et al., 1997) (Fig. 1).The sheep satellite I DNA (the 1.714 family) has a repeat unit of 820 bp and, as the bovine satellites, constitutes the centromeric heterochromatin of all autosomes, but not of the sex chromosomes (Buckland, 1983;Burkin et al., 1996;Chaves et al., 2000;Chaves et al., 2005).However, the amount and organization of satellite I DNA differ among the autosomes, with a lower amount at the centromeres of the biarmed chromosomes 1, 2, and 3, particularly of chromosome 1 (Burkin et al., 1996;D'Aiuto et al., 1997).The sheep satellite II DNA has a 700 bp monomer and constitutes the pericentromeric heterochromatin of all chromosomes with the exception of the Y chromosome (Burkin et al., 1996;D'Aiuto et al., 1997).Differently from sheep satellite I DNA, the satellite II family has a more variable chromosomal distribution: few acrocentric chromosomes are devoid and it is present at the centromeres of sheep metacentric and X chromosomes in large amounts (Burkin et al., 1996).
The bovine and ovine satellite I families show 70% sequence similarity and both consist of a degenerated 31 bp GC-rich tandem subrepeat (Novak, 1984;Reisner & Bucholtz, 1983).The presence of this 31 bp motif across the entire length of the satellite I repeat suggests that its present structure could have arisen from a tandemly amplification of an ancestral ~31 bp unit.The 31-mer motif sequence has been found also in other bovine satellites, like satellite III (Plucienniczak et al., 1982;Taparowsky & Gerbi, 1982), and in the deer, muntjac, and pronghorn centromeric satellites (Bogenberger et al., 1985;Denome et al., 1994;Lee, C. & Lin, 1996), arguing that the amplification of the ~31 bp unit may have occurred in their common ancestral.

Perissodactyla
The order Perissodactyla (odd-toed ungulates) comprises the family Equidae, with eight living species all belonging to the genus Equus: two horses (E.caballus and E. przewalskii), two Asiatic donkeys (E.kiang and E. hemionus), one African donkey (E.asinus), and three zebras (E.grevyi, E. Burchelli, and E. zebra).Despite the Equus species can be crossbred and diverged recently, sharing a common ancestor about 2-3 million years ago, their karyotypes differ extensively and their satellite DNA has evolved rapidly (Wijers et al., 1993).Moreover, during the evolution of the genus Equus, centromere repositioning, the shift along the chromosome of the centromere without structural chromosome rearrangements, has occurred frequently (Carbone et al., 2006).It implies that several evolutionary new centromeres and ancestral now inactive centromeres are present in the Equus karyotypes.
Several satellite families have been identified in the horse genome with centromeric localization, suggesting a great diversity and variability in structure and organization of horse centromeric sequences.The horse major satellite accounts for 5-10% of the total genome and has a repeat unit of 221 bp.It is localized at the centromeric regions of 30 pairs of chromosomes and is missing at the centromere of chromosomes 2 and 11, both submetacentric.Repeat units share a sequence identity of 90-100% and have no internal repeat structure (Piras et al., 2010;Wijers et al., 1993).
In 1995, another horse satellite family with a repeat unit of 23 bp was isolated and localized at the centromeres of acrocentric but not metacentric horse chromosomes (Broad et al., 1995b).This pattern reminds the satellite distribution in Cetartiodactyla that occurs according to the position of the centromere along the chromosome.
Two further horse satellite families with a repeat unit of ∼80 bp were identified, with a likely centromeric localization (Broad et al., 1995a).Alkan and colleagues extracted six distinct satellite consensus sequences in the E. caballus genome of 221, 221, 419, 450, 451, and 475 bp, respectively.Their FISH hybridization patterns included the centromeres of all or a part of horse chromosomes except chromosome 11 (Alkan et al., 2010).All horse centromeres have either one or more than one satellite whereas horse chromosome 11 is the only one lacking any satellite (Alkan et al., 2010;Piras et al., 2010;Wade et al., 2009).

Carnivora
Few studies have been accomplished on centromeric satellites in species belonging to the order Carnivora.In 1988 and 1989, the major centromeric satellite of the domestic dog (Canis familiaris) and of the grey fox (Urocyon cineroargenteus), two species of the order Carnivora that diverged from a common ancestor 10-12 million years ago (Wayne et al., 1997), have been investigated in regard to their sequence and localization.Chromosomes of the domestic dog and grey fox are primarily acrocentric.The dog satellite monomer is 737 bp long and has a GC-content of 51%; the grey fox satellite monomer is 880 bp long and has a GC-content of 54% (Fanning, 1989).Recently dog fosmid clones containing satellite DNA were mapped to the centromere of a different subgroup of dog chromosomes.These heterogeneous patterns support the existence of a complex patchwork organization of satellites at dog centromeres, similar to horse centromeric sequence organization (Alkan et al., 2010).

Chiroptera
Bat genomes (order Chiroptera) are characterized by low DNA content, with a size approximately 50-87% the size of other eutherian genomes (Burton et al., 1989).Centromeric satellite DNA has been isolated from two bat species of the genus Pipistrellus, family Vespertilionidae, suborder Microchiroptera: the common pipistrelle (Pipistrellus pipistrellus) and the Kuhl's pipistrelle (Pipistrellus kuhli).Satellite DNA of the common bat represents approximately 3% of the whole genome and is organized in tandem repeats with a monomer size of 418 bp.The monomer units are highly similar, with a sequence identity of 95-100% and few base-substitutions randomly spread along the sequence.The common bat satellite has an AT-content of 62% and contains a putative CENP-B box motif.It is localized at the pericentromeric constitutive heterochromatin of all the autosomes and X chromosome, but it is absent from the Y chromosome (Barragan et al., 2003).Pipistrellus kuhli satellite represents approximately 5% of the total genomic DNA.The monomer unit is 1100 bp long and contains the CENP-B box as well as subrepeats, palindromes, and AT-rich tracts.The monomers group into two clusters (Fantaccione et al., 2005).Both pipistrelle satellites are absent in the genomes of other bat species analyzed, thus revealing that they might be species-specific.

Cingulata and Proboscidea
The study of centromeric satellite families in mammalian species belonging to the superorders Xenarthra and Afrotheria of the clade Eutheria has started very recently.In 2010, Alkan and colleagues included in their list of sequenced mammalian genomes to analyze in regard to centromeric satellite sequences, the armadillo (Dasypus novemcinctus), a species of the superorder Xenarthra, order Cingulata, and the African elephant (Loxodonta africana), a species of the superorder Afrotheria, order Proboscidea.Using the RepeatNet algorithm, they extracted a 173 bp satellite consensus from the armadillo genome and a 1220 bp satellite consensus from the African elephant genome, and localized both satellites at the centromere of all chromosomes of the corresponding species (Alkan et al., 2010).

Diprotodontia and Didelphimorphia
The infraclass Metatheria comprises the marsupial mammals.Among the extant marsupials, species of the order Diprotodontia, family Macropodidae, and a species of the order Didelphimorphia were investigated in regard to their centromeric DNA organization.
Marsupial satellites are characterized by an uneven distribution among the centromeres of the different chromosomes.The centromeric satellite families isolated from each marsupial are also present in other marsupials analyzed, often in a different amount and with a different localization and distribution.
In 1981, Venolia and Peacock isolated a major satellite from the wallaroo (Macropus robustus) genome.It accounts for about 10% of the total DNA and localizes at the centromere of all chromosomes in different amounts and at the nucleolus organizer region (Venolia & Peacock, 1981).This satellite was localized in other Macropus species.In M. rufus and M. rufogriseus it is present mainly on the X chromosome at large, non-centromeric blocks, and in the region of the nucleolus organizer.In M. rufogriseus the satellite also occurs on the Y chromosome and in M. rufus at the centromere of four acrocentric autosomes.
Six different satellite DNA fractions have been isolated from the genome of the red kangoroo (Macropus rufus), each accounting for 1-3% of the total DNA.These satellites localize at the centromeres with each heterochromatic centromeric block differing in the amount and distribution of these satellites, as well as at interstitial regions and X chromosome telomeric heterochromatin (Elizur et al., 1982).
The red-necked wallaby (Macropus rufogriseus) karyotype has a distinctive feature: its chromosomes harbour an exceptional amount of centric and pericentric heterochromatin (Hayman & Martin, 1974), comprising almost 30% of the genome (Bulazel et al., 2006).They have unusually lengthened pericentromeric regions that are up to half the length of the chromosome, with the functional centromere restricted to a discrete point location within the larger region.In 2006, Bulazel and collaborators isolated three satellite families, named Mrb-sat1, Mrb-sat23, and Mrb-B29, from the red-necked wallaby genome.These satellites constitute the large centromeric and pericentromeric regions of the wallaby chromosomes and show a different chromosomal distribution.Mrb-sat23 constitutes the centromeric core as well as the large pericentric heterochromatic region of all chromosomes and is present in tandem arrays at all centromeres of most Macropus species (Fig. 1).In M. rufogriseus, Mrb-sat23 experienced large-scale amplifications as it resides over the entire Y chromosome and is spread throughout the extensive X chromosome pericentromere.The presence of a CENP-B binding-competent domain on the Y of a marsupial suggests that ancestral mammalian sex chromosomes utilized CENP-B to differentiate centromere location and that the loss of CENP-B protein binding and CENP-B box DNA on the Y are derived when found within eutherian mammals.In this species the centromeric satellite constitution differs between the autosomes and the sex chromosomes: all autosomes have sequences of only Mrb-sat23, whereas the X and Y chromosomes harbour sequences of three satellites, Mrb-sat1, Mrb-sat23, and Mrb-B29, in different amounts (Bulazel et al., 2006).
Besides the satellite sequences, an active retroviral element, the Kangoroo Endogenous Retrovirus (KERV), is localized at the centromere and pericentromere in the genus Macropus (Fig. 1).It is considered thereof a major constituent of Macropus active and latent centromeres since it has undergone amplification at this locus (Ferreri et al., 2011;Ferreri et al., 2005;Ferreri et al., 2004;O'Neill et al., 1998).Particularly, in M. rufogriseus KERV is localized at the centromere of all autosomes, but it is absent or present in low copy number at the centromere of the sex chromosomes (Ferreri et al., 2011;Ferreri et al., 2004).Recently, Alkan and colleagues reported the centromeric satellite of the short-tailed opossum (Monodelphis domestica), a marsupial species of the order Didelphimorphia.They identified a 528 bp satellite, that is an LTR/ERV1 element, and localized it at the centromere of four homologous opossum chromosomes (Alkan et al., 2010).Such finding of a retroviral element at the centromeres of the short-tailed opossum, a marsupial belonging to a superorder different from the one of the Macropodidae marsupials, suggests that the use of a retroviral element as centromeric satellite might be ancestral in the infraclass Metatheria.

The centromeric alpha satellite in primates
The order Primates belongs to the subclass Theria, infraclass Eutheria (Fig 1).It includes an ancient group of which the size (the number of species) is still ambiguous.In fact, depending on whether some closely related groups are considered to be varieties of the same species or not, most taxonomic classifications refer to a range of 230-270 species.
The order Primates includes two suborders: Strepsirhini (including six families) and Haplorhini.Strepsirhini has a rhinarium, the tapetum lucidum, a bicornuate uterus, a toothcomb (with the exception of the Aye-aye), and a toilet-claw for grooming.On the other hand, the so-called "higher primates" compose the suborder Haplorhini, in turn divided in two main hyporders, the Anthropoidea, including the Platyrrhini (New World monkeys) and the Catarrhini (Old World monkeys and apes), and the Tarsiiformes (Goodman et al., 1989).
Primate predominant class of centromeric DNA is made up of long stretches of repeats consisting of 171 bp and AT-rich monomers, tandemly reiterated in a head-to-tail configuration (Vissel & Choo, 1987;Waye & Willard, 1987) commonly referred to as "alphoid satellite" DNA (Manuelidis, 1978).These sequences have been identified throughout the order Primates, including great apes, Old World and New World monkeys (Alves et al., 1994;Musich et al., 1980;Willard & Waye, 1987), with the exception of the suborder Strepsirhini (Lee, H. R. et al., 2011).Alpha satellite DNA is the most abundant repetitive DNA in all primate species studied, making up to 3-5% of each chromosome; it is pancentromeric in primate chromosomes and appears to be distinctive of the primate lineage (Vissel & Choo, 1987;Waye & Willard, 1987).

Great apes
At human centromeres alphoid DNA extends for ~250 kb up to ~5 Mb (Wevrick & Willard, 1989;Willard, 1990) and is known to exist in two forms designated (1) monomeric arrays and (2) higher-order repeats (HORs).In the monomeric arrays the 171 bp monomers lack further sequence structure and the head-to-tail configuration provides directionality to each satellite block.A higher-order repeat is rather composed of arrays in which a defined number of monomers has been homogenized as a unit which, in turn, is tandemly repeated many times to span several megabases, resulting in multiple copies of an alphoid multimer (Willard & Waye, 1987).At human centromeres monomeric alphoid satellites flank HORs (Fig. 1).Sequence analysis revealed that monomers within a higher-order repeat unit share as low as 72% average pairwise sequence identity however, adjacent individual higherorder repeated units are 98-100% identical (Rudd & Willard, 2004).Higher-order alpha satellite within an array is extremely homogeneous and appears to be uninterrupted by other (non-satellite) DNA sequences (Tyler- Smith & Brown, 1987;Warburton & Willard, 1990).In contrast, monomeric alpha satellite is more heterogeneous in sequence and is extensively interspersed with non-alpha-satellite sequences such as transposable elements (Guy et al., 2003;Kazakov et al., 2003;Schueler et al., 2001).
The human higher-order repeating structures can shape alphoid subfamilies.These subfamilies are classified according to whether they are specific for a single chromosome or shared by a small group of chromosomes (Choo et al., 1991).As a consequence, (i) some centromeres contain only the chromosome-specific subfamily, while others possess several distinctive alphoid subfamilies (Choo et al., 1991) organized in discrete and homogenized physical blocks (Schueler et al., 2001); (ii) multiple domains located on one chromosome may belong to the same or to different suprachromosomal families (Choo et al., 1991).Human alpha satellite sequences have been historically grouped into five suprachromosomal families (SFs) initially according to the higher-order repeat unit length, revealed by restriction site periodicity, and then founded on sequence-based phylogenetic analyses (Alexandrov, I. et al., 2001;Alexandrov, I. A. et al., 1993;Iurov Iu et al., 1988) (Table 2).Human SFs 1-3 likely derived from an ancestral sequence by interchromosomal exchange (Waye & Willard, 1986).The reconstruction of the ancestral monomer sequence revealed that it originated the actual two phylogenetic homology groups of monomers: J1-D2-W4-W5 belong to group A, J2-D1-W1-W2-W3 compose group B (Alexandrov, I. A. et al., 1993).Moreover, there are several subsets that appear to form two more homogenous families, which are characterized by an array of equally related monomers (SF 4) and an irregular alternation of two different types of monomers (SF 5) (Alexandrov, I. et al., 2001;Alexandrov, I. A. et al., 1993).SF 4 consensus monomer, M1, is closely related to D2 and W4 monomeric types of SF 2 and SF 3; also, consensus sequences R1 and R2 (SF 5) clearly belong to subset A and B, respectively.SF 4 is related both structurally and in sequence to the African green monkey alpha component, of which the consensus sequence is closely associated to the phylogenetic homology group A; besides, consensus sequences derived from groups A and B resulted to be nearly identical to R1 and R2.This shows that the amplification of the ancestral sequences that gave rise to the two homology groupings of alphoid monomers happened at the very beginning of primate evolution and that SF 5 consensus monomers may represent the ancestral form of primate A-B satellite.Among the primate lineages, the W1-W5 pentamers and D1-D2 dimers are present in gorilla, orangutan, chimpanzee, and human, while J1-J2 sequences are present in gorilla, chimpanzee, and human (Alexandrov, I. A. et al., 1993).1,2,3,5,6,7,9,11,12,13,14,16,18,19,20,22 Table 2. Structure, periodicity, and localization of human suprachromosomal families.SF, suprachromosomal family; data from Alexandrov et al., 2001.
Among individuals, array length of a single satellite can be highly polymorphic.There is an extensive variation of centromere size between nonhomologous as well as homologous human chromosomes.For example, alpha-satellite array length of human chromosomes X and Y varies almost three times: in the case of chromosome X from 1380 to 3730 kb (Mahtani & Willard, 1990); in the case of chromosome Y from 285 to 1020 kb (Tyler- Smith & Brown, 1987).
Cross-hybridization studies of alphoid sequences among great apes show that the higherorder sequence organization of the chromosome X alpha-satellite subset has been conserved among closely related species (orthologous evolution) (Durfy & Willard, 1990).However, similar relationships were not found for the other subfamilies.In fact, experimental evidence suggests that the majority of human-derived chromosome-specific alphoid satellite DNA probes does not recognize orthologous chromosomes in great apes (Samonte et al., 1997).

Catarrhini
The first species in which primate centromeric sequence was identified is the African green monkey (AGM) (Chlorocebus sabaeus).The AGM is an Old World monkey, belonging to the family Cercopithecidae, genus Chlorocebus.In 1971, Maio and collegues identified the socalled component alpha DNA as the homogenous fraction composing the 20% of the total AGM genomic DNA, showing a behaviour similar to the mouse satellite DNA in renaturation kinetics experiments (Maio, 1971).Furthermore, two other Cercopithecidae species have been investigated in their centromeric main component: the Rhesus monkey (Macaca mulatta) and the baboon (genus Papio).Both their karyotypes show 20 pairs of autosomes plus the two sex chromosomes.
In the three species examined, the centromeric satellite DNA is the most abundant repetitive sequence, since it comprises up to 24% of the AGM and from 8 to 10% of the Rhesus monkey and baboon genomes (Musich et al., 1980).Besides, in the AGM genome, the tandemly repeated centromeric satellite monomeric unit is 172 bp long, while its length is doubled to 340 bp in the baboon and the nucleotide divergence among the satellite sequences of the two species is ~10% (Singer & Donehower, 1979) (Fig. 2).At Rhesus monkey centromeres, the underlying repetitive alphoid DNA is a 343 bp dimer (Fig. 2).The similarity between the two 171 and 172 bp component monomers is as much as 70%.Nevertheless, few highidentity regions have been detected, conserved not only between the two Rhesus monomers but also among all known Cercopithecidae monomers.Homologies between the macaque consensus alphoid sequence and the baboon, AGM and human alphoid sequences are >98%, 81%, and <70%, respectively (Musich et al., 1980).Among Catarrhini, a species belonging to the superfamily Hominoidea, family Hylobatidae, has been, lately, deeply characterized in its centromeric satellite: the white-cheecked gibbon (Nomascus leucogenys).In N. leucogenys alphoid monomers are 171 bp long and show four different hybridization patterns: telomeric, centromeric, telomeric-centromeric and Ychromosome specific (Fig. 2).N. leucogenys is then the first primate species analysed in which alphoid DNA is detected at "ectopic" regions (telomeric and interstitial sites); though, the authors speculate that the different mapping pattern may be likely due to different sequence organizations rather than to site-specific sequence divergence.In fact, the gibbon karyotype is known to be extensively rearranged against the ancestral primate karyotype, leading to the hypothesis that alphoid sequences at telomeric level could represent the rest of the evolutionary fissions within centromeric breakpoints (Cellamare et al., 2009).
Centromeric sequence analysis on the black bearded saki (Chiropotes satanas) of the genus Chiropotes and the Rio Tapajós saki (Pithecia irrorata) of the genus Pithecia family Pitheciidae, have abundant amounts of satellite with monomeric unit of 539 and 559 bp, respectively.The difference in size between the two sequences is mainly due to 14 contiguous bases of which almost half consists in GA dinucleotides suggesting a strand slippage mechanism as a cause of the expansion.
The 539 bp alphoid satellite in Chiropotes satanas consists of four 170 bp subunits of which the third is incomplete.C. satanas alphoid DNA strongly hybridizes to Pithecia, while hybridization to the black-headed uakari (Cacajao melanocephalus) is much less intense, thus suggesting a satellite content loss in this species rather than a higher sequence divergence, since the divergence time between Chiropotes and Cacajao is thought to be about 5 Myr (millions years), while the divergence time between Chiropotes/Cacajao and Pithecia is 8 Myr (Schneider et al., 1993).Moreover, in Cacajao melanocephalus a substantial proportion of the satellite mass is composed by a 340 bp alphoid monomer, while the ~550 pb monomer constitutes a small subset.Nevertheless, it is likely that the ~550 bp monomer arose from an array of 340 bp repeats, leading to the conclusion that the ancestor of the Pitheciini harbored both structures (Alves et al., 1998) (Fig. 2).
Insights of the centromeric satellite among New World monkeys come from the genus Callithrix, more precisely from the common marmoset (Callithrix jacchus).Cellamare and collaborators characterized C. jacchus centromeric sequences by several cytogenetic and molecular approaches.Their analysis showed that, like for the above-mentioned New World monkeys, the alpha satellite monomer in this species is 340 bp long (Fig. 2).Thus, it is likely that two of these ancestral monomers fused and no further homogenization occurred between the two halves.The similarity between the first and second monomer is reported to be 40-50% and no results were obtained from sequence comparisons with great apes (human, chimpanzee, and gorilla), nor Old World monkeys (macaque) alphoid sequences (Cellamare et al., 2009).

Strepsirhini
Recently, the centromeric sequence of the aye-aye has been characterized (Daubentonia madagascariensis).Aye-aye centromeres are composed of two different, AT-rich, CENP-Aassociated classes of repetitive DNA, termed DMA1 and DMA2, ~146 and ~268 bp long, respectively.DMA1 and DMA2 are often adjacent to one other at aye-aye centromeres and are completely unrelated to alphoid DNA in sequence composition though including a highly divergent CENP-B box.Moreover, their sequence analysis revealed significant homology values in the first 100 bp of both monomers, thus indicating that the two satellite classes share an evolutionary history (Lee, H. R. et al., 2011) (Fig. 2).

The evolution of alphoid DNA
Alphoid DNA in primates evolves rapidly (Mahtani & Willard, 1990;Wevrick & Willard, 1989); alpha-satellite DNA monomers evolve through a non-independent mechanism named molecular drive (Charlesworth et al., 1994;Smith, 1976;Stephan, 1986), a stochastic process in which mutations can accumulate, spread quickly through a repeat family, and fix in a population (Dover, 1982).In this evolutionary process mutations are homogenized throughout members of the satellite DNA family, and fixed within a species (Dover, 1982).Although Schindelhauer and Schwarz favour gene conversion as an explanation for both intrachromosomal and inter-homologue homogenization (Schindelhauer & Schwarz, 2002), only unequal crossover can explain the generation and maintenance of a multimeric higherorder repeat length, the extensive spread of sequence variants across megabases, and the rapid fall in sequence identity documented at the edge of the centromeric array.
Wu and Manuelidis proposed a two-step evolutionary process for the formation of tandem duplication arrays: after homology between two sequences is created, an unequal crossover might occur and thus result in dimer formation from divergent monomers; subsequently an amplification of the dimer into long tandem arrays might occur by subsequent unequal crossovers (Wu & Manuelidis, 1980).Eventually, a subset of monomers might be homogenized together to form HOR unit in which former monomers constitute subunits (Warburton & Willard, 1990;Willard & Waye, 1987).As the number of tandem repeats increases so will frequency of unequal crossovers between them.By this mechanism, variant nucleotides can be spread along tandem repeats at a rate much faster than, and independent of, the mutation rate (Choo, 1990).The two major types of phylogenetically distinct alpha satellite DNA existing in great apes are a consequence of the homogenization process: multimeric, higher-order repeats of ~171 bp units form centrally located, chromosomespecific alphoid domains (class A and B alphoid monomers), flanked by domains of more heterogeneous monomeric alpha-satellite from which they have evolved (class A alphoid monomers) lacking any further organization.
The final outcome of molecular drive is concerted evolution, exhibiting higher identity between HORs within a species than with the orthologous array in other species (Dover, 1982;Rudd et al., 2006;Willard & Waye, 1987), thus explaining the great diversity (24% divergence) seen between the human and chimpanzee alphoid regions on chromosomes 21 and 22.Given the neutral mutation rate of 0.13% per Myr and the estimated divergence of human and chimpanzee lineages 6-8 Mya (million years ago), those levels of divergence would have been absolutely unexpected.The extreme variability of HOR length among individuals may then be explained by the fact that unequal crossovers between higher-order repeat units will occur more frequently than between monomeric units because of the exceptionally high homology among HORs (Willard & Waye, 1987).Besides, these mechanisms appear to proceed in a localized, short-range fashion that leads to the formation of large domains of sequence identity, rather than among intra-or interchromosomal repeats (Dover, 2002), thus resulting in the chromosome-specific alphoid subdomains.As a consequence, adjacent monomers display a higher degree of sequence similarity (Durfy & Willard, 1989;Roizes, 2006;Schindelhauer & Schwarz, 2002;Willard & Waye, 1987) and monomers at array ends show lower identity due to the low efficiency of homogenization mechanisms at the edges of the satellite array (Schueler et al., 2005).
Unequal crossover events of alpha satellite arrays may represent both interchromosomal and intrachromosomal structural modifications.In the first case they will give rise to suprachromosomal families of higher-order alpha satellite (Alexandrov, I. A. et al., 1993;Waye & Willard, 1986), while in the latter they will result in chromosome-specific arrays of higher-order alpha satellite (Durfy & Willard, 1989;Schindelhauer & Schwarz, 2002;Schueler et al., 2001;Willard & Waye, 1987).In summary, the adjacent organization of higher-order and monomeric alpha satellite, as well as the fact that lower primates have only monomeric alpha satellite at their centromeres (Alves et al., 1994;Musich et al., 1980;Rosenberg et al., 1978), supports the hypothesis that higher-order alpha-satellite evolved from ancestral arrays of monomeric alpha-satellite and subsequently transposed to the centromeric regions of all great ape chromosomes (Alexandrov, I. et al., 2001;Kazakov et al., 2003;Schueler et al., 2001;Warburton et al., 1996).This is further confirmed by the age gradient revealed by L1 elements in alphoid regions.The theory is that after the insertion of active LINEs that disrupts the centromeric periodicity, thus compromising the centromere function, an expansion of alphoid DNA occurs in order to compensate this unrest (Schueler et al., 2001;Shepelev et al., 2009).As a consequence, the analysis of LINEs can be used to deduce and calculate the age of different satellite blocks.These studies reveal that the most distal alpha-satellite domain is the oldest, with an age gradient advancing proximally through the satellite region.
Finally, it is clear that monomeric alpha satellite present within the pericentromeric regions of human chromosomes predates higher-order arrays of alpha satellite and thus may represent direct descendants of the ancestral primate centromere sequence.Thus, monomeric alphoid arrays are likely the remnants of the centromeres of our primate ancestors, once active and homogenous, that have been replaced by HOR sequences that are a much more efficient substrate for homogenization.
In the evolution of the order Primates, the 171 bp repeat unit seems to be the starting point.Two 171 bp monomers were firstly amplified together as a dimer, then in the Platyrrini lineage, the two monomers began to accumulate differences due to the decrease of homogenization mechanisms, thus forming the specific New World monkeys ~342 bp monomeric unit (variation of this unit generated the 550 bp in Chiropotes and Cacajao).In the Catharrini ancestor, instead, the ~171 bp dimer continued to be amplified by unequal crossover, thus forming the dimeric structure common to all the centromeres as reported in macaque and baboon.Moreover, in the superfamily Hominoidea the 171 bp monomer amplified and diverged in monomeric arrays in gibbon and in higher-order repeats in orangutan, gorilla, chimpanzee, and human.

Methods for centromeric sequence analysis: History and news
The study and analysis of repetitive elements such as centromeric satellite are particularly complicated due the low-variation and high-copy-number nature of these sequences.The development of specific techniques and the application of conventional assays have both contributed in satellite DNA analysis over the last decades.
Elective methods for the analysis of repetitive DNA sequences were the measurement of DNA renaturation kinetics (Britten et al., 1974), the isopycnic centrifugation in gradients of CsCl and CsSO 4 (Szybalski, 1968), and hybridization techniques.The application of these techniques to the analysis of mammalian genomes revealed the existence of several DNA fractions, which differed in their physical and chemical properties.In particular, they allowed the identification and separation of repeated DNA sequences taking advantage of the different copy number and density, respectively, from the bulk of the whole DNA.In fact, the term "satellite" derives from the satellite bands in the genomic DNA profile in CsCl density-gradient centrifugation assay.Southern blot analyses have been successfully applied to discover and examine the tandem-repetitive nature of satellite DNA, with cleavage sites for a number of different restriction endonucleases spaced at highly regular intervals.Further, the development of sequencing strategies has allowed its characterization at the nucleotide level.Most of these historical techniques are still widely used for centromeric satellite sequence analysis, in combination with the newly developed ones.
Further advances came from cytogenetics and molecular cytogenetics that provided some insights in the location and organization of satellite sequences.C-banding has been used to specifically stain the constitutive heterochromatin, including centromeric regions; the use of satellite sequences as probes first in ISH (in situ hybridization) and then in FISH (Fluorescent in situ hybridization) experiments, has allowed the definition of their centromeric and pericentromeric localization and distribution among chromosomes.Thus, the application of such techniques clarified the role of satellite DNA as the main constituent of mammalian centromeric DNA.Besides the structurally characterization of centromeric heterochromatin, several functional studies of its protein constituents have been developed.The availability of the CREST antiserum, containing a mixture of antibodies against constitutive centromeric proteins, and of the anti-CENPA, anti-CENPB, anti-CENPC, and anti-CENPE specific antibodies, has allowed the localization of the centromeric functionality through immunofluorescence assays.The combination of satellite-FISH and immunofluorescence has definitely showed the co-localization of satellite DNA and centromeric functionality.The awareness of the binding between centromeric DNA and proteins has suggested the use of anti-CENP antibodies to isolate centromeric competent DNA from the bulk of the total DNA through chromatin immunoprecipitation (ChIP) assays.The subsequent FISH-localization, cloning, and sequencing of the isolated DNA allow the full characterization of centromeric competent DNA.
The recent development of high-throughput sequencing technologies has greatly increased the number of organisms with a sequenced genome.However, in genome sequencing projects of mammalian species, the assembly and characterization of centromeric regions cannot be directly achieved due to their repetitive and complex nature.In fact, in each human and other primate genome chromosome assembly there is no sequence in the existing gap between the p and q arms (Rudd & Willard, 2004).Nonetheless, whole-genome shotgun (WGS) sequence reads are not completely inapplicable for the characterization of centromeric regions and satellite DNA sequences.In fact, two computational methods were recently developed to isolate and characterize the centromeric repeats from WGS sequence data: HORdetect (Alkan et al., 2007) and RepeatNet (Alkan et al., 2010).HORdetect recovers alpha satellite sequences and predicts higher-order repeat structure in primate sequencing projects; RepeatNet allows the identification of higher-order repeat structures with no a priori information about the consensus, being thus applicable to any sequenced organism.This method has expanded the knowledge of mammalian satellite DNA and isolated for the first time centromeric satellites from species of the orders Cingulata, Proboscidea, and Didelphimorphia, previously reported in this chapter, although further efforts are needed to better characterize the isolated sequences.

Conclusion
The overall data collected so far allows the postulation of several considerations regarding centromeric satellite DNA in mammals.Several aspects are quite common among mammalian species, whereas others are shared among a subset of species, in some cases among evolutionarily distant ones, displaying an example of convergent evolution phenomena for satellite DNA.The main features described are summarized in Table 1.
Satellite DNA shows high variability across mammalian taxa in monomer size, nucleotide sequence, and quantity relative to the total genome.Although the satellite sequence is not evolutionarily conserved, there are recurrent elements among satellite sequences of closely related species.It is clear that satellite DNA follows a concerted evolution mechanism, so monomers are more similar to monomers of the same species than to monomers of other species.The amount of centromeric and pericentromeric satellite DNA is highly variable as well, and it is not related to the total genome size of the species.In fact, bats have a low C value but do not show a small relative amount of satellite DNA, as it was thought before its evaluation: the percentage of satellite DNA relative to the total DNA, 3-5%, is very similar to the data of other mammals (Table 1).
Quite common features of satellite DNA are: i) the presence of internal direct and inverted subrepeats (Bogenberger et al., 1985;Lee, C. & Lin, 1996;Zhang & Horz, 1984); ii) the presence of the same satellite DNA families at the centromeres of autosomes and the X chromosome, but not at the Y chromosome centromere, like in mouse, sheep, and bat; iii) a greater divergence of the Y centromere sequence in comparison to the other centromeres, like in mouse and primates.Sheep, swine, and horse have a different satellite content in acrocentric and metacentric chromosomes.In two evolutionarily distant mammals, the marsupial Macropus rufogriseus and the rodent Microtus chrotorrhinus, giant sex chromosomes derived from a large block of heterochromatin at the centromeric and pericentromeric regions have been observed.Finally, evidence of a different rate at which autosomes and sex chromosomes accumulate and dissipate centromeric material has been found in cervid deer, muntjac, and in the genus Macropus (Bulazel et al., 2006;Li et al., 2005;Lin & Li, 2006), with the retention for longer periods of time of tandem arrays of ancestral satellites in the sex chromosomes that are not found in the autosomes.
The centromeric satellite is valuable as phylogenetic marker to establish the evolutionarily relationships among species, when they are not found or are ambiguous in the fossil record or other data (Saffery et al., 1999).The centromeric satellite was used as a phylogenetic marker in regard to two different aspects: its nucleotide sequence and its chromosome localization and distribution.The analysis of satellite DNA sequence, thanks to its high divergence rate and rapid evolution during speciation, was used to define the evolutionarily relationships among closely related species that diverged recently.An example is provided by the analysis performed on the spiny mice satellites (Kunze et al., 1999).On the other hand, the comparison of satellite DNA in situ hybridization patterns and the study of the nature and amplification of the satellite DNA families on the autosomes and the X chromosome allowed to infer phylogeny and increase the resolution of the evolutionary tree of the Artiodactyla (Chaves et al., 2000;Chaves et al., 2005;Modi et al., 1996).In the family Equidae, the phylogeny of four Equus chromosomes was reconstructed by centromere and satellite DNA localization (Alkan et al., 2010;Wade et al., 2009).Moreover, the analysis of satellite DNA sequence, organization, and chromosome distribution, in conjunction with karyotype analysis, is a valuable tool to measure species relationships while also elucidating important aspects of both genome and repetitive sequence evolution.
All the data collected up to date in mammals suggest that centromere satellite sequences are neither necessary nor sufficient for centromere function, and that repetitive DNA is more likely a consequence than a source of centromere function.Nevertheless, the pancentromeric presence of satellite DNA on all mammalian centromeres clearly indicates that a repetitive, ordered, and homogenous sequence is important for centromere maintenance during the evolution of the species.

Fig. 1 .
Fig. 1.Organization of centromeric satellites at mammalian centromeres.top Phylogenetic relationships among mammalian species discussed in the chapter with satellite DNA at their centromeres.bottom Spatial organization of satellite families at the centromere and pericentromere of -from right to left-human chromosomes, mouse telocentric chromosomes, cow autosomes, sheep autosomes, and wallaby autosomes.Chromosome arms are blue coloured.In the wallaby there is the co-existence of a satellite (Mrb-sat23) and a non-satellite element (KERV).

Fig. 2 .
Fig. 2. Centromeric satellite organization in primates.left Simplified phylogenetic relationships among primate species.Estimated divergence times are reported in million of years.right Schematic organization of the alphoid arrays at primate centromeres.Arrows represent monomeric units.All Anthropoidea families have alphoid DNA at centromeres.For the genus Daubentonia (orange arrows) a model of the centromeric DMA1 and DMA2 satellites is proposed.