Phylogenetic Evolution and Phylogeography of Tibetan Sheep Based on mtDNA D-Loop Sequences Phylogenetic Evolution and Phylogeography of Tibetan Sheep Based on mtDNA D-Loop Sequences

The molecular and population genetic evidence of the phylogenetic status of the Tibetan sheep ( Ovis aries ) is not well understood, and little is known about this species’ genetic diversity. Phylogenetic relationship and phylogeography of 636 individual Tibetan sheep which were collected from the Qinghai-Tibetan Plateau area in China and were assessed using 642 complete sequences of the mitochondrial DNA D-loop. Reference data were obtained from the six reference breed sequences available in GenBank. Phylogeography analysis showed that all four previously defined haplogroups were found in the 15 Tibetan sheep populations but that only one haplogroup was found in Linzhou sheep. Furthermore, clustering analysis divided the 636 individual Tibetan sheep into at least two clusters. The estimated genetic distance and genetic differen tiation associate with altitude, suggesting geographic and adaptive effects in Tibetan sheep. These results contribute to the knowledge of Tibetan sheep populations and will help inform future conservation programs about the Tibetan sheep native to the Qinghai-Tibetan Plateau in China.


Introduction
In China, Tibetan sheep (Ovis aries) play an important role in agriculture, economy, culture and religion in Qinghai-Tibetan Plateau areas, and provide meat, wool, and pelts for local populations [1]. Due to the rich Tibetan sheep genetic resources, there are approximately 17 indigenous sheep populations in the Qinghai-Tibetan Plateau area. All indigenous Tibetan sheep adapted to the local environment [2]. Moreover, they are considered as critical genetic resources, so they are one of the most important components in agro-animal husbandry societies. However, in reality, most indigenous Tibetan sheep suffer a serious situation because of the low numbers of each population, which decline steadily since 30 years [3]. Qinghai-Tibetan Plateau areas have special climate and landforms in geography, which provide a different livelihood for Tibetan nomads because of high altitude and cold mountains [4]. Besides, Tibetan sheep bind up with their herders in culture, religion, and social life. The fossil remains indicate that the domestic Tibetan sheep date their wild ancestors from the Pleistocene period [5]. Archeological evidences indicate that domestication of the yak is likely to have been performed approximately 5000 YBP (years before present) by the ancient Qiang people in Northern Tibet [6]. According to the temporal scale, indigenous animals are considered as an ideal model for cold and hypoxia environment adaption studies, on account of their adaptions for high-altitude hypoxia. In addition, Tibetan sheep are isolated from other local sheep in China, due to the unavailable traffic to other parts of China or external countries (Nepal, India, Pakistan, etc.). The severe changes of the ecological environment, value of Tibetan sheep in illegal commerce, and deficiency of animal conservation may result in extinction [7]. So far, the genetic diversity, phylogenetic relationship, and maternal origin of the Qinghai-Tibetan Plateau populations remain uncertain and controversial.
The study of mitochondrial DNA (mtDNA) polymorphisms has been useful for describing the molecular phylogeny and diversity of this sheep [8][9][10][11], due to the extremely low rate of recombination of mtDNA, its maternal lineage heredity and its relatively fast substitution rate as compared to nuclear DNA [12]. In particular, the CR (the D-loop) is the main noncoding regulatory region for the transcription and replication of mtDNA [13]. Based on mtDNA sequence analysis, we investigate the history and phylogenic relationships of modern domestic Tibetan sheep populations [14].
It is possible to describe the genetic polymorphisms and maternal origin of Tibetan sheep, according to the variability and structure of the mtDNA control region because mtDNA merely has no recombination and follows simple maternal inheritance [14,15] and evolves relatively rapidly [16]. The higher substitution rate of CR, compared with the heterogeneity rate in the other parts of mtDNA, can characterize intraspecific and interspecific genetic diversity optimally [17][18][19][20][21]. This high mutation rate is because the CR remains single stranded for long periods during mitochondrial replication and transcription [22][23][24][25][26].
Here, we investigate the mtDNA D-loop variability of Tibetan sheep indigenous to the Qinghai-Tibetan Plateau areas. We increase the number of Tibetan sheep samples by including six available reference genomes from GenBank for our population genetic and phylogenetic analysis of the 15 Tibetan sheep populations, based on completion of the mtDNA control region. Our results provide insight into the phylogenetic evolution and maternal origin of Tibetan sheep and improve the management of sheep genetic resources and conservation of their genetic diversity.

Sample collection
We selected 15 Chinese local Tibetan sheep populations for investigation in this study. For analysis, 10 mL blood samples were collected from the jugular vein of each animal. From the 10 mL samples, 2 mL samples were quickly frozen in liquid nitrogen and stored at −80°C for genomic DNA extraction, as described previously [27]. The total DNA was extracted from the blood using the saturated salt method [28], quantified spectrophotometrically, and adjusted to 50 ng/ μL. Blood samples were collected from 636 sheep from 15 populations, living in the Qinghai-Tibetan Plateau areas in China. The genetic characteristics for these Tibetan sheep populations were analyzed in order to ascertain the phylogenetic evolution and phylogeography for the populations. The investigated populations included the following numbers and corresponding

Data collection
To achieve good coverage of the tested populations, a dataset of six referenced breeds was completed using the six submitted sequences containing the Ovis aries, Ovis vignei, and Ovis ammon mtDNA D-loops for the six individuals in GenBank [29,30].
Primers flanking sequences of the complete mtDNA D-loop was designed by an available genome sequence using the Primer Premier 5.0 software [31] and synthesized by BGI Shenzhen Technology Co., Ltd. (Shenzhen, China). The nucleotide sequence of reverse primer was 5'-GAACAACCAACCTCCCTAAG-3′, and the nucleotide sequence of forward primer was 5'-GGCTGGGACCAAACCTAT-3′. Polymerase chain reaction system (PCRs) took place in a 30 μL reaction system containing 2 μL genomic DNA (10 ng/μL) template, 2 μL dNTP (2.5 mM), 3 μL (3 pM) each primer, 3 μL 10× Ex Taq reaction buffer, 0.2 μL Taq DNA polymerase (5 μL/U) (TaKaRa, China), and 16.8 μL ddH 2 O approximately. The PCR conditions were as follows: initial denaturation for 5 min at 94°C, 36 cycles of denaturation at 94°C for 30 s, annealing at 56°C for 30 s, and extension at 72°C for 1.5 min. The final extension step was followed by a 10 min extension at 72°C. The PCR amplification products were subsequently stored at 12°C until use.
The amplified D-loop fragment was purified using a PCR gel extraction kit from Sangon Biotech Co., Ltd. (Shenzhen, China) and sequenced directly using a BigDye Terminator v3.1 cycle sequencing ready reaction kit (Applied Biosystems, Darmstadt, Germany) in an automatic sequencer (ABI-PRISM 3730 genetic analyzer, Applied Biosystems, Foster City, California, United states of America). PCR for the sequencing was performed in an automatic sequencer with a total reaction volume of approximately 5 μL containing 3 μL genomic DNA (10 ng/μL), 1 μL (3 pM) of each sequencing primer, 0.5 μL BigDye, and 0.5 μL ddH 2 O. The sequencing conditions were as follows: initial denaturation for 2 min at 95°C, 25 cycles of denaturation at 95°C for 10 s, and annealing at 51°C for 10 s. The final extension step was followed by a 190 s extension at 60°C. The PCR sequencing products were subsequently stored at 12°C until use.

Data analysis
The sequences were arranged for multiple comparisons using Clustal Omega [32] and were aligned using ClustalW and BLAST [33]. These results were compared with other sequences obtained from GenBank. The reference sequences for tree construction were taken from the maternal lineages of each tree: haplogroup A (AF039578), haplogroup B (AF039577, AY582801, and AY091487), haplogroup E (AY091490, AJ238300). The diversity parameters, including the haplotype diversity, nucleotide diversity and average number of nucleotide differences, were estimated using DnaSP (Sequence Polymorphism Software) 5.10.01 [34]. G ST , F ST , N m , AMOVA test, and neutrality tests were estimated using Arlequin version 3.5.1.2 [35]. To identify differences between the geographic regions using the AMOVA program, four groups were established. The phylogenetic and molecular evolutionary relationships, D xy , D a , ME phylogenetic haplotype and clustering tree, and genetic distance were assessed using MEGA version 6.0 [36]. We sketched the network and mismatched distribution graphs using the median-joining method implemented in the NETWORK version 4.6.1.2 software to assess the haplotype relationships [37].

Polymorphic site and sequencing analysis of the complete control region
Based on the reference sequences from GenBank accession numbers (AY091487, AY091490, AJ238300, AF039578, AF039577, AY582801), all of the sequences were aligned with 1274 comparative sites, and 350 haplotypes were obtained from the 642 sequenced individuals. The length of the sequences obtained from the 636 individuals varied considerably, between 1031 and 1259 bp, although the majority were between 1180 and 1183 bp [29]. A total of 196 variable sites were obtained from the sequences, including 63 singleton variable sites and 133 parsimony-informative variable sites. There were 158 transitions and 38 transversions within the 196 variable sites, among which 15 sites had both transitions and transversions. Transition mutations were caused by observed substitutions. The variability of the number of 75 bp tandem repeat motifs [38] caused the observed variation in the length of the mtDNA D-loop sequences of the Tibetan sheep, with the exception of the insertion or deletion of several nucleotide sites.
Whole haplotypes' nucleotide composition was 32.96% A, 29.71% T, 22.89% C, 14.44% G, 62.67% A+T, and 37.33% G+C, and the A+T were more common than the G+C haplotype substantially, showing an AT bias [29]. The largest haplogroup A consisted of 490 individuals and 259 haplotypes; the next largest haplogroup B and haplogroup C consisted of 145 individuals and 43 haplotypes. The smallest haplogroup D consisted of 1 individual and 1 haplotype. The number of haplotypes, individuals, and frequency detected in each Tibetan sheep population of haplotype group varied from 1 to 49, from 0 to 62, and from 0 to 0.88, respectively. The haplotype diversity and nucleotide diversity were calculated separately for each Tibetan sheep population and were estimated to be 0.99 ± 0.01 and 0.02 ± 0.00, respectively. The values of haplotype diversity and nucleotide diversity ranged from 0.90 ± 0.16 to 1.00 ± 0.05 and from 0.01 ± 0.00 to 0.03 ± 0.00, respectively, thus demonstrating the high level of genetic diversity in the 15 Tibetan sheep populations. The nucleotide diversity value of the LZ and JZ populations was higher than that of the remaining 13 Tibetan sheep populations, indicating a relatively high level of diversity. Similarly, the haplotype diversity values were highest in LKZ and LZ populations and the lowest in the AW population.

Genetic distance and average number of nucleotide differences
The study presents the genetic distance and average number of nucleotide differences between and within the 15 Tibetan sheep populations. The genetic distance values ranged from 0.01 to 0.04 within the population diagonals, and the genetic distance values ranged from 0.01 to 0.04 among populations in Liu et al. [29]. Among the Tibetan sheep populations, the genetic distance within populations reached a maximum value in LZ and a minimum value in AW. Similarly, the genetic distance between the populations had a maximum value for LZ and JZ and a minimum value for AW and TJ. The average number of nucleotide differences values ranged from 10.00 to 29.81 within populations along the digital diagonal, and the average number of nucleotide difference values ranged from 10.73 to 30.99 between the populations below the diagonals. Among the Tibetan sheep populations, the average number of nucleotide differences within the populations reached its value maximum in LZ and its minimum value in AW. Similarly, the average number of nucleotide differences between populations reached a value maximum in LZ and JZ and a minimum value in AW and TJ populations [29].

Genetic distances and altitude
We test whether genetic distances between populations can be explained by absolute differences between altitudes for the 15 Tibetan sheep populations. Graphically, for the focal population of LZ, Figure 1 plots the genetic distance between population LZ and each of the remaining populations as a function of the absolute difference in altitudes. Genetic distance tends to decrease with absolute difference in altitudes, as estimated by the Pearson correlation coefficient (r = −0.4136, two tailed P = 0.063, square root of 0.1711 indicated in Figure 1). This tendency is observed in 10 among the 15 sheep populations, but is never statistically significant at P < 0.05 (see Table 1). It is strongest (most negative) for high altitude populations and weakest (most positive) for populations living at low altitudes. This association between altitude and Pearson correlation coefficients obtained between genetic distances and absolute differences in altitudes ( Table 1) has itself r = −0.65, one tailed P = 0.0044.

Genetic differentiation
To examine the genetic differentiation between the 15 Tibetan sheep populations, we calculated F ST and G ST . We also calculated N m , D xy , and D a among the 15 studied Tibetan sheep populations [29]. The estimated pairwise F ST values are from Liu et al. [29]. The F ST values ranged from −0.05 to 0.24. DM and LKZ had the closest pairwise F ST value among the 15 Tibetan sheep populations and AW were more distantly related to JZ, compared with other

Genetic differentiation and altitude
We test whether genetic differentiation between populations can be explained by altitude.
Graphically, for the focal population of GD, Figure 2 plots genetic differentiation between GZ and each of the remaining populations as a function of the absolute value of the difference between their altitudes. Genetic differentiation tends to increase with altitude (r = 0.4315, one-tailed P = 0.062) (square root of 0.1862 indicated in Figure 2). Analyses similar to those in Figure 2 showed that genetic differentiation increases with absolute differences in altitude, specifically for nine populations ((significance at P < 0.05 indicated by *) QL*, TJ*, QH*, MX*, GJ, QK*, GN*, DM, and LKZ) and decreases for the remaining five populations (GB,HB, JZ, LZ, and AW), mainly GB* and HB* ( Table 1).
This association between genetic differentiation and absolute difference between altitudes is most positive for populations at high altitudes and most negative for those at low altitudes. This association between altitude and Pearson correlation coefficients obtained between genetic differentiations and absolute differences in altitudes ( Table 1) has itself r = −0.75, one-tailed P = 0.0011.

Phylogenetic relationships
To extend our knowledge of the phylogenetic relationship of the 15 Tibetan sheep populations, a phylogenetic tree was constructed using ME based on the complete mtDNA D-loop sequences of 642 individuals and 350 haplotypes from 15 Tibetan sheep populations and six reference breeds [29]. We determined four distinct cluster haplogroups:  the populations significantly (P < 0.05). The F ST was 0.05, which indicated that 4.50% of the total genetic variation was due to differences between populations, and the remaining 95.50% came from differences among individuals within each population.

Population expansions
The sample size for most of the populations was more than 30, so the detection of population expansion was performed at the individual population level (data not shown) and in all haplotype sequences. Liu [29]. The charts of the mismatch distribution for the samples of the 15 Tibetan sheep populations and the total samples were multimodal. However, the mismatch distribution for LZ was a unimodal function. The mismatch distribution of the complete dataset showed that there were two major peaks, with maximum values at 4 and 27 pairwise differences and two smaller peaks at 45 and 51 pairwise differences [29]. These results suggest that at least two expansion events occurred during the population demographic history of the Tibetan sheep population. The mismatch distribution analysis revealed a unimodal bell-shaped distribution of pairwise sequence differences in lineages A, B and C, but that of the lineage D was a sampling function duo to small sample effects.

High mtDNA D-loop diversity of Tibetan sheep populations
The 15 Tibetan sheep populations in our study showed a high level of genetic diversity. This finding is consistent with archeological data and other genetic diversity studies [21,[40][41][42][43], while in this study, the haplotype diversity was higher than that found in a previous study [44], and the nucleotide diversity was lower compared with the data in a previous study [7]. The genetic diversity among the 15 Tibetan sheep populations was relatively higher compared with other sheep populations [1,44]. For instance, the haplotype diversity values of Turkish sheep breeds distributed in a Turkish population were 0.95 ± 0.01 [44]. However, according to Walsh's work, based on the required sample size for the diagnosis of conservation units [45], a sample of 59 individuals fails necessary to support the hypothesis that individuals with unstamped ("hidden") character states exist in the population size. Thus, the sample size necessary to reject a hidden state frequency of 0.05 is 56 when sampling from a finite population of 500 individuals. Therefore, because of the large sample size, genetic diversity estimation reflects precisely Tibetan sheep. For the LZ, LKZ, HB, QH, GD, TJ, GJ, QK, GB and GN with broad distribution, a high genetic diversity could only be observed within the studies containing large sample size or wide collection areas. However, if more samples were involved in the study, a higher diversity could be found, suggesting further investigation of the genetic diversity of these 15 Tibetan sheep populations. These Tibetan sheep populations had been experiencing a genetic bottleneck during the twentieth century and were classified as the rarest sheep populations [46]. In addition, among the 15 Tibetan sheep populations, the positive Ewens-Watterson and Chakraborty's values were significantly different, suggesting a previous decline in the population size of the mtDNA D-loop diversity. This finding was consistent with the results of a previous study [46]. Such genetic diversity may be caused by an increased mutation rate in the mtDNA D-loop, the maternal effects of multiple wild ancestors, overlapping generations, the mixing of populations from different geographical locations, natural selection favoring heterozygosis or subdivision accompanied by genetic drift [42,43].

Maternal origins of the Tibetan sheep populations
The sequence motifs from the 1180 bp to the 1183 bp region of the mtDNA D-loop form the basis for the four major maternal lineages in the Tibetan sheep mtDNA haplotypes. Maternal lineage D was the rarest. The Tibetan sheep haplotypes belonged to all four major maternal lineages, although only 0.16% belonged to maternal lineages D. This finding demonstrated that populations of Tibetan sheep possess abundant mtDNA diversity, and therefore, there is a widespread origin of their maternal lineages. Furthermore, the thoroughbred Tibetan sheep has been proposed to be shared in the maternal lineages A, and the contribution of Asian sheep breeds to this population has also been reported in Liu et al. [29]. In this study, maternal lineages B and C were found in common among the overall sequences of all 15 Tibetan sheep populations, including the 14 Tibetan sheep populations respectively other than DM and AW. It is generally acknowledged that domestic sheep have two maternal lineages (A and B), based on the earlier mtDNA analysis [7,13,16,47]. Recently, a new maternal lineage (C) was found in Chinese domestic sheep [40,42,43]. The ME phylogenetic tree median-joining analyses, revealed the presence of four maternal lineages in the Tibetan sheep populations. Of these groups, the maternal lineage A was predominant, and the maternal lineage B and C were the second most common. The proportion of maternal lineage D was 0.16%, further demonstrating that lineage D is the rarest among mtDNA sheep lineages [29]. Our findings coincided with previous studies and supported the results on domestic sheep breeds in China [48,49]. Three mtDNA of maternal lineages were identified in both China [39,42,43,49] and other countries [50][51][52]. The four mtDNA maternal lineages found in the Tibetan sheep populations in the Qinghai-Tibetan plateau areas further supported the hypothesis of multiple maternal origins in Chinese domestic sheep.

Genetic differentiation of Tibetan sheep populations
In this study, the AMOVA analysis revealed the distinct population of Qinghai-Tibetan Plateau areas among other Tibetan sheep populations, with a significant positive variance in the meanwhile. Gene flow (N m ), also known as gene migration, refers to the transfer of alleles from one population to another. Poor gene exchange is indicated when N m haplotype values are >1 and N m sequences <1, a, which means that genetic drift results in substantial local differentiation [53,54].
The low G ST value, combined with the low N m of sequences used in this study, indicates that the great differentiation mainly resulted from the independent evolution of each isolated population and substantial local differentiation caused by genetic drift [55]. The lower effective population sizes may contribute too, as the GN, QK, GJ and QL live in canyons and valleys, so lower population sizes were not available for migration compared with other Tibetan sheep. As the effective population size declines, the nucleotide substitutions might reach fixation [48,56].  Figure 2). This association is statistically significant according to Fisher's exact test (an one-tailed P = 0.032). The exceptions to this association with phylogeny are populations GJ, TJ and JZ. These results on associations between genetic differentiation and altitude suggest that part of the genetic differentiation between populations might be adaptive in relation to altitude and/or climate.

Genetic relationships among the Tibetan sheep populations
The study showed that the 15 Tibetan sheep populations from the Qinghai-Tibetan Plateau divide into four maternal lineages: 490 Tibetan sheep represent the maternal origin of the maternal lineage A, 64 Tibetan sheep represent the maternal origin of the maternal lineage B, 81 Tibetan sheep represent the maternal origin of the maternal lineage C, and 1 Tibetan sheep represents the maternal origin of the maternal lineage D. This genetic relationship displayed a high consistency with traditional classification schemes and the results of previous studies [40,[57][58][59][60][61]. Fifteen Tibetan sheep populations derived from four maternal lineages. On the one hand, Tibetan sheep show a great value as portable food and wool resource; on the other hand, the commercial trade and extensive transport of sheep, along with human migratory paths, might promote the observed genetic exchange. Other study methods such as genetic approaches, including the degree method and the phylogenetic relationship clustering method, also indicated that indigenous sheep were the maternal lineages A, B, C, and D [58,60].
The study presents the Pearson correlation coefficient of genetic differentiation with the corresponding altitude difference between the 15 Tibetan sheep populations. The study showed that the correlation between altitude and genetic differentiation increases with altitude, specifically for 10 populations (GJ, GD, TJ, MX, QK, QL, QH, GN, DM and LKZ), with r ranging from 0.089 to 0.584, and decreases for the remaining five populations (GB, HB, JZ, LZ and AW), with r from −0.512 to −0.011. This evidence based on the large-scale mtDNA D-loop sequences analysis of 15 Tibetan sheep populations indicates effects associated to climate or isolation of phylogeography on genetic differentiation. These results indicate an adaptive effect associated with altitude or geography, which coincides with previous studies [29,42,43,62].

Population expansion of Tibetan sheep populations
The mismatch distribution analysis of the complete dataset, maternal lineages A, B, C, D, and 15 Tibetan sheep populations of the mtDNA D-loop, is presented [29]. Neutrality tests (Ewens-Watterson test, Chakraborty's test, Tajima's D test, Fu's FS test) were used to detect population expansion [29]. The complete dataset of all Tibetan sheep populations had a significantly large negative Tajima's D value and F S value. This result shows two large and sudden expansions, consistent with a demographic model, as inferred from the mismatch distribution. The mismatch distribution of the complete dataset suggested that there were two major peaks with maximum values at 4 and 27 pairwise differences and two smaller peaks at 45 and 51 differences. Based on the results, it could be implied that there are at least two expansion events occurred in the population demographic history of the Tibetan sheep, which live on the Qinghai-Tibetan Plateau. The mismatch distribution analysis revealed a unimodal bell-shaped distribution of the pairwise sequence differences in maternal lineages A, B and C. However, the distribution of maternal lineage D was a sambong function, duo to the geographic distribution patterns of species diversity. Mismatch analysis of maternal lineages A, B, and C suggested that it happened in the demographic history of Tibetan sheep populations that single population expansion events occurred before. Similar results were found in previous reports [42,43].

Phylogenetic analysis of the Tibetan sheep populations
Phylogenetic analyses of complete mitogenomes showed a high resolution among wild sheep as well as among the major lineages of domestic sheep [62]. The complete mitogenomes of O. orientalis and O. musimon formed a monophyletic group that was incorporated within lineage B of domestic sheep. However, the analysis of full control region and D-loop fragments showed that O. orientalis is also closely related to other lineages of O. aries. This difference could be ascribed to the small number of O. musimon and O. orientalis complete mitogenomes available in this study.
Full control region from the complete mitogenomes produced similar phylogenies with fully resolved phylogenetic relationships of wild sheep, but they failed to define the phylogenetic relationships among the major lineages of domestic sheep. Our results suggest that partial fragments of the complete mitogenomes would be problematic when making phylogenetic inferences about domestic sheep. This problem arises due to diagnostic substitutions located elsewhere in the mitogenome [62]. Thus, the diagnostic substitutions for species and lineages presented [62] here can serve as an important resource for maternal genetic differentiation between domestic and wild sheep as well as between the lineages within domestic sheep. Also, they might be helpful for addressing certain conflicts described above in future.

Conclusion
High mtDNA genetic diversity in the sheep from the Qinghai-Tibetan Plateau areas is a rich resource for China. The evidences indicate the high diversity of four maternal lineages by doing the large-scale mtDNA D-loop sequences analysis of 15 Tibetan sheep populations. Although the maternal lineage D was only found in a single LZ, phylogenetic analysis showed that four maternal lineages (A, B, C and D), previously defined, could be identified in the 636 tested individuals of the 15 Tibetan sheep populations. The estimation of demographic parameters from the mismatch analyses shows that maternal lineages A, B and C had at least one demographic expansion in the Tibetan sheep of the Qinghai-Tibetan Plateau areas.