Altitude and Pearson correlation coefficients between absolute differences in altitude and each genetic distance and genetic differentiation between 15 Tibetan sheep populations.
The molecular and population genetic evidence of the phylogenetic status of the Tibetan sheep (Ovis aries) is not well understood, and little is known about this species’ genetic diversity. Phylogenetic relationship and phylogeography of 636 individual Tibetan sheep which were collected from the Qinghai-Tibetan Plateau area in China and were assessed using 642 complete sequences of the mitochondrial DNA D-loop. Reference data were obtained from the six reference breed sequences available in GenBank. Phylogeography analysis showed that all four previously defined haplogroups were found in the 15 Tibetan sheep populations but that only one haplogroup was found in Linzhou sheep. Furthermore, clustering analysis divided the 636 individual Tibetan sheep into at least two clusters. The estimated genetic distance and genetic differentiation associate with altitude, suggesting geographic and adaptive effects in Tibetan sheep. These results contribute to the knowledge of Tibetan sheep populations and will help inform future conservation programs about the Tibetan sheep native to the Qinghai-Tibetan Plateau in China.
- Tibetan sheep
- mtDNA D-loop
- maternal lineage
In China, Tibetan sheep (
The study of mitochondrial DNA (mtDNA) polymorphisms has been useful for describing the molecular phylogeny and diversity of this sheep [8, 9, 10, 11], due to the extremely low rate of recombination of mtDNA, its maternal lineage heredity and its relatively fast substitution rate as compared to nuclear DNA . In particular, the CR (the D-loop) is the main noncoding regulatory region for the transcription and replication of mtDNA . Based on mtDNA sequence analysis, we investigate the history and phylogenic relationships of modern domestic Tibetan sheep populations .
It is possible to describe the genetic polymorphisms and maternal origin of Tibetan sheep, according to the variability and structure of the mtDNA control region because mtDNA merely has no recombination and follows simple maternal inheritance [14, 15] and evolves relatively rapidly . The higher substitution rate of CR, compared with the heterogeneity rate in the other parts of mtDNA, can characterize intraspecific and interspecific genetic diversity optimally [17, 18, 19, 20, 21]. This high mutation rate is because the CR remains single stranded for long periods during mitochondrial replication and transcription [22, 23, 24, 25, 26].
Here, we investigate the mtDNA D-loop variability of Tibetan sheep indigenous to the Qinghai-Tibetan Plateau areas. We increase the number of Tibetan sheep samples by including six available reference genomes from GenBank for our population genetic and phylogenetic analysis of the 15 Tibetan sheep populations, based on completion of the mtDNA control region. Our results provide insight into the phylogenetic evolution and maternal origin of Tibetan sheep and improve the management of sheep genetic resources and conservation of their genetic diversity.
2. Materials and methods
2.1. Sample collection
We selected 15 Chinese local Tibetan sheep populations for investigation in this study. For analysis, 10 mL blood samples were collected from the jugular vein of each animal. From the 10 mL samples, 2 mL samples were quickly frozen in liquid nitrogen and stored at −80°C for genomic DNA extraction, as described previously . The total DNA was extracted from the blood using the saturated salt method , quantified spectrophotometrically, and adjusted to 50 ng/μL. Blood samples were collected from 636 sheep from 15 populations, living in the Qinghai-Tibetan Plateau areas in China. The genetic characteristics for these Tibetan sheep populations were analyzed in order to ascertain the phylogenetic evolution and phylogeography for the populations. The investigated populations included the following numbers and corresponding population types: 39 Guide Black Fur sheep (GD), 44 Qilian White Tibetan sheep (QL), 64 Tianjun White Tibetan sheep (TJ), 44 Qinghai Oula sheep (QH), 67 Minxian Black Fur sheep (MX), 58 Ganjia Tibetan sheep (GJ), 71 Qiaoke Tibetan sheep (QK), 52 Gannan Oula sheep (GN), 10 Langkazi Tibetan sheep (LKZ), 46 Jiangzi Tibetan sheep (JZ), 85 Gangba Tibetan sheep (GB), 34 Huoba Tibetan sheep (HB), 8 Duoma Tibetan sheep (DM), 5 Awang Tibetan sheep (AW), and 9 Linzhou Tibetan sheep (LZ) raised in China. The sampling information (population code, sample number, altitude, longitude and latitude, accession number, sampling location, and geographical location) for the 15 indigenous Tibetan sheep populations is shown by Liu et al. .
2.2. Data collection
To achieve good coverage of the tested populations, a dataset of six referenced breeds was completed using the six submitted sequences containing the
Primers flanking sequences of the complete mtDNA D-loop was designed by an available genome sequence using the Primer Premier 5.0 software  and synthesized by BGI Shenzhen Technology Co., Ltd. (Shenzhen, China). The nucleotide sequence of reverse primer was 5’-GAACAACCAACCTCCCTAAG-3′, and the nucleotide sequence of forward primer was 5’-GGCTGGGACCAAACCTAT-3′. Polymerase chain reaction system (PCRs) took place in a 30 μL reaction system containing 2 μL genomic DNA (10 ng/μL) template, 2 μL dNTP (2.5 mM), 3 μL (3 pM) each primer, 3 μL 10× Ex Taq reaction buffer, 0.2 μL Taq DNA polymerase (5 μL/U) (TaKaRa, China), and 16.8 μL ddH2O approximately. The PCR conditions were as follows: initial denaturation for 5 min at 94°C, 36 cycles of denaturation at 94°C for 30 s, annealing at 56°C for 30 s, and extension at 72°C for 1.5 min. The final extension step was followed by a 10 min extension at 72°C. The PCR amplification products were subsequently stored at 12°C until use.
The amplified D-loop fragment was purified using a PCR gel extraction kit from Sangon Biotech Co., Ltd. (Shenzhen, China) and sequenced directly using a BigDye Terminator v3.1 cycle sequencing ready reaction kit (Applied Biosystems, Darmstadt, Germany) in an automatic sequencer (ABI-PRISM 3730 genetic analyzer, Applied Biosystems, Foster City, California, United states of America). PCR for the sequencing was performed in an automatic sequencer with a total reaction volume of approximately 5 μL containing 3 μL genomic DNA (10 ng/μL), 1 μL (3 pM) of each sequencing primer, 0.5 μL BigDye, and 0.5 μL ddH2O. The sequencing conditions were as follows: initial denaturation for 2 min at 95°C, 25 cycles of denaturation at 95°C for 10 s, and annealing at 51°C for 10 s. The final extension step was followed by a 190 s extension at 60°C. The PCR sequencing products were subsequently stored at 12°C until use.
2.3. Data analysis
The sequences were arranged for multiple comparisons using Clustal Omega  and were aligned using ClustalW and BLAST . These results were compared with other sequences obtained from GenBank. The reference sequences for tree construction were taken from the maternal lineages of each tree: haplogroup A (AF039578), haplogroup B (AF039577, AY582801, and AY091487), haplogroup E (AY091490, AJ238300). The diversity parameters, including the haplotype diversity, nucleotide diversity and average number of nucleotide differences, were estimated using DnaSP (Sequence Polymorphism Software) 5.10.01 .
3.1. Polymorphic site and sequencing analysis of the complete control region
Based on the reference sequences from GenBank accession numbers (AY091487, AY091490, AJ238300, AF039578, AF039577, AY582801), all of the sequences were aligned with 1274 comparative sites, and 350 haplotypes were obtained from the 642 sequenced individuals. The length of the sequences obtained from the 636 individuals varied considerably, between 1031 and 1259 bp, although the majority were between 1180 and 1183 bp . A total of 196 variable sites were obtained from the sequences, including 63 singleton variable sites and 133 parsimony-informative variable sites. There were 158 transitions and 38 transversions within the 196 variable sites, among which 15 sites had both transitions and transversions. Transition mutations were caused by observed substitutions. The variability of the number of 75 bp tandem repeat motifs  caused the observed variation in the length of the mtDNA D-loop sequences of the Tibetan sheep, with the exception of the insertion or deletion of several nucleotide sites.
Whole haplotypes’ nucleotide composition was 32.96% A, 29.71% T, 22.89% C, 14.44% G, 62.67% A+T, and 37.33% G+C, and the A+T were more common than the G+C haplotype substantially, showing an AT bias . The largest haplogroup A consisted of 490 individuals and 259 haplotypes; the next largest haplogroup B and haplogroup C consisted of 145 individuals and 43 haplotypes. The smallest haplogroup D consisted of 1 individual and 1 haplotype. The number of haplotypes, individuals, and frequency detected in each Tibetan sheep population of haplotype group varied from 1 to 49, from 0 to 62, and from 0 to 0.88, respectively. The haplotype diversity and nucleotide diversity were calculated separately for each Tibetan sheep population and were estimated to be 0.99 ± 0.01 and 0.02 ± 0.00, respectively. The values of haplotype diversity and nucleotide diversity ranged from 0.90 ± 0.16 to 1.00 ± 0.05 and from 0.01 ± 0.00 to 0.03 ± 0.00, respectively, thus demonstrating the high level of genetic diversity in the 15 Tibetan sheep populations. The nucleotide diversity value of the LZ and JZ populations was higher than that of the remaining 13 Tibetan sheep populations, indicating a relatively high level of diversity. Similarly, the haplotype diversity values were highest in LKZ and LZ populations and the lowest in the AW population.
3.2. Genetic distance and average number of nucleotide differences
The study presents the genetic distance and average number of nucleotide differences between and within the 15 Tibetan sheep populations. The genetic distance values ranged from 0.01 to 0.04 within the population diagonals, and the genetic distance values ranged from 0.01 to 0.04 among populations in Liu et al. . Among the Tibetan sheep populations, the genetic distance within populations reached a maximum value in LZ and a minimum value in AW. Similarly, the genetic distance between the populations had a maximum value for LZ and JZ and a minimum value for AW and TJ. The average number of nucleotide differences values ranged from 10.00 to 29.81 within populations along the digital diagonal, and the average number of nucleotide difference values ranged from 10.73 to 30.99 between the populations below the diagonals. Among the Tibetan sheep populations, the average number of nucleotide differences within the populations reached its value maximum in LZ and its minimum value in AW. Similarly, the average number of nucleotide differences between populations reached a value maximum in LZ and JZ and a minimum value in AW and TJ populations .
3.3. Genetic distances and altitude
We test whether genetic distances between populations can be explained by absolute differences between altitudes for the 15 Tibetan sheep populations. Graphically, for the focal population of LZ, Figure 1 plots the genetic distance between population LZ and each of the remaining populations as a function of the absolute difference in altitudes. Genetic distance tends to decrease with absolute difference in altitudes, as estimated by the Pearson correlation coefficient (r = −0.4136, two tailed P = 0.063, square root of 0.1711 indicated in Figure 1). This tendency is observed in 10 among the 15 sheep populations, but is never statistically significant at P < 0.05 (see Table 1). It is strongest (most negative) for high altitude populations and weakest (most positive) for populations living at low altitudes. This association between altitude and Pearson correlation coefficients obtained between genetic distances and absolute differences in altitudes (Table 1) has itself r = −0.65, one tailed P = 0.0044.
|Population||Altitude, m||Genetic dist, r||Genetic diff, r|
3.4. Genetic differentiation
To examine the genetic differentiation between the 15 Tibetan sheep populations, we calculated
The distribution of the 15 Tibetan sheep populations varied according to their
Liu et al. present the 15 Tibetan sheep populations’
Liu et al. provided the data of the
3.5. Genetic differentiation and altitude
We test whether genetic differentiation between populations can be explained by altitude. Graphically, for the focal population of GD, Figure 2 plots genetic differentiation between GZ and each of the remaining populations as a function of the absolute value of the difference between their altitudes. Genetic differentiation tends to increase with altitude (r = 0.4315, one-tailed P = 0.062) (square root of 0.1862 indicated in Figure 2). Analyses similar to those in Figure 2 showed that genetic differentiation increases with absolute differences in altitude, specifically for nine populations ((significance at P < 0.05 indicated by *) QL*, TJ*, QH*, MX*, GJ, QK*, GN*, DM, and LKZ) and decreases for the remaining five populations (GB,HB, JZ, LZ, and AW), mainly GB* and HB* (Table 1).
This association between genetic differentiation and absolute difference between altitudes is most positive for populations at high altitudes and most negative for those at low altitudes. This association between altitude and Pearson correlation coefficients obtained between genetic differentiations and absolute differences in altitudes (Table 1) has itself r = −0.75, one-tailed P = 0.0011.
3.6. Phylogenetic relationships
To extend our knowledge of the phylogenetic relationship of the 15 Tibetan sheep populations, a phylogenetic tree was constructed using ME based on the complete mtDNA D-loop sequences of 642 individuals and 350 haplotypes from 15 Tibetan sheep populations and six reference breeds . We determined four distinct cluster haplogroups: A, B, C and D. Of the 350 haplotypes, there was no common haplotype identified in all of the Tibetan sheep populations; 98 haplotypes were shared, and 252 haplotypes were singletons, which including 38 in GB, 33 in GJ, 28 in TJ, and 24 in QH. The leading haplotype (Hap 39) was found in 39 individuals. The next most common haplotype was Hap 42, composed of 19 individuals, and the remaining nine haplotypes were composed of 7–10 individuals. Haplotype 42 was composed of JZ, MX, QL and TJ. Haplotype 4 was composed of 14 of the Tibetan sheep populations, but excluding LKZ, indicating close clustering. The majorities of the 490 individuals were grouped in haplogroup A, followed by haplogroups B and C; however, only one animal from the LZ belonged to haplogroup D. The DM was composed of two haplogroups, the AW was composed of one haplogroup and the remaining 13 Tibetan sheep populations were composed of three haplogroups . Moreover, the maximum composite likelihood method was used to analyze the genetic distance between populations, which were in the units of the number of base substitutions per site. More specifically, the neighbor-joining phylogenetic tree of the 642 sequences of the mtDNA D-loop, based on units of the number of base substitutions per site divided the 15 Tibetan sheep populations and six reference breeds into four groups effectively.
3.7. Population expansions
The sample size for most of the populations was more than 30, so the detection of population expansion was performed at the individual population level (data not shown) and in all haplotype sequences. Liu et al. showed the mismatch distribution analysis of the complete dataset (lineages A, B, C, D and 15 Tibetan sheep populations of mtDNA D-loop) . Neutrality tests (Ewens-Watterson test, Chakraborty’s test, Tajima’s D test, Fu’s FS test) were used to detect population expansion . The charts of the mismatch distribution for the samples of the 15 Tibetan sheep populations and the total samples were multimodal. However, the mismatch distribution for LZ was a unimodal function. The mismatch distribution of the complete dataset showed that there were two major peaks, with maximum values at 4 and 27 pairwise differences and two smaller peaks at 45 and 51 pairwise differences . These results suggest that at least two expansion events occurred during the population demographic history of the Tibetan sheep population. The mismatch distribution analysis revealed a unimodal bell-shaped distribution of pairwise sequence differences in lineages A, B and C, but that of the lineage D was a sampling function duo to small sample effects. The complete dataset of 15 Tibetan sheep populations did not produce a significantly negative Ewens-Watterson test, whereas Chakraborty’s neutrality test of JZ was significant (12.63,
4.1. High mtDNA D-loop diversity of Tibetan sheep populations
The 15 Tibetan sheep populations in our study showed a high level of genetic diversity. This finding is consistent with archeological data and other genetic diversity studies [21, 40, 41, 42, 43], while in this study, the haplotype diversity was higher than that found in a previous study , and the nucleotide diversity was lower compared with the data in a previous study . The genetic diversity among the 15 Tibetan sheep populations was relatively higher compared with other sheep populations [1, 44]. For instance, the haplotype diversity values of Turkish sheep breeds distributed in a Turkish population were 0.95 ± 0.01 . However, according to Walsh’s work, based on the required sample size for the diagnosis of conservation units , a sample of 59 individuals fails necessary to support the hypothesis that individuals with unstamped (“hidden”) character states exist in the population size. Thus, the sample size necessary to reject a hidden state frequency of 0.05 is 56 when sampling from a finite population of 500 individuals. Therefore, because of the large sample size, genetic diversity estimation reflects precisely Tibetan sheep. For the LZ, LKZ, HB, QH, GD, TJ, GJ, QK, GB and GN with broad distribution, a high genetic diversity could only be observed within the studies containing large sample size or wide collection areas. However, if more samples were involved in the study, a higher diversity could be found, suggesting further investigation of the genetic diversity of these 15 Tibetan sheep populations. These Tibetan sheep populations had been experiencing a genetic bottleneck during the twentieth century and were classified as the rarest sheep populations . In addition, among the 15 Tibetan sheep populations, the positive Ewens-Watterson and Chakraborty’s values were significantly different, suggesting a previous decline in the population size of the mtDNA D-loop diversity. This finding was consistent with the results of a previous study . Such genetic diversity may be caused by an increased mutation rate in the mtDNA D-loop, the maternal effects of multiple wild ancestors, overlapping generations, the mixing of populations from different geographical locations, natural selection favoring heterozygosis or subdivision accompanied by genetic drift [42, 43].
4.2. Maternal origins of the Tibetan sheep populations
The sequence motifs from the 1180 bp to the 1183 bp region of the mtDNA D-loop form the basis for the four major maternal lineages in the Tibetan sheep mtDNA haplotypes. Maternal lineage D was the rarest. The Tibetan sheep haplotypes belonged to all four major maternal lineages, although only 0.16% belonged to maternal lineages D. This finding demonstrated that populations of Tibetan sheep possess abundant mtDNA diversity, and therefore, there is a widespread origin of their maternal lineages. Furthermore, the thoroughbred Tibetan sheep has been proposed to be shared in the maternal lineages A, and the contribution of Asian sheep breeds to this population has also been reported in Liu et al. . In this study, maternal lineages B and C were found in common among the overall sequences of all 15 Tibetan sheep populations, including the 14 Tibetan sheep populations respectively other than DM and AW. It is generally acknowledged that domestic sheep have two maternal lineages (A and B), based on the earlier mtDNA analysis [7, 13, 16, 47]. Recently, a new maternal lineage (C) was found in Chinese domestic sheep [40, 42, 43]. The ME phylogenetic tree median-joining analyses, revealed the presence of four maternal lineages in the Tibetan sheep populations. Of these groups, the maternal lineage A was predominant, and the maternal lineage B and C were the second most common. The proportion of maternal lineage D was 0.16%, further demonstrating that lineage D is the rarest among mtDNA sheep lineages . Our findings coincided with previous studies and supported the results on domestic sheep breeds in China [48, 49]. Three mtDNA of maternal lineages were identified in both China [39, 42, 43, 49] and other countries [50, 51, 52]. The four mtDNA maternal lineages found in the Tibetan sheep populations in the Qinghai-Tibetan plateau areas further supported the hypothesis of multiple maternal origins in Chinese domestic sheep.
4.3. Genetic differentiation of Tibetan sheep populations
In this study, the AMOVA analysis revealed the distinct population of Qinghai-Tibetan Plateau areas among other Tibetan sheep populations, with a significant positive variance in the meanwhile. Gene flow (
4.4. Genetic relationships among the Tibetan sheep populations
The study showed that the 15 Tibetan sheep populations from the Qinghai-Tibetan Plateau divide into four maternal lineages: 490 Tibetan sheep represent the maternal origin of the maternal lineage A, 64 Tibetan sheep represent the maternal origin of the maternal lineage B, 81 Tibetan sheep represent the maternal origin of the maternal lineage C, and 1 Tibetan sheep represents the maternal origin of the maternal lineage D. This genetic relationship displayed a high consistency with traditional classification schemes and the results of previous studies [40, 57, 58, 59, 60, 61]. Fifteen Tibetan sheep populations derived from four maternal lineages. On the one hand, Tibetan sheep show a great value as portable food and wool resource; on the other hand, the commercial trade and extensive transport of sheep, along with human migratory paths, might promote the observed genetic exchange. Other study methods such as genetic approaches, including the degree method and the phylogenetic relationship clustering method, also indicated that indigenous sheep were the maternal lineages A, B, C, and D [58, 60].
The study presents the Pearson correlation coefficient of genetic differentiation with the corresponding altitude difference between the 15 Tibetan sheep populations. The study showed that the correlation between altitude and genetic differentiation increases with altitude, specifically for 10 populations (GJ, GD, TJ, MX, QK, QL, QH, GN, DM and LKZ), with r ranging from 0.089 to 0.584, and decreases for the remaining five populations (GB, HB, JZ, LZ and AW), with r from −0.512 to −0.011. This evidence based on the large-scale mtDNA D-loop sequences analysis of 15 Tibetan sheep populations indicates effects associated to climate or isolation of phylogeography on genetic differentiation. These results indicate an adaptive effect associated with altitude or geography, which coincides with previous studies [29, 42, 43, 62].
4.5. Population expansion of Tibetan sheep populations
The mismatch distribution analysis of the complete dataset, maternal lineages A, B, C, D, and 15 Tibetan sheep populations of the mtDNA D-loop, is presented . Neutrality tests (Ewens-Watterson test, Chakraborty’s test, Tajima’s D test, Fu’s FS test) were used to detect population expansion . The complete dataset of all Tibetan sheep populations had a significantly large negative Tajima’s D value and
4.6. Phylogenetic analysis of the Tibetan sheep populations
Phylogenetic analyses of complete mitogenomes showed a high resolution among wild sheep as well as among the major lineages of domestic sheep . The complete mitogenomes of
Full control region from the complete mitogenomes produced similar phylogenies with fully resolved phylogenetic relationships of wild sheep, but they failed to define the phylogenetic relationships among the major lineages of domestic sheep. Our results suggest that partial fragments of the complete mitogenomes would be problematic when making phylogenetic inferences about domestic sheep. This problem arises due to diagnostic substitutions located elsewhere in the mitogenome . Thus, the diagnostic substitutions for species and lineages presented  here can serve as an important resource for maternal genetic differentiation between domestic and wild sheep as well as between the lineages within domestic sheep. Also, they might be helpful for addressing certain conflicts described above in future.
High mtDNA genetic diversity in the sheep from the Qinghai-Tibetan Plateau areas is a rich resource for China. The evidences indicate the high diversity of four maternal lineages by doing the large-scale mtDNA D-loop sequences analysis of 15 Tibetan sheep populations. Although the maternal lineage D was only found in a single LZ, phylogenetic analysis showed that four maternal lineages (A, B, C and D), previously defined, could be identified in the 636 tested individuals of the 15 Tibetan sheep populations. The estimation of demographic parameters from the mismatch analyses shows that maternal lineages A, B and C had at least one demographic expansion in the Tibetan sheep of the Qinghai-Tibetan Plateau areas.
The authors appreciated constructive comments from the editor Hervé Seligmann. This work was supported by the special fund from the Major International (Regional) Joint Research Project (NSFC-CGIAR 31461143020), and Gansu Provincial Agricultural biotechnology research and application projects (GNSW-2014-21, GNSW-2016-13), and the Central Level, Scientific Research Institutes for Basic R & D Special Fund Business (1610322016016).
Conflict of interest
This study did not involve endangered or protected Tibetan sheep populations. All experimental and sampling procedures were approved by the Institutional Animal Care and Use Committee, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Peoples Republic of China.
Conceived and designed the experiments: LJB DXZ ZYF GX SXP. Performed the experiments: LJB DXZ ZYF GX SXP. Analyzed the data: GX SXP YC. Wrote the paper: LJB DXZ ZYF GX YC.