Brine Shrimp Diversity in China Based on DNA Barcoding

Taxonomy, the science that deals with the study of identifying, grouping, and naming organisms according to their established natural relationship, is the basis of all biological studies. Biological and observation-based classification is still generally the best known form of taxonomy since 1735 when Carl Linnaeus published the great book Species Plantarum, and it is an empirical science mostly based on morphological difference. With the development of science and technology, scientists have discovered many methods to identification new species and other tools or definitions for species classification, such as biochemical identification (Farmer et al. 1985), cytotaxonomic identification (Le Berre et al. 1985), chromosomal DNA fingerprinting (Owen 1989), restriction fragment length polymorphism (RFLP) (Sakaoka et al. 1992), and PCR-based DNA fingerprints (Matsuki et al. 2003). Among others, molecular or genetic approaches to identify species have been proposed and extensively used (Yamamoto 1992; Zhou et al. 2003).


Introduction
Taxonomy, the science that deals with the study of identifying, grouping, and naming organisms according to their established natural relationship, is the basis of all biological studies.Biological and observation-based classification is still generally the best known form of taxonomy since 1735 when Carl Linnaeus published the great book -Species Plantarum, and it is an empirical science mostly based on morphological difference.With the development of science and technology, scientists have discovered many methods to identification new species and other tools or definitions for species classification, such as biochemical identification (Farmer et al. 1985), cytotaxonomic identification (Le Berre et al. 1985), chromosomal DNA fingerprinting (Owen 1989), restriction fragment length polymorphism (RFLP) (Sakaoka et al. 1992), and PCR-based DNA fingerprints (Matsuki et al. 2003).Among others, molecular or genetic approaches to identify species have been proposed and extensively used (Yamamoto 1992;Zhou et al. 2003).

DNA barcoding
The study of biodiversity lays the foundation for all biological studies, especially the classification of species, and the ways to do it have never stopped since Linnaeus.Traditional morphology-based taxonomy has its limitations, such as when facing mimetic polymorphism, and it mainly depends on the expertise of taxonomists, and there is little doubt that evidence at molecular levels should be complementary and of necessary.As the development of molecular biology, the idea of molecular taxonomy has been propounded and gradually accepted by related scientific communities.The standard molecular identification system was initiated during 1990s by using PCR-based and sequencing-based approaches (Frézal et al. 2008).Taken the advantage of the two powerful technologies in accuracy and convenience, DNA sequence signatures provide adequate "barcodes" for species identification, and "DNA barcoding" has been widely used in studies for speciation (Ghebremedhin et al. 2008;Sullivan et al. 1996), phylogenetics andevolution (Göker et al. 2009;Wood et al. 2000), and molecular ecology (Govan et al. 1996;Valentini et al. 2009) as well as for the classification of both pathogenic microbes (Beckmann 1999) and normal microbiomes (Holzapfel et al. 2001).DNA barcoding is an ultimate and direct approach for molecular taxonomy, depending on the complexity of sequence signatures used, especially in distinguishing species with nearly 1.1.2Examples of DNA barcoding applications DNA barcoding has been successfully used for the taxonomy of invertebrate and vertebrate animals as well as microbes, including bacteria (Siddall et al. 2009), fungi (Kelly et al. 2011;Stockinger et al. 2010), Protista (Chantangsi et al. 2007;Evans et al. 2007), and algae taxonomies (Saunders 2005).In the past three years, an increasing number of studies has been focused on DNA-barcoding of plants (He et al. 2010;Kress et al. 2007;Lahaye et al. 2008).Since there is not yet a universally accepted DNA barcode for plants, many strategies have been proposed, based either on a single chloroplast segment (Hollingsworth et al. 2009;Lahaye et al. 2008) or a combination of multiple segments (He et al. 2010;Kress & Erickson 2007)

Advantages and drawbacks of DNA barcoding
There are several obvious advantages in the currently used DNA barcoding system.First, it uses a standard procedure that can be applied universally to relevant research fields.It is of great utility in conservation biology and can also be applied to samples where traditional morphological methods are unable to define, including species identification based on eggs and larval (Wang et al. 2008) and analysis of stomach contents or excreta to determine food webs.Another advantage of DNA barcoding comes from the rapid and cost-efficient acquisition of molecular data, enabling large-scale species identification (Frézal & Leblois 2008), whereas conventional taxonomy is time consuming, and in some cases it is almost impossible to apply (Rusch et al. 2007).Therefore, it is important to be able to improve large surveys aiming at unknown species detection and identification of pathogenic species with medical, ecological, and agronomical significance (Ball et al. 2008;Barth et al. 2006).
Particularly, DNA barcoding becomes necessary when morphological traits do not adequately discriminate species (Caron et al. 2009;Guo et al. 2010;Kauffman et al. 2003;Kumar et al. 2006) or if species have polymorphic life cycles and/or exhibit pronounced phenotypic plasticity (Pegg et al. 2006;Randrianiaina et al. 2007).
However, controversies about DNA barcoding still remain.Although DNA barcoding was proposed initially as a method for species identification, to better achieve this goal, it needs be validated intensively, especially in choosing the best candidate sequences that are both universal and highly variable among species.The first question is: what are these sequences: nuclear, mitochondrial, or chloroplast?An idea to use a simple sequence from mtDNA has been dismissed.It is not adequate to be used as a sole source for species-definition due to following genetic factors: reduced effective population size and introgression, maternal inheritance, recombination, inconsistent mutation rate, heteroplasmy, and compounding evolutionary processes (Meier et al. 2006;Rubinoff et al. 2006).Until now, there has not been an universal DNA barcode for all organisms and we have not found a single gene that is conserved enough and also exhibits appropriate divergence for all species regardless where they come from (Hickerson et al. 2006;Rubinoff et al. 2006;Song et al. 2008).The validity of DNA barcoding therefore lies on establishing reference sequences from taxonomically confirmed specimens, which will acquire an integration of morphological and molecular based taxonomy data, as well as decent cooperation among sample collection, such as museums, zoos, and research institutes (De Hoog et al. 2008).This approach is closest to what has been termed "integrative taxonomy" (Dayrat 2005;Will et al. 2005).DNA sequences in combination with traditional character sets are used in a complementary fashion to define and describe species (Heethoff et al. 2011;Padial et al. 2010;Pereira et al. 2010).

Recent progresses in DNA barcoding
Recently, the approach of DNA barcoding has been greatly revived to increase accuracy and sensitivity, and the major improvements are focused on using more than one barcoding strategies for a better identification of specific species (Aliabadian et al. 2009;Ferri et al. 2009;Lin et al. 2009;Nassonova et al. 2010).Shatters et al improved DNA barcoding by using different regions of COI gene to do biotype-specific barcoding (Shatters et al. 2009).As the sequencing technology developed rapidly in the past few years, sequence-based DNA barcoding also advanced rapidly, such as cap analysis of gene expression ( CAGE) using an ultra high-throughput sequencer (Maeda et al. 2008), to show biodiversity (Creer 2010;Fonseca et al. 2010;Mitsui et al. 2010), and the ArkChip strategy for highly-resolved patterns of intraspecific evolution and a multi-species (Carr et al. 2008).Several new techniques have been implemented, and all based on the sequencing of individual DNA molecules (with or without an amplification step) in massive and parallel ways (Table 2, Figure 1).The high accuracy, throughput, and efficiency make the identification of genome sequences unique to different species and life forms easy.
The processes that apply next-generation sequencers to DNA barcoding are expected to be more complex than what has been anticipated.For instance, the classical DNA barcode is defined to be a fragment around 650bp but the effective read lengths of the next-G sequencers are actually shorter than it at present time.Progress has been made in recent studies, where smaller DNA fragments, called mini-barcode, of COI gene or rDNA were used for accurate species identification (Hajibabaei et al. 2006;Pawlowski & Lecroq 2010).
Researches show that more than 90% and 95% success rates were achieved by using 100-bp and 250-bp barcodes, respectively (Meusnier et al. 2008).Although biodiversity studies based on next-G sequencing technologies were emerged in 2006, (Ley et al. 2006;Sogin et al. 2006), most of the studies have been done with the Roche/454 system ( Hajibabaei et al. 2011;Meyer et al. 2007;Porazinska et al. 2009) and mainly for environmental samples (Deagle et al. 2010;Fire et al. 2007;Hajibabaei et al. 2011).More recently, the upgrading speed of different sequencing platforms, such as those of Illumina and Life Technologies, has been very impressive and the read lengths of these new versions of sequencers are getting longer (Table 2).They may also one day be used for biodiversity study when their read length is increased to ~100bp and more.

A case study on Artemia (Crustacea, Anostraca) in China
Brine shrimp or Artemia (Crustacea, Anostraca) is a worldwide living species well-adapted to survive in very harsh hypersaline environments, such as salty lakes and lagoons (Clegg et al. 2009), it typically shows enormous diversity at the genus level in terms of their ability to survive under different ionic compositions, climatic conditions, and altitudes.In this case study, Artemia species are served as ideal model organisms for biodiversity study in inland hypersaline lakes (Camargo et al. 2005;Castro et al. 2006;Hand SC 1982;Maniatsi et al. 2009).In addition, the morphological variations displayed among Artemia populations also provide excellent materials for studying adaptive genetic polymorphisms at molecular levels.During the past two decades phylogenetic relationships among Artemia species have been established by combined studies based on cross-breeding, morphological differentiation, cytogenetics, nuclear (including allozymes and other nuclear DNA sequences) (Badaracco G 1995;Baxevanis et al. 2006;Sun Y 2000) and mitochondrial (mtDNA) DNA markers (Badaracco G 1995).Seven sexual species have been described thus far, as well as numerous parthenogenetic populations.

Biodiversity of Artemia populations in China
The phylogeny of various Artemia samples from different habitats around the world was reported previously, and our focus now is on the biodiversity of Artemia species in China, especially that of the Tibetan Plateau.All strains used in this study are also kept as cysts at the Laboratory of Aquaculture & Artemia Reference Center (ARC) with ARC code numbers (Wang et al. 2008) Table 3. List of Artemia species in China and their locality and ARC codes.

Phylogenetic analysis of Artemia species in China based on COI gene barcoding
A 648-bp segment of the mitochondrial COI gene was selected as the standard barcode to establish phylogenetic relationships among Artemia species from major habitats, including species from the Tibetan Plateau (Figure 3, Wang et al. 2008).We built a phylogenetic tree based on COI gene, which separates the populations into five stable clades.Investigating the amino acid variations, we found two consistent amino acid changes in COI between high and low altitude species we collected in China: 153A/V and 183L/F.These sequence alterations may provide clues for further functional studies such as to determine if the adaptation to high altitude had resulted in the fixation of such mutations.We also used Ka/Ks calculator to estimate Ka/Ks (Zhang et al. 2006) with the aim to reveal sequence signatures of natural selection in COI gene.When using A. franciscana as a reference, A. tibetiana has significantly higher Ka/Ks ratios, which imply relatively stronger selective pressure on this species.Two variations that alter amino-acid sequences between the high and low altitude populations shared by the high altitude group were also detected.The sequence from sample 1612 has the highest Ka/Ks ratio, and its mutation spectra suggests a relatively stronger selection posed on this population and its synonymous mutations provide clues that the population is rapidly diverging, which is most likely due to environmental changes during last three million years rather than genetic drift.We further obtained high-quality sequences from individual adults of the six Tibetan populations and calculated the Kimura-2-Parameter distances (Table 4).For phylogentic tree construction, we used the consensus sequences when sequence heterogeneities are encountered among a minor set of samples.(ARC 1188), where it has an altitude of ~1000m above the sea level and a climate of dry, windy, and sandy.We also have one ecotype of A.franciscana is collected from Huangnigou, Shangdong in China (ARC 1590).The length variations are mainly found in the non-coding region (known as the D-loop region).All Artemia mitochondrial genomes encode 37 genes including 2 rRNAs, 22 tRNAs, and 13 polypeptides that are subunits of the respiratory chain complexes residing on the inner mitochondrial membrane.
Comparative analysis of mitochondrial DNA (mtDNA) of these Artemia species shows that the nucleotide variation ratio is higher between A. tibetiana and A. franciscana and much lower between A. tibetiana and A. urmiana or A.sinica.Among the 13 protein-coding genes, ND gene family has more nucleotide variations than other genes.ND6 varies the most both between A. tibetiana and A. franciscana (T-F) and between A. tibetiana and A. urmiana (T-U), and the same situation is observed between A. tibetiana and A.sinica (T-S).When analyzing the amino-acid changes, ATP8 gene has higher variation rates, second only to the ND gene family.In addition, COI is the most conservative protein in amino-acid sequence among the 13 polypeptides.The complexes IV and V contain more variations than other complexes.With Ka/Ks Calculator, ATP8 has a high Ka/Ks ratio, just lower than that of ND4 when A. tibetiana and A. urmiana are separated from A. franciscana, while ATP6 possesses higher evolutionary rate between A. tibetiana and A. Urmiana (data not shown)

Conclusion
Consequently, our results on DNA barcoding and comparative analysis reveal the current distribution of Artemia species in China and phylogenetic relationship among them, providing insights into the adaptive evolution of DNA sequences of Artemia.Based on phylogenetic and divergence analyses of the selected samples from different regions of the world, it is possible that the high altitude group of Artemia are descendents of a local ancestral species in the Himalayas which diverged genetically as the Tibetan Plateau arose stepwise over approximately the last three million years (Tapponnier et al. 2001).
The comparative studies among different Artemia species reveal complex sequence diversities that are expected to have functional relevance, such as energy metabolism and environmental adaptation.The highest number of adaptive variations in ATP8 implies that it is under selective pressure during long-term geographical isolation when A. tibetiana separated from their common ancestor together with the rise of Himalaya Mountains.It was reported that the ATP8 gene encodes a core subunit of F0 in ATPase that synthesizes ATP based on a protongradient that results from H+ pumping into the intermembrane space (da Fonseca, Johnson et al. 2008).It was also suggested that ATP8 may play regulatory roles in ATP synthesis among different species since it has highly variable sites in the protein-coding sequence (da Fonseca, Johnson et al. 2008).Moreover, the Ka/Ks ratio in ATP6 is also relatively high when we compared the 13 protein-coding mitochondrial genes of A. tibetiana to those of A. urmiana and A.sinica.It is known that ATP6 plays an important role in the assembly of F0 (Hadikusumo, Meltzer et al. 1988) and the highly variable sites are found in the predicted loop regions where the sequences are less selected in terms of its overall function.The high variation rates found among the ATPase subunits imply a strong selective pressure on the Artemia energy metabolism system from the high plateau environment.

Acknowledgment
The
Five species are found in Eurasia: A. salina (Mediterranean area), A. urmiana (Iran), A. tibetiana (Tibet), A. sinica(van Wely et al.), and A. spp (Old World).The New World species are A. franciscana and A. Persimilis; the former are widely distributed in most part of America, while A. persimilis is restricted to certain locations in Chile and Argentina(Clegg et al. 2009).A.franciscana, A.tibetiana, and A.sinica are the main Artemia species that inhabit in China (Figure.2).A.tibetiana dwells in the Tibetan Plateau, with the altitude of ~ 4,500m above the sea level.Living under the harsh condition of hypoxia, low temperature, high solar radiation, and lack of biological production, it requires a modified and adapted energy metabolism for survival.In 1980s, a large quantity of A.franciscana was released in the most part of salt field in the Bohai Bay.As a dominant species, A.franciscana replaced the local species, A.sinica, rapidly and has become the primary species in the Bohai Bay since.As a result, A.sinica is almost disappeared completely in sea shores of Eastern China.

Fig. 2 .
Fig. 2. The distribution of Artemia in the world.

Fig. 3 .
Fig. 3.A phylogenetic tree based on neighbor-joining method.The tree is constructed based on a sequence fragment of COI gene (Adapted from Wang et al.).
. Examples of DNA barcoding studies are summarized in Table1including DNA barcodes for animals, plants, fungi, and protists.As mentioned previously, there are advantages and limitations among the barcodes with respect to specific applications.

Table 2 .
Comparison of next-generation sequencing platforms.

Sequencing and comparative analysis of Artemia mitochondrial genomes
Based on the obvious divergence of COI gene, we speculated that environmental selection may bring more variations to other mitochondrial encoding genes involved in energy metabolism during the long-term selection that may affect structures and activities of the ATPase subunits or even other components of the mitochondrial respiratory chain complexes.Therefore, we decided to take Artemia species in Asia as our model and acquired whole mitochondrial genome sequences of Artemia tibetiana collected from the Tibetan Plateau and carried out comparative analysis involving other lower altitude Artemia species, A. franciscana, A. urmiana, and A.sinica, and aim to observe specific characteristics of the mt genome sequences of A. tibetiana.We indeed acquired and annotated five mitochondrial genomes, including two ecotypes of A. tibetiana, one each from A. urmiana, A.franciscana, and A.sinica.The A. tibetiana samples were collected from Nima (ARC 1609) and Yangnapeng Counties (ARC 1610) of the Tibetan Plateau with the altitude higher than 4,000m.A. urmiana, which had a very close phylogenetic relationship with A. tibetiana based on previous DNA barcoding study, were collected from Urmia Lake of Iran (ARC 1227) at an altitude of 1275m above the sea level.A.sinica, another native species in China which is collected from Yimeng of Inner Mongolia work received financial support by the National Natural Science Foundation of China (30221004) awarded to Weiwei Wang, grant (KSCX2-SW-331) from the Chinese Academy of Sciences awarded to Jun Yu, and the National Basic Research Program (973 Program) from the Ministry of Science and Technology of the People's Republic of China (2011CB944100) awarded to Jun Yu.