An estimated number of GRSPs in different species and the number of corresponding orthologs in
Abstract
A genome‐wide survey across 10 species from algae Guillardia theta to mammals revealed that Caenorhabditis elegans and Caenorhabditis briggsae acquired a large number of glycine‐rich secreted peptides (GRSPs, 110 GRSPs in C. elegans and 93 in C. briggsae) during evolution in this study. Chromosomal mapping indicated that most GRSPs were clustered on their genomes [103 (93.64%) in C. elegans and 82 (88.17%) in C. briggsae]. Totally, there are 18 GRSPs cluster units in C. elegans and 13 in C. briggsae. Except for four C. elegans where GRSP clusters lacking matching clusters in C. briggsae, all other GRSP clusters had its corresponding orthologous clusters between the two nematodes. Using eight transcriptomic datasets of Affmyetrix microarray, genome‐wide association studies identified many co‐expressed GRSPs clusters after C. elegans infections. Highly homologous coding sequences and conserved exon‐intron organizations indicated that GRSP tight clusters might have originated from local DNA duplications. The conserved synteny blocks of GRSP clusters between their genomes, the co‐expressed GRSPs clusters after C. elegans infections, and a strong purifying selection of protein‐coding sequences suggested evolutionary constraint acting on C. elegans to ensure that C. elegans could rapidly launch and fulfill systematic responses against infections by co‐expression, co‐regulation, and co‐functionality of GRSP clusters.
Keywords
- glycine‐rich secreted peptide
- synteny block
- co‐expressed gene cluster
- nematode infection
1. Introduction
According to the primary structure, glycine‐rich proteins can be classified into two classes: (1) consisting of large glycine‐rich proteins (GRPs >200 AA) with a length of over 200 amino acids that typically function as cell wall structural components and (2) composed of small glycine‐rich secreted peptides (GRSPs, <200 AA) that have a typical signal peptide followed by a mature peptide with a high glycine content. GRSPs represent a class of unique effectors of multicellular organisms, possessing relatively simple structures but exhibiting complex biological functions. According to previous research, almost all animals, plants, and microorganisms are enriched with GRPs, such as glycine‐rich cold‐induced proteins from zebrafish [1], glycine‐rich keratin and keratin‐associated proteins from 22 mammal genomes [2] and RNA‐binding proteins with C‐terminal glycine‐rich domain from
To our knowledge, total GRSPs encoded by genomes of different species are significantly distinct. GRSPs are enriched in some species, whereas in other species, no GRSPs have been identified.
The importance of GRSP family in nematodes is further stressed by the fact that expression of certain GRSPs of
2. Materials and methods
2.1. Identification of GRSPs in the two nematode genomes
Comprehensive comparison of GRSPs was conducted across 10 species of genomes:
2.2. C. elegans GRSPs expression at transcriptional level
Gene expression omnibus (GEO) data sets in NCBI (http://www.ncbi.nlm.nih.gov/) and the reads of RNA sequencing project (PRJNA33023) in DRASearch (https://trace.ddbj.nig.ac.jp/DRASearch/) were used to confirm the transcriptional expression of
2.3. Mapping GRSPs to the genomes of the two nematodes
Characteristic parameters of GRSPs were obtained from WormBase (https://www.wormbase.org/). Configuration files were generated, and mapping of GRSPs to the genomes was performed by Circos [9]. Spacing was based on chromosomal units and the results were further manually modified for easier identification. Orthologous pairs were determined by the twoway reciprocal “best hits” and combining sequence similarity‐ and synteny‐based approaches. Orthologous GRSPs pairs were mapped to their genomes and connected across their chromosomal maps by straight line to identify conserved orthologous synteny blocks of the two nematode genomes.
2.4. Transcriptomic analysis of C. elegans GRSPs following infection
Eight transcriptomic data sets related to
2.5. Phylogenetic and evolutionary analysis
With the signal peptide sequences of the two nematode GRSPs, a phylogenetic tree was built to detect how the nematode GRSPs families had evolved by gene duplication by using the program Molecular Evolutionary Genetics Analysis package version 6 (MEGA 6) [10]. The bootstrap consensus tree inferred from 500 replicates was taken to represent the evolutionary history to assess the reliability of the phylogenetic tree using the neighbor‐joining (NJ) method under p distance [11]. All sites bearing alignment gaps and missing information were retained initially, excluding them as necessary using the pairwise deletion option.
2.6. Analysis of the nucleotide sequences
Using MEGA 6, we estimated transition (Ti)/transversion (Tv) ratios (R) among nucleotides, the number of synonymous (dS) and nonsynonymous (dN) substitutions per site, and the codon‐based Z‐test for purifying selection. The program was operated under the model of the modified Nei‐Gojobori (assumed Ti/Tv bias = 2,2) methods to calculate the difference of dN‐dS, and the values were estimated by standard errors (SE) by the bootstrap methods (800 replicates; seed = 17,114) (for details, please refer to supplementary materials and methods in [12]).
3. Results
3.1. Genome‐wide analysis of GRSPs across 10 species
The number of GRSPs in each genome of the 10 species was 4 for human, 6 for zebrafish, 53 for fruit fly, 110 for
Species name | Genome size (Mb) | Ref seq protein | Reference Bioproject | GRSPs | Orthologs of |
---|---|---|---|---|---|
3200 | 55968 | PRJNA168 | 4 | 0 | |
1371 | 47861 | PRJNA13922 | 6 | 2 | |
144 | 30275 | PRJNA164 | 53 | 8 | |
100 | 26047 | PRJNA158 | 110 | 110 | |
104 | 17682 | PRJNA20855 | 93 | 92 | |
120 | 35378 | PRJNA116 | 52 | 3 | |
42 | 9203 | PRJNA28133 | 0 | 0 | |
12 | 5907 | PRJNA128 | 0 | 0 | |
34 | 13315 | PRJNA13925 | 5 | 2 | |
0.67 | 632 | PRJNA210 | 0 | 0 |
Table 1.
3.2. Identification and classification of the two nematode GRSPs
Based on sequence similarity and the conservation of intron position and phase, 203 GRSPs of the two nematodes were classified into 17 subfamilies (for details, please refer to Figure S1 and S2 in [12]). GRSPs mature peptides are enriched for glycine with content ranging from 17 to 74% (For details, please refer to Table S3 in [12]). 62 GRSPs (30.54%) with glycine content from 30 to 40% are the most abundant (Figure 1). Among 110

Figure 1.
Statistic description of
3.3. The evidence of transcriptional expression of C. elegans GRSPs
Highly homologous GRSPs are usually clustered together on the two nematode genomes. This is exemplified by GRSPs from
3.4. The clustered distribution of GRSPs on the two nematode genomes
GRSPs distribution on their genomes was marked by following qualities (Figure 2 and Table 2): first, most of the GRSPs were clustered on their genomes. The criteria for the definition of GRSPs clusters are (1) the scale between closely adjacent GRSPs should be less than 1 Mb, (2) the number of GRSPs members are equal to or above 3, and (3) the scale of GRSPs clusters is less than 3 Mb. The number of GRSPs clustered on their genomes was 103 for

Figure 2.
Mapping of GRSPs to genomes of the two nematodes is shown.

Table 2.
Summary of GRSPs clusters on the chromosomes of the two nematodes.
Third, GRSPs clusters were maintained in relative conserved synteny blocks on the chromosomes of the two nematodes (Figure 2 and Table 2). With the exception of four GRSPs clusters without the matching synteny clusters on
In addition, the order of the orthologous synteny blocks of GRSPs clusters on chromosome V was more conserved than that on other chromosomes of the two nematodes. Orthologous pairs of GRSPs between the two nematodes were linked by straight lines on their genome mapping, which showed that the beelines of the orthologous GRSPs clusters on chromosomes V were more likely to be crossovers than those on other chromosomes (Figure 2). The crossover means that the order of orthologous synteny blocks of GRSPs clusters was maintained on the genomes of the two nematodes.
3.5. The transcriptional co‐expression of C. elegans GRSPs clusters after infection
Genome‐wide transcriptional analysis showed that many

Figure 3.
Phylogenetic analysis based on the typical signal peptides of GRSPs in

Table 3.
Differential expression of GRSPs and co‐expression of GRSPs clusters after
3.6. The evolution of GRSPs multigene families by gene duplications
GRSPs subfamilies were classified based on the precursor sequences similarity and gene structure conservation. Phylogenetic analysis was performed using the signal peptide sequences. It is possible that the similarity between the two group sequences is not perfectly consistent among these GRSPs, which resulted in the observations that certain members within the same subfamilies were located in a different clade in the phylogenetic tree (Figure 3). Orthologous GRSPs of the two nematodes detected in the above could be well defined by phylogenetic analysis. Certain members of subfamilies (such as the members of subfamily I) were clustered together on their chromosomes and also the same clade on the phylogenetic tree (Figure 3). Five GRSPs from
3.7. Purifying selection of the two nematode GRSPs
Under the model of codon‐based Z‐test, the estimate of purifying selection was conducted directly to analyze sequence pairs and overall average. Its values are identically equal to zero and therefore rejected the null hypothesis of strict neutrality (dS = dN) and accepted the alternative hypothesis. The difference in average overall of dN‐dS was less than zero. The standard error values were less than 0.05. Synonymous substitutions were clearly prevailing on protein‐coding sequences of the nematode GRSPs, which indicated the occurrences of purifying selection. With an average ratio of R (Ti/Tv) > 1, the patterns of nucleotide substitution also showed a predominance of transitions over transversions (Table 4).
Subfamily | dN‐dS | SE | Probability | R(Ti/Tv) |
---|---|---|---|---|
I | −5.323 | 0.073 | 0.000 | 1.81 |
II | −2.228 | 0.038 | 0.028 | 1.32 |
III | −3.626 | 0.087 | 0.011 | 1.21 |
IV | −3.321 | 0.035 | 0.000 | 5.54 |
V | −4.510 | 0.042 | 0.011 | 1.52 |
VI | −5.326 | 0.036 | 0.000 | 1.26 |
VII | −3.692 | 0.028 | 0.000 | 3.32 |
VIII | −2.649 | 0.053 | 0.022 | 1.78 |
IX | −3.451 | 0.038 | 0.000 | 1.67 |
X | −2.942 | 0.046 | 0.000 | 2.15 |
XI | −3.153 | 0.061 | 0.031 | 1.93 |
XII | −4.324 | 0.049 | 0.000 | 4.34 |
XIII | −3.256 | 0.027 | 0.000 | 1.52 |
XIV | −2.968 | 0.039 | 0.021 | 2.86 |
Table 4.
Estimates of overall average variance and pattern of nucleotide substitution.
4. Discussion
Soil organisms (
The conservation of precursor organizations, the unaltered position and phase of intron, together with the homologous sequence of DNA, suggested that the GRSPs clusters in the two nematodes might come from physically local DNA reproductions. The duplication of local genes came into being by gene clusters of paralogous genes whose products have similar functions. Paralogous genes with similar functions and expression patterns are frequent in
With similar variance of (dn‐dS), the two nematode GRSPs might have experienced similar selective stress during evolution, which is in concordance with the neutral mutation‐random drift theory of molecular evolution. Relative conserved synteny blocks of the GRSPs orthologous clusters suggested that these GRSPs were subjected to functional restraint. With the increasing species complexity, the genome size and the members of a gene family usually undergo an evolutionary expansion in abundance for similar essential basic cellular mechanisms shared by eukaryotes [20]. The basic physiological process for
This study built a full set of GRSPs from the algae
Acknowledgments
This work was supported by grants from the National Nature Science Foundation of China (31160233), the Science and Technology Foundation of Jiangxi Province (20142BAB204013).
References
- 1.
Tang SJ, Sun KH, Sun GH, Lin G, Lin WW, Chuang MJ. Cold‐induced ependymin expression in zebrafish and carp brain: implications for cold acclimation. FEBS Letters. 1999; 459 :95-99. DOI: 10.1016/S0014‐5793(99)01229‐6 - 2.
Khan I, Maldonado E, Vasconcelos V, O&Brien SJ, Johnson WE, Antunes A. Mammalian keratin associated proteins (KRTAPs) subgenomes: Disentangling hair diversity and adaptation to terrestrial and aquatic environments. BMC Genomics. 2014; 15 :779. DOI: 10.1186/1471‐2164‐15‐779 - 3.
Ciuzan O, Hancock J, Pamfil D, Wilson I, Ladomery M. The evolutionarily conserved multifunctional glycine‐rich RNA binding proteins play key roles in development and stress adaptation. Physiologia Plantarum. 2015; 153 :1-11. doi: 10.1111/ppl.12286 - 4.
Mangeon A, Jungueira RM, Sachetto‐Martins G. Functional diversity of the plant glycine‐rich proteins superfamily. Plant Signaling & Behaviour. 2010; 5 :99-104 - 5.
Mallo GV, Kurz C, Couillault C, Pujol N, Granjeaud S, Kohara Y, Ewbank JJ. Inducible antibacterial defense system in C. elegans . Current Biology. 2002;12 :1209-1214. DOI: 10.1016/S0960‐9822(02)00928‐4 - 6.
Couillault C, Pujol N, Reboul J, Sabatier L, Guichou JF, Kohara Y, Ewbank JJ. TLR‐independent control of innate immunity in Caenorhabditis elegans by the TIR domain adaptor protein TIR‐1, an ortholog of human SARM. Nature Immunology. 2004;5 :488-494. DOI: 10.1038/ni1060 - 7.
Pujol N, Zugasti O, Wong D, Couillault C, Kurz CL, Schulenburg H, Ewbank JJ. Anti‐fungal innate immunity in C. elegans is enhanced by evolutionary diversification of antimicrobial peptides. PLoS Pathogens. 2008;4 :e1000105. DOI: 10.1371/journal.ppat.1000105 - 8.
O&Rourke D, Baban D, Demidova M, Mott R, Hodgkin J. Genomic clusters, putative pathogen recognition molecules, and antimicrobial genes are induced by infection of C. elegans withM. nematophilum . Genome Research. 2006;16 :1005-1016. DOI: 10.1101/gr.50823006 - 9.
Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA. Circos: An information aesthetic for comparative genomics. Genome Research. 2009; 19 :1639-1645. doi: 10.1101/gr.092759.109 - 10.
Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: Molecular evolutionary genetics analysis version 6.0. Molecular Biology and Evolution. 2013; 30 :2725-2729. DOI: 10.1093/molbev/mst197 - 11.
Pearson WR, Robins G, Zhang T. Generalized neighbor‐joining: More reliable phylogenetic tree reconstruction. Molecular Biology and Evolution. 1999; 16 :806-816. DOI: 10.1007/978‐1‐62703‐646‐7_5 - 12.
Ying M, Qiao Y, Yu L. Evolutionary expansion of nematode‐specific glycine‐rich secreted peptides. Gene. 2016; 587 :76-82. DOI: 10.1016/j.gene.2016.04.049 - 13.
Bond MR, Ghosh S, Wang P, Hanover JA. Conserved nutrient sensor O‐GlcNAc transferase is integral to C. elegans pathogen‐specific immunity. PLoS One. 2014;9 :e113231. DOI: 10.1371/journal.pone.0113231 - 14.
Head B, Aballay A. Recovery from an acute infection in C. elegans requires the GATA transcription factor ELT‐2. PLoS Genetics. 2014;10 :e1004609. DOI: 10.1371/journal.pgen.1004609 - 15.
Pukkila‐Worley R, Rhonda FR, Kirienko NV, Larkins‐Ford J, Conery AL, Ausubel FM. Stimulation of host immune defenses by a small molecule protects C. elegans from bacterial infection. PLoS Genetics. 2012;8 :e1002733. DOI: 10.1371/journal.pgen.1002733 - 16.
Sun J, Singh V, Kajino‐Sakamoto R, Aballay A. Neuronal GPCR controls innate immunity by regulating noncanonical unfolded protein response genes. Science. 2011; 332 :729-732. DOI: 10.1126/science.1203411 - 17.
Bargmann CI. Chemosensation in C. elegans . WormBook. 2006;25 :1-29. DOI: 10.1895/wormbook.1.123.1 - 18.
C. elegans Sequencing Consortium. Genome sequence of the nematodeC. elegans : a platform for investigating biology. Science. 1998;282 :2012-2018. DOI: 10.1126/science.282.5396.2012 - 19.
Sugimoto A. High‐throughput RNAi in Caenorhabditis elegans : genome‐wide screens and functional genomics. Differentiation. 2004;72 :81-91. DOI: 10.1111/j.1432‐0436.2004.07202004.x - 20.
Ying M, Huang X, Zhao H, Wu Y, Wan F, Huang C, Jie K. Comprehensively surveying structure and function of RING domains from Drosophila melanogaster . PLoS One. 2011;6 :e23863. DOI: 10.1371/journal.pone.0023863