TAR stop codons in the six frames of mitogenome-encoded genes of Aleurodicus dugesii and Aleurodicus dispersus.
Mitochondrial genetic codes evolve as side effects of stop codon ambiguity: suppressor tRNAs with anticodons translating stops transform genetic codes to stopless genetic codes. This produces peptides from frames other than regular ORFs, potentially increasing protein numbers coded by single sequences. Previous descriptions of marine turtle Olive Ridley mitogenomes imply directed stop-depletion of noncoding +1 gene frames, stop-creation recodes regular ORFs to stopless genetic codes. In this analysis, directed stop codon depletion in usually noncoding gene frames of the spiraling whitefly Aleurodicus dispersusʼ mitogenome produces new ORFs, introduces stops in regular ORFs, and apparently increases coding redundancy between different gene frames. Directed stop codon mutations switch between peptides coded by regular and stopless genetic codes. This process seems opposite to directed stop creation in HIV ORFs within genomes of immunized elite HIV controllers. Unknown DNA replication/edition mechanisms probably direct stop creation/depletion beyond natural selection on stops. Switches between genetic codes regulate translation of different gene frames.
- codon-amino acid reassignment
- Lepidochelys olivacea
- ribosomal RNA
- directional evolution
Mitochondrial genetic code diversification frequently reassigns stop codons to amino acids . This situation probably reflects ambiguity in roles of stop codons during translation in mitochondria. Indeed, phylogenetic reconstructions of the evolution of genetic codes based on differences between codons in amino acid and stop codon assignments resemble known phylogenies of organisms using these genetic codes, if ambiguity of stop codons in mitochondrial (but not nuclear) genetic codes is accounted .
These observations suggest translational activity of mitochondrial tRNAs with anticodons matching stop codons  templated by antisense strands of regular mitochondrial tRNAs [4, 5]. Predicted occurrences of mitochondrial stop suppressor (antiterminator) tRNAs coevolve with stop codon usages in predicted off-frame protein coding genes in all analyzed mitochondria (primates and Drosophila [6, 7], including their ribosomal RNAs , turtles  and Chaetognaths ), and in a peptide detected by specific monoclonal immunolocalization in human mitochondria . Hence, there are probably more mitochondrion-encoded genes than usually accepted [12, 13]. Mitogenome size reduction probably causes gene multifunctionality, including mt tDNAs functioning as replication origins [14, 15, 16, 17, 18, 19, 20, 21].
Translation of stop codons occurs in different organisms [22, 23], but seems particularly widespread in mitochondria [2, 3] as reflected by the evolution of mitochondrial genetic codes . Nuclear genetic codes lacking dedicated stop codons have been described in protists [24, 25, 26, 27] and fungi .
1.1. Alternative coding by expanded codons
Switching between regular and stopless genetic codes is not the only strategy increasing dramatically information encoded by genomes. Isolated tetracodons, codons expanded by a fourth silent nucleotide, are known since the dawn of molecular biology . These are sometimes translated by tRNAs with expanded anticodons [30, 31]. It seems probable that systematic frameshifts produce stretches of tetracodons that code for (yet) undetected peptides ([32, 33, 34]). Some evidence suggests that tetracoding occurs especially at high environmental temperatures  and is predicted by genetic code optimization : codon-anticodon interactions are more stable when four rather than only three base pairs hybridize. Other theoretical considerations suggest that the mitochondrial vertebrate genetic code evolved from a specific subset of 64 tetracodons, the tesserae, chosen on the basis of symmetry principles .
Indeed, analyses of mitochondrial mass spectra searching for peptides matching translations assuming tetra- and pentacodons, codons expanded by one or two silent nucleotides, detected numerous tetra- and pentacoded peptides [38, 39, 40, 41, 42, 43].
1.2. Alternative coding by swinger polymerization
Further little known mechanisms increase numbers of proteins potentially coded by single sequences. Polymerization occasionally exchanges systematically between nucleotides during DNA replication [44, 45, 46] or RNA transcription [47, 48, 49, 50, 51, 52, 53] for long sequence stretches (23 exchange rules are possible, nine symmetric, e.g., A<>C, and fourteen asymmetric, e.g., A > C > G > A), producing swinger sequences. Swinger replication, in particular the double symmetric exchange A<>T + C<>G, seems most frequent for mitochondrial ribosomal RNAs . This increases the coding potential of rRNAs, strengthening the hypothesis that rRNAs are modern remnants of protogenomes that templated for translational molecules (tRNA-like and rRNA-like) and protein coding genes [54, 55, 56, 57, 58] by dense overlap coding. This is compatible with the occurrence of protein coding regions within modern rRNAs [8, 59, 60, 61].
I stress here that the exchange A<>T + C<>G is not trivial: this creates the complement of the template sequence, which is not the regular inverse (or reverse) complement. “Complement” is frequently used as shortcut for inverse complement, but the A<>T + C<>G transformed sequence is a different sequence because it lacks the 3′-to-5′ inversion combined with nucleotide complementarity. The shortcut has been used because 3′-to-5′ inverted sequences had not been described previous to the description of A<>T + C<>G transformed sequences.
It seems regular transcription occasionally switches abruptly to swinger transcription (and vice versa), as indicated by chimeric RNAs. These RNAs correspond in part to regular DNA, and an adjacent part corresponds to DNA only if accounting for swinger transformation . Corresponding chimeric peptides have also been detected . Chimeric DNA also exists: the mitogenome of the stonefly Kamimuria wangi (NC_024033) is regular, beside its 16S rRNA, which is entirely swinger transformed along transformation A<>T + C<>G .
1.3. Regulation of alternative transcriptions by secondary structure
Secondary structures are important components of RNA function and evolution [64, 65]. Secondary structures formed by tRNA sequences punctuate posttranscriptional processing of mitochondrial transcripts [66, 67, 68]. Palindromes potentially forming secondary structures after sequence swinger-tranformation associate with detected mitochondrial swinger RNAs . This is similar to what is known from regular RNA processing in mitochondria and, surprisingly, giant viruses  which bear also other striking resemblances with mitogenomes, including similar gene order . Transcription sometimes deletes systematically mono- or dinucleotides after transcribing trinucleotides (del-transcription), translated into peptides that in part converge with peptides translated from regular RNAs by expanded anticodons . Del-transcription, or at least detection of delRNAs, seems downregulated by secondary structures formed after transformation of the sequence by systematic deletions .
1.4. Mechanisms that switch between genetic codes?
Phenomena systematically exchanging nucleotides remind more specific mechanisms that alter the genetic code according to which a protein is coded, from a regular genetic code to a stopless genetic code, and/or vice versa. Indeed, previous analyses of mitogenomes revealed that in one GenBank mitogenome (from a marine turtle, the Olive Ridley, Lepidochelys olivacea), following GenBank’s annotations, several regular protein coding genes do not code for the regular proteins essential for mitochondrial metabolism and usually encoded at these genomic locations. These essential proteins are indeed coded by the corresponding sequences, but only after a frameshift, and only if stops in that frame are translated, explaining the erroneous annotation of stopless ORFs (abbreviation for open reading frames) that do not code for recognized mitochondrial proteins .
This was originally interpreted as resulting from directed selection on stop codons. Observations of systematic mutations in contexts creating stop codons in ORFs of HIV genes specifically in elite controllers immune to HIV [73, 74] suggests that enzymatically directed mutagenesis during DNA replication and/or edition could transform ORFs coded according to a regular genetic code into one coded by a stopless genetic code, and vice versa, as observed in several mitochondrial genes of the Olive Ridley. For HIV, introducing stops presumably drastically reduces viral production and contributes to immunity.
This hypothesis of mutations directed at stop codons is in line with observations that polymerase errors are more frequent in stop codon contexts, interpreted as an adaptational bias to introduce mutations in stops . In the next section, GenBank is explored to detect further mitogenomes in which genetic codes were switched by producing stop codons in ORFs and stop-depletion in other frames.
2.1. Exploring GenBank for genetic code switches
Previous Blastp searches found proteins already described in GenBank and aligning with hypothetical peptides translated from randomly chosen frameshifted vertebrate mitochondrial genes. These analyses detected the unusual proteins translated from ORFs of the mitogenome of Lepidochelys olivacea . In these cases, the regular mitochondrial proteins are coded in frames that include stops, and hence were not recognized as the regular gene. The annotated frame is stopless, but codes for other, unknown peptides. These other peptides are homologous to peptides translated after frameshift from regular mitochondrial protein coding genes, from other mitogenomes that did not undergo stop codon depletion in non-ORF frames.
2.2. Choice of seed sequences for BLAST searches
The method described above only detects homologies for sequences sufficiently similar to “seed” sequences used for BLAST analyses of GenBank. Therefore, using as seed the human mitogenome, mainly vertebrate proteins were detected, as for the above-mentioned Lepidochelys olivacea. A similar situation occurs for detection of swinger DNA/RNA sequences: the original searches using as seed swinger transformed versions of the human mitogenome only detected vertebrate sequences , but BLAST analyses using a randomly chosen invertebrate mitogenome (from the North Pacific krill Euphausia pacifica (NC_016184)) detected numerous additional swinger sequences, from insect mitogenomes .
This search principle for insect nucleotide sequences can also be applied for proteins. I use as seed the five peptides translated from the five “noncoding” frames of the 13 regular protein coding genes of Euphausia’s previously randomly chosen invertebrate mitogenome. These 65 peptides were blasted to search GenBank for proteins already described and with high homologies with peptides translated from Euphausia’s noncoding frames.
3.1. Preliminary results from Aleurodicus dispersus
Preliminary BLAST analyses of peptides translated from noncoding frames of Euphausia’s mitochondrial protein coding genes detected GenBank proteins from the mitogenome of Aleurodicus dispersus, a sap-sucking spiraling whitefly. These have high homology levels with peptides translated from the antisense sequence of several among Euphausia’s protein coding genes. These unusual CDs in this insect mitogenome remind previous descriptions of other unusual CDs in the mitogenome of the marine turtle Lepidochelys olivacea. This justifies detailed analyses of peptides translated from the six frames of the 13 protein coding genes of the mitogenomes of Aleurodicus dispersus (JX566506), and, for comparative purpose, of its closest relative with a complete mitogenome in GenBank, Aleurodicus dugesii (NC_005939), whose predicted proteome seems coded according to regular rules.
3.2. Aleurodicus dispersus protein coding genes
All six frames of the 13 mitochondrial protein coding genes of Aleurodicus dispersus were translated according to the regular invertebrate mitochondrial genetic code. First, BLAST analyzed peptides translated from GenBank-annotated, stopless ORFs to verify which of these peptides are “normal,” i.e., have regular homologies with the corresponding protein predicted for the regular ORF of the mitogenome of Aleurodicus dugesii.
These analyses confirm that GenBank annotations of the six Aleurodicus dispersus mitogenes COI, COII, AT6, COIII, ND3, and ND2 code for typical invertebrate proteins homologous with corresponding proteins in regular insect mitogenomes, notably Aleurodicus dugesii. The remaining seven genes follow different coding structures described below, based on frameshifts and/or stop depletion/translation. Blastp does not detect any homologies for proteins predicted according to GenBank annotations for genes AT8, ND1, ND6, ND5, ND4, and ND4l, and only partial homology for CytB.
Mitochondrial metabolism without the regular proteins usually translated from these seven genes seems impossible. Regular analyses of the mitogenome of Aleurodicus dugesii detect these proteins as they are annotated in GenBank. The possibility that these genes were transferred in Aleurodicus dispersus to the nucleus and that proteins are imported to the mitochondrion seems unlikely as ORFs occur at positions corresponding to gene locations coding for the seven missing proteins in the predicted mitoproteome of Aleurodicus dispersus.
3.3. Recoding of mitogenes in Aleurodicus dispersus
3.3.1. Two ORFs on the same strand: AT8
The case of the missing ATP synthetase subunit 8 is solved by Blastp analysis of the peptide coded by the +2 frameshifted sequence of gene AT8. Residues at positions 8–48 (the gene has 49 codons including stop codon) in frame +2 have 75% similarity with congeneric mitochondrial ATP synthetase subunit 8 of Aleurodicus dugesii (YP_026055, e value 2 × 10−9, see alignment in Figure 1). It seems likely that the regular AT8 gene codes for the corresponding protein. This frame does not contain stops, implying that this gene has two stopless ORFs. The GenBank-annotated ORF does not correspond to the regular AT8, which is the +2 frameshifted sequence of the annotated sequence.
3.3.2. Stop codon translation after frameshift: ND6
For gene ND6, the stopless ORF annotated in GenBank does not align with any ND6-like protein. This conundrum is solved by Blastp analysis of the peptide translated from the +1 frameshifted sequence of ND6 as it is annotated in GenBank. It aligns with 86% similarity with the mitochondrial NADH dehydrogenase subunit 6 of congeneric Aleurodicus dugesii (positions 33–137 in Aleurodicus dispersus and 32–129 in Aleurodicus dugesii, e value 1 × 10−37, not shown). Hence, the annotated gene corresponds to a stopless frame that does not translate into a recognizable mitochondrial protein, while the +1 frame, which contains three stop codons codes for ND6. Only one of the stop codons is within the alignment, where it corresponds to a tyrosine in Aleurodicus dugesii. This implies translation of at least one stop codon, as previously described for other short mitochondrial protein coding genes where the protein coding region includes a programmed frameshift and translation of stops (ND3 in birds  and in turtles ). It seems plausible that ND6 translation starts in the 5′ region of the frame as annotated in GenBank, and then a programmed frameshift occurs in the vicinity of the 5′-starting point of the alignment. Translation of the stop codon by tyrosine is compatible with translation by tRNAs with near-cognate anticodons [78, 79, 80].
3.3.3. Frameshift with stop translation: CytB
The situation in CytB is similar and reminds again known cases of proteins coded by two frames. The ORF as annotated in GenBank has high homology (96% similarity) from residue 137 to 355 with the regular cytochrome B of Aleurodicus dugesii (YP_026063, e value 5 × 10−91, Figure 2). The 5′ extremity of cytochrome B is coded by the +1 frameshifted sequence of the gene, as indicated by high similarity (88%) in the alignment from residues 6–136 with the regular cytochrome B of Aleurodicus dugesii (YP_026063, e value 5 × 10−54, Figure 2). Position 131 is a stop that corresponds to a tyrosine in Aleurodicus dugesii. Hence, this gene’s coding structure implies frameshift and probable stop translation, potentially by near cognate anticodon.
3.3.4. Stop codon depletion in antisense strand: ND1
The annotation in GenBank for ND1 does not correspond to a protein homologous to NADH dehydrogenase subunit 1. However, the peptide translated from the +1 frame of the opposite (antisense) strand has high homology with NADH dehydrogenase subunit 1 from Aleurodicus dugesii (YP_026064, 94% similarity for the complete length, e value 0). This implies regular encoding of that protein. The misannotation originates from depletion of all stops in one frame of the antisense strand of that gene. The corresponding antisense frame has 15 TAR stop codons in the regular ND1 of Aleurodicus dugesii. Figure 3 aligns the peptide translated from this presumably noncoding antisense frame in Aleurodicus dugesii with the peptide translated from the GenBank-annotated frame. Stop codons correspond in this alignment mainly to serine (seven cases), then to tryptophan (two cases) and once each to leucine, lysine, methionine, and asparagine. This predicted translation is to much lower extents compatible with near cognate translation, and might be due to specific tRNA(s) with anticodon(s) matching stops.
The fact that this antisense frame is stop codon depleted in Aleurodicus dispersus so that it does not necessitate any special translational machinery for its expression suggests the possibility that this frame is translated in Aleurodicus dugesii (and in other species) and produces an unknown functional protein, and this is due to stop codon translation by antiterminator tRNAs. Indeed, the entirety of both mitochondrial strands is transcribed to RNA; hence, RNA corresponding to this supposedly noncoding strand necessarily exists and could be translated . The alignment suggests that the amino acid most probably inserted by that stop-suppressor tRNA is serine. This is in line with previous observations from cytochrome c oxidase subunit I from the silkworm Samia ricini [7, 9], where stop codons in a usually noncoding frame systematically mutated to serine. This finding strengthens serine as the likely residue inserted at stop codons in insect mitochondria. This awaits confirmation by translation and tRNA aminoacylation experiments (as for example done for giant virus tRNAs ).
3.3.5. Stop codon depletion in antisense strand and stop codon translation: ND4l and ND5
Annotations in GenBank for Aleurodicus dispersus’ ND4l and ND5 do not match proteins homologous to the corresponding NADH dehydrogenase subunits. For ND4l, the peptide translated from frame 0 of the antisense of that gene (this frame includes five stops) yields a short alignment with the regular protein in Aleurodicus dugesii (24 residues, 87% similarity, e value 8.5, not shown). Hence, the annotated ORF in Aleurodicus dispersus is probably a stop-codon-depleted antisense sequence (the corresponding frame in Aleurodicus dugesii has four stops), which does not code for the regular ND4l gene. This stop codon depletion in Aleurodicus dispersus introduced five stops in the sense strand frame that apparently codes for ND4l according to the above-described alignment.
For ND5, the peptide translated from the +1 frame of the antisense of the GenBank-annotated sequence is homologous over its complete length to NADH dehydrogenase subunit 5 of Aleurodicus dugesii (89% similarity, e value 0, not shown). This frame has a single stop codon that aligns with serine in Aleurodicus dugesii (see discussion of insertion of serine at stops in previous section).
3.3.6. Stop codon depletion in antisense strand, frameshift, and stop translation: ND4
A further mitochondrial gene for which the GenBank annotation does not produce the expected protein for Aleurodicus dispersus is ND4. Alignment analyses detect peptides homologous with regular NADH dehydrogenase subunit 4 when translating frames +1 and +2 of the antisense of the GenBank-annotated ND4 gene. Blastp alignment analyses detect homology with the regular ND4-encoded protein of Aleurodicus dugesii; the regular protein is encoded in antisense frame +1 until residue 297 (89% similarity, e value 3 × 10−126, not shown). This alignment includes a single stop, matching tyrosine in Aleurodicus dugesii. Part of the remaining protein is coded by a stopless stretch of antisense strand frame +2, where residues 341–427 align with the regular protein from Aleurodicus dugesii (73% similarity, e value 2 × 10−21). Hence, the annotated ORF in Aleurodicus dispersus is a stop-codon-depleted antisense frame that codes for an apparently different protein, the actual NADH dehydrogenase subunit 4 is encoded by two frames, one containing a stop, on the opposite strand.
4. General discussion
4.1. Genomic stop codon-depletion: overall analysis
Table 1 presents numbers of stops in all six frames of the mitochondrial genes of Aleurodicus dugesii, and of Aleurodicus dispersus, according to the strand presented in GenBank. The frame coding for the regular protein in Aleurodicus dugesii is always the only stopless frame, the frame(s) coding for the regular proteins are underlined in Aleurodicus dispersus, these are not necessarily stopless, and are not necessarily the only stopless frame.
In order to account for slight variations in gene sizes, I compare between percentages of stop codons averaged across all six frames in the two species, gene by gene. Mean stop codon percentages decrease in 11 among 13 protein coding genes of Aleurodicus dispersus, as compared to Aleurodicus dugesii. This is a significant majority of cases according to a one tailed sign test using a binomial distribution and assuming equal probability of getting more or less stop codons in any of these mitogenes (P = 0.00562). This overall stop codon depletion occurs in all seven “recoded” genes. Stop codon depletion occurs qualitatively in four among the six genes with regular, unchanged coding structure. This tendency is not statistically significant for this subgroup of genes when using the robust, but blunt nonparametric sign test. A paired t test between mean percentages of stop codons averaged across frames indicates also for these six genes a statistically significant decrease in stops in Aleurodicus dispersus, as compared to Aleurodicus dugesii. This result suggests that stop codon depletion occurred across all or at least most of this genome, and for most frames, not only for genes whose coding structure was altered, and not only for frames who became ORFs.
Presumably, unknown mechanisms associated with replication depleted stop codons in this species’ mitogenome, perhaps cumulatively over several replication or DNA edition cycles. Total stop codon depletion in some frames produced new ORFs. Natural selection against stop codons presumably enhanced unknown enzymatic phenomena, eliminating stop codons in these frames. It seems plausible that these frames in usual mitogenomes code for proteins translated by stop suppressor tRNAs. Specific unknown conditions in Aleurodicus dispersus may favor enhanced expression of peptides coded by frames that usually include stops in other mitogenomes, such as Aleurodicus dugesii. These constraints would have ultimately caused genomic stop codon depletion in Aleurodicus dispersus. In regular mitogenomes, stop codon translation downregulates expression of these unusual peptides in favor of proteins coded by regular ORFs, but in Aleurodicus dispersus, this hierarchy may be inexistent (when two stopless ORFs occur in a gene) or reversed (as in several mitogenes of Lepidochelys olivacea ), with translation of the unusual peptide not necessitating stop codon suppression, and translation of regular mitochondrial proteins requiring tRNAs with anticodons matching stop codons.
4.2. Coding redundancy between frames and tolerating ribosomal frameshifts
The original hypothesis of frame shiftability suggests that different frames of a gene code for somewhat similar peptides, presumably because the genetic code is optimized to tolerate frameshifts [83, 84, 85]. This hypothesis suggests that redundancy among frames in Aleurodicus dispersus should be greater than in the closely related Aleurodicus dugesii where coding seems regular, assuming that changes in coding structure increase redundancy among frames for coding protein variants with similar functions.
I used ClustalX to align the regular protein with peptides coded by the +1 and +2 frames of the same coding strand, for each Aleurodicus dispersus and Aleurodicus dugesii. Numbers of amino acids that were identical in the alignment were divided by total peptide lengths. This proportion for genes from Aleurodicus dispersus is plotted as a function of the corresponding proportion for Aleurodicus dugesii for alignments between frame 0 and frame +1 (Figure 4). Redundancy between frame 0 and + 1 is greater in mitogenes of Aleurodicus dispersus in nine among twelve genes (there was no difference between these species for gene COII), a statistically significant majority according to a one tailed sign test (P = 0.0365). This tendency however does not exist for alignments between frames 0 and +2 (redundancy in Aleurodicus dispersus greater in 6 among thirteen genes).
This analysis tentatively indicates that stop codon depletion and coding by frameshifting and translation of stop codons might associate with a phenomenon increasing tolerance to frameshifts during translation. Indeed, frequencies of off frame stop codons in mitochondrial genes are inversely correlated to predicted ribosomal RNA stability [86, 87, 88], suggesting that genes adapt to avoid negative effects of ribosomal frameshifts [44, 89, 90]. Stop codon-depletion may enable coding for more proteins, in addition to increasing redundancies between frames. Several effects could explain that results are not very strong statistically at the level of redundancy between frames. This hypothesis should be further tested, experimentally as done by Wang et al. [83, 84, 85] and by other bioinformatics analyses. For example, one can expect that frameshift tolerance biases exist for identity at amino acids that are not easily replaced by other amino acids (e.g., cysteine), but less for mutable ones (leucine, isoleucine, etc.). The preliminary tests presented here are not incompatible with the frameshift tolerance hypothesis [83, 84, 85].
It is important to understand in this context that the genetic code’s discovery, among the greatest fundamental discoveries, is not over, but only in process. Indeed, coding sequences include much more information than generally believed, even beyond RNA editing (RDD ), systematic transformations during replication [44, 45, 46] and transcription [39, 47, 48, 49, 50, 51], and translation along expanded codons [32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43]. Cryptic codes [92, 93] such as the well-developed theory of the natural circular code [94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112] regulate the ribosomal translation frame [113, 114, 115, 116], and protein cotranslational folding , remain to be described and decoded.
4.3. Sequencing artifacts and genome annotation
During the redescription of the recoded Lepidochelys olivacea mitogenome , an anonymous reviewer suggested that sequencing errors mimicked frameshifting mutations (insertion/deletion), producing the impression of frame recoding. This explanation is incompatible with the phenomena described in Lepidochelys and Aleurodicus, because these involve numerous specific changes/mutations in stop codon-specific nucleotide contexts, totally depleting stop codons in usually noncoding frames, and introducing stop codons in usually stopless, regular ORFs. Frameshifting mutations insert/delete a nucleotide within a regular ORF, which due to the frameshifting mutation is split between two frames. This does not deplete stop codons occurring in noncoding frames, nor introduce stops in the frameshifted ORF. ORF creation in usually noncoding frames by stop codon depletion in Lepidochelys olivacea  and Aleurodicus dispersus probably originates from natural, enzymatic, directed mutations  or other processes causing directed mutations, such as transposon-mediated directed mutations [119, 120].
Recoding probably occurs beyond mitogenomes. However, the short highly conserved mitogenomes  are most adequate to manual reannotation, a first necessary stage to detect events where genes are recoded from one to another genetic code. I suggest that annotations of genomes, and mitogenomes in particular, take systematically into account phenomena such as swinger sequences , and directed stop codon depletions that may result in ORFs that do not code for regular recognized proteins as presented here, especially in genomes/genes that seem unusual and remain in an unverified status in GenBank. Figure 5 resumes the changes in coding structures that occurred in Aleurodicus dispersus due to recoding, as compared to the “ancestral” regular situation in A. dugesii. The proposed stopless genetic code in Aleurodicus dispersus presumably introduces serine at stops TAR and differs from previously described alternative arthropod mitochondrial genetic codes, which usually recode codons AGR [122, 123].
In vertebrate mitochondria, BLAST analyses of peptides translated from frames that are not recognized ORFs and contain stop codons align with high homology with proteins translated from regular mitochondrial ORFs in GenBank. Many such ORFs code for peptides matching usually noncoding sequences and occur in the mitogenome of Lepidochelys olivacea . In this case, similar analyses are done for invertebrate mitochondria and a mitochondrial genome (JX566506, from Aleurodicus dispersus, Yu and Du, submitted in GenBank 2012, unpublished) considered as unverified are detected, probably because many protein-coding genes are undetectable with usual coding rules.
Several phenomena, and their combination, explain this situation. Alignment analyses detect the coding rules for these genes: frameshift, translation of stop codons, and depletion of stop codons in usually noncoding frames. Previous analyses detected in Lepidochelys olivacea CytB two stop-codon-deprived frames on the sense strand, among which one codes for the regular cytochrome B, and the other for an unknown protein. GenBank annotates erroneously the latter frame as coding for cytochrome B. A similar observation is reported here for the gene AT8 in Aleurodicus dispersus.
Some mitochondrial protein coding genes in Aleurodicus dispersus are unusual in the sense that the stop-codon-depleted frame erroneously annotated as the regular mitochondrial protein coding gene is on the strand opposite to the sense strand coding (with or without stop codons) for the actual usual protein: ND1, ND4l, ND4, ND5. The process depleting stop codons in these antisense frames is unknown and of particular interest. It is a probable combination of natural selection and enzymatically directed mutations to and from stop codons in the adequate nucleotide contexts, perhaps promoted by unknown conditions specific to Aleurodicus dispersus. In some genes, the protein usually coded by the regular genetic code necessitates translating stop codons (a stopless genetic code), while frames including stop codons and therefore not considered as ORFs become stop codon depleted, and hence corresponding peptides are coded by the regular invertebrate mitochondrial genetic code. This situation where peptides coded by regular and stopless genetic codes are swapped might reflect a reversal in hierarchies of needs for the expressions of the respective peptides, specific to Aleurodicus dispersus. The requirement for tRNAs translating stop codons would regulate these respective expressions, de facto swapping between regular and stopless genetic codes. I suggest that the enzymatically directed stop codon depletion is related to the process that caused directed introductions of stop codons in the coding frames of HIV proteins integrated in the nuclear genomes of “elite” HIV controller individuals [68, 69].
This study was supported by Méditerranée Infection and the National Research Agency under the program “Investissements d’avenir,” reference ANR-10-IAHU-03, and the A*MIDEX project (no ANR-11-IDEX-0001-02).
Conflict of interest
The author declares no conflict of interest.