TAR stop codons in the six frames of mitogenome-encoded genes of
Mitochondrial genetic codes evolve as side effects of stop codon ambiguity: suppressor tRNAs with anticodons translating stops transform genetic codes to stopless genetic codes. This produces peptides from frames other than regular ORFs, potentially increasing protein numbers coded by single sequences. Previous descriptions of marine turtle Olive Ridley mitogenomes imply directed stop-depletion of noncoding +1 gene frames, stop-creation recodes regular ORFs to stopless genetic codes. In this analysis, directed stop codon depletion in usually noncoding gene frames of the spiraling whitefly Aleurodicus dispersusʼ mitogenome produces new ORFs, introduces stops in regular ORFs, and apparently increases coding redundancy between different gene frames. Directed stop codon mutations switch between peptides coded by regular and stopless genetic codes. This process seems opposite to directed stop creation in HIV ORFs within genomes of immunized elite HIV controllers. Unknown DNA replication/edition mechanisms probably direct stop creation/depletion beyond natural selection on stops. Switches between genetic codes regulate translation of different gene frames.
- codon-amino acid reassignment
- Lepidochelys olivacea
- ribosomal RNA
- directional evolution
Mitochondrial genetic code diversification frequently reassigns stop codons to amino acids . This situation probably reflects ambiguity in roles of stop codons during translation in mitochondria. Indeed, phylogenetic reconstructions of the evolution of genetic codes based on differences between codons in amino acid and stop codon assignments resemble known phylogenies of organisms using these genetic codes, if ambiguity of stop codons in mitochondrial (but not nuclear) genetic codes is accounted .
These observations suggest translational activity of mitochondrial tRNAs with anticodons matching stop codons  templated by antisense strands of regular mitochondrial tRNAs [4, 5]. Predicted occurrences of mitochondrial stop suppressor (antiterminator) tRNAs coevolve with stop codon usages in predicted off-frame protein coding genes in all analyzed mitochondria (primates and Drosophila [6, 7], including their ribosomal RNAs , turtles  and Chaetognaths ), and in a peptide detected by specific monoclonal immunolocalization in human mitochondria . Hence, there are probably more mitochondrion-encoded genes than usually accepted [12, 13]. Mitogenome size reduction probably causes gene multifunctionality, including mt tDNAs functioning as replication origins [14, 15, 16, 17, 18, 19, 20, 21].
Translation of stop codons occurs in different organisms [22, 23], but seems particularly widespread in mitochondria [2, 3] as reflected by the evolution of mitochondrial genetic codes . Nuclear genetic codes lacking dedicated stop codons have been described in protists [24, 25, 26, 27] and fungi .
1.1. Alternative coding by expanded codons
Switching between regular and stopless genetic codes is not the only strategy increasing dramatically information encoded by genomes. Isolated tetracodons, codons expanded by a fourth silent nucleotide, are known since the dawn of molecular biology . These are sometimes translated by tRNAs with expanded anticodons [30, 31]. It seems probable that systematic frameshifts produce stretches of tetracodons that code for (yet) undetected peptides ([32, 33, 34]). Some evidence suggests that tetracoding occurs especially at high environmental temperatures  and is predicted by genetic code optimization : codon-anticodon interactions are more stable when four rather than only three base pairs hybridize. Other theoretical considerations suggest that the mitochondrial vertebrate genetic code evolved from a specific subset of 64 tetracodons, the tesserae, chosen on the basis of symmetry principles .
Indeed, analyses of mitochondrial mass spectra searching for peptides matching translations assuming tetra- and pentacodons, codons expanded by one or two silent nucleotides, detected numerous tetra- and pentacoded peptides [38, 39, 40, 41, 42, 43].
1.2. Alternative coding by swinger polymerization
Further little known mechanisms increase numbers of proteins potentially coded by single sequences. Polymerization occasionally exchanges systematically between nucleotides during DNA replication [44, 45, 46] or RNA transcription [47, 48, 49, 50, 51, 52, 53] for long sequence stretches (23 exchange rules are possible, nine symmetric, e.g., A<>C, and fourteen asymmetric, e.g., A > C > G > A), producing swinger sequences. Swinger replication, in particular the double symmetric exchange A<>T + C<>G, seems most frequent for mitochondrial ribosomal RNAs . This increases the coding potential of rRNAs, strengthening the hypothesis that rRNAs are modern remnants of protogenomes that templated for translational molecules (tRNA-like and rRNA-like) and protein coding genes [54, 55, 56, 57, 58] by dense overlap coding. This is compatible with the occurrence of protein coding regions within modern rRNAs [8, 59, 60, 61].
I stress here that the exchange A<>T + C<>G is not trivial: this creates the complement of the template sequence, which is not the regular inverse (or reverse) complement. “Complement” is frequently used as shortcut for inverse complement, but the A<>T + C<>G transformed sequence is a different sequence because it lacks the 3′-to-5′ inversion combined with nucleotide complementarity. The shortcut has been used because 3′-to-5′ inverted sequences had not been described previous to the description of A<>T + C<>G transformed sequences.
It seems regular transcription occasionally switches abruptly to swinger transcription (and vice versa), as indicated by chimeric RNAs. These RNAs correspond in part to regular DNA, and an adjacent part corresponds to DNA only if accounting for swinger transformation . Corresponding chimeric peptides have also been detected . Chimeric DNA also exists: the mitogenome of the stonefly
1.3. Regulation of alternative transcriptions by secondary structure
Secondary structures are important components of RNA function and evolution [64, 65]. Secondary structures formed by tRNA sequences punctuate posttranscriptional processing of mitochondrial transcripts [66, 67, 68]. Palindromes potentially forming secondary structures after sequence swinger-tranformation associate with detected mitochondrial swinger RNAs . This is similar to what is known from regular RNA processing in mitochondria and, surprisingly, giant viruses  which bear also other striking resemblances with mitogenomes, including similar gene order . Transcription sometimes deletes systematically mono- or dinucleotides after transcribing trinucleotides (del-transcription), translated into peptides that in part converge with peptides translated from regular RNAs by expanded anticodons . Del-transcription, or at least detection of delRNAs, seems downregulated by secondary structures formed after transformation of the sequence by systematic deletions .
1.4. Mechanisms that switch between genetic codes?
Phenomena systematically exchanging nucleotides remind more specific mechanisms that alter the genetic code according to which a protein is coded, from a regular genetic code to a stopless genetic code, and/or vice versa. Indeed, previous analyses of mitogenomes revealed that in one GenBank mitogenome (from a marine turtle, the Olive Ridley,
This was originally interpreted as resulting from directed selection on stop codons. Observations of systematic mutations in contexts creating stop codons in ORFs of HIV genes specifically in elite controllers immune to HIV [73, 74] suggests that enzymatically directed mutagenesis during DNA replication and/or edition could transform ORFs coded according to a regular genetic code into one coded by a stopless genetic code, and vice versa, as observed in several mitochondrial genes of the Olive Ridley. For HIV, introducing stops presumably drastically reduces viral production and contributes to immunity.
This hypothesis of mutations directed at stop codons is in line with observations that polymerase errors are more frequent in stop codon contexts, interpreted as an adaptational bias to introduce mutations in stops . In the next section, GenBank is explored to detect further mitogenomes in which genetic codes were switched by producing stop codons in ORFs and stop-depletion in other frames.
2.1. Exploring GenBank for genetic code switches
Previous Blastp searches found proteins already described in GenBank and aligning with hypothetical peptides translated from randomly chosen frameshifted vertebrate mitochondrial genes. These analyses detected the unusual proteins translated from ORFs of the mitogenome of
2.2. Choice of seed sequences for BLAST searches
The method described above only detects homologies for sequences sufficiently similar to “seed” sequences used for BLAST analyses of GenBank. Therefore, using as seed the human mitogenome, mainly vertebrate proteins were detected, as for the above-mentioned
This search principle for insect nucleotide sequences can also be applied for proteins. I use as seed the five peptides translated from the five “noncoding” frames of the 13 regular protein coding genes of
3.1. Preliminary results from
Preliminary BLAST analyses of peptides translated from noncoding frames of
Aleurodicus dispersusprotein coding genes
All six frames of the 13 mitochondrial protein coding genes of
These analyses confirm that GenBank annotations of the six
Mitochondrial metabolism without the regular proteins usually translated from these seven genes seems impossible. Regular analyses of the mitogenome of
3.3. Recoding of mitogenes in
3.3.1. Two ORFs on the same strand: AT8
The case of the missing ATP synthetase subunit 8 is solved by Blastp analysis of the peptide coded by the +2 frameshifted sequence of gene AT8. Residues at positions 8–48 (the gene has 49 codons including stop codon) in frame +2 have 75% similarity with congeneric mitochondrial ATP synthetase subunit 8 of
3.3.2. Stop codon translation after frameshift: ND6
For gene ND6, the stopless ORF annotated in GenBank does not align with any ND6-like protein. This conundrum is solved by Blastp analysis of the peptide translated from the +1 frameshifted sequence of ND6 as it is annotated in GenBank. It aligns with 86% similarity with the mitochondrial NADH dehydrogenase subunit 6 of congeneric
3.3.3. Frameshift with stop translation: CytB
The situation in CytB is similar and reminds again known cases of proteins coded by two frames. The ORF as annotated in GenBank has high homology (96% similarity) from residue 137 to 355 with the regular cytochrome B of
3.3.4. Stop codon depletion in antisense strand: ND1
The annotation in GenBank for ND1 does not correspond to a protein homologous to NADH dehydrogenase subunit 1. However, the peptide translated from the +1 frame of the opposite (antisense) strand has high homology with NADH dehydrogenase subunit 1 from
The fact that this antisense frame is stop codon depleted in
3.3.5. Stop codon depletion in antisense strand and stop codon translation: ND4l and ND5
Annotations in GenBank for
For ND5, the peptide translated from the +1 frame of the antisense of the GenBank-annotated sequence is homologous over its complete length to NADH dehydrogenase subunit 5 of
3.3.6. Stop codon depletion in antisense strand, frameshift, and stop translation: ND4
A further mitochondrial gene for which the GenBank annotation does not produce the expected protein for
4. General discussion
4.1. Genomic stop codon-depletion: overall analysis
Table 1 presents numbers of stops in all six frames of the mitochondrial genes of
In order to account for slight variations in gene sizes, I compare between percentages of stop codons averaged across all six frames in the two species, gene by gene. Mean stop codon percentages decrease in 11 among 13 protein coding genes of
Presumably, unknown mechanisms associated with replication depleted stop codons in this species’ mitogenome, perhaps cumulatively over several replication or DNA edition cycles. Total stop codon depletion in some frames produced new ORFs. Natural selection against stop codons presumably enhanced unknown enzymatic phenomena, eliminating stop codons in these frames. It seems plausible that these frames in usual mitogenomes code for proteins translated by stop suppressor tRNAs. Specific unknown conditions in
4.2. Coding redundancy between frames and tolerating ribosomal frameshifts
The original hypothesis of frame shiftability suggests that different frames of a gene code for somewhat similar peptides, presumably because the genetic code is optimized to tolerate frameshifts [83, 84, 85]. This hypothesis suggests that redundancy among frames in
I used ClustalX to align the regular protein with peptides coded by the +1 and +2 frames of the same coding strand, for each
This analysis tentatively indicates that stop codon depletion and coding by frameshifting and translation of stop codons might associate with a phenomenon increasing tolerance to frameshifts during translation. Indeed, frequencies of off frame stop codons in mitochondrial genes are inversely correlated to predicted ribosomal RNA stability [86, 87, 88], suggesting that genes adapt to avoid negative effects of ribosomal frameshifts [44, 89, 90]. Stop codon-depletion may enable coding for more proteins, in addition to increasing redundancies between frames. Several effects could explain that results are not very strong statistically at the level of redundancy between frames. This hypothesis should be further tested, experimentally as done by Wang et al. [83, 84, 85] and by other bioinformatics analyses. For example, one can expect that frameshift tolerance biases exist for identity at amino acids that are not easily replaced by other amino acids (e.g., cysteine), but less for mutable ones (leucine, isoleucine, etc.). The preliminary tests presented here are not incompatible with the frameshift tolerance hypothesis [83, 84, 85].
It is important to understand in this context that the genetic code’s discovery, among the greatest fundamental discoveries, is not over, but only in process. Indeed, coding sequences include much more information than generally believed, even beyond RNA editing (RDD ), systematic transformations during replication [44, 45, 46] and transcription [39, 47, 48, 49, 50, 51], and translation along expanded codons [32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43]. Cryptic codes [92, 93] such as the well-developed theory of the natural circular code [94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112] regulate the ribosomal translation frame [113, 114, 115, 116], and protein cotranslational folding , remain to be described and decoded.
4.3. Sequencing artifacts and genome annotation
During the redescription of the recoded
Recoding probably occurs beyond mitogenomes. However, the short highly conserved mitogenomes  are most adequate to manual reannotation, a first necessary stage to detect events where genes are recoded from one to another genetic code. I suggest that annotations of genomes, and mitogenomes in particular, take systematically into account phenomena such as swinger sequences , and directed stop codon depletions that may result in ORFs that do not code for regular recognized proteins as presented here, especially in genomes/genes that seem unusual and remain in an unverified status in GenBank. Figure 5 resumes the changes in coding structures that occurred in
In vertebrate mitochondria, BLAST analyses of peptides translated from frames that are not recognized ORFs and contain stop codons align with high homology with proteins translated from regular mitochondrial ORFs in GenBank. Many such ORFs code for peptides matching usually noncoding sequences and occur in the mitogenome of
Several phenomena, and their combination, explain this situation. Alignment analyses detect the coding rules for these genes: frameshift, translation of stop codons, and depletion of stop codons in usually noncoding frames. Previous analyses detected in
Some mitochondrial protein coding genes in
This study was supported by Méditerranée Infection and the National Research Agency under the program “Investissements d’avenir,” reference ANR-10-IAHU-03, and the A*MIDEX project (no ANR-11-IDEX-0001-02).
Conflict of interest
The author declares no conflict of interest.