Percentage of TAR10 and ATR49 triplets in various taxa independently of the V-R size.
In all the taxa and genomic systems, numerous trn genes (specifying tRNA) exhibit at specific conserved positions nucleotide triplets corresponding to stop codons (TAG/TAA). Similarly, relatively high frequencies of start codons (ATG/ATA) occur in fungi/metazoan mitochondrial-trn genes. The last nucleotide of these triplets is the first involved in the 5′-D- or 5′-T-stem, respectively. Their frequencies are tRNA species dependent. The products of these genes which bear one or two types of these codons are called ss-tRNAs (for stop/start). Metazoan mt-genomes are generally very compact, and many same strand overlapping sequences may simultaneously code for tRNAs and mRNAs. However, this study suggests that overlaps are not a direct mechanism to substantially reduce genome size. For protein-encoding genes, occulting possible overlaps, there are only alternative start codons and/or truncated stop codons, but the first putative in-frame standard initiation codon or complete stop codon is in the upstream or downstream overlapping ss-trn sequences, respectively. Even if, to date, experimental data are missing, stress signals might regulate producing extended or not proteins. Finally, possible implications of tRNA/mRNA hybrid molecules in the “RNA world” to “RNA/protein world” transition will be discussed.
- tRNA origin
- start codon
- stop codon
- origin of life
- RNA world
Transfer RNAs are key partners in the ribosome-translation machinery. Generally, they are composed of c.70–90 nucleotides (nts). Moreover, they are the most abundant nucleic acid species, constituting up to 10% of all cellular RNAs . Therewith, the number of tRNA molecules is, e.g., about 2 × 105 in Escherichia coli and 3 × 106 in yeast cell . Due to their anticodon, they read genetic information on mRNAs and deliver codon specified amino acids attached to their distal 3′-extremity for peptide bond synthesis on the ribosome. In this sense, tRNA is a key molecule which makes it possible to pass from a covalent bond between a RNA and an amino acid (fossil trace of the RNA world to the RNA/protein world transition) to peptide bonds (RNA/protein world). Genes specifying tRNAs (noted trn) are present in prokaryotic and nuclear genomes and in most of the DNAs of organelles (chloroplasts and mitochondria). Usually, tRNAs have a characteristic canonical cloverleaf secondary structure made up of the aminoacyl acceptor-stem and the D-arms (as it contains dihydrouridine), anticodon-arms, and T-arms (for the sequence TΨC where Ψ is pseudouridine), the hairpins, or “arms” consisting of a stem (helicoidal region in 3D) ending in a loop (Figure 1). The lengths of each arm, as well as the loop “diameter,” vary from the tRNA type and from species to species. Furthermore, deduced trn sequences and even sequenced mature tRNAs exhibit reduced D-arms or T-arms or even lacking at least one of them, and in the extreme situation such as in Enoplea (nematodes) mitochondrial (mt)-trn genes are totally armless . However, around 90% of the mt-tRNAs fold into the canonical cloverleaf structure . In all the genetic systems, the tRNAs can carry a myriad of idiosyncratic posttranscriptional chemical modifications (e.g.,
The ribosome allows the best possible spatial arrangement of the various partners and ensures catalysis, but the adaptor molecule which acts as a link between codes of mRNA and amino acids of polypeptides is the tRNA. In order to fill this major role, tRNAs have two distinct characteristics corresponding to two different genetic codes, the anticodon and the operational codes. The latter which is mainly embodied in the acceptor-stem allows to bind covalently and with high specificity an amino acid to a tRNA, a reaction catalyzed by a specific aminoacyl-tRNA synthetase (aaRS) . The operational code might have actually predated the “classic” code associated with anticodons . Moreover, the tRNAs exhibit diversity in uniqueness, all of them must be similar for entering the ribosome machinery; therefore, they generally look structurally homogeneous, especially in their secondary and tertiary structures even if “non-classical” tRNAs are known . Moreover, cloverleaf structure and especially the tertiary interaction network governing the L-shaped tRNA architecture imply conserved and semiconserved bps and nts. On the other side, each type of tRNA structures must interact specifically with aaRSs and posttranscriptional modification enzymes, which implies that parts of their sequences and of their structures (as the V-R size) allow to distinguish them.
Reduced bacterial and most organelle genomes do not encode the full set of 32 tRNA species required to read all triplets of the standard genetic code according to the conventional wobble rules. Superwobbling where a single tRNA species contains modifications of the anticodon-loop, such as an hypermodified uridine at the wobble position 34 of the anticodon, reads all 4 nts at third codon position and has been suggested as a possible mechanism for how reduced tRNA sets may be functional . Indeed, many metazoan mtDNAs have only a total of 22 tRNAs, apparently sufficient to recognize all codons (two tRNAs each for serine and leucine and one tRNA for each of the other 18 amino acids). However, superwobbling induces a reduced translational efficiency, which could explain why most organisms have adopted pairs of isoaccepting tRNAs over the superwobbling mechanism . Moreover, e.g., in Cnidaria (sea anemones, corals, etc.) or Chaetognatha (marine invertebrates), current mtDNAs have lost several of their trn genes, and the absence of an apparently full set of mt-trn genes has also been mentioned . Studies have investigated the fate of missing tRNAs and their corresponding aaRSs , and in many cases, the lost tRNAs are functionally replaced by imported nucleus-encoded tRNAs . However, recent search strategies suggest that efficient reanalyzes detect several tRNA-like structures (TLS), which can be efficient tRNAs .
Compared to mitochondria found in other eukaryotic kingdoms, those of metazoa are massively reduced in their genetic structure . Their mtDNA is a short, circular molecule that generally contains about 13 intronless, protein-coding genes, all of which are involved in aerobic respiration (also called oxidative phosphorylation) . Moreover, the coding sequences of genes are usually separated by at most a few nts and long polycistronic precursor transcripts may be processed into mature mRNA and rRNA by precise cleavage of the 5′ and 3′-termini of the flanking tRNAs. This processing, which is known as the tRNA punctuation model , is mediated by RNase P and Z endonucleases, respectively . However, this model is not always applicable, genes are not bound by trn genes or these latter may not be involved in the processing of precursor RNAs. Besides, in several taxa mt-mRNAs, rRNAs and even tRNAs may be oligoadenylated or polyadenylated . This has numerous consequences with potentially dual and opposite roles: this promotes transcript stability or offers a target for initiating degradation. Overlapping genes on the same DNA strand occur throughout metazoa . Therefore, the termination points of the protein-encoding genes could be difficult to infer as stop codons (generally UAA or UAG) may be absent. It is accepted that abbreviated stop codons (U or UA) are converted to UAA codons by polyadenylation after transcript cleavage, and this has been confirmed by analyzes of transcripts in some cases . Sometimes, the initiation codon may also not have been detected. For several protein-encoding genes, the question of a possible overlapping with adjacent downstream or upstream trn genes is often raised . Moreover, overlaps between adjacent mt-trn genes are frequent, but it is out of our topic [19, 20].
Incidentally, in 2004, searching for chaetognath mt-trn genes , it was observed that tRNAs bear nt triplets corresponding to stop or start codons at precise conserved positions, and this constitutes the original topic of this chapter.
2. Material and methods
Most of the research was done in two databases which include primary sequences and graphical representations of tRNA 2D structures, tRNAdb (
3. Results and discussion
3.1. Frequencies of TAR10 and ATR49 triplets in various taxa
Visual observations of tRNA deduced 2D structures suggested that nt triplets which could correspond to stop or start codons seemed to be particularly represented at specific positions. The UAR (R for purine) triplets at position 8–10 in the standard numbering and therefore will be named UAR10, whereas the potential initiation codons whose last nt is at the position 49 will be called AUR49 (Figure 1). We chose to number the codons only according to their last nt because the nt 47 is frequently missing in the metazoan mt-tRNAs. Analyzes focus on DNA; hence, these are usually annotated TAR or ATR instead of UAR or AUR. All the tRNAs which bear one or both of these codons are named ss-tRNAs (ss for stop and start) or ss-trn for the corresponding genes. Using tRNAdb and mitotRNAdb databases, these triplets’ frequencies were investigated in different taxa including nuclear and organelle genomes for eukaryotes (Table 1). Excluding taxa for which the number of trn genes is too low for statistical analysis, TAR10 always occurs at high frequencies, whether in prokaryotic, nuclear, or organelle genomes. Values range from 41.1% for fungi to 81.6% for pseudocoelomates. In all the taxa and all tRNA species combined, the percentage of TAG10 triplets is always significantly higher than those of TAA10. The differences are very important in prokaryotic and nuclear genomes, since the percentage of TAA10 is always less than 1, while that of TAG10 is at least 40%. Within the organelle genomes, the difference is smaller but can vary by a factor of 2.5–22.
As the TAR10 triplet (principally TAG) is present in at least 40% of the trn genes for all taxa and genomic systems combined, it could have been present in trn genes of the Last Unicellular Common Ancestor (LUCA), which presumably lived some 3.5–3.8 billion years ago . It is probably an ancestral character which was present in proto-trn sequences. As the percentage of TAA10 strongly increases in trn genes of organelles, one can ask whether this character was not already present in their bacterial ancestor. It is now assumed that despite their diversity, all mitochondria derive from an endosymbiotic α-proteobacterium which has been integrated into a host cell related to Asgard Archaea approximately 1.5–2 billion years ago . However, the earliest fossils possessing features typical of fungi date to 2.4 billion years ago . Moreover, the eukaryotic cells would be chimeras constituted of an archaebacterium and one or more Eubacteria . In addition, all current models for the origin of eukaryotes suggest that the eukaryotic common ancestor had mitochondria. Therefore, as the level of TAA10 is very low in trn genes of α-proteobacteria, it could therefore be a derived trait that may be related to the increase in AT% in mtDNA and/or recognition constraints by mt-aaRSs and modification enzymes. Similarly, it is generally accepted that all chloroplasts and their derivatives are derived from a single cyanobacterial ancestor , and in current cyanobacteria, the respective percentage of TAG10 and TAA10 triplets are 62.5 and 3.6, respectively. The increase in the percentage of TAA10 characterizes organelles.
In all the taxa for all tRNA species combined, the ATR49 triplets are always present in smaller numbers than TAR10. Moreover, their numbers are negligible except in organelle genomes, mainly mitochondria. The low level of ATR49 triplets in Pseudocoelomata is due to the frequent absence of T-arm in their mt-tRNAs. In mitochondria, in some taxa, the frequency of ATG49 is higher than of ATA49, while in others, the opposite occurs. The variability is not surprising, given approximately 2 billion years of mtDNA evolution . It must be noted that the nt G is overrepresented at the 5′-end of the 5′-acceptor- and D-stems, quite often at the 5′-end of the T-stem but rarely at the equivalent position of the anticodon-stem. In taxa where the percentage of ATA49 is higher than those of ATG49, G is most often not the nt majority at the 5′-end of the T-stem. Moreover, differences between the relative percentages of ATG49 and ATA49 could be due, at least in part, to variations in the AT% in organelle DNAs. The percentage of ATR49 is very low in α-proteobacteria and weaker in this last taxon compared to all Proteobacteria or Eubacteria, and it is also very weak in cyanobacteria, and so the significant rate of ATR49 triplets would seem to be a derived condition of organelle DNAs rather than a conserved primitive state lost in current prokaryotes. This trait probably appeared during the transition from endosymbiotic bacterium to permanent organelle that implied massive evolutionary changes including genome reduction, endosymbiotic and lateral gene transfers, and emergence of new genes and the retargeting of proteins . The timing of the mt-endosymbiosis and of the proto-mitochondria to mitochondria transition is uncertain, but one might trace the origin of the ATR49 triplets between at least the first eukaryotic common ancestor (FECA) and the last eukaryotic common ancestor (LECA). A second event occurred, at least, in the mitochondria of the ancestors of Opisthokonta (i.e., Metazoa and Fungi), which would have led to a net increase in numbers of ATR49. ATR49 means that the last two nts of the V-R are AT. It turns out that this mainly concerns the mt-trn genes, whose V-R has only 4 nts, which are almost exclusively present in the Fungi/Metazoa clade.
There are large differences in the frequencies of the TAR10 and ATR49 triplets depending on the species of trn genes (Table 2) and taxa (data not shown), and the selective variations in some taxa suggest that the increase in frequency for some types of triplets would be much more recent than mentioned above; in addition, decreases are also observed. There are, however, very conservative trends such as the presence of ATR49 triplets in genes specifying tRNA-Ala. Analyzes on mt-trn genes of Deuterostomia for which a great number of sequences for each type are available (from 1085 to 1382) show that only the tRNA-Cys and tRNA-Glu species have intermediate TAR10 percentages (Table 2). In all other tRNA species, the values are extreme, 9 and 10 tRNA species with values ranging from 0.4 to 9.8% or greater than 82.4%, respectively (Table 2). In contrast, half of the tRNA species have low ATR49 percentages (≤ to 10.8), and for only four types percentages are ≥77.8. There would also be a tendency suggesting that tRNA species with very high or very low percentages of TAR10 most often have low ATR49 (the tRNA species with the 7 highest and the 8 lowest TAR10 percentages exhibit 10 out of 11 of the lower percentages for ATR49).
3.2. Examples of putative implications of TAR10 and ATR49 as stop and start codons
In order to investigate possible implications of TAR10 and ATR49 triplets in translation, analyzes were performed in GenBank using as keywords: “TAA stop codon is completed by the addition of 3' A residues to the mRNA”, “alternative start codon” or “start codon not determined” and mitochondrion (or mitochondrial DNA) complete genome. Then, it was researched whether upstream (for start codon) or downstream (for stop codon) of the protein-encoding gene was a trn gene. When a trn gene was found, TAR10 or ATR49 triplets were searched, and the same investigation was then made in conspecific mt-genomes. Using this strategy, these triplets have been only found in metazoan mtDNAs, in which overlapping mt-trn genes have long been known.
An example of putative uses of TAR10 triplets as stop codons is presented in Table 3 for a subclass of parasitic flatworms (Platyhelminthes : Eucestoda). Their mt-genetic code has only UAG and UAA as stop codons, avoiding possible bias due to use of other types of termination codons. In 51 among 66 complete mt-genomes, the first in-frame potential stop codon of the cox1 gene is in the downstream trnT gene (24 cases with TAG10 suggesting a 10 nt overlap between cox1 and trnT genes). Authors considering that this long overlap would be impossible have proposed a number of alternative options favoring overlap avoidance (e.g., ). (1) cox1 might use an earlier atypical stop codon. (2) The 3′-end of the cox1 mRNA could have an abbreviated stop codon (U or UA instead of UAG10) upstream the trnT gene which is completed by polyadenylation. (3) If in the potential long transcript, the cleavage would occur just after G10, the cox1 mRNA would end with the complete UAG10 as stop codon and the first 10 nts of the trnT gene would be added by an unknown editing process. (4) The trnT gene would be shorter in its 5′-end lacking the nts from 1 to 8 or 9, e.g., this has been proposed for the mt-trnT of Cyclophyllidea (Echinococcus granulosus, Hymenolepis diminuta, and Taenia crassiceps). If the full stop codon is used, then there is only a single nt (G10) overlap between cox1 and trnT. Moreover, if the end of the cox1 gene is at the level of T9, the stop codon would complete by polyadenylation; whereas if the protein gene has a complete stop codon, the nt G10 would be added by edition. In the alternative structures, the D-arm is absent, whereas it is typical for this tRNA in digeneans (a class of Platyhelminthes) and in other phyla. However, mt-trnT genes issuing from Cyclophyllidea for which the first potential stop codon is at different positions (upstream or downstream the trnT gene, or in this last gene but upstream or downstream TAG10 or at this last position) exhibit similar secondary structures, including a D-arm. In addition, the high level of nt conservation in the 5′-end of the trnT genes of cestoda (i.e., G1, G2, T7, T8, A9, G10, T11, T12 and A14) suggests strongly that the 5′-acceptor-stem and the D-stem are under positive selection. All this implies that the hypothesis of D-armless tRNAs is, according to us, improbable.
Concerning the putative ATR49 start codon, in GenBank, the number of complete mt-genomes found using the keywords previously mentioned was relatively low; moreover, in some cases, the upstream gene encoded a protein, specified a rRNA and/or there was only one mention for a given taxon. A significant example within Deuterostomia (frogs) is presented in Table 4. In the superfamily Hyloidea, the ATA49 triplet is frequently the first potential complete start codon at the level of the gene pair encoding and specifying NAD1 and tRNA-Leu2, respectively. In two families (Bufonidae, Hylidae), for all the sequences (16 belonging to 14 different species), the first ATR triplet found in frame in the ORF of the nd1 gene is ATA49. For four sequences belonging to three other frog families, the ATR49 triplet is missing from the trnL2gene; moreover, an ATA triplet is integrally present in the V-R of the trnL2 gene of Heleophryne regis, but it is not in frame with the following gene. For these last four cases, the authors of the sequences proposed alternative start codons. This seems obligatory, but this has not been experimentally verified. For several authors who have sequenced parts of mtDNAs of Hylidae, the nd1 gene would start at ATA49 for about 140 sequences (e.g., Roelants and Bossuyt ).
In the two studied taxa, Blast analyzes of the NCBI ESTs and SRA (SequenceRead Archive) databases have been performed, but no result supports the proposed hypotheses: transcripts starting at an ATR49 or terminating at a TAR10 were not found. However, for each taxon, few mt-transcripts occur, and fully matured transcripts are even rarer.
3.3. Why mt-ss-trn genes with TAG10 and ATR49 triplets as putative stop and start codons only occur in Metazoa?
Foremost, biases in the search strategy cannot be excluded, but the important point to note is that mt-genomes of animals, fungi, protists, and plants differ drastically in all major characteristics including gene content and large size variation. Generally, metazoans have ultra-compact mtDNAs (from c.10,000 to c.50,000 bp); usually, nonfunctional sequences are rapidly eliminated, and there are short intergenic regions and frequent overlaps . However, nonbilaterian mt-genomes have higher variation in size, gene content, shape, and genetic code . The mtDNA size range is from 30,000 to 90,000 bp in fungi, and generally, intergenic regions are relatively long. A broader range of mtDNA size is found in higher plants (from 0.2 × 106 to about 11.3 × 106 bp ), and the largest known mt-genome in this lineage exceeds sizes of reduced bacterial and nuclear genomes . The increased sizes of plant mtDNAs are mostly due to noncoding DNA sequences, large inserted nuclear regions, and many introns and not to a large increase in gene numbers. The nuclear-derived sequences amount to up nearly half of their size as in melon , and so presence of mt-trn genes with nuclear origin cannot be excluded. Although not directly correlated, intergenic distances are generally much higher in larger genomes, reducing the number of overlaps. In addition, the situation of plant mt-tRNAs is very complex. Indeed, they contain few “native” tRNAs expressed from true mt-trn genes. They possess “chloroplast-like” trn genes inserted into the mtDNA. They compensate the loss of mt-trn genes by importing several nucleus-encoded tRNAs . In addition, most often in plants, the standard code applies to the reading of organelle genomes, even if ATA is frequently used as start codon. Metazoan mt-genomes are generally small, very constrained and exhibit several gene overlaps between trn and protein-encoding genes or between trn genes. Their tRNAs have sequence and structural peculiarities and tend to shortening . Our exploration is not exhaustive, but this might explain the presence of putative stop or start codons specifically within mt-ss-trn genes of this taxon.
One may wonder why the hypotheses concerning the TAR10 and ATR49 triplets were not proposed before? At least, the presence of these characteristic triplets could have been observed by some authors but considered as having no connection with the translation of neighboring protein genes. Among the first sequenced and the most studied were mt-genomes of Homo sapiens (J01415) and Mus musculus (J01420). In these latter, no putative start or stop codon occurs at these positions within trn sequences adjacent to protein genes (data not shown).
3.4. Some known structures in the living world bearing a start or stop codon at least in part of a stem-loop structure
In the living world, many nucleic sequences with secondary structures playing a physiological role involving stop and/or start codons have been discovered. Some representative examples are briefly presented here. (1) The tropism switching of the bacteriophage BPP-1 is mediated by a phage-encoded diversity-generating retroelement, which introduces nt substitutions in a gene that specifies a host cell-binding protein (Mtd) . The nt substitutions are introduced in a variable repeat located at the 3′-end of this gene. Two nts after this region, the UAG stop codon is present, and its last nt is situated at the 5′ beginning of the 5′-stem of a hairpin. Both the UAG codon and hairpin are required for phage tropism switching. (2) Programmed translational bypassing is a process, whereby ribosomes “ignore” a substantial interval of mRNA sequence. In a bacteriophage T4 gene, bypassing requires translational blockage at a “takeoff codon” immediately upstream of the UAG stop codon, and both codons are in the 5′-stem of a hairpin; moreover, this region is mobile . (3) The operon flgFG of the bacterium Campylobacter jejuni can encode two genes (flgF and flgG). Its expression in E. coli produces a fusion protein probably due to ribosomal frameshifting (translational hopping) . The putative hop region contains, among others, a hairpin beginning by the last nt of the UAA stop codon of the first mRNA. The AUG start codon of the second gene is in the loop of the following hairpin. (4) In Eubacteria, riboswitches are regulatory segments of DNA or mRNA that can bind a small molecule (the effector), which repress or activate their cognate genes at transcriptional and/or translational levels. In the riboflavin and cob operons, conformational changes can form a stem loop which sequesters the translational start site, consisting of the Shine-Dalgarno (SD) sequence plus start codon thus preventing gene translation . (5) Bacterial transfer-messenger RNAs (tmRNAs) have dual TLS and mRNA-like properties. They rescue stalled ribosomes on mRNAs lacking proper translational stop signal; the tRNA-like structure acts first as an alanine-tRNA, and then the short mRNA reading frame is translated and the product is released . This trans-translation terminates at the stop codon terminating the tmRNA reading frame. This stop can be in a little loop or totally or partially integrated in the stem of a hairpin-like structure. In eukaryotes, structurally reduced tmRNAs (no mRNA-like domain) rarely occur in chloroplasts  and in mt-genomes (in Jakobids, presumably close to the most ancient living eukaryotes with bacterial-like mt-genome) . Moreover, tmRNA TLSs function even without any canonical initiation factors. These examples show that start or stop codons located in hairpin may have various functions, as we suggest for TAR10 and ATR49.
3.5. Multifunctionality of tRNAs
Ancient tRNAs probably had diverse functions in replication and proto-metabolism before protein translation  and modern tRNAs have also various functions in all the living organisms . These functions include cell wall synthesis, protein N-terminal modification, nutritional stress management, porphyrin biosynthesis (heme and chlorophyll), lipid remodeling, and initiation of retrovirus reverse transcription. Accumulating experimental evidence suggests also that they have important regulatory roles in translation, viral infections, and tumor development (reviewed in ). Mt-tRNAs interfere with a cytochrome c-mediated apoptotic pathway and promote cell survival  and function as replication origins . Moreover, nuclear-tRNA abundance and modifications are dynamically regulated, and tRNAs and their tRNA-derived RNA fragments (tRFs) are centrally involved in stress signaling and adaptive translation [47, 48]. This suggests that the choice of cleavage sites of mRNA transcripts with or not part of the neighboring ss-tRNA could be dynamic and also respond to environmental changes. Some of the noncanonical translation functions of tRNAs can also be driven or enhanced by their ability to adopt different complex three-dimensional structures, and these conformational changes can be linked to functional states . Moreover, the tRNA multifunctionality has also been considered to be, at least in part, random due to the high amount of tRNA species within the cell . In addition, the mt-trn genes represent natural pause sites for replication forks and could also prone double-strand breaks , and their role, as “punctuation signals,” for processing of mtDNA polycistronic transcripts has already been mentioned.
Enormous numbers of tRFs in all domains of life were found in the last decade . In the plant Arabidopsis thaliana, nucleus-, plastid-, and mt-encoded tRNAs can produce tRFs . The tRFs are not randomly degraded tRNAs. Experiments showed several functions including regulation of tumor development and viral infections . Degradations resulting from cleavages at TAR10 and ATR49 triplets could produce a conformation exhibiting two loops linked by a forked-stem structure, roughly resembling a pair of cherries, so called “cherry-bob” (Figure 2). Our hypothesis predicts this structure which however has never been observed .
3.6. Other putative roles of TAR10 and ATR49
Metazoan mt-genomes are believed optimized for rapid replication and transcription. Potentially, TAG10 and ATR49 make transcription/translation more complex but perhaps more efficient. Examples in the Section 3.2 (i.e., Eucestoda) suggest mt-overlaps appeared 100s millions of years (MY) ago, enabling co-evolution between protein-encoding genes and those specifying tRNA.
Overlaps involve numerous constraints for genes including sequence bias. Constraints are probably less stringent for trn genes, which can evolve rapidly because relatively standard secondary structure coupled with a specific anticodon might suffice for tRNA function . Incomplete cloverleaf structures may also be repaired post-transcriptionally .
Alternative processing might be possible for the production of either a supposed complete mRNA or a complete tRNA. In the first case, the synthesis of new complete mRNAs could be promoted by high mitosolic tRNA numbers. Moreover, amino acid starvation can regulate mt-tRNA levels . However, if mt-tRNAs already present are not destroyed, translation would not immediately stop because mt-tRNA half-life which is lower than that of their cytosolic counterparts can nevertheless exceed 10 h . Moreover, aberrant mt-tRNAs can be corrected by RNA editing during or after transcription, and this process appeared independently several times in a wide variety of eukaryotes . As an extreme example, due to large overlaps between trn genes, up to 34 nts are added post-transcriptionally during the editing process to the mt-tRNA sequences encoded in an onychophora species, rebuilding the acceptor-stem, the T-arm, and in some extreme cases, the V-R and even a part of the anticodon-stem . In that species, several edition types must be combined, including template-dependent editing . This last example suggests that complete tRNA could be restored after a cleavage just upstream of ATR49. However, edition of parts of the 5′-end of tRNAs seems more problematic. Besides, mRNAs with upstream or downstream ss-tRNA can form a partially double strand region with a homologous ss-tRNA at the level of the acceptor-stem. This might induce mRNA degradation via antisense mechanisms. In bacteria, uncharged tRNAs cause antisense RNA inhibition , and small interfering cytosolic tRNA-derived RNAs exist . Modifications (methylation, edition, etc.) of incomplete tRNAs generated after cleavages of polycistronic transcripts at TAR10 or ATR49 triplets would indicate regulatory functions.
Putative use of TAR10 or ATR49 triplets affects protein length. When in frame, this could generate a protein at least 3 or 9 amino acids longer, respectively. Extension length depends on positions of upstream stop codons completed by polyadenylation and/or on downstream (alternative) initiator codons. Not only complete proteins may be functional. Depending on cleavage positions in polycistronic transcripts, consequences may be neutral, disadvantageous, or favorable in specific contexts. In yeast, extended proteins can increase fitness under stress conditions . In addition, in bacteria and in organelles, alternative initiation codons decrease efficiency , and it must be noted that ATR49 triplets are “canonical” start codons.
In other conditions, incomplete mRNAs could be favored. Mitosolic mRNA accumulations can be due to lack of translation because of tRNA paucity. Thus, high mRNA levels might indirectly promote cleavage of entire tRNA transcripts while reducing the synthesis of new functional mRNAs and favoring translation of those which are already present into proteins. Presence/absence of hairpins involving stop or start codons might regulate translation. This regulation could involve proteins that stabilize the hairpins or posttranscriptional modifications. Moreover, translational products of “incomplete” mRNAs might have housekeeping functions.
Regulation of alternative processing producing either complete tRNAs or complete mRNAs requires elucidation. Factors, probably proteins, need characterization. Note that metazoan mt-atp8 and atp6 genes overlap (mainly by 10 bp in vertebrates) and are transcribed as joint bicistronic transcript . This proven overlap is inherent to mt-metabolism. Hence, similar overlaps assumed for TAR10 triplets are plausible.
Overlap conservation might reflect the need to produce bicistronic transcripts (5′-tRNA-mRNA-3′ or 5′-mRNA-tRNA-3′) or functional constraints at protein level (i.e., preserving specific amino acid patterns upstream or downstream the ORF). When overlap regions have conserved, amino acid sequences at the protein N- or C-terminal functional constraints at protein level for overlaps are probable . In viruses, mutation rates are low in DNA regions coding for multiple protein products in separate reading frames (called overprinted genes) because point mutations compatible with functional products from all frames are rare. In these regions, the frame is said “close off.” Partial overlap between protein-encoding genes and ss-trn genes would present similar situations explaining greater conservation of extremities of protein and tRNA sequences when the corresponding genes overlap. This lock almost only concerns the ss-tRNA’s “top half,” limiting changes in the region interacting with many processing enzymes. The ss-trn genes could also regulate translation upstream, bicistronic mRNA/ss-tRNA transcripts could be more stable, and likewise, ss-trn genes could also play roles in replication and transcription.
3.7. Methylation of trn genes and tRNAs and their possible roles in transcription and translation
Methylation is much rarer in mt- than nuclear-DNA . However, these might occur at trn genes (particularly around TAR10 and ATR49) and might have deleterious consequences especially because differential mtDNA methylations are linked to aging and diseases (including diabetes and cancers) . Methylation of nts of UAR10 and AUR49 is known as those of A9 and G10 which can be important for correct tRNA foldings . We are unaware whether posttranscriptional modifications occur on bicistronic mt-transcripts containing complete or partial tRNAs. This would be worth investigating including possible consequences on maturation and translation.
3.8. Reassignments of codons and ss-tRNA
Several codon-amino acid reassignments are known, mainly from mitochondria [62, 63]. In 11 different mt-genetic codes, UGA stops code for tryptophan and AUA codes for methionine instead of isoleucine in 8 and 5 mt-genetic codes, respectively . Both reassignments avoid potential errors along traditional wobble rules. Reassigning UGA-stop to UGA-Trp fits the “capture” hypothesis, and UGA codons mutate first to synonymous UAA codon in AT-rich mt-genomes. Then, UGA reappears occasionally by mutations, free for “capture” by an amino acid, like Trp . AUA is frequently used as alternative initiation codon. Its reassignment to internal sense Met codon could also have evolved in AT-rich genomes. Moreover, the standard genetic code assigns six codons to arginine, whereas two would fit arginine’s relatively low frequency in current proteins . In 8 out of 11 mt-codes, different strategies reduce Arg codons to four, AGR reassignments to other amino acids (in six genetic codes), lack of two Arg codons (CGA and CGC yeast mt-code), and AGR as terminators in vertebrates. These AGR codons were believed mt-stop codons since early vertebrate evolution . However, at least in humans, AGRs are not recognized terminators , suggesting that AGRs have no assignment. Hence, the vertebrate mt-genetic code could be the most optimized known genetic code (that of yeast was not retained because four Leu codons were reassigned to Thr). Characteristics of the nt triplets at the position 8–10 and ending at position 49 should be analyzed for each mt-genetic code.
3.9. Origin of the cloverleaf structure of tRNA and ss-tRNA
Various models could explain tRNA origins (see reviews [68, 69, 70]). The modern tRNA cloverleaf structure might result from direct duplication of primordial RNA hairpins (e.g., ). However, studies lend strong support to the “two halves” hypothesis , in which tRNAs consist of two coaxially stacked helices with presumed independent structural and functional domains. These correspond to the “top half” containing the acceptor-stem and the T-arm and the “bottom half” with the D-arm and anticodon-arm (Figure 1). The 2D representation of the latter corresponds to the “cherry-bob” structure (Figure 2). The “top half” of modern tRNA embeds the “operational code” in the identity elements of the acceptor-stem that interacts with the catalytic domain of specific aaRSs and is recognized by RNases P and Z and the CCA-adding enzyme (therefore mainly RNA end processing reactions) [70, 71]. This domain also interacts with translation elongation factor Tu and one rRNA subunit . The importance of this domain in most macromolecular interactions involving tRNAs (including in vitro even when it is detached from the “bottom half”) suggests that these half’s specificities were established before the tRNA’s “bottom half,” presumably incorporated later . Growing evidence for tRNA elements involved in both RNA and DNA replication with the 3′-end playing a determinant role has led to the idea that the “top half” initially evolved for replication in the RNA world before the advent of protein synthesis . The supposed evolutionarily recent tRNA “bottom half” provides genetic code specificity. This suggests late implementation of the standard genetic code and late appearance of interactions between the tRNA “bottom half” and ribosomes . Whether the “bottom half” derived from a loop or extra loop belonging to the “top half” or was an independent structural and functional domain that was subsequently incorporated into the “top half” remains unresolved . Some authors suggest independent evolutionary origins [71, 72].
The study of ss-tRNAs suggests a model partially explaining canonical tRNA origins (Figure 3). The DNA region specifying the “bottom half” would be integrated in a sequence that can specify the “top half” but at the junction between the parts corresponding to the 3′-end of the 5′-acceptor stem and the 5′-end of the 5′-T-stem.
On the other hand, the “bottom half”/“cherry bob” structure could also be integrated at RNA level, either in the RNA world by intermolecular RNA-RNA recombination or template switches or later with retrotranscription events. Fujishima and Kanai  also proposed an equivalent model where a long hairpin corresponding to about the “top half” region merged with a viral RNA element corresponding to the “bottom half” to give the TLS found in modern viral genomes (who however possessed a pseudoknotted acceptor-stem). Besides, rare pre-tRNA molecules from the three domains of life exhibit an intron. The intron’s origin is debated. The “introns-early” scenario assumes most of them were lost during evolution, and the opposite scenario theorizes that introns were inserted into some trn genes after their emergence . To date, our hypothesis would rather favor the second scenario, even though it could be considered that the “cherry bob” structure could be an ancestral intron becoming unspliceable.
In tRNAs, the two first nts of both UAR10 and AUR49 belong to connector 1 and 2, respectively. They are thus at the junction between the top and bottom halves and are very close physically in the 3D structure (Figure 1). The belonging of some of the nts of the TAR10 and ATR49 triplets to either of the two parts is not discussed here because the theoretical model of Figure 3 is applicable independently of “bottom half” extremities. However, as the V-R is important for aminoacylation , ATR49 triplets could rather integrally belong to the “top half.” The tRNA L-shape is stabilized by various tertiary interactions of the V-R with the D-arm and between the D- and T-loops. Nucleotides of the connectors form contacts with the D-arm, and in some tRNAs, the G10 can establish potential tertiary interactions with a nt of the V-R upstream the putative start codon . At least in cytosolic tRNAs, frequently U8 and sometimes U48 form noncanonical pairs. Moreover, generally, base pair 15–48 is more conserved in mt-tRNAs than 8–14, and this is probably due to the fundamental role played by the first in maintaining the tRNA L-shape . UAR10 and AUR49 had to play first only a role in the L-shaped tertiary structure of tRNAs, and their implication as codons, if it exists, would be only a derived character. It was hypothesized that DNA punctuation evolved from 2D structures signaling polymerization initiation, termination, and/or processing to linear sequence motifs, which further evolved to translational signals . In ss-tRNA, UAR10 triplet probably already plays a structural role in proto-tRNAs, whereas AUR49 would have appeared only during the evolution of organelle tRNAs and was related to L-shaped tertiary structures of organelle tRNAs and due to severe genome reduction and extreme base compositions. The opposite hypothesis would imply that the AUR49 triplet would have been a plesiomorphic character counter-selected in large genomes but kept in certain bacterial genomes up to mt-ancestors.
3.10. tRNAs at the origin of all the nucleic members of the RNA/protein world
Some authors have hypothesized that tRNAs may be the precursors of mRNAs, rRNAs (and therefore proto-ribosomes), and also of the first genomes. Several suggested similar origins for tRNA and rRNA . Analyzes of sequences and secondary structures of ribosome suggested that the ribosomal peptidyl transferase center (PTC) which forms peptide bonds between adjacent amino acids originates from fused proto-tRNAs . Strikingly, the ribosome is a ribozyme, since only RNA catalyzes peptide bond formation . Otherwise, current eubacterial rRNAs themselves could encode several tRNAs  and chaetognath 16S rRNA genes appear as tRNA nurseries  (or the opposite). Eubacterial 5S rRNAs contain TLSs similar to alanine and arginine tRNAs , exhibiting tRNA-like 2D structures . Some suggest that rRNAs are fused tRNA molecules .
Molecular biology dogmatically assumes that “tRNA genes are of course entirely noncoding” . But in 1981, Eigen and Winkler-Oswatitsch suggested that in the RNA world to the RNA/protein world transition, ancestral tRNAs were mRNAs . Assuming that the first mRNAs had been recruited from proto-tRNAs, it follows that TLSs were inside viral and cellular mRNAs . Self-recognition between tRNA-like mRNAs and canonical cloverleaf tRNAs could stabilize these molecules and produce proto-proteins . The first proteins potentially emerged from junctions of ancestral tRNAs, and among the modern proteins, the only polymerase which matched with tRNAs translated like a mRNA was the RNA-dependent RNA polymerase . Otherwise, eubacterial rRNAs could also encode several active sites of key proteins involved in the translation machinery . Then, analyzes of sequences and secondary structures of ribosomes suggested that these derived from tRNAs also functioned as a protogenome . The very parsimonious syncretic model “tRNA core hypothesis” assumes that some proto-tRNAs were classical tRNAs and also functioned as rRNAs and mRNAs, a self-recognition between these molecules allowed to obtain proto-proteins .
Assuming that the ATR49 triplets are a primitive character lost during the first genome expansions and that they could already act as an initiation codon seems too speculative, but RNA structures having characteristics of ss-tRNAs could have accumulated many advantages in the RNA/protein world. Structures with both start and stop codons partially in a stem-loop (as ss-tRNA), constituting basic signals for translation, could be a missing link of the RNA world hypothesis. Furthermore, in these proto-tRNAs, 3D structures could act as initiation and termination signals before the emergence of standard codons. Moreover, mRNAs in the form of ss-tRNA or a combination of several of these molecules would have been relatively stable. The cloverleaf structure could facilitate its entry into the PTC, and then interactions with other factors could allow a short region to be in linear form and thus could be read. Upstream and downstream of the linear region, the arrangement in hairpins protected the proto-mRNA from degradation during its reading, and as soon as a long enough region was read, it could take again its original 3D structure. Otherwise, circular proto-mRNAs derived from ss-tRNA-like molecules could not be excluded, although the hypothesis of circular tRNA-like ancestor (“proto-tRNA”) was first proposed by Ohnishi in 1990 . Furthermore, nuclear-encoded mt-tRNAs of Kinetoplastid protists are imported into the mitochondrion, and circularized mature tRNA molecules are produced probably by mt-endogenous RNA ligase activity (in vivo or during mt-isolation) . Moreover, in red and green algae and possibly in one Archaea, the maturation of permuted trn genes, in which the sequences encoding the 5′-half and 3′-half of the specific tRNA are separated and inverted on the genome, needs the formation of a characteristic circular RNA intermediate which after cleavage at the acceptor-stem generates the typical cloverleaf structure with functional termini . If in a ss-tRNA with a T-loop of 7 nts, the nt72 is ligated to the nt1; this creates a small ORF starting with a start codon (AUR49), which potentially codes for a peptide of 12 amino acids if UAR10 is used as stop codon. However, the circularization could be done elsewhere than at levels of nts 72 and 1. Thus, UAR10 would not be in frame, and therefore, this could allow the synthesis of smaller or longer peptides. To date, the formation of this type of structure and its translation remains hypothetical; however, experimental data shown that circular RNAs can be translated in prokaryotic and eukaryotic systems in the absence of any particular element for internal ribosome entry as SD sequence, poly-A tail, or cap structure . Therefore, the evolutionary advantage of a circular proto-mRNA is also posited to be the simplicity of its replication mechanism and not be able to be degraded by the extremities that do not have one.
Besides, the fusion of tRNA-like mRNA and a classical tRNA could be at the origin of the ancestors of tmRNAs, and it can be mentioned just for guidance that the size of the tag peptide encoded by bacteria is of the same order of magnitude as those corresponding to putative translation of a ss-tRNA from the ATR49 triplet. Moreover, evolution of self-charging proto-tRNAs may also be selected , it has even been proposed that the activity of the juxtaposed 2′/3′-OHs of the tRNA A76 ribose qualifies tRNA as a ribozyme  and some RNAs (the early tRNA adaptor) must have had the ability to undergo 3′-aminoacylation. It has also been previously shown that many hairpin-structured RNAs bear ribozyme activity. These catalyze self-cleavage and ligation reactions . In addition, it remains possible that circular ss-tRNAs with amino acid-anchored structure could be at the origins of tmRNAs. Indeed, two-piece bacterial tmRNAs (e.g., in α-proteobacteria) are encoded by a circularly permuted gene sequence implying that pre-tmRNA is processed, and that the two pieces are held together by noncovalent interactions. Moreover, in line with an α-proteobacterial origin of mitochondria, probable mt-encoded circular permuted tmRNA genes have been found in the oomycete (water mold) Phytophthora sojae and in the jakobid Reclinomonas americana . A proto-trnA gene could be at the origin of modern tmRNAs . Metazoan mt-trnA genes combine the highest levels of TAR10 and ATR49 triplets (>95% for each), but in the prokaryotic world, if the rate of TAG10 is always higher than 91%, only one ATR49 occurs in Eubacteria and none in Archaea.
Studies strongly suggest that the tRNA cloverleaf structure unfolded prior to the appearance of a fully functional ribosomal core, making it one of the most ancient RNAs of the RNA world [70, 97] or even the oldest . Though the “RNA-world” hypothesis is well accepted, the successive events leading to the emergence of different partners playing a role in translation and the involvement of tRNAs in this evolution are highly controversial coveted field . However, some hypotheses as the “tRNA core”  strongly suggest that tRNAs would be at the origin of the primitive genetic material and gave rise to mRNA and rRNA, as well as the conformational structure of the first proto-ribozymes. The base module being a pleiofunctional RNA that can adopt the cloverleaf structure is found today in various sequences without direct link with translation. One may conclude that “one should not change a winning secondary structure.” In a precellular context, a molecule with ss-tRNA characteristics (small ORF associated with cloverleaf structure) would be advantageous. Putatively, ss-tRNA-like molecules cumulating both tRNA and mRNA functions would have been the first molecules on Earth to support nonrandom protein synthesis.
The antiquity of ss-tRNAs can be discussed, and it is very likely that the TAR10 (and especially TAG) triplets played very early a critical role in the tertiary folding of some tRNAs. Their implication in translation termination would be an exaptation where firstly, they were part of a structural signal. Origin of ATR49 triplets is less clear perhaps tracing to the first endosymbiosis. Hence it would be apomorphic (derived character). Analyzes by taxa and tRNA species suggest a nonhomogeneous evolution. At the beginning of the RNA/protein world, it has quickly become essential to start peptide synthesis at particular codons and one cannot exclude that ATR49 was an ancestral state which would have not been retained as intergenic spaces increased. Analyzes of known tRNAs of α-proteobacteria and cyanobacteria could suggest that in organelles, ATR49 triplets would have been selected with genome reduction. Organelle genomes may be under increased pressure for size reduction with resulting overlaps (see, ). However, several features strongly suggest that overlapping genes are not a direct mechanism to substantially reduce genome size. Gene overlaps allow mtDNA genome compaction while avoiding the loss of tRNA genes . Nevertheless, overlaps may allow a more efficient control in the regulation of gene expression, the regulatory pathways are simplified, and the number of proteins (and genes) required decreases . Among others, short antiparallel overlaps may be involved in antisense regulatory mechanisms. Consequently, genomes with compact sizes enable putatively less flexible but more efficient physiologies.
The selection of tRNAs had to be done mainly on two seemingly opposite criteria, stability and plasticity, making it a kind of Swiss army knife of the RNA world. This explains that beyond their central role in protein synthesis, tRNAs have many other crucial functions. To date, it can be hypothesized that ss-tRNAs might regulate gene expression, stress responses, and metabolic processes. Indeed, in silico analyzes allowed to speculate that several overlapping sequences may code simultaneously for mRNAs and tRNAs in most of the metazoan mt-genomes. These overlaps can have a variable (sometimes large) number of nts; however, when annotating their genomes, several authors voluntarily underestimated the number and the size of overlaps, speculating that there would be upstream abbreviated stop codons or downstream alternative start codons but most often without any direct demonstration so far. However, the high number of possible overlaps on the same strand in which the first in-frame complete stop codon or standard start codon are located at specific positions in the sequences of trn genes (TAR10 and ATR49, respectively) strongly suggest an exclusive relationship between obtaining tRNAs and translation of mRNAs and/or the development of repair system to keep the two genes functional due in some cases to co-evolution during several hundred MY. We can therefore speculate that ss-trn genes could allow true tRNA punctuation and initiation. Noted that ss-tRNAs seem to be hybrid molecules which would contain three essential coding or decoding informations in the form of nt triplets (i.e., anticodon and stop/start codons) which are all at least in part integrated into stem or loop; moreover, after the ATR49, nt triplets play the role of internal sense codons. To date, it is unclear what biochemical mechanism would allow to choose between different alternate cleavage sites, leading to the complete tRNA rather than to the mRNA or vice versa, but reduced/expanded proteins can be functional, and various processes including editing suggest this also for incomplete tRNAs. Hence, despite lacking experimental evidence, TAR10 and ATR49 triplets have probable roles, including regulation. Future analyzes of the processed bicistronic transcripts (tRNA/protein-encoding or the contrary) are required. Moreover, even if mt-trn genes are most often expressed at very low levels , only direct sequencing of tRNAs can validate transcription, epitranscriptomic maturation and can pinpoint nt modifications including post-transcriptionally edited positions. Purified native, or even synthetic, tRNAs should also be tested for their in vitro activity to confirm the functionality of aberrant transcripts. Similar experiments must be made on the flanking mRNAs and their products. If as we think, ss-tRNAs could play regulatory roles, initially experiments should compare stress and nonstress conditions.
Here, the bias for metazoan mtDNA does not allow for a complete picture of variation in the entire eukaryotic world, and protist mt-genomes should also be considered. Special attention should also be paid to noncanonical base pairings potentially formed by UAR10 and AUR49 nts, in perspective with tRNA structure and V-R length. Accounting for TAR10 and ATR49 triplet presences in the algorithms predicting tRNAs could improve mt-genome annotations, reducing numbers of false positives and negatives, and more accurately determine tRNA termini while accounting tRNA species, taxa, and genomic systems.
MtDNA plays a central role in apoptosis, aging, and cancer . Moreover, mt-diseases are among the most common inherited metabolic and neurological disorders . In addition, as new functions and new mechanisms of action of tRNAs are continuously discovered  and as ss-trn genes could affect the cellular dynamic during normal and stress conditions leading to pathologies, potential subtleties of action and regulation of these genes and products should be more thoroughly investigated.
Conflict of interest
The authors declare no potential commercial or financial conflicts of interest.