Emergence of the Diversified Short ORFeome by Mass Spectrometry-Based Proteomics

Hiroko Ao-Kondo; Hiroko Kozuka-Hata; Masaaki Oyama

doi:10.5772/19433

Author Information

Show +

Hiroko Ao-Kondo*
- Medical Proteomics Laboratory, Institute of Medical Science, University of Tokyo, Japan
Hiroko Kozuka-Hata
- Medical Proteomics Laboratory, Institute of Medical Science, University of Tokyo, Japan
Masaaki Oyama
- Medical Proteomics Laboratory, Institute of Medical Science, University of Tokyo, Japan

*Address all correspondence to:

1. Introduction

In proteomics analyses,protein identificationby mass spectrometry (MS) is usually performed usingprotein sequence databasessuch as RefSeq (NCBI; http://www.ncbi.nlm.nih.gov/RefSeq/), UniProt (http://www.uniprot.org/) or IPI (http://www.ebi.ac.uk/IPI/IPIhelp.html). Because these databases usuallytarget the longest (main) open reading frame (ORF) in the corresponding mRNAsequence, whether shorter ORFs on the same mRNA are actually translated still shrouds in mystery. In the first place,it had been considered that almost all eukaryotic mRNAs contains only one ORF and functions as monocistronic mRNAs.It is now known, however, that some eukaryotic mRNAshad multiple ORFs, which are recognized as polycistronic mRNAs.One of the well-known extra ORFs is an upstream ORF (uORF) and it functions as regulators of mRNA translation (Diba et al., 2001; Geballe & Morris, 1994; Morris & Geballe, 2000; Vilela & McCarthy, 2003; Zhang & Dietrich, 2005). For getting clues to the mystery of diversified short ORFs,full-length mRNA sequence databases with complete 5‘-untranslated regions (5‘-UTRs) were essentially needed (Morris & Geballe, 2000; Suzuki et al., 2001).

The oligo-capping method was developed to construct full-length cDNA libraries (Maruyama & Sugano, 1994) and the corresponding sequence were stored into the database called DBTSS (DataBase of Transcriptional Start Site; http://dbtss.hgc.jp/) (Suzuki et al., 1997, 2002, 2004; Tsuchihara et al., 2009; Wakaguri et al., 2008; Yamashita et al., 2006). Comparing the dataset of DBTSS with the corresponding RefSeq entries, it was found that about 50 % of the RefSeq entries had at least one upstream ATG (uATG) except the functional ATG initiator codon (Yamashita et al., 2003). Although it had been suggested that upstream AUGs (uAUGs) and uORFs play important roles for translation of the main ORF, none of the proteins from these uORFs was detected in biological experiments in vivo. Our previous proteomics analysis focused on small proteins revealed the first evidence of the existence of four novel small proteins translated from uORFs in vivo using highly sensitive nanoflow liquid chromatography (LC) coupled with the electrospray ionization-tandem mass spectrometry (ESI-MS/MS) system (Oyama et al., 2004). Large-scale analysis based on in-depth separation by two-dimensional LC also led to the identification of additional eight novel small proteins not only from uORFs but also from downstream ORFs and one of them was found to be translated from a non-AUG initiator codon (Oyama et al., 2007). Finding of these novel small proteins indicate the possibility of diversecontrol mechanisms of translation initiation.

In this chapter, we ﬁrstintroducewidely-recognized mechanism of translation initiation and functional roles of uORF in translational regulation. We then review how we identified novel small proteins with MS and lastly discuss the progress of bioinformatical analyses forelucidatingthe diversification of short coding regions defined by the transcriptome.

2. Translational regulation by short ORFs

It is well known that 5‘-UTRs of some mRNAs contain functionalelements for translational regulationdefined by uAUG and uORF. In this section, we show howuAUG and uORF have biological consequences for protein synthesison eukaryotic mRNAs.

2.1. Outlineof translation initiation

Initiation of translation on eukaryotic mRNAs occurs roughly as follows (Fig. 1) (Kozak, 1989, 1991, 1999).

A small (40S) ribosomal subunit binds near the 5‘-end of mRNA, i.e. the cap structure.
The 40S subunit migrates linearly downstream of the 5‘-UTR until it encounters the optimum AUG initiator codon.
A large (60S) ribosomal subunit joins the paused 40S subunit.
The complete ribosomal complex (40S + 60S) starts protein synthesis.

Figure 1.
The proposed procedure for initiation of translation in eukaryotes.The black region indicates the main ORF of the mRNA.

In addition to the above mechanism, initiation of translation without the step of ribosome scanning is also known. It is called “internal initiation”, which depends on some particular structure on an mRNA termedinternal ribosome entry site (IRES).

2.2. Therelationship between uORF and main ORF

In case that an mRNA contains a uORF, two models for the initiation of translation are suggested(Fig. 2) (Hatzigeorgiou, 2002). One is called ”leaky scanning” and the other is ”reinitiation”. If the first AUG codon is in an unfavorable sequence context defined by Kozak (see the section 3.2), a small ribosomal subunit(40S) ignores the ﬁrstAUG and initiates translation froma more favorable AUG codondownstream located. This phenomenon is known as ”leaky scanning”(Fig. 2-(A)). In case that a complete ribosomal complex translates a main ORF after termination of translation of the uORF on the same mRNA, itis termed”reinitiation” (Fig. 2-(B)).

Figure 2.
The irregular models of ribosome scanning on eukaryotic mRNAs.(A) Leaky scanning and (B) Reinitiation.Gray regions indicate uORFs on the mRNA, whereas black ones represent the main ORFs.

Therelations between two ORFs are classified into three types as follows; (1) A distant type; in-frame/out-of-frame, (2) A contiguoustype; in-frame and (3) An overlappedtype; in-frame/out-of-frame (Fig. 3).In-frame means that a uORF and the main ORF are on the same frame of the mRNA sequence, whereas out-of-frame meansthat they are on the different frame. According to the previous analysis of the accumulated 5‘-end sequence data, the average size of uORF was estimated at 31 amino acids and 20% of ORFs were categorized into Type (3) (Yamashita et al., 2003).

Figure 3.
The location of a uORF and the main ORF on the mRNA.(1) A distant type, (2) A contiguoustype and (3) An overlappedtype.Types (1) and (3) have two subtypes based on the frames of two ORFs. One is defined by the same reading frame (in-frame) and the other is by the different one (out-of-frame). Gray and black regions indicate uORFs and the main ORFs on mRNAs, respectively, whereas a blue one represents an overlap.

These different relations might bring about different eventsin initiatingtranslation. In eukaryotes, it hasa tendency to increase an efficiency of reinitiation if the distance betweena uORF and the main ORF is long (Kozak, 1991;Meijer & Thomas, 2002; Morris & Geballe, 2000). Therefore, the ORFs classified as Types (2) and (3) would be difficult to be regulated by reinitiation. It is also said that reinitiation occurs only when the length of uORF is short (Kozak, 1991), whereas the sequence context of an inter-ORF‘s region, that of upstream of uORF, uORF itself and even the main ORF can also affect reinitiation (Morris & Geballe, 2000). On the contrary, the ORFs of Type (3) might easily cause leaky scanning(Geballe & Morris, 1994; Yamashita et al., 2003). As a special case, when a termination codon of the uORF is nearthe AUG initiator codon of the downstream ORF, withinabout 50 nucleotides, ribosomes could scan backwards and reinitiate translation from the AUG codon of the downstream ORF (Peabody et al., 1986).

2.3. The role of short ORFs in translation regulation

The 5‘-UTR elements such as uAUGs and uORFs are well known as important regulators for translation initiation. In case of some genes that have multiple uORFs, considerablydifferent effects can be generated on the translation of the main ORF depending on which combination of uORFs istranslated. Some uORFsseem to promotereinitiation of the main ORFs andthe others seem to inhibit it. It is supposed that these effects arecaused by the nucleotide sequences of the 3‘ ends of the uORFs, that of uORFs or protein products encoded by uORFs. Suchdifferential enhancement of translationare considered to be one ofthe responsesof adaptation to the environment (Altmann&Trachsel, 1993; Diba et al., 2001; Geballe & Morris, 1994; Hatzigeorgiou, 2002; Iacono et al., 2005; Meijer & Thomas, 2002;Morris & Geballe, 2000; Vilela & McCarthy, 2003; Wang & Rothnagel, 2004; Zhang & Dietrich, 2005). In addition to that,variousfactors or events are known to influence onthe translational inhibition of the main ORF; the presence of arginine, a stalling of a ribosomal complex at the termination or an interaction between a ribosomal complex and the peptide encoded by the uORF, which indicates that down-regulated controlsby uORFs are general (Diba et al., 2001; Geballe & Morris, 1994; Iacono et al., 2005;Meijer & Thomas, 2002; Morris & Geballe, 2000; Vilela & McCarthy, 2003; Zhang & Dietrich, 2005).

As for downstream ORFs, there is also a report that a peptide encoded in the 3‘-UTR may be expressed (Rastinejad & Blau, 1993). However, whether and how the peptides control the translation initiation of the main ORF is still unknown.

3. Variability of translation start sites

How a ribosomal complex (40S + 60S) recognizes an initiator codon on the mRNA is a matter of vital importance fordefining the proteome. Here we presenta part of already proposed elements for regulation of translation initiation.

3.1. The first-AUG rule

Traditionally, the first-AUG rule iswidely recognized for initiation of translation (Kozak, 1987, 1989, 1991). It states that ribosomes start translation from the first-AUG on the corresponding mRNA. Although this rule is not absolute, 90-95% of vertebrate ORFs was established by the first AUG codon on the mRNA (Kozak, 1987, 1989, 1991). Our previous proteomics analysis of small proteins also indicated that about 84% of proteinsinRefSeq were translated from the first AUG of the corresponding mRNAs (Oyama et al., 2004). On the other hand, there are also many negative reports concerningthe rule;29% of cDNA contained at least one ATG codon in their 5‘-UTR (Suzuki et al., 2000); 41% of transcriptshad more than one uAUG and24% of genes had more than two uAUGs(Peri & Pandey, 2001); about 50% of the RefSeq entries had at least one uAUG (Yamashita et al., 2003); about 44% of 5‘-UTRs had uAUGs and uORFs(Iacono et al., 2005). There are also some reports that the first AUG is skipped if it is too close to the cap structure, within 12 (Kozak, 1991) to 14 (Sedman et al., 1990) nucleotides(see the section 3.3). In this chapter, we cited a variety of statistical data on the UTRs. Because they are based on different versions or generations of sequence databases, the data vary widely (Meijer & Thomas, 2002), which is the point to be properly considered.

3.2. Kozak’s consensus sequence

The strongest bias for initiation of translation in vertebrates is the sequence context called“Kozak’s sequence”, known as GCCA/GCCATGG(Kozak, 1987). The nucleotides in positions -3 (A or G) and +4 (G) are highly conserved andgreatly effective for a ribosomal complex to start translation (Kozak, 1987, 2002; Matsui et al., 2007; Suzuki et al., 2001; Wang & Rothnagel, 2004).The context of an AUG codon in position -3 is the most highly conserved and functionally the most important; it is regarded as strong or optimal only when this position matches A or G, and that in position +4 is also highly conserved (Kozak, 2002). Some reports mentioned that only 0.86% (Kozak, 1987) to 6% (Iacono et al., 2005) of functional initiator codons lacked Kozak’ssequence in positions -3 and +4,whereas 37% (Suzuki et al., 2000) to 46% (Kozak, 1987) of uATGswould be skipped because of unfavorable Kozak’ssequencein both of the positions. On the contrary,another report mentioned that most initiator codons were not in close agreement with Kozak’sconsensus sequence (Peri & Pandey, 2001).

3.3. The length of the 5'-UTR

The length of 5'-UTR is also effective when translation occursfrom an AUG codon near the 5’ end of the mRNA (Kozak, 1991; Sedman et al., 1990).About half of ribosomes skip an AUG codon even in an optimal context if the length of 5‘-UTR is less than 12 nucleotides (mentioned in the section 3.1) and this type of leaky scanning can be reduced if the length of 5‘-UTR is more than or equal to 20 nucleotides (Kozak, 1991).In the traditional analysis based on incomplete 5‘-UTR sequences,the distance from the 5' end to the AUG initiator codon in vertebrate mRNAs was generally from 20 and 100nucleotides (Kozak, 1987). The previous analysis using RefSeq human mRNA sequences indicated that 85% of 5‘-UTR sequences less than 100 nucleotides contain no uAUGs(Peri & Pandey, 2001). The evidence convincedusthat the first-AUG rule was widely supported in eukaryotes. In the recent analysis based on full-length 5‘-UTR sequences, it is 125 nucleotideslongon average (Suzuki et al., 2000)andtranscriptional start sites (TSSs) vary widely (Carninci et al., 2006; Kimura et al., 2006; Suzuki et al., 2001). The average scattered length of5'-UTR was more than 61.7 nucleotides, with a standard deviation of 19.5nucleotides(Suzuki et al., 2001) and 52 % of the human RefSeq genes contained 3.1 TSS clusters on average (Kimura et al., 2006), which has an over 500 nucleotides interval (Fig. 4).In protein-coding genes, differentially regulated alternative TSSs are common (Carninci et al., 2006). Because the diversity of transcriptioninitiation greatly affects the length of the 5'-UTR, there remainsome doubtswhether thelength of the 5'-UTRcontributesto the efficiency of translation initiation.There is also a report that the degree of leaky scanning is not affected by the length of 5‘-UTR (Wang & Rothnagel, 2004).

Figure 4.
The schematic representation of the 5‘ends of the TSSs.Each TSScluster consists of at least one TSS and has an over 500 nucleotides interval.

3.4. non-AUG initiator codon

In the general translation model, a non-AUG codon is considered to be ignored by ribosomes unless a downstream AUG codon is in a relatively weak context (Geballe & Morris, 1994; Kozak, 1999). In case that an upstream non-AUG codon, such as ACG, CUG or GUG, satisfies Kozak’s consensus sequence, it possibly functions as an initiator of translation in addition to the first AUG initiator codon(Kozak, 1999, 2002). Besides Kozak’s consensus sequence, downstreamstem-and-loop and highly structured GC-rich context in the 5‘-UTRcould enhance translation initiation from a non-AUG codon(Kozak, 1991, 2002).

4. Protein identification by MS

The recent progress of proteomic methodologies based on highly sensitive liquid chromatography-tandem mass spectrometry (LC-MS/MS) technology have enabled us to identify hundreds or thousands of proteins in a single analysis.

Wesucceededinthe discovery of novel small proteins translated from short ORFs using direct nanoflow LC-MS/MS system (Oyama et al., 2004, 2007). Among54 proteins less than 100 amino acids that were identified by retrieving several sequence databases with a representative search engine, Mascot (Matrix Science; http://www.matrixscience.com/), four ones wereturned out to be encoded in 5‘-UTRs (Oyama et al., 2004). This showed the first direct evidence of peptide products from the uORFs actually translated in human cells. In the subsequent analysis using more sophisticated two-dimensional LC system, we also discovered eight novel small proteins (Oyama et al., 2007), five of which were encoded in the 5‘-UTR and three were encoded in the 3‘-UTR of the corresponding mRNA. Even based on the accumulated DBTSS data, two ORFs had no putative AUG codon, which indicated the possibility that they were translated fromnon-AUG initiator codon. In the article above, 197 proteins less than 20 kDa were identified by Mascot. Theprocedurefor identifying novel proteins by MS is describedas follows.

Figure 5.
The procedure for preparing samples for proteomic analyses of small proteins.

4.1. Materials and methods

The proteins included in cultured cell lysates were first separated according to their size. Small protein-enriched fraction through acid extraction and SDS-PAGE were treated with enzymes. In case of SDS-PAGE, the digested peptides were extracted from the gel. The samples were desalted and concentrated to introduce into the MS system. The schematic procedure is shown in Fig. 5.

4.2. Protein identification

The samples were analyzed using nanoflow LC-MS/MS system.The purified peptides were eluted with a linear gradient of acetonitrile and sprayed into the high-resolution tandem mass spectrometer. Acquired tandem mass (MS/MS) spectra were then converted to text files and processed against sequence databases using Mascot. Based on theprinciple that each peptide has a MS/MS spectrum with unique characteristics,the search enginecomparesmeasured data on precursor/product ionswith those theoretically calculated from protein sequence data(Fig. 6). The MS/MSspectrumfile contains mass to charge ratio (m/z) values of precursor and product ions along withtheir intensity. The measuredspectrum lists are searched against sequence databases to identify the corresponding peptide in a statistical manner. The theoretical spectrumlists are totally dependent on the contents of sequence databases themselves.

Figure 6.
The principle ofprotein identification.A search enginecompares measuredMS/MSspectrum lists with theoretical ones. The acquired MS/MS spectra are converted into a text file that iscomposed of precursor ion data and product ion data, in a format defined by the search engine. Product ion data usually consist ofmultiplepairs ofm/zand its intensity. The theoretical m/z values are calculated virtually.

4.3. Finding of novel small proteins

Forexploringnovel small proteins,two types of sequence databases were used;one was an artificial database computationally translated from the cDNAsequences in all the reading frames and the other was an already established protein database. In order to processthe comparison ofthe large-scale protein identification data from the two kinds of databases,severalPerl scripts have beendeveloped based on thedefinition that candidatesof novel small proteins were identified only in the cDNA database(s) (Fig. 7). In a result datasheet using RefSeq sequences, each protein was annotated with NM numbers for the cDNA database and with NP numbers for the protein database. The Perl scripts then exchanged NM to NP numbers and evaluatedthem.

Figure 7.
Thealgorithm to compare the listsof searchresults using RefSeq cDNAand protein databases.The proteins identified from the cDNA database are annotated with NM numbers, whereas, those from the protein database are with NP numbers. To compare these results, it is needed to exchange NM to NP numbers. The NP numbersannotated only from the cDNA database are considered to becandidates of novel proteins.

5. Bioinformatics approach

In order to forward MS-based identification of novel coding regions of mRNAs, MS systems, sequence databases and bioinformatics methodologies are required toimprove together. Regardingbioinformatics, twoaspects seem to be demanded; one is for retrieving target proteins from an enormoussize ofdatabase searching results, the other is for constructingplatforms to predict novelcoding sequences (CDSs).

5.1. Contribution of sequence databases & bioinformatics to MS-based proteomics

The recent advances in MS-based proteomics technology have enabled us to perform large-scale protein identification with high sensitivity. The accumulation of well-established sequence databasesalso made a great contribution to efficient identification in proteomics analyses. One of the representative databases is a specialized 5‘-end cDNA database like DBTSS and the other is a series of whole genomesequence databases for variousspecies. To investigatethe mechanismsintranscriptional control, DBTSS has lately attracted considerable attention because it contains accumulated information on the transcriptional regulation of each gene (Suzuki et al., 2002, 2004; Tsuchihara et al., 2009; Wakaguri et al., 2008; Yamashita et al., 2006). Based on the accumulated data,the diverse distribution of TSSs wasclearly indicated (Kimura et al., 2006; Suzuki et al., 2000, 2001). On the other hand,manywhole genome sequencing projectsare progressing all over the world (GOLD: Genomes Online Database; http://www.genomesonline.org/).Complement and maintenanceof sequence databases for variousspeciesmust help to find more novel proteins across the species. For example,there are several reports that conducted bioinformatical approaches to explore novel functional uORFs by comparing the 5'-UTRregions of orthologs based on multiple sequence alignments (Zhang & Dietrich, 2005), using ORF Finder (http://bioinformatics.org/sms/orf find.html) and a machine learning technique, inductive logic programming (ILP) with biological background knowledge (Selpi et al., 2006), or applying comparative genomics and a heuristicrule-based expert system (Cvijovic et al., 2007). Using advanced sequence databases, new proteinCDSs were added as a result of the predictionby variousalgorithms(e.g. Hatzigeorgiou, 2002; Ota et al., 2004). Based on the well-established cDNA databases, MS couldevaluatewhether these CDSs are actually translated in a high-throughput manner. Construction of more detailed sequence databases will lead to detection of more novel small proteins in the presumed 5'-UTRs (Oyama et al., 2004). Tomake good use of those exhaustive sequence databases, bioinformatical techniques, especiallydata mining tools such as search engines to retrieve target proteins from an enormoussize ofdatabase search results, areobviouslyindispensable.

5.2. Contribution of MS-based proteomics tosequence databases & bioinformatics

In addition to the technological progress of MS, sequence databases and data mining tools, development of other bioinformatical techniques calledprediction tools, are also important. Ad-hoc algorithms for predicting new CDSs, as mentioned above, could be improved by usingMS-based novel protein data. Those novel onescan be applied to play a role ina collection ofsupervised training data for machine learning, pattern recognition or rule-based manual approach. There is an interesting bioinformatical reportwhich hypothesizedthat a uORF in the transcript down-regulates transcription of the corresponding RNA via RNA decay mechanisms (Matsui et al., 2007). They obtained human and mouse transcripts from RefSeq and UniGene (http://www.ncbi.nlm.nih.gov/unigene) and classified the transcripts into Level 0 (not containing uORF) and Level 1-3 (containing uORF). Then, they prepared the data of expression intensities and half-lives of mRNA transcripts mainly from SymAtlas (now linked to BioGPS; http://biogps.gnf.org/#goto=welcome) and Genome Research website (http://genome.cshlp.org/). Although they suggested that not only the expression level but also the half-life of transcriptswas obviouslydeclined in the latter group, they did not demonstrate any interaction between uORFs and transcripts.

Advanced MS instruments can not only evaluatewhether uORFs are actually translated but also quantifytime-course changes of their expression levels. Stable isotope labeling with amino acids in cell culture (SILAC) technology enables us to quantify the changes regarding all the proteins in vivo (Oyama et al., 2009).Based on time-course changes of specific peptides, we could also hypothesize some regulatory interactions.In combination with the measurement of the dynamics of the corresponding mRNAs using microarray or reverse transcription-polymerase chain reaction (RT-PCR),transcriptional regulation by short ORFs will be analyzed at the system level.

6. Conclusion

Although the roles of5‘-UTR elements, especially uORFs, had been well discussedas translational regulators for the main ORFs in the biological context, whether the proteins encoded by the uORFs were translated had not been approached for a long time. We first unraveledthe mystery by demonstrating the existence of novel protein products defined by these ORFs using advanced proteomics technology. Thanks to the progress of nanoLC-MS/MS-based shotgun proteomics strategies, thousands of proteins can now be identified fromprotein mixtures such as cell lysates. Some of the presumed UTRs areno longer“untranslated“,and other noncoding transcriptsareno longer“noncoding“. One of the novel small proteins revealed in our analysis was indeed defined by a short transcript variant generated by utilization of the downstream alternative promoters(Oyama et al., 2007). Alternative uses of diverse transcription initiation, splicing and translation start sites could increase the complexity of short protein-coding regions and MS-based annotation of these novel small proteins will enable us to perform a more detailed analysis of the real outline of the proteome, along with the translational regulationby the diversified short ORFeome systematically.

References

1. AltmannM.TrachselH.1993Regulation of translation initiation and modulation of cellular physiology.Trends in Biochemical Sciences, 1811429432Online 0167-7640Print ISSN 0968-0004.
2. CarninciP.SandelinA.LenhardB.KatayamaS.ShimokawaK.PonjavicJ.SempleC. A. M.TaylorM. S.EngstromP. G.FrithM. C.ForrestA. R. R.AlkemaW. B.TanS. L.PlessyC.KodziusR.RavasiT.KasukawaT.FukudaS.Kanamori-KatayamaM.KitazumeY.KawajiH.KaiC.NakamuraM.KonnoH.NakanoK.Mottagui-TabarS.ArnerP.ChesiA.GustincichS.PersichettiF.SuzukiH.GrimmondS. M.WellsC. A.OrlandoV.WahlestedtC.LiuE. T.HarbersM.KawaiJ.BajicV. B.HumeD. A.HayashizakiY.2006Genome-wide analysis of mammalian promoter architecture and evolution.Nature Genetics, 386626635Online 1546-1718Print ISSN 1061-4036.
3. CvijovicM.DaleviD.BilslandE.KempG. J. L.SunnerhagenP.2007Identification of putative regulatory upstream ORFs in the yeast genome using heuristics and evolutionary conservation.BMC Bioinformatics, 8Article 295Online 1471-2105
4. DibaF.WatsonC. S.GametchuB.(2001).5’UT.2001UTR Sequences of the Glucocorticoid Receptor 1A Transcript Encode a Peptide Associated With Translational Regulation of the Glucocorticoid Receptor. Journal of Cellular Biochemistry, 811149161Online 1097-4644Print ISSN 0730-2312.
5. GeballeA. P. .MorrisD. R.1994Initiation codons within 5’-leaders ofmRNAs as regulators of translation.Trends in Biochemical Sciences, 194159164Online 0167-7640Print ISSN 0968-0004.
6. HatzigeorgiouA. G.2002Translation initiation start prediction in human cDNAs with high accuracy. Bioinformatics, 182343350Online 1460-2059Print ISSN 1367-4803.
7. IaconoM.MignoneF.PesoleG.(2005).uA. U.2005uAUG and uORFs in human and rodent 5’untranslated mRNAs.Gene, 34997105Online 1879-0038Print ISSN 0378-1119.
8. KimuraK.WakamatsuA.SuzukiY.OtaT.NishikawaT.YamashitaR.YamamotoJ.SekineM.TsuritaniK.WakaguriH.IshiiS.SugiyamaT.SaitoK.IsonoY.IrieR.KushidaN.YoneyamaT.OtsukaR.KandaK.YokoiT.KondoH.WagatsumaM.MurakawaK.IshidaS.IshibashiT.Takahashi-FujiiA.TanaseT.NagaiK.KikuchiH.NakaiK.IsogaiT.SuganoS.2006Diversification of transcriptional modulation: Large-scale identification and characterization of putative alternative promoters of human genes. Genome Research, 1615565Online 1549-5469Print ISSN 1088-9051.
9. KozakM.1987An analysis of 5’-noncoding sequences from 699 vertebrate messenger RNAs. Nucleic Acids Research, 152081258148Online 1362-4962Print ISSN 0305-1048.
10. KozakM.1989The Scanning Model for Translation: An Update. The Journal of Cell Biology, 1082229241Online 1540-8140Print ISSN 0021-9525.
11. KozakM.1991Structural Features in EukaryoticmRNAs That Modulate theInitiation of Translation. The Journal of Biological Chemistry, 266301986719870Online 0108-3351X, Print ISSN 0021-9258.
12. KozakM.(19991999Initiation of translation in prokaryotes and eukaryotes.Gene, 2342187208Online 1879-0038Print ISSN 0378-1119.
13. KozakM.2002Pushing the limits of the scanning mechanism for initiation of translation.Gene, 2991-2134Online 1879-0038Print ISSN 0378-1119.
14. MaruyamaK.SuganoS.1994Oligo-capping: a simple method to replace the cap structure of eukaryotic mRNAs with oligoribonucleotides. Gene, 1381-2171174Online 1879-0038Print ISSN 0378-1119.
15. MatsuiM.YachieN.OkadaY.SaitoR.TomitaM.2007Bioinformatic analysis of post-transcriptional regulation by uORF in human and mouse.FEBS Letters, 5812241844188Online 1873-3468Print ISSN 0014-5793.
16. MeijerH. A.ThomasA. A. M.2002Control of eukaryotic protein synthesis by upstream open reading frames in the 5’-untranslated region of an mRNA.Biochemical Journal, 3671111Online 1470-8728Print ISSN 0264-6021.
17. MorrisD. R.GeballeA. P.2000Upstream Open Reading Frames as Regulators of mRNA Translation.Molecular and Cellular Biology, 202386358642Online 1098-5549Print ISSN 0270-7306.
18. Ota, T., Suzuki, Y., (2004). Complete sequencing and characterization of 21,243 full-length human cDNAs. Nature Genetics, Vol. 36, No. 1, pp. 40-45,Online ISSN 1546-1718, Print ISSN 1061-4036.
19. OyamaM.ItagakiC.HataH.SuzukiY.IzumiT.NatsumeT.IsobeT.SuganoS.2004Analysis of Small Human Proteins Reveals the Translation of Upstream Open Reading Frames of mRNAs. GenomeResearch, 1410B20482052Online 1549-5469Print ISSN 1088-9051.
20. OyamaM.Kozuka-HataH.SuzukiY.SembaK.YamamotoT.SuganoS.2007Diversity of Translation Start Sites May Define Increased Complexity of the Human Short ORFeome. Molecular & Cellular Proteomics, 6610001006Online 1535-9484Print ISSN 1535-9476.
21. OyamaM.Kozuka-HataH.TasakiS.SembaK.HattoriS.SuganoS.InoueJ.YamamotoT.2009Temporal Perturbation of Tyrosine Phosphoproteome Dynamics Reveals the System-wide Regulatory Networks.Molecular & Cellular Proteomics, 82226231Online 1535-9484Print ISSN 1535-9476.
22. PeabodyD. S.SubramaniS.BergP.1986Effect of Upstream Reading Frames on Translation Efficiency in Simian Virus 40 Recombinants. Molecular and Cellular Biology, 6727042711Online 1098-5549Print ISSN 0270-7306.
23. PeriS.PandeyA.2001A reassessment of the translation initiation codon in vertebrates.Trends in Genetics, 1712685687Print 0168-9525
24. RastinejadF.BlauH. M.(19931993Genetic Complementation Reveals a Novel Regulatory Role for 3’ Untranslated Regions in Growth and Differentiation. Cell, 726903917Online 1097-4172Print ISSN 0092-8674.
25. SedmanS. A.GelembiukG. W.MertzJ. E.1990Translation Initiation at a Downstream AUG Occurs with Increased Efficiency When the Upstream AUG Is Located Very Close to the 5’ Cap. Journal of Virology, 641453457Online 1098-5514Print ISSN 0022-538X.
26. SelpiBryant. C. H.KempG. J. L.CvijovicM.2006A First Step towards Learning which uORFs Regulate Gene Expression. Journal of Integrative Bioinformatics, 32ID. 31, Online 1613-4516
27. SuzukiY.Yoshitomo-NakagawaK.MaruyamaK.SuyamaA.SuganoS.1997Construction and characterization of a full length-enriched and a 5′-end-enriched cDNA library.Gene, 2001-2149156Online 1879-0038Print ISSN 0378-1119.
28. SuzukiY.IshiharaD.SasakiM.NakagawaH.HataH.TsunodaT.WatanabeM.KomatsuT.OtaT.IsogaiT.SuyamaA.SuganoS.2000Statistical Analysis of the 5’Untranslated Region of Human mRNA Using “Oligo-Capped” cDNA Libraries.Genomics, 643286297Online 1089-8646Print ISSN 0888-7543.
29. SuzukiY.TairaH.TsunodaT.Mizushima-SuganoJ.SeseJ.HataH.OtaT.IsogaiT.TanakaT.MorishitaS.OkuboK.SakakiY.NakamuraY.SuyamaA.SuganoS.2001Diverse transcriptional initiation revealed by fine, large-scale mapping of mRNA start sites. EMBO reports, 25388393Online 1469-3178Print ISSN 1469-221X.
30. SuzukiY.YamashitaR.NakaiK.SuganoS.2002DBTSS: DataBase of human Transcriptional Start Sites and full-length cDNAs. Nucleic Acids Research, 301328331Online 1362-4962Print ISSN 0305-1048.
31. SuzukiY.YamashitaR.SuganoS.NakaiK.2004DBTSS: DataBase of Transcriptional Start Sites: progress report 2004. Nucleic Acids Research, 32suppl 1), Database D78-D81D78 EOFD81 EOFOnline 1362-4962Print ISSN 0305-1048.
32. TsuchiharaK.SuzukiY.WakaguriH.IrieT.TanimotoK.HashimotoS.MatsushimaK.Mizushima-SuganoJ.YamashitaR.NakaiK.BentleyD.EsumiH.SuganoS.2009Massive transcriptional start site analysis of human genes in hypoxia cells.Nucleic AcidsResearch, 37722492263Online 1362-4962Print ISSN 0305-1048.
33. VilelaC.Mc CarthyJ. E. G.2003Regulation of fungal gene expression via short open reading frames in the mRNA 5’untranslated region. Molecular Microbiology, 494859867Online 1365-2958Print ISSN 0950-382X.
34. WakaguriH.YamashitaR.SuzukiY.SuganoS.NakaiK.2008DBTSS: database of transcription start sites, progress report 2008. Nucleic Acids Research, 36suppl 1), Database D97-D101Online 1362-4962Print ISSN 0305-1048.
35. Wang-QX.RothnagelJ. A.2004Untranslated regions with multiple upstream AUG codons can support low-level translation via leaky scanning and reinitiation. Nucleic Acids Research, 32413821391Online 1362-4962Print ISSN 0305-1048.
36. YamashitaR.SuzukiY.NakaiK.SuganoS.2003Small open reading frames in 5′ untranslated regions of mRNAs. Comptes RendusBiologies, 32610-11987991Online 1768-3238Print ISSN 1631-0691.
37. YamashitaR.SuzukiY.WakaguriH.TsuritaniK.NakaiK.SuganoS.2006DBTSS: DataBase of Human Transcription Start Sites, progress report 2006. Nucleic Acids Research, 34suppl 1), Database D86-D89D86 EOFD89 EOFOnline 1362-4962Print ISSN 0305-1048.
38. ZhangZ.DietrichF. S.2005Identification and characterization of upstream open reading frames (uORF) in the 5’ untranslated regions (UTR) of genes in Saccharomyces cerevisiae.Current Genetics, 4827787Online1432-0983Print ISSN 0172-8083.

[1] 1. AltmannM.TrachselH.1993Regulation of translation initiation and modulation of cellular physiology.Trends in Biochemical Sciences, 1811429432Online 0167-7640Print ISSN 0968-0004.

[2] 2. CarninciP.SandelinA.LenhardB.KatayamaS.ShimokawaK.PonjavicJ.SempleC. A. M.TaylorM. S.EngstromP. G.FrithM. C.ForrestA. R. R.AlkemaW. B.TanS. L.PlessyC.KodziusR.RavasiT.KasukawaT.FukudaS.Kanamori-KatayamaM.KitazumeY.KawajiH.KaiC.NakamuraM.KonnoH.NakanoK.Mottagui-TabarS.ArnerP.ChesiA.GustincichS.PersichettiF.SuzukiH.GrimmondS. M.WellsC. A.OrlandoV.WahlestedtC.LiuE. T.HarbersM.KawaiJ.BajicV. B.HumeD. A.HayashizakiY.2006Genome-wide analysis of mammalian promoter architecture and evolution.Nature Genetics, 386626635Online 1546-1718Print ISSN 1061-4036.

[3] 3. CvijovicM.DaleviD.BilslandE.KempG. J. L.SunnerhagenP.2007Identification of putative regulatory upstream ORFs in the yeast genome using heuristics and evolutionary conservation.BMC Bioinformatics, 8Article 295Online 1471-2105

[4] 4. DibaF.WatsonC. S.GametchuB.(2001).5’UT.2001UTR Sequences of the Glucocorticoid Receptor 1A Transcript Encode a Peptide Associated With Translational Regulation of the Glucocorticoid Receptor. Journal of Cellular Biochemistry, 811149161Online 1097-4644Print ISSN 0730-2312.

[5] 5. GeballeA. P. .MorrisD. R.1994Initiation codons within 5’-leaders ofmRNAs as regulators of translation.Trends in Biochemical Sciences, 194159164Online 0167-7640Print ISSN 0968-0004.

[6] 6. HatzigeorgiouA. G.2002Translation initiation start prediction in human cDNAs with high accuracy. Bioinformatics, 182343350Online 1460-2059Print ISSN 1367-4803.

[7] 7. IaconoM.MignoneF.PesoleG.(2005).uA. U.2005uAUG and uORFs in human and rodent 5’untranslated mRNAs.Gene, 34997105Online 1879-0038Print ISSN 0378-1119.

[8] 8. KimuraK.WakamatsuA.SuzukiY.OtaT.NishikawaT.YamashitaR.YamamotoJ.SekineM.TsuritaniK.WakaguriH.IshiiS.SugiyamaT.SaitoK.IsonoY.IrieR.KushidaN.YoneyamaT.OtsukaR.KandaK.YokoiT.KondoH.WagatsumaM.MurakawaK.IshidaS.IshibashiT.Takahashi-FujiiA.TanaseT.NagaiK.KikuchiH.NakaiK.IsogaiT.SuganoS.2006Diversification of transcriptional modulation: Large-scale identification and characterization of putative alternative promoters of human genes. Genome Research, 1615565Online 1549-5469Print ISSN 1088-9051.

[9] 9. KozakM.1987An analysis of 5’-noncoding sequences from 699 vertebrate messenger RNAs. Nucleic Acids Research, 152081258148Online 1362-4962Print ISSN 0305-1048.

[10] 10. KozakM.1989The Scanning Model for Translation: An Update. The Journal of Cell Biology, 1082229241Online 1540-8140Print ISSN 0021-9525.

[11] 11. KozakM.1991Structural Features in EukaryoticmRNAs That Modulate theInitiation of Translation. The Journal of Biological Chemistry, 266301986719870Online 0108-3351X, Print ISSN 0021-9258.

[12] 12. KozakM.(19991999Initiation of translation in prokaryotes and eukaryotes.Gene, 2342187208Online 1879-0038Print ISSN 0378-1119.

[13] 13. KozakM.2002Pushing the limits of the scanning mechanism for initiation of translation.Gene, 2991-2134Online 1879-0038Print ISSN 0378-1119.

[14] 14. MaruyamaK.SuganoS.1994Oligo-capping: a simple method to replace the cap structure of eukaryotic mRNAs with oligoribonucleotides. Gene, 1381-2171174Online 1879-0038Print ISSN 0378-1119.

[15] 15. MatsuiM.YachieN.OkadaY.SaitoR.TomitaM.2007Bioinformatic analysis of post-transcriptional regulation by uORF in human and mouse.FEBS Letters, 5812241844188Online 1873-3468Print ISSN 0014-5793.

[16] 16. MeijerH. A.ThomasA. A. M.2002Control of eukaryotic protein synthesis by upstream open reading frames in the 5’-untranslated region of an mRNA.Biochemical Journal, 3671111Online 1470-8728Print ISSN 0264-6021.

[17] 17. MorrisD. R.GeballeA. P.2000Upstream Open Reading Frames as Regulators of mRNA Translation.Molecular and Cellular Biology, 202386358642Online 1098-5549Print ISSN 0270-7306.

[18] 18. Ota, T., Suzuki, Y., (2004). Complete sequencing and characterization of 21,243 full-length human cDNAs. Nature Genetics, Vol. 36, No. 1, pp. 40-45,Online ISSN 1546-1718, Print ISSN 1061-4036.

[19] 19. OyamaM.ItagakiC.HataH.SuzukiY.IzumiT.NatsumeT.IsobeT.SuganoS.2004Analysis of Small Human Proteins Reveals the Translation of Upstream Open Reading Frames of mRNAs. GenomeResearch, 1410B20482052Online 1549-5469Print ISSN 1088-9051.

[20] 20. OyamaM.Kozuka-HataH.SuzukiY.SembaK.YamamotoT.SuganoS.2007Diversity of Translation Start Sites May Define Increased Complexity of the Human Short ORFeome. Molecular & Cellular Proteomics, 6610001006Online 1535-9484Print ISSN 1535-9476.

[21] 21. OyamaM.Kozuka-HataH.TasakiS.SembaK.HattoriS.SuganoS.InoueJ.YamamotoT.2009Temporal Perturbation of Tyrosine Phosphoproteome Dynamics Reveals the System-wide Regulatory Networks.Molecular & Cellular Proteomics, 82226231Online 1535-9484Print ISSN 1535-9476.

[22] 22. PeabodyD. S.SubramaniS.BergP.1986Effect of Upstream Reading Frames on Translation Efficiency in Simian Virus 40 Recombinants. Molecular and Cellular Biology, 6727042711Online 1098-5549Print ISSN 0270-7306.

[23] 23. PeriS.PandeyA.2001A reassessment of the translation initiation codon in vertebrates.Trends in Genetics, 1712685687Print 0168-9525

[24] 24. RastinejadF.BlauH. M.(19931993Genetic Complementation Reveals a Novel Regulatory Role for 3’ Untranslated Regions in Growth and Differentiation. Cell, 726903917Online 1097-4172Print ISSN 0092-8674.

[25] 25. SedmanS. A.GelembiukG. W.MertzJ. E.1990Translation Initiation at a Downstream AUG Occurs with Increased Efficiency When the Upstream AUG Is Located Very Close to the 5’ Cap. Journal of Virology, 641453457Online 1098-5514Print ISSN 0022-538X.

[26] 26. SelpiBryant. C. H.KempG. J. L.CvijovicM.2006A First Step towards Learning which uORFs Regulate Gene Expression. Journal of Integrative Bioinformatics, 32ID. 31, Online 1613-4516

[27] 27. SuzukiY.Yoshitomo-NakagawaK.MaruyamaK.SuyamaA.SuganoS.1997Construction and characterization of a full length-enriched and a 5′-end-enriched cDNA library.Gene, 2001-2149156Online 1879-0038Print ISSN 0378-1119.

[28] 28. SuzukiY.IshiharaD.SasakiM.NakagawaH.HataH.TsunodaT.WatanabeM.KomatsuT.OtaT.IsogaiT.SuyamaA.SuganoS.2000Statistical Analysis of the 5’Untranslated Region of Human mRNA Using “Oligo-Capped” cDNA Libraries.Genomics, 643286297Online 1089-8646Print ISSN 0888-7543.

[29] 29. SuzukiY.TairaH.TsunodaT.Mizushima-SuganoJ.SeseJ.HataH.OtaT.IsogaiT.TanakaT.MorishitaS.OkuboK.SakakiY.NakamuraY.SuyamaA.SuganoS.2001Diverse transcriptional initiation revealed by fine, large-scale mapping of mRNA start sites. EMBO reports, 25388393Online 1469-3178Print ISSN 1469-221X.

[30] 30. SuzukiY.YamashitaR.NakaiK.SuganoS.2002DBTSS: DataBase of human Transcriptional Start Sites and full-length cDNAs. Nucleic Acids Research, 301328331Online 1362-4962Print ISSN 0305-1048.

[31] 31. SuzukiY.YamashitaR.SuganoS.NakaiK.2004DBTSS: DataBase of Transcriptional Start Sites: progress report 2004. Nucleic Acids Research, 32suppl 1), Database D78-D81D78 EOFD81 EOFOnline 1362-4962Print ISSN 0305-1048.

[32] 32. TsuchiharaK.SuzukiY.WakaguriH.IrieT.TanimotoK.HashimotoS.MatsushimaK.Mizushima-SuganoJ.YamashitaR.NakaiK.BentleyD.EsumiH.SuganoS.2009Massive transcriptional start site analysis of human genes in hypoxia cells.Nucleic AcidsResearch, 37722492263Online 1362-4962Print ISSN 0305-1048.

[33] 33. VilelaC.Mc CarthyJ. E. G.2003Regulation of fungal gene expression via short open reading frames in the mRNA 5’untranslated region. Molecular Microbiology, 494859867Online 1365-2958Print ISSN 0950-382X.

[34] 34. WakaguriH.YamashitaR.SuzukiY.SuganoS.NakaiK.2008DBTSS: database of transcription start sites, progress report 2008. Nucleic Acids Research, 36suppl 1), Database D97-D101Online 1362-4962Print ISSN 0305-1048.

[35] 35. Wang-QX.RothnagelJ. A.2004Untranslated regions with multiple upstream AUG codons can support low-level translation via leaky scanning and reinitiation. Nucleic Acids Research, 32413821391Online 1362-4962Print ISSN 0305-1048.

[36] 36. YamashitaR.SuzukiY.NakaiK.SuganoS.2003Small open reading frames in 5′ untranslated regions of mRNAs. Comptes RendusBiologies, 32610-11987991Online 1768-3238Print ISSN 1631-0691.

[37] 37. YamashitaR.SuzukiY.WakaguriH.TsuritaniK.NakaiK.SuganoS.2006DBTSS: DataBase of Human Transcription Start Sites, progress report 2006. Nucleic Acids Research, 34suppl 1), Database D86-D89D86 EOFD89 EOFOnline 1362-4962Print ISSN 0305-1048.

[38] 38. ZhangZ.DietrichF. S.2005Identification and characterization of upstream open reading frames (uORF) in the 5’ untranslated regions (UTR) of genes in Saccharomyces cerevisiae.Current Genetics, 4827787Online1432-0983Print ISSN 0172-8083.

Emergence of the Diversified Short ORFeome by Mass Spectrometry-Based Proteomics

Computational Biology and Applied Bioinformatics

Author Information

Hiroko Ao-Kondo*

Hiroko Kozuka-Hata

Masaaki Oyama

1. Introduction