Comparison of RNA-seq methodologies.
The central dogma of molecular biology describes the flow of genetic information from genes to functions of the cells and organisms. This comprises a two-step process: first, DNA, the permanent, heritable, genetic information repository, is transcribed by the RNA polymerase enzymes into RNA, a short-lasting information carrier; second, a subset of RNA, the messenger RNAs, mRNAs, are translated into protein. The
Importantly, not all RNAs are translated into proteins, some serve a structural function, for example, rRNAs in the assembly of ribosomes, others are transporters, e.g., tRNAs, yet others serve regulatory functions, for example, the siRNAs, short interfering RNA, or lncRNAs, long non-coding RNAs; these are not translated into proteins . However, these non-coding RNAs can and often do play roles in human diseases such as cancer, cardiovascular, and neurological disorders. While transcriptomics is most commonly applied to the mRNAs, the coding transcripts, transcriptomics also provides important data regarding content of the cell noncoding RNAs, including rRNA, tRNA, lncRNA, siRNA, and others. Specific approaches address the analysis of splice variant of the same gene in different tissues.
1. Transcriptome analysis
Transcriptome Analysis is the study of the transcriptome, of the complete set of RNA transcripts that are produced by the genome, under specific circumstances or in a specific cell, using high-throughput methods. Transcription profiling, which follows changes in behavior of a cell
2. Uses of transcriptome analysis
Transcriptome Analysis is most commonly used to compare specific pairs of samples. The differences may be due to different external environmental conditions, e.g., hormonal effects or toxins. More commonly, healthy and disease states are compared. For example, in cancer, transcriptomics analyses address classification, the mechanisms of pathogenesis and even outcome prediction. Transcriptome studies can classify cancer beyond anatomical location and histopathology. Outcome predictions can establish gene-based benchmarks to predict tumor prognosis and therapy response. These approaches are already in use for personalized medicine, individualized cancer patient therapies.
Organisms and tissues at various stages of development can be molecularly characterized. The transcriptomes of stem cells help to understand the processes of cellular differentiation or embryonic development. Because of its very broad approach transcriptome analysis is a great source for identifying targets for treatment.
The early approach to study whole transcriptomes used microarrays, a set of defined sequences arranged on a solid substrate . Microarrays almost exclusively represented mRNAs, that is, genes that are translated into proteins.
Nowadays the microarray approach is supplanted by high-throughput RNA sequencing, RNA-Seq, which detects all transcripts in a sample, including the regulatory siRNA and lncRNA transcripts . In this methodology, the bulk RNA is extracted from the sample and copied into stable double-stranded copy DNA, ds-cDNA, which is then sequenced using various sequencing methods . The sequences obtained are aligned to reference genome sequences, available in data banks, to identify which genes are transcribed. Quantitatively, the results provide the expression levels for the transcribed genes. Compared to microarrays, RNA-Seq can measure both the low-abundance and high-abundance RNAs over a five orders of magnitude range and, importantly, RNA-Seq requires much less starting material (nanograms vs. micrograms and even as little as 50 pg) . This made possible analyses of transcriptomes in a single cell, a great advance over bulk tissue RNA analyses . RNA-seq can be used to identify alternative splicing, novel transcripts, and fusion genes (Table 1).
In principle, the assembly of RNA-Seq reads is not dependent on reference genomes and can be used for gene expression studies of poorly characterized species with limited genomic resources. It can also be used to identify novel protein coding regions in sequenced genomes. RNA-seq can be performed using many next-generation sequencing platforms, however, each platform has its own requirements of sample preparation and the instrument design.
2.2 Data analysis, repositories and presentation
Improved sequencing technologies necessitated improved data analysis methods to deal with the increased volume of data produced by each transcriptome experiment. Importantly, the results are deposited into transcriptome databases, essential tools for transcriptome analysis. For example Gene Expression Omnibus, www.ncbi.nml.nih.gov, contains millions of transcription profiling experiments. Such data have potential applications beyond the original aims of an experiment. Typical outputs include quantitative tables of the transcript levels. This requires specific analysis algorithms, often specific to the methodology used. There are software packages to bridge data from disparate methodologies, to identify groups of similar expressed genes, or differentially expressed functionally significant regulatory or metabolic pathways.
The results of transcriptomic analyses are graphically often presented as heat maps, a system of color-coding that represents different levels of expression of given genes in different samples (Figure 1A). Such presentations also frequently display a clustering of samples, this helps to identify samples with similar gene expression. Another common graphical presentation uses Venn diagrams, which count the transcripts which are equivalently regulated in multiple samples (Figure 1B).
Transcriptome analyses have become indispensable in basic research, translational, and clinical studies. In general, transcriptome analysis is a very powerful hypothesis-generating tool, more than a theory proving one.
3. Specific example: transcriptome analysis applied to human skin
Easily accessible, skin was among the first targets analyzed using ‘omics’ and dermatology embraced the approaches very early . A classic example of coordinated transcriptional regulation was observed in cultured fibroblasts after serum stimulation . Serum addition causes not only rapid recommencement of the cell cycle but, characteristically a wound-healing response, a physiological role of fibroblasts in wound healing . Transcriptional responses of epidermal keratinocytes to UV light, hormones, vitamins, infections, inflammatory and immunomodulating cytokines, toxins and allergens have been characterized, as were the changes associated with epidermal differentiation [9, 10].
The expression signatures that define the various cell types in human skin, were used to define 20 specific gene signatures, including those for keratinocytes, melanocytes, endothelia, adipocytes, immune cells, hair follicles, sebaceous, sweat, and apocrine glands. This resource provided a resource named SkinSig, which was then used to analyze 18 skin conditions, providing in-context interpretation of, for example, influx in immune cells in inflammation or differentiation changes in disorders of cornification .
In the future we can anticipate a greatly expanded usage of transcriptome analysis. Translated to the bedside, it can provide better understanding and more specific diagnoses of diseases. This, of course, requires additional advances in the technology, both in the lab-bench components reducing the costs and guaranteeing reproducibility and accuracy, as well as in the computer-based components, algorithms that enable physicians to establish diagnosis quickly and reliably. In a generation, this approach will become routine.