1. Introduction: the RNA world
We have already learned how the genetic codes or biological information are transcribed into mRNAs, and then how they are translated into amino acid sequences of various polypeptides. This is widely known as “Central Dogma”, the most fundamental and important concept in molecular biology. In the translation process, polypeptides are synthesized by ribosomes with a template mRNA. Thus, in this context, we tend to take it for granted that the main function of RNAs is just transporting genomic information from nuclei to cytoplasm. However, if we regard a ribosome as a polypeptide-synthesizing protein-RNA complex, which contains all major RNAs, mRNA, rRNA, and tRNA, we will realize that the RNAs have biologically important roles as enzymatic machinery to synthesize proteins. RNAs are not only required for the translation system, but also for splicing/processing of RNAs and telomere elongation, contained in the protein-RNA complexes, snRNPs, and
Because, as described, RNAs have both information and function, it is suggested that they might have been the most fundamental and primary molecules from the beginning of chemical evolution before living organisms emerged on earth. Even in the DNA replication process, RNAs are required as primers for the leading strand synthesis. Recent study in initiation site sequencing (ini-seq) revealed that the human DNA replication origins very frequently overlap with transcription start sites (TSSs) and G-quadruplex (G4) motifs . In other words, a number of biologically essential events or reactions are dependent on the synthesis of the RNA molecules. Thus, it has been proposed that an origin of life has come from an “RNA World” , which might have given a chance to develop the most biologically essential reactions for life, including replication, transcription, and translation.
Based on the concept that RNAs are the most essential molecules for living things, in this book project, it is worth to focus on topics discussing on how transcription is regulated and how it becomes if it is dysregulated.
2. Transcriptional controlling system in eukaryotic cells
It is already known that three types of RNA polymerases participate in the synthesis of different types of RNAs. The RNA polymerase (RNA pol) I and III catalyzes production of rRNAs, tRNAs, and snRNAs. Although they are essentially important enzymes to generate functional RNAs, much of the interest have been directed to RNA pol II, which catalyzes both protein-encoding and non-coding gene transcription. Up to present, molecular mechanisms of how each protein-encoding gene is expressed have been well studied. Most of the textbooks in molecular biology describe in detail how general transcription factors, including TBP, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH, co-operatively work to recruit RNA pol II appropriately onto the TSSs . Recently, structure of the eukaryotic RNA pol II complex, containing elongation factors, Spt4/5, Elf1, and TFIIS, has been revealed . The entire transcription reaction system from initiation to termination on DNA template will be elucidated in the near future.
In eukaryotic cells, epigenetic regulation or chromatin modification affect gene expression. After unwinding the chromosomes, each gene will be correctly transcribed from TSS in the core promoter region. The promoter activity is regulated by enhancer or proximal promoter regions, where various transcription factors (TFs) access to bind. These TFs usually recognize specific DNA sequences or
3. Non-coding RNAs (ncRNAs)
Not all of the information on the genomes encodes protein. It is estimated that most (about 95%) of the genomes consist of non-coding regions. Recent transcriptome analyses, including Cap-Analysis Gene Expression (CAGE), revealed that large amount of ncRNAs are contained in total transcripts . Because ncRNAs do not encode proteins, they have long been thought as “junks” in the genome. However, recent studies discovered that some of them are not junks, rather “jewels” in nuclei having essential roles in controlling cell growth, development, and function. The ncRNAs are classified into two groups as short ncRNAs (miRNAs, piwiRNAs, and snRNAs) and long ncRNAs (lncRNAs), consisting of over 200 ribonucleotides in the molecule. The more analytical methods in sequencing RNA molecules developed, the more lncRNAs were identified with increasing in number, which are estimated over 35,000 at present. The lncRNAs are transcribed by RNA pol II, and their TSSs are frequently (65%) found at bidirectional promoter regions . In mouse embryonic cells, transcribed lncRNAs recruit TFs and splicing factors to activate neighboring or bi-directional partner gene expression . Therefore, lncRNAs may give accurate platforms or TSSs for bi-directional partner genes. Recent study with genome editing system identified lncRNA loci regulate genes neighborhood . In addition, it was revealed that specific lncRNAs are contained in nuclear bodies, including nuclear speckle (
4. Epigenetic control of gene expression
Gene expression pattern could be altered by epigenetic regulation . The epigenetic control, which mainly regulates expression of sets of genes, is driven by DNA methylation, histone modifications, chromatin remodeling, and ncRNAs [20, 21]. A lot of factors that are involved in the epigenetic controlling system have been identified and characterized. Methyl groups could be transferred to both DNAs and histone proteins by enzymatic reactions. DNA methylation plays pivotal roles in the regulation of nuclear events, including gene expression . Especially, when silencing of DNA repair genes occurred, it may boost mutation rate, and it will be resulted in the genome instability. Therefore, DNA methylation is regarded as one of the essential biomarkers in cancer . The reaction is catalyzed by at least three independent DNA methyltransferases (DNMTs): DNMT1, DNMT3A, and DNMT3B , using S-adenosyl-l-methyonine (AdoMet) as a methyl group donor . The ten-eleven translocation (TET) family proteins, including TET1, TET2, and TET3, are the 5-methylcytosine hydroxylases that remove methyl group from an oxidized form of the cytosine (5mC), 5-hydroxymethylcytosine (5hmC), and other forms [20, 22]. Recent study revealed that intragenic DNA methylation assists fidelity in genome transcription initiation . NRF protein preferentially accesses to unmethylated genomic regions, indicating that DNA methylation status restricts binding of methylation sensitive TFs onto their recognition sequences . A methylation sensitive SELEX analysis indicated that transcription factor ETS protein binding was inhibited by mCpG, but homeobox proteins, such as POU and NFAT, preferentially bind to the methylation introduced site . The results suggest that DNA recognition mechanism of several TFs that mainly act in the development of organisms is dependent on the methylation of DNA.
Histone proteins could be modified by attachment of various molecules, including methyl-, acetyl-, hydroxyl-, SUMOyl-, and poly ADP(ribosyl)-groups [20, 29]. In addition, they are acylated on the Lys residues to regulate transcription of genes that encode metabolic-response factors . These modifications are recognized by different proteins, such as bromodomain-containing proteins , double PHD finger domain proteins [30, 32], YEATS domain proteins , WD40 proteins , and Ankyrin-repeat proteins .
As described above, epigenetic regulation is tightly linked with genome stability, and it is affected by modifying group molecules or metabolites, including acetyl-CoA, AdoMet, and NAD+ . These observations suggest that aging is not only determined by genetic information, but also by environmental stresses, including nutrient conditions . Interestingly, recent study indicated that lactate dehydrogenase LDHA promotes IFN-γ expression through an epigenetic mechanism . The results suggest that LDHA-mediated aerobic glycolysis could enforce mitochondria to generate more acetyl-CoA that is to be utilized for histone modification. Poly(ADP-ribosyl)ation is a protein modification reaction that is catalyzed by PARP enzymes using NAD+ as a substrate. The poly(ADP-ribosyl)ation occurs on histones, non-histone proteins, DNA-repair factors , and TFs to regulate gene expression [39, 40]. In summary, nutrients or food intake may have some roles in regulating transcription because they can induce epigenetic changes .
5. Transcription disorders and human diseases
The great progresses in the whole genome sequencing (WGS) techniques enabled us to study subtle differences in genomic sequences between cancer and normal cells . Somatic mutations on driver genes in various cancers have been identified  and the statistical data will be applied on diagnosis or even on the prediction of cancer risks and incidences . Very recently, it was proposed that analysis of WGS data from circulating tumor cells could be applied for personalized therapy for malignant cancer . Importantly, mutations in 5′-upstream region of the human
Various diseases may be caused by dysregulations in transcription. For example, Yes-associated protein (YAP) and TAZ proteins, which activate inflammatory gene expression, are involved in the atherosclerosis . In several neurological and neuromuscular disorders, including Huntington disease, muscular dystrophy, and amyotrophic lateral sclerosis (ALS), accumulation of repeat containing RNAs in aberrant foci in nucleus have been observed . Moreover, shRNA screening system
6. Future prospects
Overall, most of the cellular responses to signals and stresses from the environment, including DNA damage, nutrient condition, viral infections, and some specific drugs, could affect transcription or gene expression profile. Presently, it has been shown that introduction of several TFs (OKSM factors or Yamanaka factors) into somatic cells can reprogram and convert them with pluripotency [63, 64]. Very recently, it was experimentally shown that iPS cell-derived dopaminergic neurons could be applied for the treatment of Parkinson’s disease . These experimentally supported evidences suggest that introduction of certain combination of TFs into cancer cells might enforce them to reprogram transcription profile so that they could stop proliferation but acquire DNA repair with more accuracy.
Clinical application of gene therapy  with appropriate set of expression vectors to induce DNA repair or mitochondrial function associated genes will soon be established. Not only vectors, which deliver TF-encoding genes into cell nuclei, but also nucleic acids, including siRNAs, lncRNAs, or RNA aptamers, should be also improved for the strategy. In addition, genome editing on the promoter or enhancer regions of some target genes of patient derived cells will be an effective approach to treat specific diseases. For example, it was recently reported that genome editing to delete α-globin enhancer reduces its excessed expression in primary human hematopoietic stem cells, strongly suggesting the clinical use of the technique could be applied as a potential therapy for β-thalassemia . Therefore, genome editing on transcription regulatory elements will be an alternative novel gene therapy for leukemia and cancer, in accordance with a progress in gene delivery system. In the near future, the continuing studies in the transcription controlling systems will successfully contribute to establish novel therapeutics for various human intractable diseases, including cancer, immunological diseases, and neuro-degenerating diseases.