1. Introduction: the RNA world
We have already learned how the genetic codes or biological information are transcribed into mRNAs, and then how they are translated into amino acid sequences of various polypeptides. This is widely known as “Central Dogma”, the most fundamental and important concept in molecular biology. In the translation process, polypeptides are synthesized by ribosomes with a template mRNA. Thus, in this context, we tend to take it for granted that the main function of RNAs is just transporting genomic information from nuclei to cytoplasm. However, if we regard a ribosome as a polypeptide-synthesizing protein-RNA complex, which contains all major RNAs, mRNA, rRNA, and tRNA, we will realize that the RNAs have biologically important roles as enzymatic machinery to synthesize proteins. RNAs are not only required for the translation system, but also for splicing/processing of RNAs and telomere elongation, contained in the protein-RNA complexes, snRNPs, and TERC subunit of the telomerase, respectively. Moreover, recent studies revealed that non-coding RNAs (ncRNAs, siRNAs, or miRNAs) regulate transcription and translation.
Because, as described, RNAs have both information and function, it is suggested that they might have been the most fundamental and primary molecules from the beginning of chemical evolution before living organisms emerged on earth. Even in the DNA replication process, RNAs are required as primers for the leading strand synthesis. Recent study in initiation site sequencing (ini-seq) revealed that the human DNA replication origins very frequently overlap with transcription start sites (TSSs) and G-quadruplex (G4) motifs . In other words, a number of biologically essential events or reactions are dependent on the synthesis of the RNA molecules. Thus, it has been proposed that an origin of life has come from an “RNA World” , which might have given a chance to develop the most biologically essential reactions for life, including replication, transcription, and translation.
Based on the concept that RNAs are the most essential molecules for living things, in this book project, it is worth to focus on topics discussing on how transcription is regulated and how it becomes if it is dysregulated.
2. Transcriptional controlling system in eukaryotic cells
It is already known that three types of RNA polymerases participate in the synthesis of different types of RNAs. The RNA polymerase (RNA pol) I and III catalyzes production of rRNAs, tRNAs, and snRNAs. Although they are essentially important enzymes to generate functional RNAs, much of the interest have been directed to RNA pol II, which catalyzes both protein-encoding and non-coding gene transcription. Up to present, molecular mechanisms of how each protein-encoding gene is expressed have been well studied. Most of the textbooks in molecular biology describe in detail how general transcription factors, including TBP, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH, co-operatively work to recruit RNA pol II appropriately onto the TSSs . Recently, structure of the eukaryotic RNA pol II complex, containing elongation factors, Spt4/5, Elf1, and TFIIS, has been revealed . The entire transcription reaction system from initiation to termination on DNA template will be elucidated in the near future.
In eukaryotic cells, epigenetic regulation or chromatin modification affect gene expression. After unwinding the chromosomes, each gene will be correctly transcribed from TSS in the core promoter region. The promoter activity is regulated by enhancer or proximal promoter regions, where various transcription factors (TFs) access to bind. These TFs usually recognize specific DNA sequences or cis-elements, and the enhancement of transcriptional activity is dependent on the combinations of TFs, their binding sites, or distances from TSS. Thanks to the completion of human genome project, and with a development of Next Generation Sequencing (NGS) and especially Chromatin Immune Precipitation sequencing (ChIP-seq) technique , we can now refer TF-binding sequences and TSSs of most of the genes by a number of online programs, including NCBI, JASPAR, and DBTSS databases [6, 7, 8]. Promoters and enhancers, which might be digital landmarks on genomes for TFs, can determine transcription-initiation frequency. After transcription is completed, RNAs should be appropriately processed and modified. These processes are regulated by RNA binding helicases and ncRNAs . It should not be ignored that RNAs are degraded in RNA exome complexes . The degradation is required for the quality control of RNAs and gene expression. Then, mRNAs are incorporated in mRNPs that are to be exported from nuclei to cytoplasm with a help of export-component proteins, including TREX . Thus, matured mRNA molecules are made through a complicated multistep process, though it would be advantageous for cells to fine tune gene expression system overall. If unpredictable and undesired expression of some specific genes happened, it may lead to dysfunctions in mitochondria, immune response, and DNA-repair/epigenetic controlling systems. If it were deleterious for organisms, it may cause diseases, including cancer.
3. Non-coding RNAs (ncRNAs)
Not all of the information on the genomes encodes protein. It is estimated that most (about 95%) of the genomes consist of non-coding regions. Recent transcriptome analyses, including Cap-Analysis Gene Expression (CAGE), revealed that large amount of ncRNAs are contained in total transcripts . Because ncRNAs do not encode proteins, they have long been thought as “junks” in the genome. However, recent studies discovered that some of them are not junks, rather “jewels” in nuclei having essential roles in controlling cell growth, development, and function. The ncRNAs are classified into two groups as short ncRNAs (miRNAs, piwiRNAs, and snRNAs) and long ncRNAs (lncRNAs), consisting of over 200 ribonucleotides in the molecule. The more analytical methods in sequencing RNA molecules developed, the more lncRNAs were identified with increasing in number, which are estimated over 35,000 at present. The lncRNAs are transcribed by RNA pol II, and their TSSs are frequently (65%) found at bidirectional promoter regions . In mouse embryonic cells, transcribed lncRNAs recruit TFs and splicing factors to activate neighboring or bi-directional partner gene expression . Therefore, lncRNAs may give accurate platforms or TSSs for bi-directional partner genes. Recent study with genome editing system identified lncRNA loci regulate genes neighborhood . In addition, it was revealed that specific lncRNAs are contained in nuclear bodies, including nuclear speckle (MALAT1), paraspeckle (NEAT1), and polycomb body (TUG1) , suggesting that lncRNAs affect chromosomal integrity. More importantly, the lncRNAs have certain effects on epigenetic gene regulation systems . The famous example is that X inactive specific transcript (Xist) silences X chromosome genes, interacting with transcriptional suppressor proteins . Not only Xist, but also other lncRNAs, such as HOTAIR, LUNAR1, and MALAT1 are required as scaffolds for DNA methylation/demethylation factors, chromosome looping factors, and splicing factors, respectively . Additionally, enhancer RNA (eRNA), which is transcribed at active enhancer, can function as a scaffold for histone acethyltransferase CBP to modulate gene expression .
4. Epigenetic control of gene expression
Gene expression pattern could be altered by epigenetic regulation . The epigenetic control, which mainly regulates expression of sets of genes, is driven by DNA methylation, histone modifications, chromatin remodeling, and ncRNAs [20, 21]. A lot of factors that are involved in the epigenetic controlling system have been identified and characterized. Methyl groups could be transferred to both DNAs and histone proteins by enzymatic reactions. DNA methylation plays pivotal roles in the regulation of nuclear events, including gene expression . Especially, when silencing of DNA repair genes occurred, it may boost mutation rate, and it will be resulted in the genome instability. Therefore, DNA methylation is regarded as one of the essential biomarkers in cancer . The reaction is catalyzed by at least three independent DNA methyltransferases (DNMTs): DNMT1, DNMT3A, and DNMT3B , using S-adenosyl-
Histone proteins could be modified by attachment of various molecules, including methyl-, acetyl-, hydroxyl-, SUMOyl-, and poly ADP(ribosyl)-groups [20, 29]. In addition, they are acylated on the Lys residues to regulate transcription of genes that encode metabolic-response factors . These modifications are recognized by different proteins, such as bromodomain-containing proteins , double PHD finger domain proteins [30, 32], YEATS domain proteins , WD40 proteins , and Ankyrin-repeat proteins .
As described above, epigenetic regulation is tightly linked with genome stability, and it is affected by modifying group molecules or metabolites, including acetyl-CoA, AdoMet, and NAD+ . These observations suggest that aging is not only determined by genetic information, but also by environmental stresses, including nutrient conditions . Interestingly, recent study indicated that lactate dehydrogenase LDHA promotes IFN-γ expression through an epigenetic mechanism . The results suggest that LDHA-mediated aerobic glycolysis could enforce mitochondria to generate more acetyl-CoA that is to be utilized for histone modification. Poly(ADP-ribosyl)ation is a protein modification reaction that is catalyzed by PARP enzymes using NAD+ as a substrate. The poly(ADP-ribosyl)ation occurs on histones, non-histone proteins, DNA-repair factors , and TFs to regulate gene expression [39, 40]. In summary, nutrients or food intake may have some roles in regulating transcription because they can induce epigenetic changes .
5. Transcription disorders and human diseases
The great progresses in the whole genome sequencing (WGS) techniques enabled us to study subtle differences in genomic sequences between cancer and normal cells . Somatic mutations on driver genes in various cancers have been identified  and the statistical data will be applied on diagnosis or even on the prediction of cancer risks and incidences . Very recently, it was proposed that analysis of WGS data from circulating tumor cells could be applied for personalized therapy for malignant cancer . Importantly, mutations in 5′-upstream region of the human TERT gene are frequently identified in melanoma [46, 47]. It should be noted that in certain cancers, especially in melanomas, the rate of somatic mutations is highly increased at active TF-binding sites, where they interfere accession of nucleotide excision repair (NER) machinery [48, 49]. Thus, cancer-related mutations are not only present in the protein-coding regions, but also in the gene expression regulatory regions, including promoters and enhancers in the genomes.
Various diseases may be caused by dysregulations in transcription. For example, Yes-associated protein (YAP) and TAZ proteins, which activate inflammatory gene expression, are involved in the atherosclerosis . In several neurological and neuromuscular disorders, including Huntington disease, muscular dystrophy, and amyotrophic lateral sclerosis (ALS), accumulation of repeat containing RNAs in aberrant foci in nucleus have been observed . Moreover, shRNA screening system in vitro showed that transcription elongation factors, including JMJD6, help cells to survive in the microenvironment of glioblastoma . The result suggests that the transcription elongation machinery could be an effective therapeutic target. It was recently reported that ENL protein, which possesses acetyl lysine recognizing YEATS domain , acts as an activator of leukemia-diving factor encoding gene expression [53, 54]. Therefore, targeting the ENL protein could be an effective therapeutic strategy against aggressive leukemia. Currently, candidate drugs that target HDAC [55, 56, 57] and DNMTs [56, 57] are under clinical tests and expected to contribute to the development of novel cancer therapeutics. Epigenetic alterations on genomic DNAs is not only associated with cancer generation, but also with neurologic diseases , autoinflammatory diseases , and metabolic diseases, including type II diabetes . Toward establishment of new therapies for these diseases, epigenetic modulators will be the right targets for effective treatment with lowered side effects . We should remember drug resistance of cancer could be caused by compounds that induce epigenetic reprogramming, and thereby alter transcriptional state, which is regulated by SOX proteins, Jun/AP1, and GGAA-recognizing factors . Therefore, secondly effects of drugs on transcription system should be examined. In summary, next generation therapeutics may have to put gene expression systems under control.
6. Future prospects
Overall, most of the cellular responses to signals and stresses from the environment, including DNA damage, nutrient condition, viral infections, and some specific drugs, could affect transcription or gene expression profile. Presently, it has been shown that introduction of several TFs (OKSM factors or Yamanaka factors) into somatic cells can reprogram and convert them with pluripotency [63, 64]. Very recently, it was experimentally shown that iPS cell-derived dopaminergic neurons could be applied for the treatment of Parkinson’s disease . These experimentally supported evidences suggest that introduction of certain combination of TFs into cancer cells might enforce them to reprogram transcription profile so that they could stop proliferation but acquire DNA repair with more accuracy.
Clinical application of gene therapy  with appropriate set of expression vectors to induce DNA repair or mitochondrial function associated genes will soon be established. Not only vectors, which deliver TF-encoding genes into cell nuclei, but also nucleic acids, including siRNAs, lncRNAs, or RNA aptamers, should be also improved for the strategy. In addition, genome editing on the promoter or enhancer regions of some target genes of patient derived cells will be an effective approach to treat specific diseases. For example, it was recently reported that genome editing to delete α-globin enhancer reduces its excessed expression in primary human hematopoietic stem cells, strongly suggesting the clinical use of the technique could be applied as a potential therapy for β-thalassemia . Therefore, genome editing on transcription regulatory elements will be an alternative novel gene therapy for leukemia and cancer, in accordance with a progress in gene delivery system. In the near future, the continuing studies in the transcription controlling systems will successfully contribute to establish novel therapeutics for various human intractable diseases, including cancer, immunological diseases, and neuro-degenerating diseases.