Eukaryotic and prokaryotic regulation databases.
The regulation of gene expression is the process by which expression of genes is controlled (induced or repressed) at the cell level in a particular time under a particular condition. It is a fundamental process to diverse other biological processes that occur within the cell including cell development and differentiation, the response and the adaptation to environmental stresses. Gene regulation has classically been viewed as the interaction between proteins to regulatory elements located at the vicinity of the transcription start site within promoters. However, gene regulation is a more complex process that involves additional layers of control including chromatin remodeling, nucleosome positioning, histone modifications, DNA-binding regulatory proteins such as transcription factors and noncoding RNA [1, 2, 3]. Such process requires structural and chemical changes to the genetic material, binding of proteins to specific DNA elements to regulate transcription, or mechanisms that modulate translation of mRNA.
Indeed, gene expression is controlled at multiple cellular levels consisting in the chromatin level through chromatin modification and remodeling, the mRNA level (transcriptional and posttranscriptional regulation) and protein level (translation regulation and posttranslational degradation).
This introductory chapter will give a brief overview on the transcriptional and posttranscriptional regulation, list the main database resources that can be used for transcriptional and/or posttranscriptional regulation data and finally list the main tools allowing to predict TF and miRNA gene targets.
2. Transcriptional regulation
Regulation at the transcriptional level involves proteins called transcription factors (TFs) that recognize and bind specifically to regulatory elements within the promoter regions to control the expression of a downstream gene. These TFs regulate target genes—by turning them on and off—in order to make sure that they are transcribed into mRNA within the cell at the right time and in the right amount. TFs are classified into three large families of DNA-binding domains that include:
Basic helix-loop-helix (bHLH) proteins found in organisms from yeast to humans and function in critical developmental processes controlling embryonic development, particularly in neurogenesis, myogenesis, heart development, and hematopoiesis [4, 5].
The TFs with basic leucine zipper domains .
TFs with the helix-turn-helix (HTH) domains that are involved in a wide range of functions beyond transcription regulation, including DNA repair and replication, RNA metabolism, and protein-protein interactions in diverse signaling contexts [7, 8]. This group also includes homeobox (zinc finger, HOX-like, TALE, POU, etc.) and homeodomain protein products.
High-throughput techniques including ChIP-on-chip/ChIP-seq and enhanced yeast one-hybrid have been widely employed to uncover protein-DNA interactions [9, 10] and represent convenient methods to identify and characterize the repertoire of regulatory elements that can be targeted by a protein of interest or transcription factors that can bind a DNA sequence of interest , respectively. Thanks to the ENCODE (Encyclopedia of DNA Elements) project aiming to build a comprehensive parts list of functional elements in the human genome including regulatory elements that control cells, such regulatory data were made available for the scientific community (
In addition to the ENCODE project, several regulatory databases have been developed for including multiple animals/plants/microorganisms regulation data. Table 1 lists the most widely used transcriptional regulation database with a brief description, reference to original publication and current accessible website URL.
|TRANSFAC||TRANSFAC||TRANSFAC® is a maintained and curated database of eukaryotic transcription factors, their genomic binding sites, and DNA-binding profiles.|||
|Transcription Regulatory Regions database||TRRD||TRRD is a unique information resource, accumulating information on the structural and functional organization of transcription regulatory regions of eukaryotic genes.|||
|Ensembl Regulation||Ensembl Regulation||Ensembl Regulation provides resources used for studying gene expression and its regulation in human and mouse, with a focus on the transcriptional and posttranscriptional mechanisms.|||
|Regulatory Network Repository of Transcription Factor and microRNA Mediated Gene Regulations||RegNetwork||RegNetwork is developed based on 25 databases that provide the regulatory relationship information, annotation, and other necessary information in order to derive the regulatory relationships.|||
|Transcriptional Regulatory Element Database||TRED||TRED provides good training datasets for further genome-wide cis-regulatory element prediction, assist detailed functional studies, and facilitate to decipher the gene regulatory networks.|||
|Transcriptional Regulatory Relationships Unraveled by Sentence Based Text mining||TRRUST||TRRUST database provides information of mode of regulation (activation or repression).|||
|Open Regulatory Annotation database||ORegAnno||The Open Regulatory Annotation database (ORegAnno) is a resource for curated regulatory annotation.|||
|PRODORIC||PRODORIC2||The PRODORIC2 database hosts one of the largest collections of DNA-binding sites for prokaryotic transcription factors.|||
|Gene Transcription Regulation Database||GTRD||The most complete collection of uniformly processed ChIP-seq data to identify transcription factor binding sites for human and mouse.|||
|Transcription factor prediction database||DBD||DBD is a database of predicted transcription factors in completely sequenced genomes.|||
3. Post-transcriptional regulation
A very large part of the human genome constitutes noncoding elements classified as small noncoding RNAs (sncRNAs) and long noncoding RNAs (lncRNAs). These noncoding components are receiving increased attention from researchers due to their predicted important role in posttranscriptional regulation. Small ncRNAs class includes small interfering RNAs (siRNAs), microRNAs (miRNAs), PIWI-interacting RNAs (piRNAs), endogenous small interfering RNAs (endo-siRNAs or esiRNAs), promoter associate RNAs (pRNAs), small nucleolar RNAs (snoRNAs), and sno-derived RNAs, while lncRNAs includes linc RNA, NAT, eRNA, circ RNA, ceRNAs, PROMPTS. Both lncRNAs and sncRNAs have been identified at regulatory elements [23, 24]. Among these noncoding elements, microRNAs have been the most widely investigated since their discovery in the early 1990s, underscoring their importance in posttranscriptional gene regulation . These later act as posttranscriptional regulators of their messenger RNA (mRNA) targets via mRNA degradation and/or translational repression . It has been widely evidenced that miRNA-mediated downregulation is a one-way process leading to the repression of translation and/or target mRNA degradation [27, 28, 29, 30]; however, recent studies have shown that miRNAs are able to upregulate gene expression in specific cell types and conditions with distinct transcripts and proteins .
Pulling down microRNA-induced silencing complexes (miRISCs) immunoprecipitation method allows researchers to collect information on microRNAs and their mRNA targets in vivo. Such information has been collected and stored in several public databases. Table 2 contains the most widely used posttranscriptional regulation database with a brief description, reference to original publication and current functional website URL.
|The microRNA database||miRBase||http://www.mirbase.org/||The miRBase database is a searchable database of published miRNA sequences and annotation.|||
|The experimentally validated microRNA-target interactions database||miRTarBase||miRTarBase has accumulated miRNA-target interactions (MTIs), which are collected by manually surveying pertinent literature.|||
|miRDB||miRDB||miRDB is an online database for miRNA target prediction and functional annotations|||
|miRNAMap||miRNAMap||An online resource that stores information related to the known miRNAs in metazoan.|||
|Vir-Mir||Vir-Mir||Contains predicted viral miRNA candidate hairpins|||
|Virus miRNA Target||ViTA||ViTa collects virus data from miRBase and ICTV, VirGne, VBRC, etc. and provide effective annotations, including human miRNA expression, virus-infected tissues, annotation of virus, and comparisons.|||
|miRecords||miRecords||miRecords is a resource for animal miRNA-target interactions.|||
|microRNA Data Integration Portal||mirDIP||Provides several million human microRNA-target predictions, which were collected across 30 different resources.|||
4. The interplay between TFs and miRNAs
Transcription factors (TFs) and microRNAs (miRNAs) are key regulators of gene expression. Several studies have shown that abnormal miRNA and/or TF expression can be critical for cell survival and development through targeting critical genes in the cellular system. In the last decade, several bioinformatic studies have been performed to elucidate transcriptional and posttranscriptional (mostly miRNA-mediated) regulatory interactions. Besides experimental techniques (ChIP-Seq, ChIP-ChIP, yeast two-hybrid, miRISCs), computational tools have been developed to predict the TF-gene target and/or miRNA-target interactions. Table 3 lists some bioinformatic tools used to predict transcriptional and posttranscriptional regulation. Using such tools and/or through the integration of data collected from public databases (Tables 1 and 2), researchers were able to generate regulatory networks aiming to understand mechanisms involved in some phenotypes and/or diseases. Recent studies focused on the study of mixed miRNA/TF feed-forward regulatory loops (FFLs) through genome-wide transcriptional and posttranscriptional regulatory network integration to decipher the complex and interlinked cascade of events related to several diseases [46, 47, 48]. Such approaches provide the scientific community with the ability to investigate the interplay between TFs and miRNAs in a given system.
|Tool/Web tool||Website link||Description||References|
|TargetFinder||Provides a web-based resource for finding genes that show a similar expression pattern to a group of user-selected genes.|||
|BART: Binding analysis for regulation of transcription||A novel computational method and software package for predicting functional transcription factors that regulate a query gene set or associate with a query genomic profile, based on more than 6000 existing ChIP-seq datasets for over 400 factors in human or mouse.|||
|MATCH||Match is a weight matrix-based program for predicting transcription factor binding sites (TFBS) in DNA sequences.|||
|RNAhybrid||RNAhybrid is a tool for finding the minimum free energy hybridization of a long and a short RNA.|||
|TargetScan||TargetScan predicts biological targets of miRNAs by searching for the presence of conserved 8mer, 7mer, and 6mer sites that match the seed region of each miRNA.|||
|miRWalk||Supplying the biggest available collection of predicted and experimentally verified miRNA-target interactions with various novel and unique features.|||
During these last years, transcriptional and posttranscriptional regulation constituted the most important layers of gene regulation. However, a recent study by Barna group  has upset our understanding of gene regulation. Indeed, while researchers have believed for decades that ribosomes are identical showing no preference for translating RNA molecules into proteins, it appears that these later exhibit a preference for translating certain types of genes. One type of ribosome, for example, prefers to translate genes involved in cellular differentiation, while another specializes in genes that carry out essential metabolic duties. This study is uncovering a new layer of gene expression regulation that will have broad implications for basic science and human disease.