Data Mining Approaches for Understanding of Regulation of Expression of the Urea Cycle Genes

Urea cycle converts ammonia, a waste product of protein catabolism and a neurotoxin, into non-toxic urea. Urea cycle disorders are a group of rare genetic diseases that have protein-restricted diet as a common treatment modality. Expression of urea cycle genes is regulated in concert by the dietary protein intake, but the mechanism of this regulation is not well understood. Data mining of databases such as ENCODE and Cistrome can be used to gain new information about regulatory elements, transcription factors, and epigenetic mechanisms that regulate expression of urea cycle genes. This can lead to better understanding of the common mechanism, which regulates urea cycle genes, and can generate testable hypotheses about regulation of gene expression and new treatments for urea cycle disorders.


Introduction
Transcriptional regulation of gene expression is essential for development, tissue differentiation, and organisms' responses to changes in their environment. Maintenance of homeostasis would be impossible without regulation of expression of genes that code for enzymes in the carbohydrate, fat, and protein metabolism. Omnivorous mammals, such as humans, mice, and rats, can adapt their metabolism to varying proportions of amino acids, fats, and carbohydrates as sources of energy [1,2]. Strict carnivores, such as cats, lack such adaptive mechanisms [3]. Diet rich in carbohydrates and fats triggers utilization of these nutrients as sources of energy and storage of excess sugars and fat in the form of glycogen and adipose tissue [1,4]. On the other hand, a diet rich in proteins imposes changes in nitrogen balance because excess proteins and amino acids cannot be stored [1]. High intake of proteins, combined with low intake of carbohydrates and fats, leads to utilization of amino acids as energy sources and results in increased catabolism of amino acids [1,4] and increased need to dispose of waste nitrogen, generated in this process. Transcriptional regulation of expression of enzymes in the pathways for degradation of nutrients as well as biosynthesis of molecules that can be stored is necessary for adaptations to these dietary changes.   /10.5772/intechopen.81253 to arginine by the ASL (EC 4.3.2.1). Arginase 1 (EC 3.5.3.1) completes the cycle by hydrolyzing arginine into urea and ornithine, which is transported into mitochondria to be a substrate for OTC (Figure 1). Urea cycle genes and enzymes are not uniformly expressed in the liver; their expression follows a gradient from high in hepatocytes surrounding portal vein to no expression in hepatocytes surrounding central vein [11]. NAGS, CPSI, and OTC are also expressed in the small intestine, where they synthesize citrulline which is then transported to the kidneys for the biosynthesis of arginine by the ASS and ASL [6]. Argininosuccinate synthase and lyase also function in the NO signaling and are present in the tissues that express nitrous oxide synthase [5].
Long-term changes in dietary protein intake lead to adaptive changes in expression of urea cycle enzymes. Their expression increases in rats and monkeys fed high-protein diet and decreases upon feeding of low-protein diet [4,12]. These adaptive changes seem to be mediated, at least in part, by the hormones glucagon, glucocorticoid, and insulin [13][14][15][16][17][18][19][20][21][22][23][24][25]. Glucagon and glucocorticoid hormones trigger changes in mRNA and protein levels of all five urea cycle enzymes, but the mechanisms responsible for these changes seem to differ for each enzyme [21]. It is not known whether changes in the dietary protein intake trigger similar changes in the expression of the NAGS gene because it has not been identified at the time. Also unknown are signaling cascades that mediate effects of hormones to regulate expression of urea cycle genes and whether specific amino acids and/or other metabolites act as sensors of the dietary protein intake.
Inspection of the regulatory regions of genes for urea cycle enzymes (Figure 2) does not reveal a common regulatory element that would bind one or more transcriptional factors to coordinately regulate transcription of all urea cycle genes [13,[26][27][28][29][30][31][32][33][34][35][36]. The studies of expression of urea cycle genes in knockout mice also show a lack of common regulatory mechanism. Ureagenesis is defective in mice lacking hepatocyte nuclear factor 4 α (HNF4α) due to absence of OTC mRNA and protein [37] as well as in mice lacking CCAAT/enhancer binding protein α (C/EBPα) due to lack of CPSI mRNA [38]. However, ureagenesis appears normal in mice lacking C/EBPβ although this transcriptional factor appears to regulate expression of the arginase 1 gene [39]. It is also unknown if short-term increases in nitrogen load following a meal trigger any change in expression of urea cycle enzymes.
This chapter focuses on the regulation of NAGS, CPS1, and OTC expression because their only known functions are protection of the brain from ammonia toxicity through participation in the urea cycle and intestinal biosynthesis of citrulline. The three genes share common expression pattern in the liver, intestine, and during development. Because of the role of urea cycle in protecting the brain from ammonia toxicity, expression of the three genes have been studied in much greater detail in the liver than intestinal cells. Detailed understanding of the transcriptional regulation of the urea cycle genes is important for our understanding of bodies' response to changes in the environment such as dietary changes as well as events that trigger increased catabolism of cellular proteins such as starvation, infections, and invasive medical procedures [6,[40][41][42][43]. Because regulation of expression of the mammalian CPS1 and OTC genes has been studied for more than three decades while expression of human NAGS, which was identified and cloned in 2002, took place less than a decade ago the approaches taken in these studies differed greatly. Knowledge of transcriptional regulation of mammalian CPS1 and OTC was gained through cloning of genomic DNA, construction of reporter gene plasmids with various fragments from the CPS1 and OTC regulatory regions and their expression in cultured cells and transgenic mice, whereas regulatory elements of the NAGS gene have been identified using comparative genomics approaches.

Transcriptional regulation of mammalian NAGS gene
Although the existence of mammalian NAGS gene and its product have been known since the 1950s [44], the gene remained elusive until 2002, when it was identified and cloned in mice and humans [45][46][47][48]. The human NAGS gene is located on chromosome 17 within band 17q21.31 and spans approximately 8.5 kb. This includes seven exons that encode a 1605 bp open reading frame, six introns, a promoter, and an enhancer located about 3 kb upstream of the transcription start sites [45,47,[49][50][51]. The mouse Nags gene is located in the syntenic region on chromosome 11. Pairwise BLAST [52] was used for comparison of the regions upstream of the NAGS genes from seven mammals including human; this analysis revealed two conserved elements, one located immediately upstream of the first exon of the NAGS gene and a putative regulatory element located about 3 kb upstream of the NAGS translation initiation site [50]. The pattern of DNA sequence conservation within the conserved region immediately upstream of the first exon of the NAGS gene suggested that it might consist of a promoter and a proximal regulatory element, which is similar to the CPS1 regulatory region that will be described in the next section. Cis-element over-representation (CLOVER) software [53] was then used to identify binding sites for specificity protein 1 (Sp1), cAMP response element binding/activating transcription factor (CREB/ATF), and CCAAT-enhancer binding protein (C/EBP) transcription factors in the putative NAGS promoter, while activator protein-2 (AP2), hepatic nuclear factor 1 (HNF1), nuclear factor-Y (NF-Y), and mothers against decapentaplegic homolog 3 (SMAD3) binding sites were found in the upstream regulatory element, which was named −3 kb enhancer. These findings were then experimentally verified.
Reporter gene assays were used to confirm that conserved regions located adjacent to and −3 kb upstream of the first NAGS exon indeed function as a promoter and enhancer in the HepG2 hepatoma cells [50]. Consistent with the absence of the TATA-box in the NAGS promoter, transcription of the NAGS mRNA in the liver and intestine initiates at multiple sites located between 50 and 150 bp upstream of the NAGS translation initiation codon [50]. Binding of the Sp1 and CREB transcription factors to the NAGS promoter, and binding of the HNF1 and NF-Y transcription factors to the −3 kb enhancer were confirmed with chromatin immunoprecipitation (ChIP) and DNA pull-down assays [50]. Binding of HNF1 to the −3 kb NAGS enhancer is responsible for the liver-specific expression of the NAGS gene [50] and the role of HNF1 transcription factor in expression of the NAGS gene was confirmed when a sequence variant that caused decreased HNF1 binding to its site was found in a patient with NAGS deficiency [49].

Transcriptional regulation of mammalian CPS1 gene
Regulatory region of the rat Cps1 gene has been cloned in 1985 [54]. Almost all of our knowledge of transcriptional regulation of the CPS1 gene is based on experiments with the rat Cps1 gene in the rat and human hepatoma cell lines and transgenic mice. The aim of these studies was to elucidate the mechanism of regulation of Cps1 expression by glucagon and glucocorticoids as well as identify regulatory elements that restrict Cps1 expression to periportal hepatocytes [55][56][57].
A promoter, located immediately upstream of the first Cps1 exon, a proximal enhancer that is immediately adjacent to the promoter, a distal enhancer located about 6 kb upstream of the Cps1 translation initiation codon, and another regulatory element located about 10 kb upstream of the Cps1 translation initiation site are responsible for transcriptional regulation of the rat Cps1 gene [27,29,56,58]. Transcription of the rat Cps1 mRNA is initiated 138-140 bp upstream of the translation initiation site by the promoter that has TATA and CAAT motifs [59] and binds C/EBP transcription factor [29]. The distal Cps1 enhancer consists of a cAMP response unit (CRU) and a glucocorticoid response unit (GRU); each response unit binds multiple transcription factors that activate Cps1 expression in response to glucagon and glucocorticoids, and are responsible for Cps1 expression in periportal hepatocytes [26,55,56,60]. The CRU binds CREB, HNF3, C/EBP transcription factors, and a yet to be identified protein P1 [60,61], while GRU binds glucocorticoid receptor, hepatocyte nuclear factor 3/forkhead box A (HNF3/FOXA), C/EBP, a 75 kDa protein P3, and a yet to be identified protein P2 [26, 61,62]. The distal Cps1 enhancer activates Cps1 transcription via proximal enhancer that binds C/EBP and glucocorticoid receptor and is located immediately upstream of the Cps1 promoter [63]. These studies of the rat Cps1 gene regulation rest on the premise that the rat gene is a good model for human CPS1. However, the two species have different metabolic rates due to their different sizes and regulation of the CPS1 gene and urea cycle may differ in the two organisms. More recently, a region of the human CPS1 gene that corresponds to the rat Cps1 promoter and proximal enhancer has been shown to bind HNF3 and direct reporter gene expression in hepatoma cells [64]. Human CPS1 gene is located on chromosome 2, band 2q34 where it spans approx. 125 kb and has 38 exons that encode a 1500 amino acids long protein.

Transcriptional regulation of mammalian OTC gene
The human OTC gene is 70 kb long and has 10 exons which contain a 1062 bp long coding sequence [65]. Transcription of the human OTC gene initiates at multiple transcription start sites (TSS) [66], while transcription of the mouse and rat Otc genes initiate at a single transcription start site located 136 and 98 bp upstream of the translation initiation codon [67,68]. The rat Otc promoter is sufficient for expression of transgenes in the liver and intestine of transgenic mice [69,70]. An enhancer located approximately 11 kb upstream of the first exon of the rat Otc gene is responsible for a high level of expression of the Otc gene in the liver [31]. This −11 kb enhancer has four transcription factor-binding sites, designated I-IV [31]; sites I and II bind C/EBPβ, while transcription factor HNF4 binds to sites I and IV in the rat Otc enhancer to activate expression of the Otc gene [32,37,38]. Since comparative genomics studies revealed that the distance between regions that correspond to the −11 kb rat Otc enhancer and OTC promoter vary in mammalian genomes, this region was renamed as the liver-specific enhancer (LSE) [71].

Transcriptional regulation of the NAGS, CPS1, and OTC genes in the genomics era
Advances in sequencing technology-enabled sequencing of dozens of mammalian genomes and comparisons of their sequences revealed conserved regions in non-coding regions that could function as regulatory elements [72,73]. This strategy was used to identify NAGS promoter and enhancer [50]. Next-generation sequencing also enabled examination of the function of non-coding regions in the human and mouse genomes including their chromatin structure, and binding of transcription and chromatin remodeling proteins to generate an Encyclopedia of Non-coding DNA Elements (ENCODE). These studies were first carried out in the limited number of cultured cell lines, but are now expanding to include tissues and cultured primary cells and their results have been stored in the ENCODE database [74,75]. In addition to these large-scale projects, many individual labs have been performing ChIP-Seq experiments and the publically available results of their experiments are being gathered in the Cistrome database [76]. The advantage of the Cistrome database is ability to compare chromatin states and track changes in binding of transcription factors in response to signaling molecules, treatments, and environmental stimuli. Data mining of the ENCODE and Cistrome databases present an opportunity to identify novel regulatory elements in the NAGS, CPS1, and OTC genes and transcription factors that bind to the regulatory elements. Both databases were queried for chromatin modifications and binding of transcription factors to the NAGS, CPS1, and OTC genes and their flanking regions in the liver tissue using following coordinates of the GRCh38/hg38 human genome assembly: chr17:43,994,682-44,012,832 for the NAGS gene, chr2:210,499,833-210,691,279 for the CPS1 gene, and chrX:38,334,777-38,459,529 for the OTC gene. The following filters were applied to experimental matrix of the ENCODE database (www. encodeproject.org): organism-Homo sapiens, biosample type-tissue, organliver, project-ENCODE, genome assembly-GRCh38, assay category-ChIP-Seq, assay category-DNA binding, and target of assay-transcription factor, histone, broad and narrow histone mark, and chromatin remodeler. Results of the query were visualized using UCSC Genome Browser. Results of the ChIP-Seq experiments for the genomic region of interest can be downloaded as either wiggle or bed files using Tools and Table Browser menus of the UCSC Genome Browser. The ChIP-Seq data for each DNA binding protein and histone modification of interest can be acquired by selecting ENCODE Hub from the group menu, ENCODE ChIP-Seq from the track menu and experiment ID from the table menu of the Table Browser page. Cistrome Data Browser was used to query Cistrome database (www.cistrome. org); Homo sapiens was selected as species and hepatocyte as biological source. Experimental results that passed quality controls were visualized in the UCSC Genome Browser and results of experiments were obtained is the same way as for the ENCODE database.
The 5′-ends of each region were chosen based on the presence of RAD21, a component of cohesion, and CTCF binding that can indicate boundaries of chromatin domains, whereas the 3′-ends of the NAGS and OTC genomic region have been chosen to be either within or close to their downstream neighboring genes; the 3′-end of the CPS1 genomic region was chosen to include a conserved region downstream of the last exon of the CPS1 gene (Figures 3A, 4A, and 5A). RAD21 and CTCF bind to additional sites within NAGS, CPS1, and OTC genes and the role of cohesion and CTCF in expression of the three genes is yet to be determined (Figures 3A, 4A, and 5A).
The H3K4me3 histone 3 modifications that mark active promoters and H3K27ac modifications that mark active enhancers are present at upstream regions of all three genes (Figures 3B, 4B, and 5B). The ENCODE database also has the DNaseI sensitivity data from the human fetal liver tissue that show open chromatin state for the NAGS gene and closed chromatin for the CPS1 and OTC genes (Figures 3B,  4B, and 5B). This difference could be due to presence of the ubiquitously expressed TMEM101 gene downstream of the NAGS gene (Figure 3).
Query of the ChIP-Seq data in the ENCODE database confirmed binding of the Sp1, CREB/ATF3, HNF4A, HNF3/FOXA1, HNF3/FOXA2, and COUP-TF to the promoters and enhancers of the NAGS, CPS1, and OTC genes and identified binding of these and several other transcription factors to previously identified as well as novel regulatory elements (Figures 3C, 4C, and 5C). Transcription factors RXRA, COUP-TF and HNF4A, YY1 and JUND/AP1 also appear to bind NAGS promoter while RXRA, HNF4A, YY1, and REST bind the −3 kb NAGS enhancer (Figure 3C). The ChIP-Seq data also show binding of transcription factor to regions of the NAGS gene that could be novel regulatory elements. For example, Sp1, RXRA, and HNF4A bind to a region in the first intron of the NAGS gene. ChIP-Seq data also show that transcription factors bind to a region located between NAGS promoter and −3 kb enhancer as well as to the two regions upstream of the −3 kb enhancer; these sites could be novel regulatory elements of the NAGS gene ( Figure 3C). The map of the NAGS genomic region shows that transcription of the PYY gene initiates at the NAGS promoter but in the opposite direction ( Figure 3D). This is because a PYY cDNA (GenBank ID BC041057.1) has been isolated from a brain astrocytoma sample and sequenced [77]. This PYY transcript may have resulted from aberrant expression of the PYY gene in the brain astrocytoma cells since PYY is not expressed in the brain according to the Human Protein Atlas [78] and the GTEx track of the UCSC Genome Browser [79].
Transcription factors HNF3/FOXA1, HNF3/FOXA2, and CREB bind to the human CPS1 promoter and the region upstream of the human CPS1 that corresponds to the rat Cps1 distal enhancer as well as additional sites located within the first intron of the CPS1 gene and upstream of the distal enhancer ( Figure 4C). Moreover, HNF4A, RXRA, SP1, YY1, JUND/AP2, and REST also bind to the CPS1 upstream region and first intron (Figure 4C). It is possible that yet to be identified proteins P1, P2, and P3 that bind to the rat Cps1 distal enhancer are among these transcription factors. Likewise, the HNF4A and COUP-TF transcription factors that are known to bind to the OTC promoter and LSE also bind to sites upstream of the LSE and within first OTC intron, and additional transcription factors bind to these regions (Figure 5C). The novel regulatory elements that have been identified by the binding of transcription factors coincide with regions that are conserved in vertebrates as indicated by the phyloP and phastCons tracks of the UCSC Genome Browser (Figures 3D, 4D, and 5D).
These data mining efforts identified a common set of transcription factors that bind to the regulatory regions of the NAGS, CPS1, and OTC genes in the liver and may be responsible for the coordinated changes in their expression in response to dietary protein intake and hormonal signaling. The knowledge of transcription factors that regulate expression of urea cycle genes could provide a clue about amino acid(s) and metabolites that act as sensors of the dietary protein intake. Data mining of the ENCODE and Cistrome databases presented here revealed that transcription factor RXRA binds to regulatory elements of NAGS, CPS1, and OTC genes. The RXRA transcription factor regulates gene expression by forming heterodimers with several transcription factors including peroxisome proliferatoractivated receptor gamma (PPARγ), which regulated glucose metabolism [80]. If future studies show that PPARγ-RXRA heterodimer regulates expression of the NAGS, CPS1, and OTC genes, that would suggest that glucose, rather than amino acid(s), might be a sensor of the balance of protein and carbohydrate intake that regulates expression of the urea cycle genes. Although similar data sets are not yet available for the human small intestine cells, the ENCODE project is ongoing and future data mining efforts will provide a more complete information about transcriptional regulation of the NAGS, CPS1, and OTC genes. Similarly, the Cistrome  database is growing and queries of its data may reveal molecular mechanisms of NAGS, CPS1, and OTC regulation through differential binding of transcription factors to their regulatory elements. The utility of the data mining approach goes beyond understanding of transcriptional regulation of genes. This approach can be used to explain deleterious effects of sequence variants on expression of genes that are associated with human diseases and identify drug targets for treatment of diseases that can benefit from increased expression of hypomorphic alleles. In the case of NAGS, CPS1, or OTC deficiencies, which have high plasma ammonia, or hyperammonemia, as a common symptom, partial defects in any of the three genes result in the decreased activity or abundance of the corresponding enzyme and decreased capacity for ureagenesis. Protein-restricted diet, which minimizes ammonia production, is standard therapy for patients with partial defects of the NAGS, CPS1, OTC, and other urea cycle genes that were not discussed in this chapter. However, protein-restricted diet also leads to decreased expression of urea cycle genes, including the defective one, leading to further decrease patient's capacity for ureagenesis and increased risk of hyperammonemia. A drug therapy that is based on transcriptional regulation of the NAGS, CPS1, and OTC genes might be able to increase their expression even when patients are on the protein-restricted diet and decrease patients' risk of hyperammonemia-induced brain damage.