5 Shared Regulatory Motifs in Promoters of Human DNA Repair Genes

This manuscript presents methods used to test, and resulting evidence to support the hypothesis that specialized transcription factor binding sites coordinate the expression of DNA repair genes. Building on the seminal work of the Elnitski laboratory (Yang et al. 2007), which identified the most complete set of human transcripts under the control of bidirectional promoters and identified the first putative regulatory networks that make use of the bidirectional promoter structure, the authors present additional details of these regulatory networks. Much of the work regarding the regulation of DNA repair proteins is aimed at the level of protein-protein interactions and post-translational processing events (Hurley et al. 2007, Jensen et al. 2011, Shibata et al. 2010). However, transcriptional activation of DNA repair genes is likely to utilize shared factors, especially in cases of induced activation, which have not been thoroughly evaluated. Yang, Koehly and Elnitski reported the discovery and characterization of 5,653 bidirectional promoters in the human genome (Yang et al. 2007). Prior to that date, bidirectional promoters were annotated only for protein-coding genes, and only 1,352 examples had been reported in the human genome. The work of Yang et al. included evidence from all noncoding-RNA genes, as well. Each bidirectional promoter regulates the expression of two genes, oriented in opposite directions with transcription start sites within 1000 bp of one another. The authors developed a novel approach to map all bidirectional promoters by analyzing the public expressed-sequence-tag (EST) data. The prevalence of this promoter structure led the authors to explore the hypothesis that it plays a role in regulation of certain classes of genes. They discovered that many more DNA repair genes have bidirectional promoters than previously reported and that many genes with somatic mutations in cancer have bidirectional promoters. The relevance of DNA repair genes to cancers (Kinsella et al. 2009, Liang et al. 2009, Smith et al. 2010, Kelley et al. 2008, Li et al. 2009, Bellizii et al. 2009, Naccarati et al. 2007, Berwick et al. 2000)) and the association of bidirectional promoters with DNA repair genes suggested that bidirectional promoters might indicate a higher-order type of regulatory structure that could be detected through common features at the DNA sequence level. If true, these features should discriminate bidirectional promoters and unidirectional promoters of genes with DNA repair functions.


Introduction
This manuscript presents methods used to test, and resulting evidence to support the hypothesis that specialized transcription factor binding sites coordinate the expression of DNA repair genes.Building on the seminal work of the Elnitski laboratory (Yang et al. 2007), which identified the most complete set of human transcripts under the control of bidirectional promoters and identified the first putative regulatory networks that make use of the bidirectional promoter structure, the authors present additional details of these regulatory networks.Much of the work regarding the regulation of DNA repair proteins is aimed at the level of protein-protein interactions and post-translational processing events (Hurley et al. 2007, Jensen et al. 2011, Shibata et al. 2010).However, transcriptional activation of DNA repair genes is likely to utilize shared factors, especially in cases of induced activation, which have not been thoroughly evaluated.Yang, Koehly and Elnitski reported the discovery and characterization of 5,653 bidirectional promoters in the human genome (Yang et al. 2007).Prior to that date, bidirectional promoters were annotated only for protein-coding genes, and only 1,352 examples had been reported in the human genome.The work of Yang et al. included evidence from all noncoding-RNA genes, as well.Each bidirectional promoter regulates the expression of two genes, oriented in opposite directions with transcription start sites within 1000 bp of one another.The authors developed a novel approach to map all bidirectional promoters by analyzing the public expressed-sequence-tag (EST) data.The prevalence of this promoter structure led the authors to explore the hypothesis that it plays a role in regulation of certain classes of genes.They discovered that many more DNA repair genes have bidirectional promoters than previously reported and that many genes with somatic mutations in cancer have bidirectional promoters.The relevance of DNA repair genes to cancers (Kinsella et al. 2009, Liang et al. 2009, Smith et al. 2010, Kelley et al. 2008, Li et al. 2009, Bellizii et al. 2009, Naccarati et al. 2007, Berwick et al. 2000)) and the association of bidirectional promoters with DNA repair genes suggested that bidirectional promoters might indicate a higher-order type of regulatory structure that could be detected through common features at the DNA sequence level.If true, these features should discriminate bidirectional promoters and unidirectional promoters of genes with DNA repair functions.
Thus, this chapter presents additional evidence of these regulatory networks.Specifically, this chapter provides evidence that there are distinct regulatory signatures for (1) genes involved in certain types of cancers, (2) bidirectional versus unidirectional promoters and (3) specific DNA repair pathways.The authors have identified transcription factor binding sites in bidirectional promoters of genes implicated in breast and ovarian (B/O) cancers.Additionally, they have discovered novel transcription factor binding sites that may serve as regulatory elements to distinguish DNA repair genes with bidirectional promoters from DNA repair genes with unidirectional promoters.Applications of this work extend to a collection of novel transcription factor binding sites shared among genes acting as checkpoint factors of DNA repair pathways.These findings have important implicationsas evidence of novel regulatory mechanisms, and new insights into cancer biology (i.e., genomic elements relevant to transcriptional regulation) are gained.

Regulatory features of genes implicated in breast and ovarian cancers
This section provides evidence to support the hypothesis that there are distinct regulatory control systems among bidirectional and unidirectional promoters.Additionally, this section presents transcription factor binding sites discovered in bidirectional promoters of genes implicated in breast and ovarian cancers.As reported in Yang et al. 2007, we identified transcription factor binding sites for known factors in genes implicated in B/O cancers.The enrichment of bidirectional promoters in several cancer genes, and in additional genes having functions in DNA repair, suggests common mechanisms of regulation.We used expression clustering and enrichment of genes with bidirectional promoters to group the cancer genes into expression groups from the full genome to address features common among the clusters that might indicate the presence of regulatory networks.The cancer-related genes that were identified and studied are listed below, along with their descriptions from GeneCards (Safran et al. 2010).The Elnitski group was the first to report that this set of genes has bidirectional promoters.All genes were assessed for the top most related gene expression profiles in the genome using the gene sorter tool at the UCSC Genome Browser and expression data from the Novartis GNF Atlas2 (containing expression profiles for 96 tissues).Each cluster was then compared to all the others to identify intersection points (by gene names) among the lists of co-expressed genes.Using a process of multidimensional scaling, the gene lists were compared and a putative regulatory network was generated (Figure 1).The MLH1 gene appeared in several co-expression clusters and therefore occupied a central location with connections to 7 other genes (BARD1, FANCA, BRCA1, CHK2, BRCA2, TP53 and FANCF).Two additional genes co-occupied the central position with MLH1.COMMD3 (an uncharacterized protein) and ITGB3BP, a regulator of apoptosis in breast cancer cells.

Network visualization
The bidirectional promoters that are associated with the breast and ovarian cancer genes were considered an affiliation network or a bipartite graph.In this example nodes represent the genes in the co-expression clusters and edges connect the genes appearing in more than one list.The higher the number of appearances of any gene from the ten co-expression lists, the more central its position in the network.Geodesic distances between genes were computed (e.g.length of the shortest path between genes through promoters, and the geodesic distance matrix was scaled using a metric multidimensional scaling (MDS)  Borgatti et al., 2002).The distance between the 10 B/O cancer genes represents their similarity based on the number of shared genes found in the coexpression clusters.Genes in the center of the network were present in the largest number of gene clusters, seven out of 10, indicating that co-expression clusters intersect through common regulatory nodes.

Transcription factor binding site analysis
A systematic search of transcription factor binding sites in the list of bidirectional promoters was used to assess regulatory connections at the DNA level, and revealed several in common (using a motif finding algorithm we searched for the motifs reported in (Xie et al. 2005)).Notably, identical ELK1 binding sites were located at the same distance from ERBB2, FANCD2, and BRCA2 transcription start sites (Yang et al. 2007).ETS factor binding sites were present as a trio with SP1 and PAX4/RXR binding sites in the majority of the promoters.The transcription factors for which binding motifs were found in all of the promoters along with their descriptions from GeneCards (Safran et al. 2010) are reported in Table 2.

Sp1
Transcription factor that can activate or repress transcription in response to physiological and pathological stimuli.Regulates the expression of a large number of genes involved in a variety of processes such as cell growth, apoptosis, differentiation and immune responses.May have a role in modulating the cellular response to DNA damage.

NFAT
The nuclear factor of activated T-cells family of transcription factors.

EGR-1
The protein encoded by this gene belongs to the EGR family of C2H2-type zinc-finger proteins.It is a nuclear protein and functions as a transcriptional regulator.Studies suggest this is a cancer suppressor gene.

PAX4
This gene is a member of the paired box (PAX) family of transcription factors.These genes play critical roles during fetal development and cancer growth.

ELK1
ELK1 is a member of ETS oncogene family.The protein encoded by this gene is a nuclear target for the ras-raf-MAPK signaling cascade.
Table 2. Transcription factor binding sites in the promoters of the B/O cancer genes. www.intechopen.com

Unbiased assessment of transcription factor binding sites in two subgroups of genes from DNA repair pathways
The research reported in (Yang et al. 2007) provides strong evidence that a unique set of regulatory proteins control genes that contain bidirectional promoters by comparing coexpression clusters of genes enriched for bidirectional promoters versus those depleted for bidirectional promoters.This section reports on a study that identified transcription factor binding sites that are specific to genes in DNA repair pathways (Lichtenberg et al. 2009).The promoters of genes from the DNA repair pathways were partitioned into two groups, those that are bidirectional (32 promoters) and those that are unidirectional (42 promoters).

Assessment of individual sites
Each group of promoters was analyzed to discover putative transcription factor binding sites.The analysis was performed with WordSeeker motif discovery software (Lichtenberg et al. 2010), which employs high performance supercomputer-based algorithms to perform motif enumeration and to construct Markov models.Our analysis revealed that the average nucleotide G+C content of the bidirectional promoters was slightly higher than the unidirectional promoters, 59.87% versus 50.84%, respectively.These differences were rigorously controlled by the use of the Markov model, which examines background frequencies of each nucleotide in the collection of sequences.Unique sets of binding sites were identified for each group, some of which represent novel binding sites.
A statistical analysis of the promoters of the DNA repair genes revealed a number of significant DNA binding site motifs.Some of the discovered motifs correspond to recognition sequences of known proteins.These are listed in Table 3, along with their pvalues and the corresponding transcription factors known to bind to the motifs (as determined by the TRANSFAC database (Wingender et al. 2000) and the JASPAR database (Bryne et al. 2008)).In addition, novel motifs, representing uncharacterized transcription factor binding sites, were discovered in the bidirectional and unidirectional promoters from DNA repair pathway genes (see Table 4 for the motifs and their p-values).

Assessment of paired binding sites
To identify putative regulatory modules (co-acting regulatory elements), we identified statistically overrepresented pairs of DNA motifs in each set of promoters.Motif pairs are shown in Table 5.The motif pair scores are computed as the product of (1) the number sequences, S, in which the pair occurs and (2) the natural log of the ratio of S and the expected value of S, E s ; i.e., the score is S•ln(S/E s ).The genomic signatures (significant DNA motifs and motif pairs) of the bidirectional promoters were virtually non-overlapping with the signatures of unidirectional promoters.This provides strong support for the hypothesis that the regulatory mechanisms of bidirectional promoters are unique.Additionally, this work contributes a significant enhancement to the available knowledge about transcriptional regulation of genes involved in DNA repair pathways, and implicates the presence of a regulatory network.

Unbiased assessment of transcription factor binding sites of checkpoint factor genes from DNA repair pathways
We have performed a focused, detailed characterization of the checkpoint factors in DNA repair pathways (Elnitski et al. 2010).The checkpoint factors (Kanehisa et al. 2008, Wood 2005, Helleday et al. 2008) are activated upon detection of DNA damage, resulting in halting the cell cycle so that subsequent DNA repair pathways can mend the damage.In addition to examining the most recognized promoter in each gene (the 5' end of the full-length transcription unit), we assessed alternative start sites for each checkpoint factor gene as independent regulatory units, to discover putative transcription factor binding sites.section we report the DNA motifs that were discovered, along with several clusters of related genes and promoters.We hypothesize that these similar components implicate regulatory networks responsible for co-regulation of the checkpoint factor genes.We studied fourteen checkpoint factor genes, which are listed in Table 6.The number of alternative promoters per gene, shown in parentheses, varied for each gene.Because most of the genes have alternative promoters, we analyzed a total of thirty promoters.The complete set of alternative promotes is shown in Table 7. Alternative promoters were identified using annotations of genes in the UCSC Human Genome Browser.Transcription start sites of transcript isoforms served as the coordinates around which 900 bp upstream and 100 bp downstream were defined as the putative promoter region.Alternative promoters with significant overlap were truncated or removed from the analysis.DNA sequences were obtained for the forward and reverse strands of the genome to ensure coverage of words that might have biased nucleotide content and be subject to omission during the Markov model analysis stage.

Gene Description from GeneCards (Safran 2010) ATM (5)
The protein encoded by this gene (ataxia telangiectasia mutated) belongs to the PI3/PI4-kinase family.This protein functions as a regulator of a wide variety of downstream proteins, including p53, BRCA1, CHK2, RAD17, RAD9, and NBS1.This protein and the closely related kinase ATR are thought to be master controllers of cell cycle checkpoint signaling pathways, required for cell response to DNA damage and for genome stability.

ATR (2)
The protein encoded by this gene (ataxia telangiectasia and Rad3 related) belongs the PI3/PI4-kinase family, and is most closely related to ATM.Both proteins share similarity with The protein encoded by this gene is a cell cycle checkpoint regulator and putative tumor suppressor.It contains a forkhead-associated protein interaction domain essential for activation in response to DNA damage and is rapidly phosphorylated in response to replication blocks and DNA damage.This protein interacts with and phosphorylates BRCA1, allowing BRCA1 to restore survival after DNA damage.Three transcript variants encoding different isoforms have been found for this gene.

CLK2 (2)
This gene encodes a member of the CLK family of dual specificity protein kinases.CLK family members have been shown to interact with, and phosphorylate, serine-and arginine-rich (SR) proteins of the spliceosomal complex, which is a part of the regulatory mechanism that enables the SR proteins to control RNA splicing.

HUS1 (1)
The protein encoded by this gene is a component of an evolutionarily conserved, genotoxin-activated checkpoint complex that is involved in the cell cycle arrest in response to DNA damage.This protein forms a heterotrimeric complex with checkpoint proteins RAD9 and RAD1.DNA damage induced chromatin binding has been shown to d e p e n d o n t h e a c t i v a t i o n o f t h e c h e c k p o i n t k i n a s e A T M , a n d i s thought to be an early checkpoint signaling event.

MDC1 (2)
The protein encoded by this gene (mediator of DNA-damage checkpoint) is required to activate the intra-S phase and G2/M phase cell cycle checkpoints in response to DNA damage.This nuclear protein interacts with phosphorylated histone H2AX near sites of DNA double-strand breaks through its BRCT motifs, and facilitates www.intechopen.comrecruitment of the ATM kinase and meiotic recombination 11 protein complex to DNA damage foci.

NBS1 (1)
The encoded protein is a member of the MRE11/RAD50 doublestrand break repair complex which consists of 5 proteins.This gene product is thought to be involved in DNA double-strand break repair and DNA damage-induced checkpoint activation.

P53/TP53 (3)
This gene encodes tumor protein p53, which responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism.

PER1 (1)
This gene is a member of the Period family of genes and is expressed in a circadian pattern in the suprachiasmatic nucleus, the primary circadian pacemaker in the mammalian brain.Genes in this family encode components of the circadian rhythms of locomotor activity, metabolism, and behavior.The specific function of this gene is not yet known.Alternative splicing has been observed in this gene; however, these variants have not been fully described.

RAD1 (2)
This gene encodes a component of a heterotrimeric cell cycle checkpoint complex, known as the 9-1-1 complex, that is activated to stop cell cycle progression in response to DNA damage or incomplete DNA replication.The 9-1-1 complex is recruited by RAD17 to affected sites where it may attract specialized DNA polymerases and other DNA repair effectors.Alternatively spliced transcript variants of this gene have been described.

RAD17 (3)
The protein encoded by this gene is highly similar to the gene product of Schizosaccharomyces pombe rad17, a cell cycle checkpoint gene required for cell cycle arrest and DNA damage repair in response to DNA damage.This protein recruits the RAD1-RAD9-HUS1 checkpoint protein complex onto chromatin after DNA damage,.The phosphorylation of this protein is required for the DNA-damage-induced cell cycle G2 arrest, and is thought to be a critical early event during checkpoint signaling in DNA-damaged cells.Eight alternatively spliced transcript variants of this gene, which encode four distinct proteins, have been reported.

RAD9A (2)
This gene product is highly similar to Schizosaccharomyces pombe rad9, a cell cycle checkpoint protein required for cell cycle arrest and DNA damage repair in response to DNA damage.This protein is found to possess 3' to 5' exonuclease activity, which may contribute to its role in sensing and repairing DNA damage.It forms a checkpoint protein complex with RAD1 and HUS1.This complex is recruited by checkpoint protein RAD17 to the sites of DNA damage, which is thought to be important for triggering the checkpointsignaling cascade.Use of alternative polyA sites has been noted for this gene.

Visualization and interpretation of data
Shared words among the checkpoint factor genes suggested the presence of regulatory networks.We assessed the relationships by generating network depictions in the form of interaction networks (Figure 2) and a circos diagram (Figure 3) constructed from the summary data in Table 9.To derive Figure 2, a metric MDS was conducted on the affiliation network defined in Table 9.The resulting graph was then spring-embedded, with node repulsion, to facilitate visualization (Borgatti, 2002).The interaction network depicts the distribution of the DNA words among the genes (note that each gene appears once, representing all alternative promoters as a single node).Genes are denoted by blue squares and words are represented with red circles.Bold lines indicate multiple occurrences of a word.Reverse complement words are shown independently.The circos diagram represents the information in a closed circular space, wherein connections between words on one side of the diagram extend to genes on the other side.The putative nodes of the regulatory networks are defined by multiple edges, representing a characterized transcription factor or a novel DNA binding site, or a checkpoint factor gene.Some of the discovered words correspond to known binding sites for transcription factors, reported in the JASPAR and TRANSFAC databases of transcription factors (see Table 10).The relationships between the top fifteen words and the transcription factors are depicted in the circos diagram in Figure 4.Note that multiple binding site motifs were discovered for many of the transcription factors, and that several of the sites match the binding patterns of more than one transcription factor.Table 9.The top ranked words (rows of the table), based on statistical significance (S•ln(S/ E s )), and the number of occurrences of each word in the promoter regions of genes (columns).Table 10.Known transcription factor binding sites (with significance scores and corresponding transcription factor) discovered in the promoters of the checkpoint factors genes.
Additional insight into the regulatory network for the checkpoint factors can be seen in Figure 5, which replaces the DNA binding site motifs with the names of implicated transcription factors for each DNA repair gene.The diagram indicates the discovery of specific transcription factors involved in the control of each gene and shared among multiple genes.Up to seven transcription factors were discovered for each gene.

Conclusions
This chapter provides a summary of research into transcriptional regulatory networks controlling DNA repair pathways, bidirectional versus unidirectional promoters of DNA repair genes, and bidirectional promoters of breast and ovarian cancer genes.DNA words are shared among these promoters, and these words represent both known and unknown binding sites for transcription factors.When possible, we report the highest scoring assignment of transcription factor to DNA word.Our research represents a novel approach to identifying factors involved in transcriptional regulation of DNA repair genes.Many of these proteins have dual roles in transcription and DNA repair.Although many of the regulatory relationships are characterized at the level of protein-protein interactions, little research is available on the transcriptional regulatory networks that control DNA repair gene expression.We present evidence that regulatory networks exist among these genes, and support the claim that bidirectional promoters (implicated in B/O cancers) have a distinct network from unidirectional promoters.The identification of putative binding sites provides the first step in the elucidation of higher-order interdependencies among DNA repair genes in the cell.We also report preliminary findings on pairs of binding sites that represent regulatory modules.Furthermore, we show that there is much overlap among promoters of DNA repair genes, and that shared DNA binding motifs can be distributed among a collection of alternative promoters, each having distinct combinations of regulatory elements.The complex nature of the data can be simplified for visual interpretation using visualization techniques such as network modeling and circos diagrams.

Fig. 2 .
Fig. 2. Model of the checkpoint regulatory network using multidimensional scaling.

Fig. 3 .
Fig. 3. Circos 2diagram of the top 15 words, based on statistical significance, and their occurrences in gene promoter regions.

Fig. 4 .
Fig. 4. Circos diagram showing the top 15 DNA motifs found in promoters of checkpoint factor genes and their related transcription factors (number of occurrences are multiplied by 100).
supported by the Intramural Research Program of NHGRI (LE and LK) and by the Ohio Plant Biotechnology Consortium, the Choose Ohio First Program of the University System of Ohio, and the Ohio University Graduate Research and Education Board (LW).

Table 3 .
Enriched motifs matching characterized transcription factor binding sites discovered in the bidirectional promoters (columns 1 and 2) and in the unidirectional promoters (columns 3 and 4).

Table 4 .
Uncharacterized motifs discovered in the promoters of DNA repair genes.Words are ordered alphabetically.

Table 5 .
In this Putative transcription factor binding modules discovered in promoters of DNA repair genes.

Table 6 .
The checkpoint factors genes that were studied.The number of alternative promoters is shown in parentheses next to each gene name.

Table 7 .
Alternative promoters, indicated by their genomic coordinates, of genes involved in cell-cycle checkpoint factor pathways.Statistical analysis of thirty promoters found several interesting DNA words, which predict DNA elements that participate in the regulation of the DNA repair checkpoint factors.The most significant words discovered are listed in Table8.Words that are shared among the gene sets identify regulatory relationships.Reverse complement words are reported separately, as internal verification on the process.Words without a reverse complement example indicate a particular bias in the nucleotide content.

Table 8 .
Top 15 enumerated DNA words, based on the S•ln(S/E S ) overrepresentation score, and the alternative promoters, identified by subscript.