Bioinformatics tools used for the identification of
The scope of this chapter is to examine how advances in the field of Bioinformatics can be applied in the development of improved therapeutic strategies. In particular, we focus on how algorithms designed to unravel complex gene regulatory networks can then be used in the design of synthetic gene promoters that can be subsequently incorporated in novel gene transfer vectors to promote safer and more efficient expression of therapeutic genes for the treatment of various pathological conditions.
2. Development of synthetic promoters: a historical perspective
A synthetic promoter is a sequence of DNA that does not exist in nature and which has been designed to control gene expression of a target gene.
Construction of synthetic promoters is possible because of the modular nature of naturally-occurring gene regulatory regions. This was cleverly demonstrated by a group that used synthetic promoters to evaluate the role of the TATA box in the regulation of transcription (Mogno et al., 2010). The authors looked at the role of the TATA box in dictating the strength of gene expression. They found that the TATA box is a modular component in that its strength of binding to the RNA polymerase II complex and the resultant strength of transcription that it mediates is independent of the
Synthetic promoters have been used in the study of gene regulation for more than two decades. In one of the first examples a synthetic promoter derived from the lipoprotein gene in
Most of these studies were initially undertaken with a view to establish the important structural features of prokaryotic or eukaryotic promoters so that essential elements could be identified. In one example, the role of the Tat protein in the regulation of HIV gene expression was studied using synthetic promoters (Kamine et al., 1991). In this study a series of minimal promoters containing Sp1- binding sites and a TATA box were constructed and analysed to see if the Tat protein from HIV could activate them. The results demonstrated that Tat could only activate the synthetic promoters containing Sp1 sites and not promoters with the TATA box alone. The observations enabled the authors to propose that
As alluded to above, it was soon realised that synthetic promoter technology had direct implications in the improvement of the efficiency of gene expression. Indeed, one of the most widely used eukaryotic promoters employed for research purposes today is actually a synthetic promoter. The steroid-inducible Glucocorticoid Receptor Element (GRE) is a naturally occurring sequence that regulates the expression of a plethora of genes that are responsive to glucocorticoids. In a relatively early study several of these elements were linked together in order to construct a promoter with enhanced responsiveness to these steroids (Mader et al., 1993). This study detailed the construction of a 5 x GRE synthetic promoter linked to the Adenovirus type 2 major late promoter TATA region that displayed 50-fold more expression levels in response to steroid hormones when compared to the natural promoter sequence. This synthetic promoter is now a widely used constituent of a number of reporter constructs adopted in a variety of different research applications.
Finally, synthetic promoters have also been used in prokaryotic systems to reveal that regulation of gene expression follows boolean logic (Kinkhabwala et al., 2008). In this prototypical study the authors found that two transcription repressors generate a NOR logic; i.e. a OR b (on OR off), while one repressor plus one activator determines an ANDN logic; i.e. a AND NOT b (on AND NOT off). This idea was later expanded on to demonstrate that various combinations of synthetic promoters could combine to generate 12 out of 16 boolean logic terms (Hunziker et al., 2010). Most interestingly the results from these studies demonstrated that if a promoter does not follow a specific logic it is more likely to be leaky, in that it will drive gene expression under conditions where it is not expected to.
In this chapter we describe the evolution of synthetic promoter technology, its application in the development of improved tissue-specific promoters and its potential use for the development of effective disease-specific gene regulators; thus enabling the development of safer and more effective gene therapies.
3. Recent advances in the design of the synthetic promoter
In recent years some efforts have been made to construct synthetic promoters for tissue specific transcription based on the linking of short oligonucleotide promoter and enhancer elements in a random (Li et al., 1999; Edelman et al., 2000) or ordered (Chow et al., 1997; Ramon et al., 2010) fashion.
In what can be described as one of the first attempts to rationally design a tissue-specific synthetic promoter, Chow et al. describe the rearrangement of the cytokeratin K18 locus to construct a promoter mediating a highly restrictive pattern of gene expression in the lung epithelium (Chow et al., 1997). In this study the authors describe the generation of transgenic mice with this construct and demonstrate expression only in the lung. They also generated CMV (Cytomegalovirus) and SV40 (Sarcoma Virus 40) promoter based constructs and found lack of specificity and no expression in the lung epithelia. This study had important implications for researchers developing lung-based gene therapies, i.e. if CMV, one of the most widely used promoters, could not regulate gene expression in the lung epithelia then it is necessary to identify (or develop) new promoters that can efficiently regulate gene expression in this location. Indeed, it is now becoming increasingly apparent that traditional virus-derived promoters like CMV and RSV (Rous Sarcoma Virus) will have limited application in the development of modern gene therapeutics.
The random assembly of
Retroviral vectors have also been used to screen for synthetic promoters in eukaryotic cells (Edelman et al., 2000). This study was the first description of a retroviral library approach using antibiotic resistance and FACS selection to isolate promoter sequences (illustrated in figure 2). The libraries generated using random oligonucleotides in an effort to identify new sequences as well as examining the effects of combinations of known elements and for uncovering new transcriptional regulatory elements. After preparing a Ran18 promoter library comprises random 18mer oligonucleotides, the authors analysed the sequences of the generated synthetic promoters by searching for known transcription factor binding motifs. They found that the highest promoter activities were associated with an increased number of known motifs. They examined eight of the best known motifs; AP2, CEBp, gre, ebox, ets, creb, ap1 AND sP1/maz. Interestingly, several of the promoter sequences contained none of these motifs and the author's looked for new transcription factors.
In a similar effort employed to examine one million clones, Sutton and co-workers adopted the FACS screening approach based on the establishment of a lentiviral vector-based library (Dai et al., 2004). In this study duplex oligonucleotides from binding sites of endothelial cell-specific and non-specific transcription factors were cloned in a random manner upstream of a minimal promoter driving expression of eGFP in a HIV self-inactivating expression vector. A pool of one million clones was then transfected into endothelial cells and the highest expressers were selected by FACS sorting. Synthetic promoters were then rescued from stable transfectants by PCR from the genomic DNA where the HIV vectors had integrated. The results from this study also demonstrated the possibility of isolating several highly active endothelial cell-specific synthetic promoter elements from a random screen.
Synthetic promoters active only in the liver have also been developed (Lemken et al., 2005). In this study transcriptional units from ApoB and OTC genes were used in a controlled, non-random construction procedure to generate a series of multimeric synthetic promoters. Specifically, 2x, 4x, 8x and 16x repeats of the ApoB and OTC promoter elements were ligated together and promoter activity analysed. The results indicated that the promoter based on 4xApoB elements gave the optimal levels of gene expression and that 8x and 16x elements gave reduced levels of expression, thus demonstrating the limitations of simply ligating known promoter elements together in a repeat fashion to achieve enhanced expression.
When adopting this type of methodology in the design of synthetic tissue-specific promoters it is important to use well-designed duplex oligonucleotides. For example, each element has to be spaced in such a way that the regulatory elements appear on the same side of the DNA helix when reassembled, relevant minimal promoter elements have to be employed so that the screen produces promoters capable of expressing efficiently only in the tissue of interest and there must be some sort of mechanism, such as the addition of Sp1 sites, for the protection against promoter silencing through methylation.
In addition to tissue-specific promoters, cell-type synthetic promoters have also been developed. In one study, researchers designed a synthetic promoter to be active in nonadrenergic (NA) neurones (Hwang et al., 2001). They authors randomly ligated
In a similar type of study a synthetic promoter was constructed that was specifically active in myeloid cells (He et al., 2006). The promoter comprised myeloid-specific elements for PU.1, C/EBPalpha, AML-1 and myeloid-associated elements for Sp1 and AP-1, which were randomly inserted upstream of the p47-phox minimal promoter. Synthetic promoters constructed showed very high activity. Haematopoietic Stem Cells (HSC) were initially transduced then the expression in differentiated cells was examined; only myeloid cells were found to express the reporter construct. To test therapeutic applicability of these promoters apoE-/- mice were transplanted with HSC transduced with a lentiviral vector expressing apoE from CMV and synthetic promoters. Even though transduced cells containing CMV and synthetic promoters both corrected the artherosclerotic phenotype, the cells derived from lentiviral vectors harbouring the synthetic promoter did so with less variability. Thus highlighting the improved safety features when using synthetic promoters for gene therapy applications.
In addition to tissue- and cell type-specific constitutive promoters, inducible synthetic promoters can also be constructed. One group describe a synthetic promoter constructed by placing the EPO enhancer region upstream of the SV40 promoter. The result is a strong promoter that is active only under ischaemic conditions. The authors tested this promoter by developing Neural Stem Cells (NSC) responsive to hypoxia and proposed that this system could be used to deliver therapeutic stem cells to treat ischaemic events. The authors were able to demonstrate that transplantation of NSC modified with a hypoxia-sensitive synthetic promoter resulted in specific expression of the luciferase reporter gene in response to ischaemic events
4. Applications of synthetic promoter technology
Synthetic promoters have direct applications in large-scale industrial processes where enzymatic pathways are used in the production of biological and chemical-based products (reviewed in Hammer et al., 2006). One of the most important limitations in industrial-scale processes that synthetic promoter technology addresses is the inherent genetic instability in synthetically engineered biological systems. For instance, in prokaryotic organisms designed to express two or more enzymes, mutations will invariably arise in very few generations resulting in the termination of gene expression. This is because there is the lack of evolutionary pressure keeping all the components intact. The result is that mutations accrue over generations resulting in the deactivation of the circuit. Homologous recombination in natural promoters driving high levels of gene expression is the main reason why this circuitry fails (Sleight et al., 2010). Therefore, the use of synthetic promoters in these systems should serve to lower gene expression to result in more genetic stability, allow the avoidance of repeat sequences to prevent recombination and allow the use inducible promoters (a feature that also reduces genetic instability). In summary, the use of synthetic promoter technology in complex genetically engineered synthetic organisms expressing a variety of components should serve to increase genetic stability and improve the efficiency of the processes that the components control.
One interesting therapeutic application for synthetic promoter technology that has been described is the generation of a class of replication-competent viruses that enable tumour cell-specific killing by specifically replicating in cancer cells. In this study a replication competent retrovirus was developed to selectively kill tumour cells (Logg et al., 2002). The authors added a level of transcriptional targeting by incorporating the prostate-specific probasin (PB) promoter into the retroviral LTR and designed more efficient synthetic promoters based on the PB promoter to increase the efficiency of retroviral replication in prostate cancer cells. The result was a retrovirus that could efficiency transduce and replicate only in cancer cells. This is an attractive therapeutic strategy for the treatment of cancer, as tumour virotherapy has actually been examined as a potential therapeutic strategy for several decades.
Synthetic promoters that are active only in cycling endothelial cells would be another attractive tool for the development of cancer gene therapies. The rationale being that by targeting new blood vessels growing into tumours we would be able to develop a cancer gene therapy that could cut off supply of nutrients to the growing cancer. In a study that adopted this approach the cdc6 gene promoter was identified as a candidate promoter active only in cycling cells and was coupled to the endothelin enhancer element to construct a promoter active in dividing endothelial cells (Szymanski et al., 2006). Four endothelin elements conjugated to the cdc6 promoter gave the optimal results
Perhaps one of the most impressive applications of synthetic promoter technology thus far was the development of a liver-specific promoter that could be used to essentially cure diabetes in a transgenic mouse model (Han et al., 2010). In this study a synthetic promoter active in liver cells in response to insulin was constructed. The authors designed 3-, 6- & 9-element promoters based on random combinations of HNF-1, E/EBP and GIRE
Synthetic promoters are increasingly being used in gene therapy type of studies. In one recent study their potential application to the gene therapy of Chronic Granulomatous Disease (X-CGD; an X-linked disorder resulting from mutations in gp91-phox, whose activity in myeloid cells is important in mounting an effective immune response) was examined (Santilli et al., 2011). The authors cite a clinical trial using a retroviral vector, which was successful at correcting the phenotype, but expression was short-lived due to promoter inactivation. In order to address this issue a chimeric promoter was constructed that was a fusion of Cathepsin G and c-Fes minimal promoter sequences, which are specifically active in cells of the myeloid lineage. This promoter was used to drive the expression of gp91-phox in myeloid cells in mice using a SIN lentiviral vector and the results show effective restricted expression to monocytes and subsequent introduction of gp91 results in high levels of expression in target cells and restoration of wild type phenotype
These studies serve to highlight the potential application of synthetic promoter technology in gene therapy. They particularly highlight the importance of achieving cell-type specific gene expression and address the common issue of promoter shutdown that is seen when using stronger viral promoters like those derived from the CMV and RSV. If gene therapy is to be a success in the clinic it will be imperative to develop promoters that are highly specific and which display a restrictive and predictable expression profile. Thus, synthetic promoter technology represents the ideal solution to achieve this goal and its use is likely to become an increasingly popular approach adopted by researchers developing gene therapeutics.
5. Bioinformatic tools and synthetic promoter development
We first described how functional genomics experimentation and bioinformatics tools could be applied in the design of synthetic promoters for therapeutic and diagnostic applications several years ago (Roberts, 2007). Since then a number of scientists have also realised that this approach can be broadly applied across the biotech industry (Venter et al., 2007). In this section we discuss some of the tools that we use to analyse data obtained from large-scale gene expression analyses, which is subsequently used in the smart design of synthetic promoters conveying highly-specific regulation of gene expression.
To design a synthetic promoter it is essential to identify an appropriate number of
There is now a growing trend for researchers to analyse microarray data in terms of ‘gene modules’ instead of the presentation of differentially regulated gene lists. By grouping genes into functionally related modules it is possible to identify subtle changes in gene expression that may be biologically (if not statistically significantly) important, to more easily interpret molecular pathways that mediate a particular response and to compare many different microarray experiments from different disease states in an effort to uncover the commonalities and differences in multiple clinical conditions. Therefore, we are moving into a new era of functional genomics, where the large datasets generated by the evaluation of global gene expression studies can be more fully interpreted by improvements in computational methods. The advances in functional genomics made in recent years have resulted in the identification of many more
In cancer the changes in the gene expression profile are often the result of alterations in the cell’s transcription machinery induced by aberrant activation of signalling pathways that control growth, proliferation and migration. Such changes result in the activation of transcription regulatory networks that are not found in normal cells and provide us with an opportunity to design synthetic promoters that should only be active in cancerous cells. If microarray technology is to truly result in the design of tailored therapies to individual cancers or even patients, as has been heralded, it is important that the functional genomics methodology that was designed for the identification of signalling and transcription networks be applied to the design of cancer-specific promoters so that effective gene therapeutic strategies can be formulated (Roberts & Kottaridis, 2007). The development of bioinformatics algorithms for the analysis of microarray datasets has largely been applied in order to unravel the transcription networks operative under different disease and environmental conditions. To this date there has been no effort to use this type of approach to design synthetic promoters that are operative only under these certain disease or environmental conditions.
The regulation of gene expression in eukaryotes is highly complex and often occurs through the coordinated action of multiple transcription factors. The use of
Different bioinformatics tools, examples of which are given in table 1, may be used to screen for
The ability to use gene expression data to identify gene modules, which mediate specific responses to environmental stimuli (or to a diseased state) and to correlate their regulation to the
It is also possible to identify the higher-level regulator that controls the expression of the genes in each module (Segal et al., 2003). Examination of the upstream regulatory sequences of each gene in a module may reveal the presence of common
For nearly two decades scientists have been compiling databases that catalogue the
Databases of known TFBSs can be used to detect the presence of protein-recognition elements in a given promoter, but only when the binding site of the relevant DNA-binding protein and its tolerance to mismatches
At present these tools are mainly applied in the study of lower eukaryotes where the genome is less complex and regulatory elements are easier to identify, extending these algorithms to the human genome has proven somewhat more difficult. In order to redress this issue a number of groups have shown that it is possible to mine the genome of higher eukaryotes by searching for conserved regulatory elements adjacent to transcription start site motifs such as TATA and CAAT boxes, e.g. as catalogued in the
Phylogenetic footprinting, or comparative genomics, is now being applied to identify novel promoter elements by comparing the evolutionary conserved untranslated elements proximal to known genes from a variety of organisms. The availability of genome sequences between species has notably advanced comparative genomics and the understanding of evolutionary biology in general. The neutral theory of molecular evolution provides a framework for the identification of DNA sequences in genomes of different species. Its central hypothesis is that the vast majority of mutations in the genome are neutral with respect to the fitness of an organism. Whilst deleterious mutations are rapidly removed by selection, neutral mutations persist and follow a stochastic process of genetic drift through a population. Therefore, non-neutral DNA sequences (functional DNA sequences) must be conserved during evolution, whereas neutral mutations accumulate. Initial studies sufficiently demonstrated that the human genome could be adequately compared to the genomes of other organisms allowing for the efficient identification of homologous regions in functional DNA sequences.
Subsequently, a number of bioinformatics tools have emerged that operate by comparing non-coding regulatory sequences between the genomes of various organisms to enable the identification of conserved TFBSs that are significantly enriched in promoters of candidate genes or from clusters identified by microarray analysis; examples of these software suites are discussed below. Typically these tools work by aligning the upstream sequences of target genes between species thus identifying conserved regions that could potentially function as
A comprehensive list of the all the databases described above with a summary of their features and a reference to the original citation are shown in table 1.
Each of the aforementioned databases can be used when searching for potential regulatory sequences for inclusion in the design of synthetic promoters. Indeed, these resources can be used in order to identify
|DBTSS||Database of transcriptional start sites||Suzuki et al., (2002)|
|TRAFAC||Conserved ||Jegga et al., (2002)|
|TRANSCompel||Database of composite regulatory elements||Kel-Margoulis et al., (2002)|
|TRANSFAC||Eukaryotic transcription factor database||Matys et al., (2003)|
|Phylofoot||Tools for phylogenetic footprinting purposes||Lenhard et al., (2003)|
|CORG||Multi-species DNA comparison and annotation||Dieterich et al., (2003)|
|CONSITE||Explores ||Lenhard et al., (2003)|
|CONFAC||Conserved TFBS finder||Karanam et al., (2004)|
|CisMols||Identifies ||Jegga et al., (2005)|
|TRED||Catalogue of transcription regulatory elements||Zhao et al., (2005)|
|Oncomine||Repository and analysis of cancer microarray data||Rhodes et al., (2005)|
|ABS||Database of regulatory elements||Blanco et al., (2006)|
|JASPAR||Database of regulatory elements||Sandelin et al., (2004)|
|HTPSELEX||Database of composite regulatory elements||Jagannathan et al., (2006)|
|PReMod||Database of transcriptional regulatory modules in the human genome||Blanchette et al., (2006)|
|CisView||Browser of regulatory motifs and regions in the genome||Sharov et al., (2006)|
|BEARR||Batch extraction algorithm for microarray data analysis||Vega et al., 2004)|
|VISTA||Align and compare sequences from multiple species||Dubchak et al., (2006)|
|PromAn||Promoter analysis by integrating a variety of databases||Lardenois et al., (2006)|
Importantly, synthetic promoters often mediate a level of gene expression with much greater efficiency than that seen with viral promoters, such as CMV, or compared to naturally occurring promoters within the genome. Given that the entire Biotech industry is centred on the regulation of gene expression, it is likely that synthetic promoters will eventually replace all naturally-occurring sequences in use today and help drive the growth of the synthetic biology sector in the coming decades.
In summary, synthetic promoters have emerged over the past two decades as excellent tools facilitating the identification of important structural features in naturally occurring promoter sequences and allowing enhanced and more restrictive regulation of gene expression. A number of early studies revealed that it was possible to combine the
Recent advances in bioinformatics and the emergence of a plethora of tools specifically designed at unravelling transcription programs has also facilitated the design of highly-specific synthetic promoters that can drive efficient gene expression in a tightly regulated manner. Changes in a cell’s gene expression profile can be monitored and the transcription programs underpinning that change delineated and the corresponding
A number of institutions, such as Synpromics, have taken advantage of these advances and are now working to apply synthetic promoter technology to the enhanced production of biologics for use in biopharmaceutical, greentech and agricultural applications; the development of new gene therapies; and in the design of a novel class of molecular diagnostics. As the synthetic biology field continues to develop into a multi-billion dollar industry, synthetic promoter technology is likely to remain at the heart of this ever-expanding and exciting arena.