Genetic elements used as components of synthetic regulatory networks (adapted from McArthur IV & Fong, 2010 and Purnick& Weiss, 2009). Legend: CFP, cyan fluorescent protein; GFP, green fluorescent protein; YFP, yellow fluorescent protein.
Cancer is the second leading cause of mortality worldwide, with an expected 1.5-3.0 million new cases and 0.5-2.0 million deaths in 2011 for the US and Europe, respectively (Jemal et al.,2011). Hence, this is an enormously important health risk, and progress leading to enhanced survival is a global priority. Strategies that have been pursued over the years include the search for new biomarkers, drugs or treatments (Rodrigues et al., 2007). Synthetic biology together with bioinformatics represents a powerful tool towards the discovery of novel biomarkers and the design of new biosensors.
Traditionally, the majority of new drugs has been generated from compounds derived from natural products (Neumann & Neumann-Staubitz, 2010). However, advances in genome sequencing together with possible manipulation of biosynthetic pathways, constitute important resources for screening and designing new drugs (Carothers et al. 2009). Furthermore, the development of rational approaches through the use of bioinformatics for data integration will enable the understanding of mechanisms underlying the anti-cancer effect of such drugs (Leonard et al., 2008; Rocha et al., 2010).
Besides in biomarker development and the production of novel drugs, synthetic biology can also play a crucial role in the level of specific drug targeting. Cells can be engineered to recognize specific targets or conditions in our bodies that are not naturally recognized by the immune system (Forbes, 2010).
Synthetic biology is the use of engineering principles to create, in a rational and systematic way, functional systems based on the molecular machines and regulatory circuits of living organisms or to re-design and fabricate existing biological systems (Benner &Sismour, 2005). The focus is often on ways of taking parts of natural biological systems, characterizing and simplifying them, and using them as a component of a highly unnatural, engineered, biological system (Endy, 2005). Virtually, through synthetic biology, solutions for the unmet needs of humankind can be achieved, namely in the field of drug discovery. Indeed, synthetic biology tools enable the elucidation of disease mechanisms, identification of potential targets, discovery of new chemotherapeutics or design of novel drugs, as well as the design of biological elements that recognize and target cancer cells. Furthermore, through synthetic biology it is possible to develop economically attractive microbial production processes for complex natural products.
Bioinformatics is used in drug target identification and validation, and in the development of biomarkers and tools to maximize the therapeutic benefit of drugs. Now that data on cellular signalling pathways are available, integrated computational and experimental projects are being developed, with the goal of enabling
In this chapter, synthetic biology approaches for cancer diagnosis and drug development will be reviewed. Specifically, examples on the design of RNA-based biosensors, bacteria and virus as anti-cancer agents, and engineered microbial cell factories for the production of drugs, will be presented.
2. Synthetic biology: tools to design, build and optimize biological processes
Synthetic biology uses biological insights combined with engineering principles to design and build new biological functions and complex artificial systems that do not occur in Nature (Andrianantoandro et al., 2006). The building blocks used in synthetic biology are the components of molecular biology processes: promoter sequences, operator sequences, ribosome binding sites (RBS), termination sites, reporter proteins, and transcription factors. Examples of such building blocks are given in Table 1.
Great developments of DNA synthesis technologies have opened new perspectives for the design of very large and complex circuits (Purnick& Weiss, 2009), making it now affordable to synthesize a given gene instead of cloning it. It is possible to synthesize
Currently, the design and synthesis of biological systems are not decoupled. For example, the construction of metabolic pathways or any circuit from genetic parts first requires a collection of well characterized parts, which do not yet fully exist. Nevertheless, this limitation is being addressed through the development and compilation of standard biological parts (Kelly et al., 2009). When designing individual biological parts, the base-by-base content of that part (promoter, RBS, protein coding region, terminator, among others) is explicitly dictated (McArthur IV & Fong, 2010). Rules and guidelines for designing genetic parts at this level are being established (Canton et al., 2008). Particularly, an important issue when designing protein-coding parts is codon optimization, encoding the same amino acid sequence with an alternative, preferred nucleotide sequence. Although a particular sequence, when expressed, may be theoretically functional, its expression may be far from optimal or even completely suppressed due to codon usage bias in the heterologous host.
|Constitutive promoters||“Always on” transcription|
|Regulatory regions||Repressor and activator sites|
|Inducible promoters||Control of the promoter by induction or by cell state|
|Cell fate regulators||GATA factors||Control cell differentiation|
|RNA interference (RNAi)||Logic functions, RNAi repressor||Genetic switch, logic evaluation and gene silencing|
|Riboregulators||Ligand-controlled ribozymes||Switches for detection and actuation|
|Ribosome binding site||Kozak consensus sequence mutants||Control the level of translation|
|Phosphorylation cascades||Yeast phosphorylation pathway||Modulate genetic circuit behavior|
|Protein receptor design||TNT, ACT and EST receptors||Control detection thresholds and combinatorial protein function|
|Protein degradation||Ssra tags, peptides rich in Pro, Glu, Ser and Thr||Protein degradation at varying rates|
|Localization signals||Nuclear localization, nuclear export and mitochondrial localization signals||Import or export from nucleus and mitochondria|
|Reporter genes||GFP, YFP, CFP, LacZ||Detection of expression|
|Antibiotic resistance||ampicilin, chloramphenicol||Selection of cells|
Codon optimization of coding sequences can be achieved using freely available algorithms such as Gene Designer (see section 3). Besides codon optimization, compliance with standard assembly requirements and part-specific objectives including activity or specificity modifications should be considered. For example, the BioBrick methodology requires that parts exclude four standard restriction enzyme sites, which are reserved for use in assembly (Shetty et al., 2008). Extensive collections of parts can be generated by using a naturally occurring part as a template and rationally modifying it to create a library of that particular genetic part. Significant progress in this area has been recently demonstrated for promoters and RBS (Ellis et al., 2009; Salis et al., 2009). Ellis and co-workers (2009) constructed two promoter libraries that can be used to tune network behavior a
Design at the pathwaylevel is not only concerned with including the necessary parts, but also with controlling the expressed functionality of those parts. Parts-based synthetic metabolic pathways will require tunable control, just as their natural counterparts which often employ feedback and feed-forward motifs to achieve complex regulation (Purnick& Weiss, 2009). Using a synthetic biology approach, the design of DNA sequences encoding metabolic pathways (e.g. operons) should be relatively straightforward. Synthetic scaffolds and well-characterized families of regulatory parts have emerged as powerful tools for engineering metabolism by providing rational methodologies for coordinating control of multigene expression, as well as decoupling pathway design from construction (Ellis et al., 2009). Pathway design should not overlook the fact that exogenous pathways interact with native cellular components and have their own specific energy requirements. Therefore, modifying endogenous gene expression may be necessary in addition to balancing cofactor fluxes and installing membrane transporters (Park et al., 2008).
After designing parts, circuits or pathways, the genomic constructs ought to be manufactured through DNA synthesis. Nucleotide’s sequence information can be outsourced to synthesis companies (e.g. DNA2.0, GENEART or Genscript, among others). The convenience of this approach over traditional cloning allows for the systematic generation of genetic part variants such as promoter libraries. Also, it provides a way to eliminate restriction sites or undesirable RNA secondary structures, and to perform codon optimization. The ability to make large changes to DNA molecules has resulted in standardized methods for assembling basic genetic parts into larger composite devices, which facilitate part-sharing and faster system-level construction, as demonstrated by the BioBrick methodology (Shetty et al., 2008) and the Gateway cloning system (Hartley, 2003). Other approaches based on type II restriction enzymes, such as Golden Gate Shuffling, provide ways to assemble many more components together in one step (Engler et al., 2009). A similar one-step assembly approach, circular polymerase extension cloning (CPEC), avoids the need for restriction-ligation, or single-stranded homologous recombination altogether (Quan&Tian, 2009). Not only is this useful for cloning single genes, but also for assembling parts into larger sequences encoding entire metabolic pathways and for generating combinatorial part libraries. On a chromosomal level, disruption of genes in
The majority of the synthetic biology advances has been achieved purely
3. Bioinformatics: a rational path towards biological behavior predictability
In order to evolve as an engineering discipline, synthetic biology cannot rely on endless trial and error methods driven by verbal description of biomolecular interaction networks. Genome projects identify the components of gene networks in biological organisms, gene after gene, and DNA microarray experiments discover the network connections (Arkin, 2001). However, these data cannot adequately explain biomolecular phenomena or enable rational engineering of dynamic gene expression regulation. The challenge is then to reduce the amount and complexity of biological data into concise theoretical formulations with predictive ability, ultimately associating synthetic DNA sequences to dynamic phenotypes.
3.1. Models for synthetic biology
The engineering process usually involves multiple cycles of design, optimization and revision. This is particularly evident in the process of constructing gene circuits (Marguet et al., 2007). Due to the large number of participating species and the complexity of their interactions, it becomes difficult to intuitively predict a design behavior. Therefore, only detailed modeling can allow the investigation of dynamic gene expression in a way fit for analysis and design (Di Ventura et al., 2006). Modeling a cellular process can highlight which experiments are likely to be the most informative in testing model hypothesis, and for example allow testing for the effect of drugs (Di Bernardo et al., 2005) or mutant phenotypes (Segre et al., 2002) on cellular processes, thus paving the way for individualized medicine.
Data are the precursor to any model, and the need to organize as much experimental data as possible in a systematic manner has led to several excellent databases as summarized in Table 2. The term “model” can be used for verbal or graphical descriptions of a mechanism underlying a cellular process, or refer to a set of equations expressing in a formal and exact manner the relationships among variables that characterize the state of a biological system (Di Ventura et al., 2006). The importance of mathematical modeling has been extensively demonstrated in systems biology (You, 2004), although its utility in synthetic biology seems even more dominant (Kaznessis, 2009).
|BIND (Biomolecular Interaction Network Database)||http://www.bind.ca/|
|Brenda (a comprehensive enzyme information system)||http://www.brenda.uni-koeln.de/|
|CSNDB (Cell Signaling Networks Database)||http://geo.nihs.go.jp/csndb/|
|DIP (Database of Interacting Proteins)||http://dip.doe-mbi.ucla.edu/|
|EcoCyc/Metacyc/BioCyc (Encyclopedia of E. coli genes and metabolism)||http://ecocyc.org/|
|EMP (Enzymes and Metabolic Pathways Database)||http://www.empproject.com/|
|GeneNet (information on gene networks)||http://wwwmgs.bionet.nsc.ru/mgs/systems/genenet/|
|Kegg (Kyoto Encyclopedia of Genes and Genomes)||http://www.genome.ad.jp/kegg/kegg.html|
|SPAD (Signaling Pathway Database)||http://www.grt.kyushu-u.ac.jp/eny-doc/|
|ExPASy-beta (Bioinformatics Resource Portal)||http://beta.expasy.org/|
Model-driven rational engineering of synthetic gene networks is possible at the level of topologies or at the level of molecular components. In the first one, it is considered that molecules control the concentration of other molecules, e.g. DNA-binding proteins regulate the expression of specific genes by either activation or repression. By combining simple regulatory interactions, such as negative and positive feedback and feed-forward loops, one may create more complex networks that precisely control the production of protein molecules (e.g. bistable switches, oscillators, and filters). Experimentally, these networks can be created using existing libraries of regulatory proteins and their corresponding operator sites. Examples of these models are the oscillator described by Gardner et al (2000) and repressilator by Elowitz and Leibler (2000). In the second level, the kinetics and strengths of molecular interactions within the system are described. By altering the characteristics of the components, such as DNA-binding proteins and their corresponding DNA sites, one can modify the system dynamics without modifying the network topology. Experimentally, the DNA sequences that yield the desired characteristics of each component can be engineered to achieve the desired protein-protein, protein-RNA, or protein-DNA binding constants and enzymatic activities. For example, Alon and co-workers (2003) showed how simple mutations on the DNA sequence of the lactose operon can result in widely different phenotypic behavior.
Various mathematical formulations can be used to model gene circuits. At the population level, gene circuits can be modeled using ordinary differential equations (ODEs). In an ODE formulation, the dynamics of the interactions within the circuit are deterministic. That is, the ODE formulation ignores the randomness intrinsic to cellular processes, and is convenient for circuit designs that are thought to be less affected by noise or when the impact of noise is irrelevant (Marguet et al., 2007). An ODE model facilitates further sophisticated analyses, such as sensitivity analysis and bifurcation analysis. Such analyses are useful to determine how quantitative or qualitative circuit behavior will be impacted by changes in circuit parameters. For instance, in designing a bistable toggle switch, bifurcation analysis was used to explore how qualitative features of the circuit may depend on reaction parameters (Gardner et al., 2000). Results of the analysis were used to guide the choice of genetic components (genes, promoters and RBS) and growth conditions to favor a successful implementation of designed circuit function. However, in a single cell, the gene circuit’s dynamics often involve small numbers of interacting molecules that will result in highly noisy dynamics even for expression of a single gene. For many gene circuits, the impact of such cellular noise may be critical and needs to be considered (Di Ventura et al., 2006). This can be done using stochastic models (Arkin, 2001). Different rounds of simulation using a stochastic model will lead to different results each time, which presumably reflect aspects of noisy dynamics inside a cell. For synthetic biology applications, the key of such analysis is not necessarily to accurately predict the exact noise level at each time point. This is not possible even for the simplest circuits due to the “extrinsic” noise component for each circuit (Elowitz et al., 2002). Rather, it is away to determine to what extent the designed function can be maintained and, given a certain level of uncertainty or randomness, to what extent additional layers of control can minimize or exploit such variations. Independently of the model that is used, these can be evolved
In most attempts to engineer gene circuits, mathematical models are often purposefully simplified to accommodate available computational power and to capture the qualitative behavior of the underlying systems. Simplification is beneficial partially due to the limited quantitative characterization of circuit elements, and partially because simpler models may better reveal key design constraints. The limitation, however, is that a simplified model may fail to capture richer dynamics intrinsic to a circuit. Synthetic models combine features of mathematical models and model organisms. In the engineering of genetic networks, synthetic biologists start from mathematical models, which are used as the blueprints to engineer a model out of biological components that has the same materiality as model organism but is much less complex. The specific characteristics of synthetic models allow one to use them as tools in distinguishing between different mathematical models and evaluating results gained in performing experiments with model organisms (Loettgers, 2007).
3.2. Computational tools for synthetic biology
Computational tools are essential for synthetic biology to support the design procedure at different levels. Due to the lack of quantitative characterizations of biological parts, most design procedures are iterative requiring experimental validation to enable subsequent refinements (Canton et al., 2008). Furthermore, stochastic noise, uncertainty about the cellular environment of an engineered system, and little insulation of components complicate the design process and require corresponding models and analysis methods (Di Ventura et al.,2006). Many computational standards and tools developed in the field of systems biology (Wierling et al., 2007) are applicable for synthetic biology as well.
As previously discussed, synthetic gene circuits can be constructed from a handful of basic parts that can be described independently and assembled into interoperating modules of different complexity. For this purpose, standardization and modularity of parts at different levels is required (Canton et al., 2008). The Registry of Standard Biological Parts constitutes a reference point for current research in synthetic biology and it provides relevant information on several DNA-based synthetic or natural building blocks. Most computational tools that specifically support the design of artificial gene circuits use information from the abovementioned registry. Moreover, many of these tools share standardized formats for the input/output files. The System Biology Markup Language (SBML) (http://sbml.org) defines a widely accepted, XML-based format for the exchange of mathematical models in biology. It provides a concise representation of the chemical reactions embraced by a biological system. These can be translated into systems of ODEs or into reaction systems amenable to stochastic simulations (Alon, 2003). Despite its large applicability to simulations, SBML currently lacks modularity, which is not well aligned with parts registries in synthetic biology. Alternatively, synthetic gene systems can be described according to CellML language which is more modular (Cooling et al., 2008).
One important feature to enable the assembly of standard biological parts into gene circuits is that they share common inputs and outputs. Endy (2005) proposed RNA polymerases and ribosomes as the molecules that physically exchange information between parts. Their fluxes, measured in PoPS (Polymerase Per Second) and in RiPS (Ribosomes Per Second) represent biological currents (Canton et al., 2008). This picture, however, does not seem sufficient to describe all information exchanges even in simple engineered gene circuits, since other signal carriers like transcription factors and environmental “messages” should be explicitly introduced and not indirectly estimated by means of PoPS and RiPS (Marchisio&Stelling, 2008). Based on the assumption that parts share common input/output signals, several computational tools have been proposed for gene circuit design, as presented in Table3. Comparing these circuit design tools it is obvious that we are still far from an ideal solution. The software tools differ in many aspects such as scope of parts and circuit descriptions, the mode of user interaction, and the integration with databases or other tools.
|Vienna RNA package||http://www.tbi.univie.ac.at/~ivo/RNA/|
|Zinc Finger Tools||http://www.scripps.edu/mb/barbas/zfdesign/zfdesignhome.php|
Biojade was one of the first tools being reported for circuit design (Goler, 2004). It provides connections to both parts databases and simulation environments, but it considers only one kind of signal carrier (RNA polymerases). It can invoke the simulator TABASCO (Kosuri et al., 2007), thus enabling genome scale simulations at single base-pair resolution. CellDesigner (Funahashi et al., 2003) has similar capabilities for graphical circuit composition. However, parts modularity and consequently circuit representation do not appear detailed enough. Another tool for which parts communicate only by means of PoPS, but not restricted to a single mathematical framework, is the Tinkercell. On the contrary, in Asmparts (Rodrigo et al., 2007a) the circuit design is less straightforward and intuitive because the tool lacks a Graphic User Interface. Nevertheless, each part exists as an independent SBML module and the model kinetics for transcription and translation permit to limit the number of parameters necessary for a qualitative system description. Marchisio and Stelling (2008) developed a new framework for the design of synthetic circuits where each part is modeled independently following the ODE formalism. This results in a set of composable parts that communicate by fluxes of signal carriers, whose overall amount is constantly updated inside their corresponding pools. The model also considers transcription factors, chemicals and small RNAs as signal carriers. Pools are placed among parts and devices: they store free signal carriers and distribute them to the whole circuit. Hence, polymerases and ribosomes have a finite amount; this permits to estimate circuit scalability with respect to the number of parts. Mass action kinetics is fully employed and no approximations are required to depict the interactions of signal carriers with DNA and mRNA. The authors implemented the corresponding models into ProMoT (Process Modeling Tool), software for the object-oriented and modular composition of models for dynamic processes (Mirschel etal., 2009). GenoCAD (Czar et al., 2009) and GEC (Pedersen & Phillips, 2009) introduce the notions of a grammar and of a programming language for genetic circuit design, respectively. These tools use a set of rules to check the correct composition of standard parts. Relying on libraries of standard parts that are not necessarily taken from the Registry of Standard Biological Parts, these programs can translate a circuit design into a complete DNA sequence. The two tools differ in capabilities and possible connectivity to other tools.
The ultimate goal of designing a genetic circuit is that it works, i.e. that it performs a given function. For that purpose, optimization cycles to establish an appropriate structure and a good set of kinetic parameters values are required. These optimization problems are extremely complex since they involve the selection of adequate parts and appropriate continuous parameter values.Stochastic optimization methods (e.g. evolutionary algorithms) attempt to find good solutions by biased random search. They have the potential for finding globally optimal solutions, but optimization is computationally expensive. On the other hand, deterministic methods (e.g. gradient descent) are local search methods, with less computational cost, but at the expense of missing good solutions.
The optimization problem can be tackled by tools such as Genetdes (Rodrigo et al., 2007b) and OptCircuit (Dasika&Maranas, 2008). They rely on different parts characterizations and optimization algorithms. Genetdes uses a stochastic method termed “Simulated Annealing” (Kirkpatrick et al., 1983), which produces a single solution starting from a random circuit configuration. As a drawback, the algorithm is more likely to get stuck in a local minimum than an evolutionary algorithm. OptCircuit, on the contrary, treats the circuit design problem with a deterministic method (Bansal et al., 2003), implementing a procedure towards a “local’ optimal solution. Each of these optimization algorithms requires a very simplified model for gene dynamics where, for instance, transcription and translation are treated as a single step process. Moreover, the current methods can cope only with rather small circuits. Another tool that has been described by Batt and co-workers (2007), RoVerGeNe, addresses the problem of parameter estimation more specifically. This tool permits to tune the performance and to estimate the robustness of a synthetic network with a known behavior and for which the topology does not require further improvement.
Detailed design of synthetic parts that reproduce the estimated circuit kinetics and dynamics is a complex task. It requires computational tools in order to achieve error free solutions in a reasonable amount of time. Other than the placement/removal of restriction sites and the insertion/deletion of longer motifs, mutations of single nucleotides may be necessary to tune part characteristics (e.g. promoter strength and affinity toward regulatory factors). Gene Designer (Villalobos et al., 2006) is a complete tool for building artificial DNA segments and codon usage optimization. GeneDesign (Richardson et al., 2006) is another tool to design long synthetic DNA sequences. Many other tools are available for specific analysis of the DNA and RNA circuit components. The package UNAFold (Markham &Zuker, 2008) predicts the secondary structure of nucleic acid sequences to simulate their hybridizations and to estimate their melting temperature according to physical considerations. A more accurate analysis of the secondary structure of ribonucleic acids can be performed through the Vienna RNA package (Hofacker, 2003). Binding sites along a DNA chain can be located using Zinc Finger Tools (Mandell&Barbas, 2006). These tools allows one to search DNA sequences for target sites of particular zinc finger proteins (Kaiser, 2005), whose structure and composition can also be arranged. Thus, gene control by a class of proteins with either regulation or nuclease activity can be improved. Furthermore, tools that enable promoter predictions and primers design are available, such as BDGP and Primer3. Another relevant task in synthetic biology is the design and engineering of new proteins. Many tools have been proposed for structure prediction, homology modeling, function prediction, docking simulations and DNA-protein interactions evaluation. Examples include the Rosetta package (Simons et al., 1999); RAPTOR (Xu et al., 2003); PFP (Hawkins et al., 2006); Autodock 4.2 (Morris et al., 2009) and Hex 5.1 (Ritchie, 2008).
Further advance in computational synthetic biology will result from tools that combine and integrate most of the tasks discussed, starting with the choice and assembly of biological parts to the compilation and modification of the corresponding DNA sequences. Examples of such tools comprise SynBioSS (Hill et al., 2008); Clotho and Biskit (Grunberg et al., 2007). Critical elements are still lacking, such as tools for automatic information integration (literature and databases), and tools that re-use standardized model entities for optimal circuit design. Overall, providing an extended and integrated information technology infrastructure will be crucial for the development of the synthetic biology field.
4. A roadmap from design to production of new drugs
Biological systems are dynamic, that is they mutate, evolve and are subject to noise.Currently, the full knowledge on how these systems work is still limited. As previously discussed, synthetic biology approaches involve breaking down organisms into a hierarchy of composable parts, which is useful for conceptualization purposes. Reprogramming a cell involves the creation of synthetic biological components by adding, removing, or changing genes and proteins. Nevertheless, it is important to notice that assembly of parts largely depends on the cellular context (the so-called chassis), thus restraining the abstraction of biological components into devices and modules, and their use in design and engineering of new organisms or functions.
One level of abstraction from the DNA synthesis and manipulation is parts production, which optimization can be accomplished through either rational design or directed evolution. Applying rational design to parts alteration or creation is advantageous, in that it cannot only generate products with a defined function, but it can also produce biological insights into how the designed function comes about. However, it requires prior structural knowledge of the part, which is frequently unavailable. Directed evolution is an alternative method that can effectively address this limitation. Many synthetic biology applications will require parts for genetic circuits, cell–cell communication systems, and non-natural metabolic pathways that cannot be found in Nature, simply because Nature is not in need of them (Dougherty & Arnold, 2009). In essence, directed evolution begins with the generation of a library containing many different DNA molecules, often by error-prone DNA replication, DNA shuffling or combinatorial synthesis (Crameri et al., 1998). The library is next subjected to high-throughput screening or selection methods that maintain a link between genotype and phenotype in order to enrich the molecules that produce the desired function. Directed evolution can also be applied at other levels of biological hierarchy, for example to evolve entire gene circuits (Yokobayashi et al., 2002). Rational design and directed evolution should not be viewed as opposing methods, but as alternate ways to produce and optimize parts, each with their own unique strengths and weaknesses. Directed evolution can complement this technique, by using mutagenesis and subsequent screening for improved synthetic properties (Brustad& Arnold, 2010). In addition, methods have been developed to incorporate unnatural amino acids in peptides and proteins (Voloshchuk&Montclare, 2009). This will expand the toolbox of protein parts, and add beneficial effects, such as increased
For the design, engineering, integration and testing of new synthetic gene networks, tools and methods derived from experimental molecular biology must be used (for details see section 2). Nevertheless, progress on these tools and methods is still not enough to guarantee the complete success of the experiment. As a result, design of synthetic biological systems has become an iterative process of modeling, construction, and experimental testing that continues until a system achieves the desired behavior (Purnick& Weiss, 2009). The process begins with the abstract design of devices, modules, or organisms, and is often guided by mathematical models (Koide et al., 2009). Afterwards, the newly constructed systems are tested experimentally. However, such initial attempts rarely yield fully functional implementations due to incomplete biological information. Rational redesign based on mathematical models improves system behavior in such situations (Koide et al., 2009; Prather & Martin, 2008). Directed evolution is a complimentary approach, which can yield novel and unexpected beneficial changes to the system (Yokobayashi et al., 2002). These retooled systems are once again tested experimentally and the process is repeated as needed. Many synthetic biological systems have been engineered successfully in this fashion because the methodology is highly tolerant to uncertainty (Matsuoka et al., 2009). Figure 1 illustrates the above mentioned iterative approach used in synthetic biology.
Since its inception, metabolic engineering aims to optimize cellular metabolism for a particular industrial process application through the use of directed genetic modifications (Tyo et al., 2007). Metabolic engineering is often seen as a cyclic process (Nielsen, 2001), where the cell factory is analyzed and an appropriate target is identified. This target is then experimentally implemented and the resulting strain is characterized experimentally and, if necessary, further analyses are conducted to identify novel targets. The application of synthetic biology to metabolic engineering can potentially create a paradigm shift. Rather than starting with the full complement of components in a wild-type organism and piecewise modifying and streamlining its function, metabolic engineering can be attempted from a bottom-up, parts-based approach to design by carefully and rationally specifying the inclusion of each necessary component (McArthur IV & Fong,2010). The importance of rationally designing improved or new microbial cell factories for the production of drugs has grown substantially since there is an increasing need for new or existing drugs at prices that can be affordable for low-income countries. Large-scale re-engineering of a biological circuit will require systems-level optimization that will come from a deep understanding of operational relationships among all the constituent parts of a cell. The integrated framework necessary for conducting such complex bioengineering requires the convergence of systems and synthetic biology (Koide et al., 2009). In recent years, with advances in systems biology (Kitano, 2002), there has been an increasing trend toward using mathematical and computational tools for the
Current models in both synthetic and systems biology emphasize the relationship between environmental influences and the responses of biological networks. Nevertheless, these models operate at different scales, and to understand the new paradigm of rational systems re-engineering, synthetic and systems biology fields must join forces (Koide et al., 2009). Synthetic biology and bottom-up systems biology methods extract discrete, accurate, quantitative, kinetic and mechanistic details of regulatory sub-circuits. The models generated from these approaches provide an explicit mathematical foundation that can ultimately be used in systems redesign and re-engineering. However, these approaches are confounded by high dimensionality, non-linearity and poor prior knowledge of key dynamic parameters (Fisher &Henzinger, 2007) when scaled to large systems. Consequently, modular sub-network characterization is performed assuming that the network is isolated from the rest of the host system. The top-down systems biology approach is based on data from high-throughput experiments that list the complete set of components within a system in a qualitative or semi-quantitative manner. Models of overall systems are similarly qualitative, tending toward algorithmic descriptions of component interactions. Such models are amenable to the experimental data used to develop them, but usually sacrifice the finer kinetic and mechanistic details of the molecular components involved (Price &Shmulevich, 2007). Bridging systems and synthetic biology approaches is being actively discussed and several solutions have been suggested (Koide et al., 2009).
A typical synthetic biology project is the design and engineering of a new biosynthetic pathway in a model organism (chassis). Generally,
5. Novel strategies for cancer diagnosis and drug development
Cancer is a main issue for the modern society and according to the World Health Organization it is within the top 10 of leading causes of death in middle- and high-income countries. Several possibilities to further improve existing therapies and diagnostics, or to develop novel alternatives that still have not been foreseen, can be drawn using synthetic biology approaches. Promising future applications include the development of RNA-based biosensors to produce a desired response
5.1. RNA-based biosensors
Synthetic biology seeks for new biological devices and systems that regulate gene expression and metabolite pathways. Many components of a living cell possess the ability to carry genetic information, such as DNA, RNA, proteins, among others. RNA has a critical role in several functions (genetic translation, protein synthesis, signal recognition of particles) due to its functional versatility from genetic blueprint (e.g. mRNA, RNA virus genomes. Its catalytic function as enzyme (e.g. ribozymes, rRNA) and regulator of gene expression (e.g. miRNA, siRNA) makes it stand out among other biopolymers with a more specialized scope (e.g. DNA, proteins) (Dawid et al., 2009). Therefore, non-coding RNA molecules enable the formation of complex structures that can interact with DNA, other RNA molecules, proteins and other small molecules (Isaacs et al., 2006).
Natural biological systems contain transcription factors and regulators, as well as several RNA-based mechanisms for regulating gene expression (Saito & Inoue, 2009). A number of studies have been conducted on the use of RNA components in the construction of synthetic biologic devices (Topp&Gallivan, 2007; Win &Smolke, 2007). The interaction of RNA with proteins, metabolites and other nucleic acids is affected by the relationship between sequence, structure and function. This is what makes the RNA molecule so attractive and malleable to engineering complex and programmable functions.
One of the most promising elements are the riboswitches, genetic control elements that allow small molecules to regulate gene expression. They are structured elements typically found in the 5’-untranslated regions of mRNA that recognize small molecules and respond by altering their three-dimensional structure. This, in turn, affects transcription elongation, translation initiation, or other steps of the process that lead to protein production (Beisel&Smolke, 2009; Winkler & Breaker, 2005). Biological cells can modulate gene expression in response to physical and chemical variations in the environment allowing them to control their metabolism and preventing the waste of energy expenditure or inappropriate physiological responses (Garst&Batey, 2009). There are currently at least twenty classes of riboswitches that recognize a wide range of ligands, including purine nucleobases (purine riboswitch), amino acids (lysine riboswitch), vitamin cofactors (cobalaminriboswitch), amino sugars, metal ions (mgtAriboswitch) and second messenger molecules (cyclic di-GMP riboswitch) (Beisel&Smolke, 2009). Riboswitches are typically composed of two distinct domains: a metabolite receptor known as the aptamer domain, and an expression platform whose secondary structure signals the regulatory response. Embedded within the aptamer domain is the switching sequence, a sequence shared between the aptamer domain and the expression platform (Garst&Batey, 2009). The aptamer domain is part of the RNA and forms precise three-dimensional structures. It is considered a structured nucleotide pocket belonging to the riboswitch, in the 5´-UTR, which when bound regulates downstream gene expression (Isaacs et al., 2006). Aptamers specifically recognize their corresponding target molecule, the ligand, within the complex group of other metabolites, with the appropriate affinity, such as dyes, biomarkers, proteins, peptides, aromatic small molecules, antibiotics and other biomolecules. Both the nucleotide sequence and the secondary structure of each aptamer remain highly conserved (Winkler & Breaker, 2005). Therefore, aptamer domains are the operators of the riboswitches.
A strategy for finding new aptamer sequences is the use of SELEX (Systemic Evolution of Ligands by Exponential enrichment method). SELEX is a combinatorial chemistry technique for producing oligonucleotides of either single-stranded DNA or RNA that specifically bind to one or more target ligands (Stoltenburg et al., 2007). The process begins with the synthesis of a very large oligonucleotide library consisting of randomly generated sequences of fixed length flanked by constant 5' and 3' ends that serve as primers. The sequences in the library are exposed to the target ligand and those that do not bind the target are removed, usually by affinity chromatography. The bound sequences are eluted and amplified by PCR to prepare for subsequent rounds of selection in which the stringency of the elution conditions is increased to identify the tightest-binding sequences (Stoltenburg et al., 2007). SELEX has been used to evolve aptamers of extremely high binding affinity to a variety of target ligands. Clinical uses of the technique are suggested by aptamers that bind tumor markers (Ferreira et al., 2006). The aptamer sequence must then be placed near to the RBS of the reporter gene, and inserted into
Synthetic riboswitches represent a powerful tool for the design of biological sensors that can, for example, detect cancer cells, or the microenvironment of a tumor, and in the presence of a given molecule perform a desired function, like the expression
5.2. Bacteria as anti-cancer agents
Bacteria possess unique features that make them powerful candidates for treating cancer in ways that are unattainable by conventional methods. The moderate success of conventional methods, such as chemotherapy and radiation, is related to its toxicity to normal tissue and inability to destroy all cancer cells. Many bacteria have been reported to specifically target tumors, actively penetrate tissue, be easily detected and/or induce a controlled cytotoxicity. The possibility of engineering interactions between programmed bacteria and mammalian cells opens unforeseen progresses in the medical field. Emerging applications include the design of bacteria to produce therapeutic agents (
Ideally, an engineered bacterium for cancer therapy would specifically target tumors enabling the use of more toxic molecules without systemic effects; be self-propelled enabling its penetration into tumor regions that are inaccessible to passive therapies; be responsive to external signals enabling the precise control of location and timing of cytotoxicity; be able to sense the local environment allowing the development of responsive therapies that can make decisions about where and when drugs are administered; and be externally detectable, thus providing information about the state of the tumor, the success of localization and the efficacy of treatment (Forbes, 2010). Indeed some of these features naturally exist in some bacteria, e.g. many genera of bacteria have been shown to preferentially accumulate in tumors, including
Ultrasound is one of the techniques often used to treat solid tumors (e.g. breast cancer); however, this technique is not always successful, as sometimes it just heats the tumor without destroying it. Therefore, we are currently engineering the heat shock response machinery from
5.3. Alternative nanosizeddrug carriers
The design of novel tumor targeted multifunctional particles is another extremely interesting and innovative approach that makes use of the synthetic biology principles. The modest success of the traditional strategies for cancer treatment has driven research towards the development of new approaches underpinned by mechanistic understanding of cancer progression and targeted delivery of rational combination therapy.
5.3.1. Viral drug delivery systems
The use of viruses, in the form of vaccines, has been common practice ever since its first use to combat smallpox. Recently, genetic engineering has enlarged the applications of viruses, since it allows the removal of pathogen genes encoding virulence factors that are present in the virus coat. As a result, it can elicit immunity without causing serious health effects in humans. In the light of gene therapy, the use of virus-based entities hold a promising future, since by nature, they are being delivered to human target cells, and can be easily manipulated genetically. As such, they may be applied to target and lyse specific cancer cells, delivering therapeutics
5.4. Microbial cell factories for the production of drugs
On a different perspective, as exemplified in section 4, synthetic biology approaches can be used for the large scale production of compounds with pharmaceutical applications. One of the easily employable approaches to develop synthetic pathways is to combine genes from different organisms, and design a new set of metabolic pathways to produce various natural and unnatural products. The host organism provides precursors from its own metabolism, which are subsequently converted to the desired product through the expression of the heterologous genes (see section 4). Existing examples of synthetic metabolic networks make use of transcriptional and translational control elements to regulate the expression of enzymes that synthesize and breakdown metabolites. In these systems, metabolite concentration acts as an input for other control elements (Andrianantoandro et al., 2006). An entire metabolic pathwayfrom
Many polyketides and nonribosomal peptides are being used as antibiotic, anti-tumor and immunosuppressant drugs (Neumann & Neumann-Staubitz, 2010). In order to produce them in heterologous hosts, assembly of all the necessary genes that make up the synthetic pathways is essential. The metabolic systems for the synthesis of polyketides are composed of multiple modules, in which an individual module consists of either a polyketide synthase or a nonribosomal peptide synthetase. Each module has a specific set of catalytic domains, which ultimately determine the structure of the metabolic product and thus its function. Recently, Bumpus et al. (2009) presented a proteomic strategy to identify new gene clusters for the production of polyketides and nonribosomal peptides, and their biosynthetic pathways, by adapting mass-spectrometry-based proteomics. This approach allowed identification of genes that are used in the production of the target product in a species, for which a complete genome sequence is not available. Such newly identified pathways can then be copied into a new host strain that is more suitable for producing polyketides and nonribosomal peptides at an industrial scale.This exemplifies that the sources of new pathways are not limited to species with fully sequenced genomes.
The use of synthetic biology approaches in the field of metabolic engineering opens enormous possibilities, especially toward the production of new drugs for cancer treatment. Our goal is to design and model a new biosynthetic pathway for the production of natural drugs in
Despite all the scientific advances that humankind has seen over the last centuries, there are still no clear and defined solutions to diagnose and treat cancer. In this sense, the search for innovative and efficient solutions continues to drive research and investment in this field. Synthetic biology uses engineering principles to create, in a rational and systematic way, functional systems based on the molecular machines and regulatory circuits of living organisms, or to re-design and fabricate existing biological systems. Bioinformatics and newly developed computational tools play a key role in the improvement of such systems. Elucidation of disease mechanisms, identification of potential targets and biomarkers, design of biological elements for recognition and targeting of cancer cells, discovery of new chemotherapeutics or design of novel drugs and catalysts, are some of the promises of synthetic biology. Recent achievements are thrilling and promising; yet some of such innovative solutions are still far from a real application due to technical challenges and ethical issues. Nevertheless, many scientific efforts are being conducted to overcome these limitations, and undoubtedly it is expected that synthetic biology together with sophisticated computational tools, will pave the way to revolutionize the cancer field.