Proposal for a Minimal DNA Auto-Replicative System

DNA replication allows cell division and population growth of living organisms. Here we will focus on DNA replication in prokaryotic single celled microorganisms. Several excellent reviews of the molecular processes that carry out DNA replication in bacteria already exist, E. coli being the model described in most detail (Langston LD et al., 2009; Quinones-Valles et al., 2011). Briefly, the process begins when DnaA (DNA initiator replication protein) in its activated form (DnaA-ATP) recognizes and binds the oriC (origin of replication on the bacte‐ rial chromosome). In the following step, the replisome is assembled and binds to the com‐ plex of DnaA-ATP at the oriC. Next, the DNA strands are separated and synthesis of the complementary strands initiates followed by elongation steps. The molecular mechanisms of elongation differ depending on the strand used as a template; the leading strand is repli‐ cated continuously starting from a unique RNA primer, whereas on the lagging strand DNA polymerase III must recognize several RNA primers, previously synthesized by DnaG, and then replicate each DNA fragment (Okazaki fragments). This is followed by the replacement of RNA primers by DNA polymerase I, and removal of nicks by a DNA ligase. The whole process concludes when replisomes reach the ter site, almost opposite to oriC on the circular DNA molecule. Tus proteins are attached to the ter sites and when replisomes reach these complexes, they collide and finally are disassembled (see Figure 1 for an overview of the whole process).


Introduction
DNA replication allows cell division and population growth of living organisms.Here we will focus on DNA replication in prokaryotic single celled microorganisms.Several excellent reviews of the molecular processes that carry out DNA replication in bacteria already exist, E. coli being the model described in most detail (Langston LD et al., 2009;Quiñones-Valles et al., 2011).Briefly, the process begins when DnaA (DNA initiator replication protein) in its activated form (DnaA-ATP) recognizes and binds the oriC (origin of replication on the bacterial chromosome).In the following step, the replisome is assembled and binds to the complex of DnaA-ATP at the oriC.Next, the DNA strands are separated and synthesis of the complementary strands initiates followed by elongation steps.The molecular mechanisms of elongation differ depending on the strand used as a template; the leading strand is replicated continuously starting from a unique RNA primer, whereas on the lagging strand DNA polymerase III must recognize several RNA primers, previously synthesized by DnaG, and then replicate each DNA fragment (Okazaki fragments).This is followed by the replacement of RNA primers by DNA polymerase I, and removal of nicks by a DNA ligase.The whole process concludes when replisomes reach the ter site, almost opposite to oriC on the circular DNA molecule.Tus proteins are attached to the ter sites and when replisomes reach these complexes, they collide and finally are disassembled (see Figure 1 for an overview of the whole process).
From another aspect, one of the more challenging areas of Synthetic Biology is the design and construction of minimal cells.The accomplishment of this aim might contribute to answering basic questions about the minimal components necessary to sustain life systems, in addition to cell auto-organization, function and evolution.In a practical application, mini-mal cells can be used as a background chassis for the generation of dedicated biological systems designed for the synthesis or degradation of diverse compounds of interest.(1).DnaA binds to ATP, homo-multimers of DnaA-ATP are formed (2).These homo-multimers bind to oriC and once replication is initiated SeqA binds this region and prevents initiation of a new replication event (3).The SSB (single strand binding) protein and DnaB assist the complex to open the DNA strands and release DnaC (4).A DNA topoisomerase helps to further unfold the DNA strands (5).b) The elongation phase; the replication fork is formed and the replisome is assembled (6).DNA polymerase III replicates the leading strand (7).DnaG incorporates RNA primers as primers for replication of the lagging strand (8).Polymerase III can now replicate Okazaki fragments on the lagging strand (9).DNA polymerase I replaces RNA nucleotides for DNA nucleotides (10).A DNA ligase (LigA) seals the nicks on contiguous DNA fragments (11).c) Termination of DNA replication; The protein Tus binds to the ter sites, when replisomes reach Tus, replication ceases (12).The recombinases XerC and XerD resolve the replicated DNA strands (13).Finally, FtsK translocates the DNA strands and each double-stranded DNA molecule can be liberated (14).
In recent years, the essential properties and capabilities necessary to develop minimum cells have been broadly speculated (MacDonald et al., 2011).Among these characteristics it is evident that DNA replication should be a fundamental property of these biosystems.Many genes for DNA replication are found to be conserved when comparative analysis of bacterial genomes is carried out.These types of genes are considered as informational genes, in charge of maintaining the genetic code, and are among the genes less frequently found be horizontally transferable (Jain et al, 1999).Therefore by genomic comparisons and functional analyses it is possible to propose a minimum core of genes capable of supporting the process of DNA replication.
From a genetic point of view, and for the purpose of this study it is important to state our definition of a minimal DNA auto-replicative system (MiDARS) as: a genetic system comprising the minimum number of DNA components, including regulatory elements and gene products necessary for the auto-replication of the DNA molecule on which they are encoded, functioning in an in vitro condition.
In this chapter we will develop a proposal for the construction of such an auto-replicative DNA system.This system is designed to serve as a scaffold for the incorporation of additional biological functions such as transcription and translation, etc.For the scaffold design we exploit information of genes necessary for replication in E. coli that are highly conserved in bacteria with extremely reduced genomes and analyze their functional role in DNA replication in order to finally propose a minimal genetic system with a DNA auto-replicatory function.

Minimal cells and minimal genetic systems
A minimal cell can be defined as a biological system that has the minimal number of genetic parts and molecular components for supporting life functions under defined growth conditions.In other words, it includes only the necessary number of genes and derived biomolecular machinery that are considered basic to support life functions (Jewett and Forster, 2010).
The concept of life is intrinsically complex; in biochemical terms it could be defined by three basic characteristics (Luisi et al., 2006): 1. auto-regulation of metabolism, 2. auto-replication of the genetic material and,

controlled evolution of their components and functions.
The design and synthesis of minimal cells depends on the environmental conditions the systems will be exposed to.Initially, we might consider that a minimal cell should be exposed to the most favorable conditions in order to facilitate its conception and function.These favorable conditions will require an environment where the cell is not suffering any kind of environmental stress.Nonetheless, even this ideal scenario is a challenging condition to direct the rational design of components of a minimal cellular system since the genes for many cellular functions are not yet totally defined.What we could do is to start to reconstruct minimal biological functions that are more or less well defined.These might be the processes relating to the central dogma of molecular biology: DNA replication, DNA transcription and mRNA translation (Figure 2).Some of these functions have been the object of different studies; e. g. transcription and translation were successfully recreated in the experiment of Asahara (2010) by separately expressing the components of the E. coli RNA polymerase, including the sigma70 factor and reconstituting the function of the complete enzyme in vitro.
Since one of the fundamental characteristics of life systems is the replication of their own genetic material, we can consider the design of minimal genetic systems that sustain DNA auto-replication as an important to starting point.

Approaches for the development of minimal genetic systems
Currently there are two approaches for the study of minimal biological systems.These are the top down and bottom up strategies (Delaye L & Moya A, 2009; Murtas, 2009).The top down approach considers the analysis of existing biological systems and, by following a reductionist approach, looks to minimize the number of components either by searching for con-served genetic elements or by experimentally reducing the genome without losing functionality.This strategy was used to reduce the E. coli genome by 15% by deleting nonessential genes, recombinogenic and mobile DNA elements, and cryptic genes.The resulting cells had good growth profiles and showed improved performance for protein production (Pósfai et al., 2006).Another focus of this approach is to carry out comparative genomics and define a set of conserved genes such as those in charge of specific functions (Gil et al., 2004;Forster & Church, 2006).
On the other hand, the bottom up approach involves the construction of complex systems starting from relatively simple molecular precursors.A classical example is the experiment of Miller, who obtained amino acids from a mixture of simple organic and inorganic molecules (Miller, 1953).
Considering the design and construction of minimal genetic systems, benefits should be obtained by employing both complementary top down and bottom up approaches.

Escherichia coli as a model organism for the design of a minimal DNA auto-replicative system
Escherichia coli is a bacillary Gram-negative, aerobic, facultative and non-sporulating organism.It was discovered in 1885 by the physician Theodore von Escherich and is now classified as part of the Enterobacteriaceae family of the Gamma-proteobacterias (Blattner et al., 1997).This bacterium lives in the intestine of mammals, and assists its hosts with assimilation of nutrients, providing some vitamins and preventing the establishment of bacterial pathogens.Since its discovery, E. coli has been widely used as a working model in the laboratory to study biochemistry and diverse molecular processes.In addition, it has been widely used in biotechnology as a vehicle for the expression of multiple recombinant proteins and whole metabolic pathways.
Arthur Kornberg was one of the most prominent investigators in molecular biology and a pioneer in the description of the replicative process using E. coli as a model.For his accomplishments in the field he was awarded the Nobel Prize in Physiology and Medicine in 1959.He discovered DNA polymerase I (Bessman et al., 1958;Lehman et al., 1958a), and describes the synthesis of DNA as a process based on the use of a single strand of DNA as a template (Lehman et al., 1958b).Later, Kornberg and his collaborators discovered additional enzymes involved in DNA replication: DNA primase, DNA helicase, DnaA, PriA among others.Nowadays, the replication process and the replicative enzymes of E. coli are the best understood and characterized of any organism.
From a biotechnological standpoint, E. coli shows three important characteristics that make it an ideal organism to serve as the platform for the design of a synthetic cellular program (Foley & Shuler, 2010): 1. functionally it is the organism best characterized at the molecular and biochemical levels in terms of components of metabolism, 2. it has proven to be a robust vehicle for the expression of multiple biotechnological processes,

3
. it has a short growth cycle and is easy to manipulate genetically.
Additionally, the genome of E. coli serves as the principal source of standardized genetic parts for the construction of genetic circuits, the "BioBricks", in a project whose aim is to standardize genetic parts to facilitate biological engineering (http://partsregistry.org),(Smolke, 2009).Most biobricks are designed to function in E. coli, therefore, we think E. coli is the best organism of choice for the design of a DNA auto-replicative system.

Comparison of the DNA replicative machinery of E. coli with that of bacteria with reduced genomes
Comparative genomics is a powerful approach that allows the identification of genetic sequences sharing identity/similarity among different organisms.Through these comparisons it is possible to identify conserved genes and predict the components of the replicative machinery in several different organisms.
For our purpose, among the organisms of interest to consider in our design are those with extremely reduced genomes.A characteristic of these organisms is that they are incapable of growth in a free-living manner.The genomes of organisms with these characteristics correspond to those having the minimum number of genes possible in nature.From these we chose the 25 organisms with the most reduced genomes known to date (Table 1).All of these genomes contain less than 1,200 kbp of DNA and all are endosymbiotic bacteria, most of which are thought to survive at the expense of the host.
In these organisms, we searched for genes encoding enzymes involved in DNA replicative functions with orthology to the replicative machinery from E. coli (Table 2).To find orthologous genes we followed two complementary strategies: we looked for Clusters of Orthologous Groups (COGs, Tatusov et al., 2003)  ) had only 5, 3 and 5 genes related to replication respectively which were orthologous to E. coli.These three organisms are strict endosymbionts of insects, with the smallest genomes known to date (Table 2).The fact that these bacteria showed fewer genes related to DNA replication in comparison to bacteria with larger genomes (Figure 3), indicates that the minimal replicative machinery in these organisms might be composed by a small number of constituents.This observation raises many open questions, for instance:

i.
Are these genes sufficient to sustain the process of replication of an entire chromosome?; ii.Does the host supply the missing elements for replication of the endosymbionts DNA? and, iii.Do these organisms use additional proteins in comparison to those currently described for the process of DNA replication?
The apparent requirement of only a handful of genes for DNA replication in extremely reduced genomes, compared with the 228 annotated in E. coli, might suggest a parsimonious mechanism of DNA replication in endosymbiont bacteria since they are always living in stable environments.The genes which are more highly conserved in both reduced genomes and E. coli are those whose products form the replisome (dnaE, dnaB, dnaN, dnaG, dnaX, dnaQ, ssb, holA and holB), the genes encoding for DNA topoisomerase type II (gyrA and gyrB), and the gene for the NAD(+)-dependent DNA-ligase, ligA (Figure 4).

Components of a Minimal DNA Auto-Replicative System (MiDARS)
Of the three organisms with the most reduced genomes in nature, Carsonella ruddii is the more closely related phylogenetically to E. coli (Nakabachi et al., 2006).For this reason in our design we used the information of the replicative machinery in Carsonella ruddii and the functions known in E. coli.For the physical construction of the systems, however, we will use genes from E. coli for two main reasons: 1. it is difficult to obtain genomic DNA from Carsonella ruddii since it cannot be cultured in vitro.
2. E. coli is the best chassis for applications in synthetic biology as mentioned above and therefore adequate for the incorporation of additional functions.
For the design of the minimal DNA auto-replicative system we will attempt to include the minimal elements present in Carsonella and -in a conservative manner-those we presume as necessary to perform the process of DNA replication in E. coli.In addition to the coding sequences, it is also necessary to define the regulatory regions of the genes and we propose to conserve the operative regions as defined for E. coli with the future aim of expanding the minimal functions of E. coli including the regulatory functions.Other important regions to include in the design are: the DNA replication origin (oriC) and the signals for termination of replication.Below we propose the genetic components that would constitute a MiDARS.

The DNA initiator protein (dnaA)
At the beginning of the replication process, check-point proteins have to recognize and unfold the initiation site for replication at oriC.In E. coli DnaA is the principal protein employed for this purpose and is highly conserved among bacteria with reduced genomes.Therefore, dnaA should be present in the MiDARS.

The DNA helicase (dnaB)
The next candidate gene is dnaB, which encodes a DNA helicase.The role of the product of this gene is to unwind the DNA strands, a very important process during the elongation stage of replication.

The DNA primase (dnaG)
The gene that encodes the primase (dnaG) should also be considered.It is important for the synthesis of the RNA primers that permit the elongation of new DNA strands.

The single strand stabilization protein (ssb)
Another important function is the stabilization of single strands, carried out by the SSB protein, encoded by the ssb gene.

The core components of DNA polymerase III (dnaE and dnaQ)
The gene for the α subunit (dnaE) of DNA polymerase III is present in all 25 organisms with reduced genomes and the gene for the ε subunit (dnaQ) in twenty-one.These proteins form part of the core of DNA polymerase III, which carries out the essential polymerization and proofreading activities during DNA synthesis.

The clamp components (dnaX, holA, holB, dnaN)
During the elongation stage, two very important structures are formed; the leader and slider clamps.The first has the function of anchorage between DNA polymerase III and the DNA helicase; (Reyes-Lamothe R. et al., 2010) allowing the synthesis of the DNA in a synchronized manner between the leading and lagging strand.It is composed of the following subunits (genes): τ (dnaX), γ (dnaX), δ (holA) and δ'(holB).The circular slider clamp is constituted by two β-subunits (both products of dnaN gene), that recognize and bind to DNA-RNA hybrids (Georgescu R. et al., 2010).The slider clamp assists the core of DNA pol III to bind the lagging strand and allows the extension of the Okazaki fragments.

The DNA ligase (ligA)
The function of a ligase is needed for sealing nicks formed when the RNA primers are removed and replaced by DNA in the Okazaki fragments on the lagging strand.

Type II DNA topoisomerase (gyrA and gyrB)
We consider that a relaxing system produced by a DNA helicase may be necessary.This could be provided by the DNA gyrase complex (Type II Topoisomerase) composed of the A (gyrA) and B (gyrB) subunits.

Protein for termination of replication (tus)
Although there are several proteins that could contribute to termination of DNA replication we think that in a minimal system, the action of Tus could be enough to ensure this.

Origin of DNA replication (oriC)
This DNA sequence of around 245 bp in E. coli (Tabata et al., 1983) is needed to enable the DnaA protein to initiate the process of DNA replication

Termination of DNA replication (terB and terC)
These sequences are used by the Tus proteins to form the trap which terminates DNA replication.
The proposed elements that constitute the auto-replicative system are also listed in Table 3.This proposal is somewhat similar to previous reports, where genes that could constitute a minimal cell based on a comparative genomics study among various endosymbionts are described (Gil et al., 2004).In the present study however we also considered the inclusion of the DNA regions for initiation and termination of replication, as well as the dnaA, ssb and tus genes.

Expression of the replicative proteins of the MiDARS
A primary condition for the operation of an auto replicative system is that the protein-machinery encoded in it should be expressed.For transcription of the assembled group of genes, we propose use the E. coli RNA polymerase and its transcription factor sigma70 since all the genes of the system have a sigma70 factor promoter.The essential components of the RNA polymerase and their sigma70 factors have previously been successfully expressed separately and their activity reconstituted as mentioned previously (Asahara & Chong, 2010).We propose these components can be assembled as an additional functional module whose activity can be assayed separately and subsequently integrated into the system.The resulting mRNA ( 16) could be translated in an in vitro system such as the Pure System TM (Ueda et al., 1992; Shimizu & Ueda, 2010); containing ribosomes, aminoacyl-tRNAs, chaperones and initiation, elongation and termination factors among other elements essential for translation.Once protein synthesis is completed, the products could initiate replication of the DNA molecule for which the addition of deoxynucleotide triphosphates (dNTPs) and the appropriate buffers will be necessary.The source of energy for the system will be creatine phosphate with the creatine kinase enzyme as the regenerator (Shimizu et al., 2006).An outline for the operation of the DNA auto-replicative system is shown in Figure 5.

Perspectives
Previous efforts have been made to propose the design of minimal cells however this objective is still far from being accomplished.From the standpoint of Synthetic Biology, biological systems that are robust, predictable in performance and highly efficient are desired (Jewett & Forster, 2010).In this work, we present a proposal to build an auto-replicative DNA system as the first step toward the development of synthetic biosystems.Additional cellular processes will need to be designed and constructed in a modular way including: transcriptional and translational functions and a minimal metabolism in order to maintain cell growth and produce energy.
Once this first prototype has been constructed and tested for performance, some further reduced combinations of the proposed number of genes could be tested to determine the absolute minimum set of genes sufficient to sustain DNA auto-replication; e.g. the few genes present in Carsonella ruddii PV.
The system proposed in this work can be assembled using methodologies such as that used when working with Biobricks (Smolke, 2009).Once the mini-chromosome is assembled it could function in cell-free systems, in anucleated mini-cells (Adler et al., 1967), in spores that lack DNA (Siccardi et al., 1975), in micelles or lipidic vesicles, and in some commercial systems.An important achievement in this sense has previously been reported by another research group, namely DNA replication achieved by using the Phi29 DNA polymerase, inside a lipidic vesicle.In this report only one strand was linearly replicated and circularized (Kurihara, 2011).
The successful development of a DNA auto-replicative system as proposed here could be a very important platform for the development of synthetic biology and the potential for such a system is great: 1. in the refinement of biotechnological processes since cellular energy could be directed to the desired biosynthetic pathways; 2. in the study of synthetic or natural circuits at a higher resolution and sharpness due to the minimalization of cellular noise; and 3. to test some evolutionary hypotheses, such as the proposed components of last common ancestors and components of rudimentary first cells, among others.The system contains the initiation region (oriC) and termination (ter) sites for DNA replication as well as a set of genes from E. coli (Table 4).The genes can be organized in the same order as in the native chromosome and contain their native operator regions to control expression.b) Transcription and, c) Translation can be carried out in solution using commercial kits (e.g.Pure System), RNA Polymerase and the E. coli sigma70 factor.d) The initiation of replication is regulated by DnaA-ATP and the helicase will join to the lagging strand in order to form the replication forks.The primase will bind to the helicase to carry out the synthesis of RNA primers that permit the activity of DNA pol III.The SSB protein stabilizes single strands of DNA.e) Two core subunits (α and ε) of the DNA Pol III, perform the elongation and proofreading of DNA.The DNA ligase and DNA Pol I replace the RNA primers, sealing the nicks between contiguous DNA fragments on the lagging strand.Topoisomerase II will relax the DNA template as the replication fork progresses.f) The Tus protein bound to the ter sites serves as a trap for the replicative machinery headed by the DNA helicase, stopping its movement and promoting the separation of the new MiDARS.

Conclusions
Here we propose a design for the construction of a minimal genetic system for DNA autoreplication.This proposal is based on the consideration of the latest knowledge of the details of the mechanisms and controls of DNA replication in E. coli and by taking into account the conservation of the replicative machinery in bacteria with extremely reduced genomes particularly those present in Carsonella ruddii PV.
The proposed auto-replicative device consists of 17 DNA elements (27822 bp including their operator regions) taken from the E. coli genome and incorporating the most conserved elements of the replicative machinery found in bacteria with extremely reduced genomes.These genetic elements will maintain their native operator and termination regions.Their products encode proteins encompassing the minimal number of predicted activities involved in DNA replication.Finally we propose some conditions in which the system might function.

Figure 1 .
Figure 1.Main steps of DNA replication in bacteria.a) Initiation of DNA replication; the datA locus has a high affinity for binding DnaA (1).DnaA binds to ATP, homo-multimers of DnaA-ATP are formed (2).These homo-multimers bind to oriC and once replication is initiated SeqA binds this region and prevents initiation of a new replication event (3).The SSB (single strand binding) protein and DnaB assist the complex to open the DNA strands and release DnaC (4).A DNA topoisomerase helps to further unfold the DNA strands (5).b) The elongation phase; the replication fork is formed and the replisome is assembled(6).DNA polymerase III replicates the leading strand(7).DnaG incorporates RNA primers as primers for replication of the lagging strand(8).Polymerase III can now replicate Okazaki fragments on the lagging strand(9).DNA polymerase I replaces RNA nucleotides for DNA nucleotides(10).A DNA ligase (LigA) seals the nicks on contiguous DNA fragments(11).c) Termination of DNA replication; The protein Tus binds to the ter sites, when replisomes reach Tus, replication ceases(12).The recombinases XerC and XerD resolve the replicated DNA strands(13).Finally, FtsK translocates the DNA strands and each double-stranded DNA molecule can be liberated(14).

Figure 2 .
Figure 2. Representation of a hypothetical minimal auto-replicative system.One of the key features of the minimal cell is that it should perform basic functions such as transcription, translation and replication of the genetic information contained in its genome.

Figure 3 .
Figure 3. Conservation of DNA replicative machinery in bacteria with reduced genomes.Graph showing the relationship between number of genes annotated with DNA replicative functions versus genome sizes.

Figure 4 .
Figure 4. Conservation of genes for DNA replication in 25 reduced genomes and in E. coli.Grey bars represent the genes proposed to be essential for DNA auto-replication in a minimal genetic system.

Figure 5 .
Figure 5. Proposal for the minimal components of a MiDARS and their function.a) The genetic system is a simplified version of a prokaryotic DNA mini-chromosome (25432 bp).The system contains the initiation region (oriC) and termination (ter) sites for DNA replication as well as a set of genes from E. coli (Table4).The genes can be organized in the same order as in the native chromosome and contain their native operator regions to control expression.b) Transcription and, c) Translation can be carried out in solution using commercial kits (e.g.Pure System), RNA Polymerase and the E. coli sigma70 factor.d) The initiation of replication is regulated by DnaA-ATP and the helicase will join to the lagging strand in order to form the replication forks.The primase will bind to the helicase to carry out the synthesis of RNA primers that permit the activity of DNA pol III.The SSB protein stabilizes single strands of DNA.e) Two core subunits (α and ε) of the DNA Pol III, perform the elongation and proofreading of DNA.The DNA ligase and DNA Pol I replace the RNA primers, sealing the nicks between contiguous DNA fragments on the lagging strand.Topoisomerase II will relax the DNA template as the replication fork progresses.f) The Tus protein bound to the ter sites serves as a trap for the replicative machinery headed by the DNA helicase, stopping its movement and promoting the separation of the new MiDARS.

Table 2 .
Conservation of replicative genetic machinery in bacteria with less than 1200 kbp

Table 3 .
Components of a minimal DNA auto-replicative system.