The long association and intense competition between bacteria and their viruses have created a fertile ground for evolution to develop numerous tools for DNA modification, assembly and degradation. Many of these tools underpin the past 50 years of molecular biology, and others show great potential in shaping the next 50 years of the field. Here, I present some of the tools that have come out of the bacteria-bacteriophage arms race and discuss some of the concepts that may shape their future use. Molecular biology remains a fast-growing area increasingly limited solely by researcher ingenuity.
- molecular biology tools
- DNA modifications
The relationship between bacteriophages and bacteria is often explained in terms of an arms race: each ‘developing’ measures and countermeasures for attacking and defending itself from the other [1, 2]. The imagery of an arms race is a powerful metaphor to summarise the relationship between possibility and availability that have constrained the emergence, evolution and diversification of life on Earth.
Life is limited by what is possible. For instance, life can only exist because chemical information storage is possible. On the other hand, life as we know it has evolved around DNA and RNA because they are informational molecules that could function in the environment of the early Earth and whose building blocks were readily available.
The relationship between bacteria and bacteriophages is similarly constrained. The emergence (or availability) of bacterial cells capable of establishing a rich internal environment (compared to the outside of the cell) creates the possibility for other organisms to evolve predatory or parasitic survival strategies, including bacteriophages. Once phages emerge, they alter the dynamics of the ecological niche and create an advantage to bacterial hosts that can reduce the success of phage infection—whether by hindering phage access to the cell cytoplasm, by interfering with phage survival or replication in the cell or by interfering with phage maturation and release .
Bacterial defences arise from any function already available in the host (e.g. uracil-DNA glycosylase involved in DNA repair) or that can be co-opted from available genetic resources in the cell or in the environment (e.g. restriction-modification systems). Defences can also emerge from loss of function (e.g. mutations to the maltose porin LamB in E. coli, which make it resistant to bacteriophage λ infections ).
Given the prevalence of bacteria and phages in the environment, and given the evolutionary scale time of their arms race, the variety, complexity and efficiency of these attack and defence strategies are huge and can range from silent integration into the host genome (i.e. lysogeny) to enacting a hostile molecular takeover of the bacterial host cell. Despite our current efforts to map the genetic diversity available on Earth, it remains likely that new strategies are still to be identified and characterised.
Nonetheless, many of these defence and attack strategies have also been harnessed for biotechnology applications, significantly beyond the simple use of bacteriophages (or bacteriophage proteins) as bacterial control agents [5, 6, 7]. Restriction-modification (RM) systems found in bacteria were among the very first tools isolated from the bacteria-phage arms race [8, 9]. They de facto represent the start of modern molecular biology, and they have remained key tools for over 50 years (Figure 1).
More recently, another tool derived from bacterial defence has been harnessed, with potentially equally transformative impact on how we interact with biology: clustered regularly interspersed palindromic repeats (CRISPR) [26, 27, 28]. CRISPR forms part of an adaptive immunity system in prokaryotes, but it is being harnessed to deliver a wide range of research and therapeutic tools.
Although RM systems and CRISPR are deservedly acknowledged as having a significant impact on molecular biology and biotechnology, many other tools have been or are being developed based on components isolated from the bacteria-phage arms race. This chapter focuses on some of those tools—their mechanisms, current and potential applications.
2. Common molecular biology tools and orthogonality
Because of the wide range of bacteriophage infection strategies available, it becomes difficult to introduce simple classification without recreating the complexity of approaches taken by phages. For instance, bacteriophage promoters can rely exclusively on host proteins (e.g. T4 early promoters), on a mixture of host and phage proteins (e.g. T4 middle promoters or PL promoter from λ phage) or exclusively on phage-derived factors (e.g. T7 RNA polymerase promoters). This provides a continuum that can be further dissected by analysing the mechanism of the hybrid promoters, with their specific host and phage dependencies.
That continuum maps how independent a phage system is from the host while still active within the host, i.e. it is a measure of the orthogonality of the system. Having evolved to survive in a changing environment, bacteria have complex layers of gene expression regulation with multiple feedback systems which are not necessarily easy to control independently, despite our advances in understanding bacterial metabolism [29, 30]. In that context, phage systems that have reduced dependencies on the host machinery (i.e. increased orthogonality) provide isolated systems that can be simpler to regulate and are, at least in part, shielded from variations in the cellular machinery—an approach that has dominated biotechnology until recently.
Many of the common phage-derived biotechnology tools have been developed from such systems, none more so than T7 RNA polymerase . Isolated from T7 bacteriophage, this monomeric RNA polymerase can recognise a specific promoter sequence. The core T7 promoter sequence (TAATACGACTCACTATAG) is sufficient to trigger transcription in cells harbouring a T7 RNA polymerase gene. This is the strategy set up behind pET vectors, which contain a T7 promoter and rely on an E. coli host carrying a T7 RNA polymerase under an inducible promoter (e.g. lacUV)—usually the result of the introduction of a DE3 phage [32, 33].
Nevertheless, the context of the T7 promoter can have a significant impact on the expression level of the downstream genes, and at high polymerase concentrations, it is possible to drive transcription from suboptimal promoters—highlighting that the orthogonality of the system is limited.
Given its monomeric structure and orthogonal role in cellular transcription, T7 RNA polymerase became not only a useful tool in biotechnology but also an important model system for the study of transcription (reviewed in ). Because of its orthogonality, T7 RNA polymerase (and its promoter) can be harnessed for the regulation of transcription in a wide range of hosts beyond E. coli, including Gram-positive bacterial hosts , yeast [36, 37] and human cells [38, 39].
Its role in the regulation of transcription has also been expanded through the creation of more complex systems using split T7 RNA polymerase proteins. Surprisingly for a mesophilic highly dynamic enzyme, T7 RNA polymerase can be expressed in two [40, 41] or more  fragments that in vivo are able to reassemble and function as viable RNA polymerases.
While orthogonality can be a desirable feature for in vivo applications, it is wholly unnecessary for in vitro applications, where the key constraint lies on identifying reaction conditions in which expressed and purified proteins are sufficiently active to carry out the desired function. That is the case with T4 DNA ligase and T4 polynucleotide kinase, which were originally isolated from the E. coli T4 phage and remain important tools in molecular biology.
T4 DNA ligase has a central role in the replication and repair of the phage genome during its infection of E. coli . This also entails coping with DNA modifications such as the full substitution of cytosine for 5-hydroxymethylcytosine and glucosyl-5-hydroxymethylcytosine that naturally occurs in vivo [44, 45]—this is a phage defence mechanism discussed below. Although its structure has only recently been determined , the mechanism of action of this enzyme has long been characterised . Even in the absence of other phage genes, it is active in vivo , but its main application in molecular biology remains its in vitro activity that, coupled with restriction endonucleases, has underpinned modern molecular biology—allowing a molecular cut and paste approach to DNA assembly.
Ligase in vitro activity, particularly its ability to accept modified ligands, has been extensively explored for the assembly of heavily modified DNA sequences for aptamer selection [49, 50] and to explore a wider range of nucleic acid modifications, such as sugar-modified nucleic acids .
3. Second-generation tools and applications
The combination of different enzymatic functions has created novel applications, such as the large-scale DNA assembly in Gibson assembly through the combination of exonuclease, polymerase and ligase activities . However, a whole range of novel applications are possible by harnessing additional bacteriophage proteins that have not yet been extensively explored.
Recombinases and integrases, enzymes that catalyse the sequence-specific insertion of a phage genome into the host chromosome [53, 54], were identified early in bacteriophage research (e.g. the λ integrase) but were not immediately harnessed for applications. These came substantially later once recombinases were shown to facilitate DNA assembly, whether by increasing the efficiency of subcloning such as in Gateway cloning , or enabling multipart assemblies needed for metabolic engineering .
In general, recombinases bind DNA specifically, as dimers, on recognition sites that are relatively short (usually between 30 and 50 bases) and partially palindromic, termed attP and attB (originally to distinguish phage and bacterial origins). These two sites have different sequences which contribute to making the recombination process unidirectional. The recombinases facilitate the break and religation of double-stranded DNA within the att sites resulting in chimeric sites that are then labelled attR and attL (from right and left sides, respectively). Insertion of a phage genome into the host is efficient and stable, but it can also be reversed with the contribution of a single host factor (reviewed in ).
Because of the high efficiency of integration, recombinases have been also developed as systems to facilitate homologous recombination in higher eukaryotes, such as mammalian cells , Drosophila organisms  and plants (reviewed in ). In the latter, recombinases were of particular interest because of their potential to remove transformation and selection markers from engineered crops, leaving behind only the genes responsible for the engineered trait. This idea of using recombinases to directly alter an organism’s genome has been vastly expanded in the Yeast 2.0 project, through the design implementation of multiple loxP (the equivalent to att sites for Cre recombinase) in the synthetic yeast genome that can be activated, leading to large-scale genome rearrangement—termed synthetic chromosome rearrangement and modification by loxP-mediated evolution (SCRaMbLE) .
By enabling controllable chromosome rearrangements between the designed loxP sites, a synthetic yeast can delete, duplicate and reorder many of its genes, allowing in vivo selection for desirable traits such as increased alkali tolerance . Alternatively, the system can be coupled to heterologously expressed genes allowing the rapid optimisation of pathways .
Like SCRaMbLE, protein-directed evolution relies on cycles of mutagenesis (to introduce diversity into a population) and selection (to reduce diversity towards functional proteins)  which in some platforms can be achieved continuously in vivo—e.g. in PACE  or in some continuous culture approaches . In both examples, mutation can be controlled by stressors or the induction of error-prone replication but are not necessarily limited to the area of interest (e.g. a single gene). Greater control of targeting is possible and has been reported through the use of an error-prone PolI —which is necessary for the replication of some bacterial plasmids and can be used to drive diversification in vivo in the vicinity of plasmid replication initiation—and protein fusions that target an error-prone polymerase to a particular region of the genome [e.g. EvolvR and MutaT7 (reviewed in )].
However, one such system, termed diversity-generating retroelements (DGRs), has naturally emerged in bacteriophages and was first implicated in the tropism switching of Bordetella phages [68, 69] but that has since been identified in a wide range of bacterial and archaeal genomes [70, 71]. The system relies on an error-prone reverse transcriptase (RT) and on two DNA repeats—one operating as a template (template repeat) and the other as the target (variable repeat). RNA synthesised from the template is used to guide the error-prone synthesis of a DNA by the RT, which also coordinates its insertion at the target site. The error-profile of DGR RTs results in adenines being replaced with other bases, creating a directionality to the evolution that is always anchored by the template repeat . Nevertheless, changes in DNA sequences involved in the targeting of the retroelement, termed initiation of mutagenic homing (IMH), can lead to both template and variable repeats being allowed to evolve at different rates—removing some of the directionality in evolution and freeing the system to explore the sequence space more thoroughly .
Despite its potential, it remains to be seen whether such a system can be harnessed for protein engineering. If the DGR systems can themselves be engineered, their targeting and error rate may be amenable to modulation opening possibilities to compete with the most recent Cas9-derived gene editors .
4. Xenobiotic nucleic acids
Chemical modification of the phage’s own genome is a widespread strategy that emerged multiple times in evolution to circumvent (or at least slow down) bacterial defence mechanisms that target the invading DNA for degradation: restriction endonucleases, exonucleases and CRISPR-Cas systems [44, 74]. Those modifications have been reported not only on the nucleobases, akin to eukaryotic epigenetic markers, but also on the nucleic acid backbone.
Reported nucleobase modifications suggest that the overwhelming majority of any such modifications is targeted to the C5 position in pyrimidines. They range from small modifications, such as methylation, to bulky modifications, such as putrescine and even carbohydrate moieties. While such modifications have long been known, new sequencing platforms capable of reading the DNA sequence without an amplification step, such as nanopore sequencing, hold great promise in enabling a more systematic mapping and characterisation of DNA modifications in phage genomes (Figure 2) .
DNA modifications, particularly modifications that bring chemical functionality not available in natural bases, such as glycosylation in Bacillus subtilis SP-15 phage , can be harnessed for function as has been achieved through the chemical modification of DNA bases and systematic evolution of ligands by exponential enrichment (SELEX) [50, 77]. Despite characterisation of the biosynthetic pathway for multiple-phage DNA modification systems, none have been implemented in vitro for applications.
One potential bottleneck lies on how the phage and the host handle those chemically modified DNAs. Upon infection, the mature phage DNA needs to be modified if that is an evolutionary strategy being exploited to slow down or avoid in vivo degradation. On the other hand, those chemical modifications can affect DNA structure and biophysical properties which may also be detrimental to bacteriophage replication—since this would require a DNA polymerase capable of processing such heavily modified genomes.
It is known that at least in some cases, this chemical cloaking is removed upon infection, such as in SP-15 , before unmodified DNA is replicated in vivo. But, given viral polymerases more permissive substrate specificity, it is possible that some systems can be replicated directly by highly adapted phage polymerases—either to DNA followed by reinstalling the chemical modifications or directly from modified DNA to modified DNA. In the case of SP-15, the bulkier modifications are rapidly removed prior to replication of DNA . T4 seems to follow a similar pattern where glycosylation is removed, and DNA is replicated containing only the simpler 5-hydroxymethylation modification. This is further supported by the biochemical evidence that glycosylation is ‘installed’ on the replicated T4 DNA [44, 74]. Nevertheless, early T4 transcription occurs rapidly, and it is carried out by the host RNA polymerase, suggesting that natural RNA polymerases can still use the hypermodified bases in that template.
Notably, although phage polymerases replicate phage DNA in vivo harbouring simple chemical modifications, such as (in Synechocystis S-2 L phage) 2-aminoadenine  and uracil (in Bacillus phages AR9, PBS1 and PBS2, Yersinia phage PhiR1-37 and Staphylococcus phage S6), no viral polymerase has been described that is capable of selectively incorporating the modified bases. That is, although some bacteriophages make use of modified nucleobases and have evolved systems that lead to 100% incorporation of the modified bases in their genomes, their DNA polymerases have not specialised towards being able to only incorporate the modified nucleobases—they remain able to recognise unmodified triphosphates.
Still, the increased substrate flexibility of phage DNA polymerases may at least in part justify why a Bacillus subtilis Phi29 DNA polymerase required a single mutation for the synthesis of anhydrohexitol nucleic acids (HNA)  while an archaeal enzyme required in excess of seven mutations .
Bacteriophages remain a rich source of novel functionalities that can be harnessed to advance molecular biology (and synthetic biology). The examples here provided represent only a small fraction of the potential applications available, which also include medical applications from phage proteins [82, 83] and engineered phages [10, 11, 84].
In addition, bacteriophages have had a close relationship with directed evolution, either as a vehicle such as in phage display [85, 86] or by providing (in addition to the examples above) proteins to accelerate strain engineering, such as in multiplex automated genome engineering (MAGE) .
Finally, bacteriophages may also become an important tool in harnessing new non-model organisms in synthetic biology, as pre-optimised DNA delivery nanomachines for custom circuits.