PCR Advances Towards the Identification of Individual and Mixed Populations of Biotechnology Microbes

Public health and safety, diagnostics and surveillance are aided by knowledge of the identity and genetic content of biotechnology microbes and their close relatives. Both types of information allow recognition and prediction of virulence and pathogenicity of microbes. PCR has played an important role in enabling the identification of micro-organisms and the distinction of pathogenic from non-pathogenic species, since the technical descriptions in the mid-1980s (Mullis et al., 1986; Mullis and Faloona, 1987). This DNA amplification technology allows the generation of large template quantity, a pre-requisite for cloning and for dideoxy DNA “Sanger” sequencing (Sanger et al., 1977). As such, PCR has been integral in first generation and phylogenetic marker sequencing projects (Bottger, 1989).


Introduction
Public health and safety, diagnostics and surveillance are aided by knowledge of the identity and genetic content of biotechnology microbes and their close relatives. Both types of information allow recognition and prediction of virulence and pathogenicity of microbes. PCR has played an important role in enabling the identification of micro-organisms and the distinction of pathogenic from non-pathogenic species, since the technical descriptions in the mid-1980s (Mullis et al., 1986;Mullis and Faloona, 1987). This DNA amplification technology allows the generation of large template quantity, a pre-requisite for cloning and for dideoxy DNA "Sanger" sequencing (Sanger et al., 1977). As such, PCR has been integral in first generation and phylogenetic marker sequencing projects (Bottger, 1989).
During the last decade, PCR has remained a cornerstone in microbial genetic characterization. Marker sequencing remains a component of the polyphasic characterization of microbial genomes in which genetic, morphological and biochemical data are reconciled. At the same time, great progress has been made in single cell microbial genetics and PCR miniaturization has been implemented in second generation sequencing platforms (Metzker, 2010). Collectively, these developments have resulted in increased numbers of whole genome sequences from individual microbes of "unculturable" microorganisms and outbreak strains such as Shiga toxin-producing E. coli strain O104:H4 detected in Europe during 2011 (Mellmann et al., 2011). High throughput sequencing has allowed for insights into natural and human environments and their mixed bacterial populations (Hamady and Knight, 2009;Mardis, 2011;Sapkota et al., 2010).
This chapter serves to highlight PCR advances that have enabled microbial identification during the last decade. At the level of single species, identifications can involve phylogenetic marker sequencing, or whole genome sequencing from individual cells or cultures. Mixed microbial populations, may be sorted, individually identified by sequencing or collectively sequenced using high throughput platforms. The potential and challenges of these new platforms, as well as their applications towards novel microbial strains that will be produced by synthetic biology approaches, will be discussed.

Current challenges in microbial identification
Collectively, microbes occupy a vast range of ecological niches and feature intrinsic diverse metabolic potential. Microbial biotechnology has enabled the screening and enhancement of strains for commercial applications such as: preservation and harvest of natural resources (bio-pesticides and bio-mining of metals), environmental remediation (improved soil/air/water quality) and applications for sustainable development. Often, biotechnology microbes are used as single species, while other commercial products involve mixtures of a few or many different species and strains.
Bacterial strains, that feature desirable phenotypic traits, have been traditionally isolated by high-throughput screening, or strains have been improved by random mutagenesis and screening. Currently, consensus identification and classification of bacterial strains is carried out by a polyphasic approach. Phenotypic data (biochemical tests, fatty acid composition), genotypic data and phylogenetic information, that includes genetic information, derived from PCR amplification of marker genes, are reconciled (Vandamme et al., 1996).
Discrimination of beneficial and harmful species is challenging in a number of genera that contain closely related species: Burkholderia, Bacillus, Acinetobacter and Pseudomonas. For example, in the Burkholderia genus, B. cepacia is a non-pathogenic soil bacterium that is being developed for the application of phytoremediation (Barac et al., 2004) and clinically, B. cepacia bacteria have been associated with infections and cystic fibrosis, as reviewed in (Coenye and Vandamme, 2003). Another prime example concerns the Bacillus cereus sensu lato family of bacteria. This group comprises the Bacillus cereus species sensu stricto, B. anthracis, strains and subspecies of B. thuringiensis, B. mycoides, and B. weihenstephanensis. Most Bacillus cereus organisms are common soil bacteria that are pathogenic to insects and invertebrates. Some species may cause contamination problems in the dairy industry and paper mills and may also be a causative agent of food poisoning. Select strains of Bacillus anthracis and a few Bacillus cereus sensu stricto strains are the only ones reported to cause fatal pulmonary or intestinal infections (Dixon et al., 2000).
Bacterial mixtures, that have been created or isolated in order to carry out a function, pose technical challenges to polyphasic characterization. Phenotypic data is typically derived for individual microbial isolates. However, current culture techniques cannot support a substantial fraction of microbial species (Handelsman, 2004) and there is risk of bias towards culturable species. In these cases, molecular methods that directly acquire genetic information may remove this hindrance.

Phylogenetic marker amplification and analysis
Of all global markers, small subunit ribosomal RNA (16S rRNA) encoding genes are the best characterized genes for microbial systematics. The 16S rRNA gene is ubiquitous, highly conserved, but possesses enough variability to allow taxa specific discrimination. The gene www.intechopen.com is composed of nine hypervariable regions separated by conserved regions and sequences are available for numerous organisms via public databases such as NCBI, the Ribosomal Database project (Cole et al., 2009) and Greengenes (Desantis et al., 2006). The nine different variable 16S rRNA regions are flanked by conserved nucleotide stretches in bacteria (Neefs et al., 1993) and these could be used as targets for PCR primers with near-universal specificity.
It was during the mid-1980's that PCR first enabled molecular microbial ecology studies involving the 16S rRNA gene. Pace and colleagues first amplified 16S rRNA from bulk nucleic acid extractions using nearly "universal" primers, in order to sequence, classify and compare these to phylogenetic trees (Pace, 1997) (Woese, 1987). At this time it was observed that not all environmental microorganisms were capable of colony formation and that by sequencing cloned ribosomal DNA, new microbial species could be revealed (Stahl et al., 1984), .
Over the last few decades, a large number of primer sequences have been designed for amplification and sequencing of 16S RNA genes, as reviewed in (Baker et al., 2003). There are a number of databases available for the primer sequences. Some of these primers have been designed as taxa specific, whereas others have been designed to amplify all prokaryotic rRNA genes and are referred to as "universal".
16S rRNA sequences may offer limited taxonomic resolution, particularly for genera that feature close phylogenetic relationships. B. cepacia complex reference strains feature high similarity values (above 98%) which reflects a close phylogenetic relationship (Coenye and Vandamme, 2003). Also, up to 2% intraspecies diversity has been observed in B. cepacia rRNA sequences and they cannot be identified at the species level by simple comparison of 16S rRNA sequences. Similarly, for the B. cereus group, there is also insufficient divergence in 16S rRNA to allow for resolution of strains and species (Bavykin et al., 2004). In these cases, other global markers have been explored for strain discrimination such as the genes that encode: RNA polymerase subunits, DNA gyrases, heat shock and recA proteins and hisA. The strong functional and structural constraints for these gene products, limits the number of mutations that can occur in the genes and renders them useful as markers for relatedness.
Identification of distinct strains of a prokaryotic species can take place by multi-locus sequence typing, in which sequence mismatches in a small number of house keeping genes are analyzed (as reviewed in (Maiden, 2006)). In the case of prokaryotic identification of closely related species, a similar strategy designated multi-locus sequence analysis has been used for several studies and involves a two step process: rRNA sequencing in order to assign an unknown strain to a group (either genus or family), that in turn defines the particular genes and primers to be used for analysis. This two-tiered approach has allowed discrimination of Burkholderia strains and those of the Bacillus cereus group (reviewed in (Gevers et al., 2005)).

PCR as a component of genomic methods
During the last decade, various applications of DNA microarrays have been used to assess the risk of a particular microbe by enabling detection and/or identification at the species, subspecies or strain level, or presence of virulence genes (reviewed in (Shwed et al., 2007)). However, DNA amplification is rarely a technical component of these studies. However, as will be described in section 3.1, novel PCR amplification strategies are a component of the workflow for high throughput sequencing platforms.

Miniaturization of PCR
Arguably, the major PCR advancement of the last decade has been the development of miniaturized and parallelized platforms. Whereas previously PCR reactions were typically carried out at the microlitre scale, new configurations have enabled femtolitre scale reactions. In turn, higher throughput and cost efficiencies have been achieved.
One miniaturization has been achieved by reaction entrapment in thermodynamically stable "water in oil" nanoreactor microemulsion systems, such as reverse micelles, as described for enzymatic reactions (Klyachko and Levashov, 2003). These emulsions are easily prepared and stable under a wide variety of temperatures, pH and salt concentrations. The smallest droplets rival the scale of bacteria with diameters of less than one micrometre with volumes in the femtolitre scale.
Emulsion PCR was first reported for the directed evolution of heat-stable, heparin insensitive variants of Taq DNA polymerase (Ghadessy et al., 2001). The concept of emulsion PCR was to disperse template DNA into a water in oil emulsion such that most droplets contain a single template and only a few droplets contain more than one template. Amplification was carried out within the drops by PCR, so that each droplet generated an amplified number of clonal copies.

Convergence of miniaturized PCR with other technologies
During the last decade, advancements have been made in the engineering of microfluidic scale devices that integrate multiple analytical steps into "laboratory on chip" systems (as reviewed in (Liu and Mathies, 2009)). These devices allow the generation and manipulation of aqueous microdroplets at high rates and with high fidelity manipulation in microfluidic channels. PCR-based genetic analysis and sequencing can now be carried out at the picolitre to nanolitre volume scale, with the advantages of decreased thermal cycling times and reagent consumption along with increased throughput.
Microfluidic droplet PCR has been reported to allow 1.5 million parallel amplifications for target enrichment of loci in the human genome (Tewhey et al., 2009). In this instance, microfluidic chips were designed to merge 20 picolitre droplets that contain about 3 picograms of biotinylated fragments of template DNA (2-4 kb) with droplets that contain a pair of PCR primers that amplify specific sequences. This platform allowed a yield of more than one million merged droplets that are subjected to PCR. At the end of the amplification reaction, the emulsion is broken. After centrifugation, the aqueous phase, that contains the PCR products from all the droplets, is subjected to a second generation sequencing strategy.
During the last decade, several commercial second-generation sequencing platforms have been developed and these feature cyclic array sequencing strategies, involving new variations of PCR. In both cases, amplification of densely arrayed amplicons is achieved, in order to serve as features for in situ sequencing and imaging-based sequence by synthesis data collection (more detailed descriptions of second generation sequencing platforms are reviewed in (Shendure et al., 2011)). Common to all strategies, the first step is the in vitro www.intechopen.com generation of a shot gun genomic library, by the random fragmentation of DNA and the ligation of universal adaptor sequences. Afterwards, in vitro clonal amplification is carried out by one of two principal types of PCR, which generate template for sequencing. Table 1 shows how various commercial platforms use PCR to derive features that are sequenced.
Emulsion PCR is carried out as described above (section 3.0 and shown in Fig.1 A, B), with the exception that paramagnetic beads that are bound to one of the PCR primers on their surface, are used (Dressman et al., 2003). These beads allow the solid-phase capture of clonally amplified PCR amplicons from each emulsion PCR compartment. For some commercial pyrosequencing platforms, beads are then deposited on microfabricated arrays of picolitre scale wells to allow immobilization and in situ pyrosequencing.
Bridge PCR (Adessi et al., 2000;Fedurco et al., 2006) involves the use of spatially distributed oligonucleotides that are covalently attached to a support (shown in Fig. 1 C,D). A DNA library is hybridized as single stranded DNA to the support. Immobilized copies of the library are synthesized by extension from the immobilized primers. After denaturation, the template copies are able to loop and hybridize to an adjacent oligonucleotide on the support. Additional copies of the template are synthesized and the process is repeated on each template so that clonal clusters, each with about 2000 molecules are generated.

Second generation sequencing from microbial mixtures
In recent years, complex microbial communities, such as those of the human gut intestinal tract, or those associated with biofilm infections, have been analyzed by second generation sequencing of shot gun libraries derived from either metagenomic DNA, or PCR amplified variable 16S regions amplified from metagenomic DNA prepared from a microbial mixture (Arumugam et al., 2011;Dowd et al., 2008).
Second generation platforms allow economies of scale in sequencing. PCR amplified products can be characterized without cloning, which saves time and costs. Also, the estimated costs per megabase of derived sequence are lower for the new platforms compared to first generation sequencing (Shendure et. al. 2011). Lastly, multiplexed runs, derived from 16S rRNA coding sequences from several communities, are feasible by using unique sequence barcodes during amplification (Hamady et al., 2008).
It has been proposed that sequencing of individual variable regions is sufficient for taxonomic differentiation of bacterial mixtures (Liu et al., 2007). The sequence read lengths of second generation platforms are generally short, but several new models have shown greater read lengths (Liu et al., 2008). On the other hand, direct sequencing of metagenomic DNA has been proposed to be less biased than that of PCR amplified DNA, due to lack of 16S primer bias (von Mering et al., 2007).

Fig. 1. PCR advancements towards second-generation sequencing Panels A.B: Emulsion PCR
Panel A) A shot-gun DNA library is ligated to adaptors (blue and red bars), diluted, and PCR amplified in a water in oil emulsion, within aqueous microdroplets. The droplets contain streptavidin coated beads that carry one of the biotinylated PCR primers tethered to beads. Panel B) Where DNA is amplified in the presence of a bead, several thousand copies of the template will be captured. Panels C,D: Bridge PCR Panel C) A shot-gun DNA library is ligated to adaptors, made single stranded and hybridized to PCR primers that are immobilized with flexible linkers on a substrate. Bridge amplification occurs when primer extension occurs from immediately adjacent primers. Panel D) Immobilized clusters of about one thousand amplicons are formed after successive cycles of extension and denaturation.
The critical analytical step of taxonomic analyses of microbial diversity analysis is known as binning, where the sequences from a mixture of organisms are assigned phylogenetic groups. However, the outcome of binning results may range from kingdom level to genus level assignment, depending on the quality of data and the read length of data (Yang et al., 2010). One of the binning strategies in use is based on classification of DNA fragments based on sequence homology, using publically available reference databases such as Basic Local Alignment Search Tool (Huson et al., 2007;Meyer et al., 2008). The second strategy involves similarity to protein families and domains, such as in the phylogenetic algorithm CARMA (Krause et al., 2008).

www.intechopen.com
Collectively, these identification approaches are limited by the use of reference databases of known species and genes from readily cultivated microbes. As a consequence, species within a microbial community that lack a reference sequence will remain unidentified.

PCR analysis of single cells
The analysis of complex mixtures of environmental bacteria will benefit from microfluidic digital PCR analysis that involves single cell sorting from mixtures of bacteria. Single bacterial cells can be isolated by various technologies, including: optical tweezers, micromanipulation, FACS, serial dilutions, or laser capture microdissection. In turn, experimentation that involves retrieving "needles in a haystack", such as searches for microbes featuring particular genes are facilitated by microfluidics technologies (Baker, 2010).
Characterization of environmental bacteria of the 1 microlitre volume termite hindgut model, exemplify the potential of cell sorting and PCR. This microenvironment contains about 10 6-10 8 microbial cells, comprised of unculturable species not detected in other environments (reviewed in (Hongoh, 2010)).
Otteson et al. (Ottesen et al., 2006), applied a microfluidic digital PCR characterization approach for the termite bacteria. In this study, individual cells were partitioned in a microfluidic array panel and served as templates for the simultaneous amplification of both rRNA and metabolic genes of interest. The digital PCR aspect involved ensuring that the partitioning was into reactions that contained an average of one template (bacterial cell) or less (Sykes et al., 1992). Retrieved PCR products from individual chambers allowed sequence analysis of both genes by standard methods and allowed the determination of new bacterial species that contribute to metabolism. More recently, microfluidic digital PCR was used to associate particular viruses that infect the bacteria of the termite gut, without culturing either the viruses or the hosts (Tadmor et al., 2011). Here, amplification of both rRNA gene and a viral marker gene was carried out from a PCR array panel containing individual microbes.

Whole genome sequencing from individual cells
Genomic sequences provide the most absolute indication of genetic variation and virulence potential for a bacterial strain. The documentation of the complete nucleic acid sequences of high priority beneficial and detrimental microorganisms in public databases are efforts that can greatly aid the identification of unknown strains. In studies involving closely related bacterial strains, shotgun library sequences can be assembled by mapping the reads to a reference genome.
Direct single bacterial cell genome sequencing can be carried out by multiple displacement amplification, using individually lysed bacteria and the few femtograms of DNA present in bacterial cells in order to generate template for shotgun sequencing. This reaction involves the use of 29 DNA polymerase and random primers to amplify DNA templates under isothermal conditions (Dean et al., 2001).
Genomic sequencing from individual uncultured bacterial cells was first shown by Raghunathan et al., using E. coli cells that had been isolated by flow cytometry (Raghunathan et al., 2005). This report illustrated contamination as a technical challenge when working with individual microbial cells. The reaction involves random primers in order to initiate polymerization and this can result in amplification of contaminating DNA. In the case of poorly characterized or novel biotechnology microbes, the non-target DNA could confound conclusions about the target organism. In addition, there are biases introduced by multiple displacement amplification, particularly with the use of small input quantity of DNA. Segments of the chromosomes have been observed to be preferentially amplified. As well, chimeric rearrangements of DNA result from the linking of noncontiguous chromosomal regions (Zhang et al., 2006).
Despite these challenges, there have been recent reports that are more encouraging about the acquisition of finished genomic sequence derived from a single bacterial cell (Woyke et al., 2010). Multiple displacement amplification artifacts have been overcome with new computational algorithms, that can compensate for amplification bias and chimeric sequences, using short sequence reads (Chitsaz et al., 2011).

Conclusions and future challenges
The safe use of biotechnology microbes for public health and in the environment requires knowledge of the identity and genetic potential of these organisms. In the first decade of the 21 st century, amongst the genetic tools available for genetic characterization, PCR remains a cornerstone. Advances in miniaturization and parallelism of PCR have enhanced throughput and enabled second generation sequencing platforms. These technological advancements have been linked to progress in single cell microbial genomics, whole genome sequencing and the characterization of microbial mixtures. Collectively, these developments have direct implications for the safety assessments that are carried out by industry and governments.
These recent technological advances will allow new human and environmental surveys. As an example, movements of genes amongst microbes by horizontal gene transfer mechanisms may be tracked. Environmental surveys of the movements of particular nucleotide sequences are now possible by metagenomic methods. Culture-independent methodology for genetic analysis will allow greater throughput. However, at present, computational hurdles remain for the wide-spread implementation of such technology.
Miniaturization has been a hallmark of progress in electronics and computing. By this measure, PCR miniaturization that has taken place to date is of relatively low order. At the same time, the complexity of biotechnology microbes developed for commercial applications is increasing. The advances in PCR and genomic technologies must be considered in parallel with the technical advancements that have been made towards the de novo construction of synthetic microbes. High throughput, high efficiency microfluidic devices can enable the encapsulation of novel genetic material in abiotic chassis (Szita et al., 2010). PCR and sequencing advancements will remain important for microbial genetic characterization.

Acknowledgements
Drs. Guillaume Pelletier and Azam Tayabali are thanked for constructive criticism of the manuscript. Open access charges are supported by the Canadian Regulatory System for Biotechnology Fund. www.intechopen.com