Techniques commonly used for bacterial genotyping and identification .
The identity and clonal differences within bacterial populations have been broadly explored through PCR-based techniques. Thus, bacterial identification and elucidation of DNA fingerprinting have provided insights regarding their phenotypic and genotypic variations. Indeed, some diversity of rates may reflect changes among subpopulations that have their own ecological dynamic and individual traits on coexisting genotypes. Therefore, identification of polymorphic regions from nucleic acid sequences is based on the identification of both conserved and variable regions. Advantages of PCR-based methods are high sensitivity, specificity, speed, cost-effectiveness, and the opportunity for simultaneous detection of many microbial agents or variants. Fingerprint information might allow the tracking of certain outbreaks globally in several reference databases containing valuable genotyping information. In this chapter, we will review applications from Web resources and computational tools online for the designing of PCR-based methods to identify bacterial species. We will also focus on lab applications and key conditions for technique standardization.
- molecular biology
- molecular typing
- DNA fingerprinting
- bacterial identification
- DNA amplification
Bacterial culture is the conventional test to identify a microorganism which is based on the isolation and growth of live specimens . Currently, these methods have been considered the gold standard method for assessing the validity of new diagnostic methods. However, phenotypic-based methods are time-consuming tests, difficult to interpret, low reproducible between laboratories, expensive, and laborious. Several commercial polymerase chain reactions or PCR-based methods are available nowadays, and like in-house PCR assays, they use different target specific genes in a clinical sample to identify a pathogen .
Bacterial identification in many cases is performed through a fingerprint comparison against some reference genotyping databases. Thus, many organisms can be taxonomically classified and specifically differentiated according to several conserved genes which are recognized as “molecular clocks” . One well-known example is the ribosomal RNA (rRNA) gene, which is a good candidate due to its universal distribution and reasonably well conservation in sequence across evolution . Thus, a good method is feasible, rapid, and cheap and can be implemented in local settings from highly endemic areas of a certain infectious disease . Based on designing a proper PCR-based method, the challenge was addressed on serial steps to reach the expected aim.
2. Step one: choosing the PCR-based method
The first step is to select an accurate technique that provides enough genetic information about your model organism. Therefore, it is pivotal to outline a molecular method with high reproducibility, specificity, sensibility, and discrimination power. Some nucleic acid-based techniques with similar taxonomic ranges as other fingerprinting techniques for strain characterization include: restriction fragment length polymorphism (RFLP), pulsed-field gel electrophoresis (PFGE), amplified fragment length polymorphism (AFLP), random amplified polymorphic DNA (RAPD), multilocus sequence typing (MLST), arbitrarily primed PCR (AP-PCR), repetitive sequence-based PCR (rep-PCR), and internal transcribed spacer-PCR (ITS-PCR) (Table 1). These techniques are regularly used for classification at genus, species, and subspecies level and even for strain characterization .
|Sequencing of 16S rRNA genes|
|ARDRA, tRNA-PCR, DNA-DNA reassociation|
|RFLP, PFGE, AFLP, RAPD, MLST, AP-PCR, rep-PCR, ITS-PCR|
Molecular biologists are privileged to have a repertoire of tools which provide good molecular distinction which can resolve questions in the clinical settings and infectious disease control. Those methods are regularly performed for pathogen characterization and gave higher positive detection of target species than conventional methods . Therefore, interlaboratory reproducibility is an important feature to produce highly valuable information [8, 9]. Thus, genotyping would provide accurate data that allow the implementation of multi-user international libraries, for example, MLST for multiple bacterial species and spoligotyping for mycobacteria [10, 11, 12]. Surveillance studies have identified certain species through DNA fingerprinting which can acquire drug-resistance determinants or clones prone to global dissemination [13, 14]. Tracking certain genotypes is a key to control them in specific geographic areas .
Techniques based on DNA amplification include two main approaches: the first involves amplification of one or few fragments of specific regions followed by specific restriction (e.g., ARDRA). The new fragments can be resolved by gel electrophoresis which provides specific patterns for each strain. Besides, differences in the DNA sequences or genomic regions allow its identification or polymorphisms specific at genus, species, or strains level . On the other hand, unspecific probes or oligonucleotide arrays can be used for multiple amplification into the chromosomal DNA. Fragments are also separated by gel electrophoresis resulting in a band pattern or fingerprint that represents a specific genotype. Some available protocols included: arbitrary primers (e.g., RAPDs), double digest selective label (DDSL), extended primers in combination with low annealing temperatures (e.g., AP-PCR), or those based on DNA repetitive sequences from several bacteria genus (e.g., rep-PCR).
3. Step two: defining a gene target
To analyze the genetic variability, it is necessary to select an appropriate target which supports the required hierarchical level. Important insights remain regarding the choice of a “molecular clock” including traces of the evolutionary record from microbial diversity . Ribosomal genes become ancient molecules that harbor information with high phylogenetic value and can differentiate organisms at genus and species level . However, alternative core genes also defined as housekeeping have been proposed such as RNA polymerase beta subunit (rpoB), DNA gyrase alpha subunit (gyrA), glyceraldehyde 3-phosphate dehydrogenase A (gapA), GroEL genes (groE, groL), outer membrane protein A (ompA), and glucose-6-phosphate isomerase (pgi) ensuring the success for bacterial species definition [18, 19, 20]. Some of these genes are included in a multi-locus sequence typing (see further information in step ten).
Typing methods based on the 16S rRNA genes represent an accurate strategy for strain characterization because these genes harbor both conserved and variable regions that might delineate changes on a specific position on the bacterial ribosome leading to strain differentiation [4, 17]. Although, it is important to consider that multiple copies of the 16S rRNA genes are present in all bacterial genomes. 16S rRNA gene has been subjected to many phylogenetic studies including those related to bacterial definition species .
Sequencing of rRNA genes is the preferential method for phylogenetic reconstruction, nucleic acid-based detection, and quantification of microbial diversity . Thus, many genotyping approaches remain based on 16S rRNA gene analysis or ribosomal gene sequencing which still constitutes a gold standard for bacterial taxonomy . Therefore, it is possible to explore a sequence by searching against the Ribosomal Gene Database (RGD) release 11.5 which contains 3.356.809 16S rRNAs and 125.525 fungal 28S rRNA gene sequences by November on 2018 (
4. Step three: primer design for PCR-based methods
Once we have chosen a gene target, the next step is the design of specific primers to detect it in the DNA sample. In the last decades, PCR-based techniques have been successfully employed for the genetic characterization of many taxa of many pathogens [2, 27]. Primers for DNA amplification are short synthetic oligonucleotides which may be complementary to target sites on the template DNA. PCR is performed at different temperatures (denaturation, annealing, and extension) where efficiency is determined based on primer annealing . Some essential features have to be taken into consideration for accurate primer design:
Primers should contain guanine or cytosine, or both at the 3′-ends to increase the efficiency of oligonucleotide binding. Primers must form a stable duplex with target DNA at the annealing temperature.
Oligonucleotide should not be self-complementary to avoid the generation of secondary structures such as hairpins loops.
No complementarity between forward and reverse primers.
Melting temperature (Tm) defines the balance between the unbounded primer and free template compared with primer bound to the target DNA (50%). Tm is recommended to be among 42–65°C with an ideal temperature of 62°C. This parameter is a key because too high Tm can affect specificity and decrease the amount of PCR product (amplicon). If Tm is too low, unspecific products can be produced due to mismatch base pairing. Usually, Tm is set, 5°C below or 5°C above.
The optimal length of primers should be long enough to increase specificity (usually between 18 and 30 bases).
It is recommended to ensure a similar distribution of G/C and A/T content in the primers (40–60%).
For standard PCR, length of the amplified product should be between 200 and 1000 bp and for quantitative PCR in the range from 75 to 150 bp. Nevertheless, PCR products larger than 1000 bp may require additional time during the extension step (1 min/kb of PCR product).
Primers must be designed specifically in the target gene to avoid cross contamination with unwanted DNA sequences in the PCR. Typically, primers are designed, and sequences are analyzed in silico using BLAST analysis or others, to check for the specificity.
Many resources are available for primer and probe design to optimize the PCR method; therefore, the researcher might consider parameters including molecular weight, millimolar extinction coefficient, Tm, and prediction of secondary structure formation and magnesium chloride (MgCl2) concentration (Table 2). For some molecular biology procedures, it is recommended to design the forward primer less 35 pb downstream from the start site of the coding gene and also it applies for the reverse primer regarding the stop site. For example, in a sequencing protocol, the fragment size should not be large due to artifacts introduced on lecture sequence. Likewise, unclear results might increase concomitantly with larger fragments .
|Oligo®.net||This primer analysis software is useful to design sequencing and PCR primers. Also, user can check DNA and RNA secondary structure, dimer formation, false priming, and homology. Available at|
|CLC Main Workbench||It is commercial software from Qiagen. It provides basic and advanced sequence analysis in a user-friendly platform. Access at:|
|Primer Premier||It is a PCR Primer Design Software from Premier Biosoft company. Primer Premier’s search algorithm finds optimal PCR, multiplex and SNP genotyping primers with the most accurate melting temperature using the nearest neighbor algorithm. Available at:|
|Gene Runner||It is a free program for multiple sequence analysis including PCR, restriction analysis, sequencing primers, and hybridization probes. Available at:|
|Serial Cloner||Serial Cloner 2.5 (free) lets the user to find restriction sites, define ORF, calculate Tm of selected fragments, %GC, and scan sequence features. Download at:|
|Primer designing tool||A tool for finding specific primers on a PCR template developed by the National Center for Biotechnology Information (NCBI), U.S. Available at:|
|Primer3web version 4.1.0||Since the first release, this popular Web interface has assisted many researchers with PCR primer design. The newer version of Primer3 available at|
|Genefisher2||Genefisher2 is an interactive Web-based program for designing degenerate primers. Access at:|
|PrimerExplorer||This software is specifically for designing the primer sets for loop-mediated isothermal amplification (LAMP) method.|
|Tools for calculation of oligo properties||The following tools calculate Tm, molecular weight, and millimolar extinction coefficient (OD/μmol, μg/OD) for oligos:|
Oligo Calc: Oligonucleotide Properties Calculator available at
Oligo Calculation Tool
|Interactive tools provided by suppliers||New England Biolabs with multiple tools available at|
Thermo Fisher scientific provides the Oligo Design Tools which can be found at:
Eurofins offers a PCR Primer Design Tool at:
Sigma-Aldrich provides the OligoEvaluator™
OligoAnalyzer 3.0 from Integrated DNA Technology (IDT):
GenScript Real-time PCR (TaqMan) Primer Design
|Directory of different tools||This directory contains links for: calculation of oligonucleotide physicochemical parameters, PCR primers based upon protein sequence, PCR and cloning, PCR primers based upon multi-alignments, genomic scale primers, overlapping primer sets, short interfering RNA (siRNA) design, real-time PCR primer design, introduction of mutations, and primer presentation on the DNA sequence. Available at:|
5. Step four: in silico simulation of molecular biology experiments
Nowadays, many resources (Web servers and programs) are available to simulate PCR results, predicting expected bands and successful primer annealing [30, 31]. Although, in silico simulation of several PCR-based methods is possible by using tools to obtain theoretical PCR results with many bacterial species sequenced up to date . The list of target genomes is updated according to their availability at NCBI. Many experiments against prokaryotic genomes can be performed such as PCR amplification, restriction digest, and PFGE, PCR-RFLP, double digestion fingerprinting, AFLP-PCR, and other DNA fingerprinting techniques (
6. Step five: guidelines for standardization of PCR mix
Once, you have your DNA target and PCR primer sequences, you should set experimental conditions including water (HPLC-grade), 10× reaction buffer, MgCl2, dNTPs, primers (forward and reverse), sample DNA, and DNA polymerase . The 10× reaction buffer includes magnesium, thus it is optional to use separately as MgCl2. If so, typical MgCl2 concentration in a standard PCR should be between 1.5 and 2.0 millimolar (mM). When magnesium is too low, no amplicon might appear; but if it is too high, undesired amplicons would be observed as extra bands in the agarose gel. For dNTP, concentration should be 200 μM of each nucleotide. Regarding the Taq (obtained from Thermus aquaticus) DNA polymerase (further information in step 6), it is recommended the addition between 0.5 and 2.0 units per 50 μL mix (preferably 1.25 units). Primers work well at the default concentration (50 nM), but concentrations between 0.1 μM and 1 μM of each primer are recommended. Last, DNA sample should be used between 1 ng and 1 μg of genomic templates because higher amounts might reduce PCR product specificity .
7. Step six: choosing the DNA polymerase enzyme
DNA polymerase is an enzyme which synthesizes the new DNA strands. DNA polymerase was first isolated from T. aquaticus in 1976 . This enzyme has an optimum temperature between 75 and 80°C, which possess a half-life until 97.5°C during 9 min and can polymerize 150 nucleotides per second . When choosing a DNA polymerase, the researcher must consider key aspects such as specificity, thermostability, fidelity, and processivity. First, if specificity is low, low-quality amplicon would affect the yield product, sensitivity, and possible problems in downstream applications (e.g., cloning or protein expression). Second, regarding thermostability, consider using enzymes with a half-life above 90°C because of the denaturing step. Third, if the researcher needs amplicons with 100% similarity to DNA target, consider using high-fidelity DNA polymerases with significantly proof-reading activity. Last, processivity reflects the rate and speed of the reaction from the enzyme. Thus, processivity should be considered in case of long templates, self-complementary targets, high G/C content, and samples containing PCR inhibitors including if the amplicon is accumulating during later PCR cycles [29, 36].
8. Step seven: setting the PCR conditions
The PCR runs in cycles composed of three called steps: denaturation, annealing, and extension. For default, PCR includes between 25 and 35 cycles per reaction. The denaturation step produces single-stranded DNA and usually is performed initially at 95°C for 2 min . The following step is the primer annealing which pair-base primer with the complementary DNA template and generally is carried out considering the primer’s Tm of both PCR oligos. Third, for extension step is recommended to set 1 min per 1000 bp (or 30 s per 500 bp) of the amplicon. Larger PCR products (>3 kb) may require longer extension times. The extension is usually performed at 72°C which is considered the optimum temperature for thermostable DNA polymerases. The standard PCR protocol for a 500 bp amplicon includes: an initial denaturation at 95°C for 2 min, followed by 25 cycles of denaturation at 95°C for 15 s, annealing at 55°C for 15 s, and extension at 68°C for 45 s. One additional cycle of final extension is at 68°C for 5 min [28, 29].
9. Step eight: setting specificity and sensitivity of the PCR method
To test assay specificity, it should be assessed against many related microorganisms. Potential cross-reactivity with DNA contaminants in the sample should also be investigated especially when the method applies to natural populations . This issue is essential, particularly when the new method is compared with traditional techniques. Specificity is first tested in silico using the BLAST tool . Then, specificity can be assessed in vitro by the PCR amplification of genomic DNA purified from taxonomically related species. Regarding sensitivity, defines the detection limit of the minimum of DNA target in a sample. This issue is relevant when it is difficult to obtain cultures or when the low number of bacteria cannot be detected in other diagnostic technique .
Identifying the bacterial species related to clinical phenotypes requires a method to cluster fingerprints into groups which are likely to share most genotypic and phenotypic traits . Instead, species have been genotyped by measuring genetic variation in the number of repetitive genetic elements in the direct repeat region or detection of polymorphic sequence . These techniques have identified clonal groups of isolates that each appears to be related through a common ancestor .
10. Step nine: evaluation of the amplicons
Experimental validation of PCR results entails two possibilities: the first option is to load and run the PCR product on an agarose gel testing the expected sizes with the suitable molecular weight marker. In the second place, it is to sequence the amplicon evaluating its sequence identity compared to the DNA target [28, 29].
11. Step ten: comparing a query band pattern or DNA sequence against databases
Genotyping programs rely on the collection and analysis of large quantities of data. Control infection programs are implementing genotyping programs for comparing against a database. Central databases for isolate tracking, laboratory results, and epidemiologic data are essential. Because cluster investigations are an epidemiologic activity, the infectious disease programs should maintain the principal databases for spread analysis and control measurements . The information in these databases can enable infectious disease programs to identify easily patients with matching genotypes and epidemiologic links . Today, information available regarding bacterial genotyping at both traditional MLST and whole-genome sequencing (WGS) are available (Table 3) [10, 15, 45, 46, 47]. For more applications, public resources also store primer information for quantitative gene expression analysis or comparing with previous reports [48, 49, 50].
|PubMLST||This database contains more than 140 MLST allelic profiles and sequences.|
BIGSdb software runs in PubMLST to store and analyze sequence data for bacterial isolates.
|MLST 2.0||Multilocus sequence typing from an assembled genome or from a set of reads. It is brought by the Center for Genomic Epidemiology|||
|LOCUST||It is an automated classification program that allows users to customize the typing of microbial isolates from whole-genome sequencing data.|||
|BacWGSTdb||This database was designed for genotyping and source tracking bacterial pathogens.|||
|PrimerBank||Databank of PCR primers for gene expression detection and quantification (real-time PCR). User can search by using GenBank Accession, NCBI protein accession, NCBI Gene ID, Gene Symbol, Primer-Bank ID or Keyword (gene description) or blast a gene sequence.|||
|RTPrimerDB||It is a public database for primer and probe sequences used in real-time PCR assays.||[48, 49]|
|RUCS 1.0||Rapid identification of PCR primers pairs for unique core sequences.|||
12. Variants of the PCR methods
Beyond PCR use in the lab, availability of an improved version of thermocyclers, dyes, primers, probes, and DNA polymerases have extended applications. Many PCR variants allow quantify gene expression, improving diagnostic sensitivity and genotyping without further procedures as restriction analysis (Table 4) [51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64]. Advanced technique such as digital PCR (dPCR) has showed to improve sensitivity and reliability until single-cell applications and tolerance to PCR inhibitors such as chelating agents . Finally, another common application of several PCR-based techniques (conventional PCR, RT-PCR, qPCR, and LAMP assays) is the detection of antimicrobial resistance against first-line drugs or even last-resort antibiotics .
|In house PCR||Conventional PCR targeting a single gene||Detection of a pathogen in a clinical sample|||
|Nested PCR||Two pairs of primers are used to amplify a fragment. First pair amplifies similar to a conventional PCR. Second pair is nested within the first fragment||Increase specificity of an amplification reaction when targeting a gene|||
|Multiplex PCR||This PCR variation enables the simultaneous amplification of many targets in a single reaction by using over one pair of primers. Amplicon sizes should be different or be labeled||Screening for a set of genes at once in a DNA sample. Analysis of microsatellites and SNPs|||
|Real-time PCR or quantitative PCR||It is an assay that monitors the fluorescence emitted during the reaction as an indicator of amplicon production at each PCR cycle (real-time) as opposed to the endpoint detection||DNA quantification in a sample. Level of gene expression. Copy number variation. Genotyping. Multi-species analysis|||
|Inverse PCR||In this PCR, primers are oriented in the reverse direction of the usual orientation. The template for the reverse primers is a restriction fragment that has been self-ligated||Cloning of sequences flanking a known sequence. Amplification and identification of flanking sequences|||
|Arbitrary Primed PCR (AP-PCR)||This technique uses primers whose nucleotide sequence is randomly chosen. Amplification occurs under low stringency conditions||DNA fingerprinting||[55, 56, 57]|
|Reverse transcriptase PCR||This PCR transcribes RNA into DNA. It allows the synthesis of complementary DNA (cDNA)||Evidence regarding the transcription of a gene. Complementary DNA synthesis|||
|Loop Mediated Isothermal Amplification (LAMP)||This PCR variant runs at a single temperature (usually 63–67°C) and requires 6 primers, and it is amenable to visual detection. Amplification occurs in less than 30 min||Point of care methods for pathogen screening|||
|Droplet PCR||It includes a separation step of sample into multiple compartments so that only few molecules are present in each partition. Thus, each droplet will be an independent PCR||Rare species detection and mutations with low frequency|||
13. Educational tools for beginners
Biotechnology students face challenges understanding molecular biology principles and techniques. E-learning resources simulate a real environment where students create a real experience when running protocols in the lab [66, 67]. The genetic science learning center provides a flash-simulation of PCR principles useful for beginners (
14. Case study: identification at species and subspecies level
PCR-based detection based on the conserved regions of the 16S rRNA sequence of bacterial pathogens is currently performed by several groups [4, 18, 21]. The rRNA at SSU contains segments that are conserved in species, genus, and kingdom level. In this case, Klebsiella pneumoniae is divided into three subspecies: K. pneumoniae pneumoniae, K. pneumoniae rhinoscleromatis, and K. pneumoniae ozaenae. All together are phenotypically closely related and difficult to differentiate based on conventional tests .
The 16S rRNA gene sequences from K. pneumoniae subspecies were retrieved in FASTA format and aligned by using MACAW program (download link
In another case, it was the implementation of in-house PCR method for tuberculosis diagnosis . This molecular method might be an important tool in high-incidence areas due to its speed, sensitivity, and discriminatory power overcoming conventional methods (acid-fast stain and culture). The method was designed based on the IS6110 gene specific for Mycobacterium tuberculosis complex and was successfully tested in sputum, bronchoalveolar lavage fluid, blood, gastric fluid aspirate, urine, cerebrospinal fluid, ascitic fluid, and abscess secretions. The method improved diagnostic accuracy and confirmed to be fast, low cost, and feasible and can be implemented in a middle-income resource setting .
Some bacterial pathogens may be undetectable by traditional culture methods due to their nutrient requirements, growth conditions or the bacterial inoculum per sample. Therefore, PCR emerged as the effective method which overcomes the detection limit of certain pathogens in clinical samples. The success in the PCR experiment implies planning and results prediction with available tools and resources. Web-based tools and programs are useful for primer design, calculating accurate thermodynamic and physicochemical parameters, changing the thermal cycling protocol, and performing a good experimental design. Success in the PCR-based protocol depends on performing an accurate in silico simulation which would allow the optimal selection of reagents and test conditions and to avoid troubleshooting on inefficient reactions. Recommendations in this chapter might enable the researcher to customize and troubleshoot a wide variety of PCR-based methods. Hence, PCR remains as a versatile technique in molecular biology that allows changes in adjustable standard protocol to any gene target choosing the most suitable option for pathogen identification.
We would like to acknowledge the Office for Research Funding from Universidad de Cundinamarca.
Conflict of interest
The authors declare not to have any conflict of interests.