Nucleic Acid-based Diagnosis and Epidemiology of Infectious Diseases

In this chapter, the immense contribution of nucleic acid discovery to the diagnosis and molecular epidemiology of pathogenic microorganisms and its relevance for veterinary and human health will be discussed. The development of nucleic acid detection, amplification, and sequencing techniques, principally after the introduc‐ tion of polymerase chain reaction (PCR), allowed the improvement of different strategies to diagnose and to quantify infectious microbiological agents in a variety of organisms and biological samples. Pos-PCR associated techniques such as fragment enzyme restriction and sequence analysis permit the determination of nucleic acid sequence diversity to detect drug resistance, to associate pathogen genetic markers with disease outcome, and to predict temporal and spatial distribution of microorganisms which can be used to prevent and treat infectious diseases efficiently. The principal methods used in the detection of nucleic acids, the advantages and drawbacks of single- and multiple-copy genes for use in diagnosis by amplification, and the application of pos-PCR techniques in drug resistance identification are dis‐ cussed in Section 1.1. Section 1.2 discusses the sequencing methods used to recog‐ nize genetic variability, the implication of this variability to pathology and virulence, and the importance of genetic variability determination in disease control and vaccines. The contribution of molecular diagnosis and epidemiology for the treatment and prevention of infectious diseases is also considered. Multilocus mi‐ crosatellite typing reveals a genetic relationship but, also, genetic differences be‐ tween Indian strains of Leishmania tropica

Classical clinical microbiological diagnostics for protozoan and bacteria rely on microscopic examination with different staining methods, culture isolation, morphological and physiological/biochemical characterization. For viruses, the conventional diagnosis is based on culture isolation of cell monolayer, serological assays, and electronic microscopic examination. These standard diagnostic methods are very useful, and culture isolation associated with other analytical procedures to identify microorganisms continues to be the gold standard method since it enables drug sensitivity tests. However, these diagnostic techniques are unsuitable for several microorganisms presenting fastidious growth characteristics, low morphological and physiological specificity, and requirement of specific biosafety infrastructure. The same is applied to viruses that, even after culture isolation, can only be visualized by electronic microscopy, which is expensive and needs specialized personnel to maneuver. Thus, nucleic acid detection by hybridization and amplification technologies opened a new and innovative period for microbial diagnosis. After the first report on the application of polymerase chain reaction (PCR) in clinical diagnosis of the human immunodeficiency virus (HIV) [1], several other infectious organisms were detected by the same technique and its variations (Section 1.1.1).
In all molecular detection techniques, the gene target is the main device, and its choice depends on the infectious agent and the host genomic and epidemiological characteristics. For a specific diagnosis, the gene of choice has to be specific to the infectious agent and should not crosshybridize with the host genome and other organisms living in the same microhabitat. A sensitive diagnosis depends on the amount of gene target copies in the biological sample and to the physico-chemical characteristics of the constituents implicated in the detection and amplification of gene target. A discussion about the use of single-and multiple-copy genes for specific gene target amplification will be presented in Section 1.1.2.
The detection of drug resistance is dependent on sensitivity tests performed on the isolated microorganism, which is time consuming; however, for several uncultivable pathogenic agents, it is not feasible. The investigation of nucleotide mutations associated to drug resistance allows the development of gene target amplification and post-amplification analytical techniques, such as enzyme restriction analysis and sequencing, to be used directly on the biological infected sample, thereby enabling fast detection of drug resistance and consequently an efficient treatment. The same strategy can be used to identify organisms from closed biological groups, with identical morphological characteristics on microscopic examination and with different genetic features. Examples of nucleic acid drug resistance detection techniques used in microbiological clinical laboratories will be presented in Section 1.1.3.

Nucleic acid detection methods
Originally, the nucleic acids were detected mainly by gene cloning strategies and hybridization procedures [2], which are laborious and time consuming, being restricted to scientific investigation. The use of nucleic acid detection for the diagnosis of genetic and infectious diseases in clinical laboratories was possible after the advent of PCR [3], a technique based on amplification of nucleic acids by means of thermostable polymerase enzymes and a thermocycler. By this method, typically, DNA duplex templates are melted at high temperatures, and two oligonucleotides complementary to the flanking gene target sequence are specifically annealed in a strict temperature dependent on the primer sequence and length. Variations of the technique includes the utilization of a set of oligonucleotides in order to identify different organisms or variants in a single reaction on a biological sample [4][5][6][7]. Also, to increase sensitivity and specificity, a nested and hemi/semi-nested PCR can be performed using an initial PCR product as template [8,9,6,10].
Besides PCR, the most largely used nucleic acid amplification device, isothermal amplification techniques based on enzymes required during the cellular process of DNA/RNA synthesis were also developed and are accessible for diagnosis and scientific investigation as transcription-mediated amplification (TMA), nucleic acid sequence-based amplification (NASBA), signal-mediated amplification of RNA technology (SMART), strand displacement amplification (SDA), rolling circle amplification (RCA), loop-mediated isothermal amplification of DNA (LAMP), isothermal multiple displacement amplification (IMDA), helicase-dependent amplification (HDA), single primer isothermal amplification (SPIA), and circular helicase-dependent amplification (cHDA). The description of these methods is out of the scope of this chapter, and several revision articles can be consulted for more information [11][12][13].
Subsequent to amplification, PCR products are traditionally visualized by an electrophoresis in agarose or acrylamide gels following staining with fluorescent dyes and exposition to UV light. By this method, the specificity of the amplified nucleic acid fragment is determined by size, directly or after digestion with restriction enzymes in order to get more accurate information about the product obtained. The specificity of PCR products can also be determined by sequencing (Section 1.2.2) and hybridization with chemically, fluorescently, or radioactively labeled specific probes as exemplified by the means of human papillomavirus (HPV) genotyping [14,15].
A novel improvement in nucleic acid amplification was achieved after the development of PCR in the presence of fluorescent dyes, enabling the detection of products by amplification cycle and, at real time, the real-time PCR [16]. Thermocycles developed for the real-time PCR are associated to a fluorescence detection system and software to facilitate interpretation of data at real time. Although conventional PCR allows the amplification of DNA fragments as long as 20 Kb, the size of DNA fragment obtained by the real-time PCR is not longer than 150 bp which cannot be used for pos-PCR analysis. The specificity of a real-time PCR product is determined by the use of Carl Wittwer's melting curve analysis [17] or by using dual fluorescently labeled probes [18]. These improvements allowed the use of real-time PCR not only to detect genotype [19,20] but also to quantify the amplification product, to determine gene copy numbers of pathogenic microorganisms and the expression of genes associated to infection reactivation, virulence, genetic modification, etc. [21][22][23][24].

Advantages and drawbacks of single and multiple copy genes
The specificity and sensitivity of nucleic acid amplification techniques depend on the target gene selection and the design of oligonucleotides. Once the gene target is elected, there are several free available bioinformatic tools to check for the better sequence to be used in PCR and real-time PCR reactions, mainly to avoid the formation of oligonucleotide self-hairpin structure and primer−dimer arrangements. Thus, the principal challenge to obtain an accurate molecular diagnosis based on nucleic acid amplification is the selection of the target genomic sequence.
The biological features of the microorganism to be diagnosed are of critical importance to direct the choice of an accurate gene target, including the life cycle, specific metabolic pathways, genomic organization, and evolution. Knowledge on life cycle is principally important when the purpose is the detection of stage-specific forms of a microorganism, as gametocytes of Plasmodium in human blood [25]. In this case, the real-time PCR can detect the expression of gametocyte-specific genes, and patients at risk to transmit malaria can be identified.
Particular metabolic pathways of a microorganism are usually associated to specific genes which can be used as markers for the recognition of infectious agents [26-28]. Commonly, these genes are in single copy in the genome and are highly expressed and evolutionarily conserved, being excellent targets for real-time PCR diagnosis [29], as it is a highly sensitive nucleic acid detection method.
Genomic organization and evolution are of principal interest to select gene targets that are not in strong structured regions of the genome [30] and whose sequence is beneath strong selective pressure, being highly conserved. Usually, multiple-copy genes as ribosomal and transporter RNA coding sequences are used as template for diagnostic methods in order to obtain high sensitivity with specificity. In this case, it is important to consider that RNA coding sequences possess a high capability to form hairpins and are in a structured region of the genome, and thus, even in high copy number, they can present a low sensitivity in PCR reactions. Also, as ribosomal and transporter RNA coding sequences are very conserved among all groups of organisms, decreasing sensitivity of the diagnostic test can occur depending on the selected gene region. Thus, it is very important to perform a comparative analysis of the diagnostically selected gene among phylogenetically closed groups of organisms and the host. Mitochondrial genes are also largely used as gene targets due to their high copy number in eukaryotes; however, their similarity with bacterial genome also has to be observed in order to obtain specificity [31].

Identification of drug resistance
After amplification of a target gene by PCR, several molecular strategies can be used to detect nucleotide mutations, such as sequencing, restriction enzyme analysis, hybridization, etc. This approach is useful to detect mutations associated to drug resistance directly on biological samples without the requirement of culture isolation. Some antibiotics act in bacterial ribosome, and the investigation of point mutations in ribosomal RNA coding sequences of cultivable bacteria can be extended by Homology to fastidious growing pathogenic species. One example is the fastidious cultivable bacteria Helicobacter pylori, the etiological agent of gastritis and peptic ulcer diseases. There are several real-time PCR and pos-PCR methods for clinical applications in order to determine resistance against the most important antibiotics used for H. pylori treatment, including clarithromycin [32, 33] and tetracycline [34,35].
The same kind of methodology is used to investigate genotypes of microorganisms associated to pathogenicity, virulence, drug resistance, etc. As an example, genotype identification of HIV [36,37] and hepatitis C virus (HCV) [38] resistant to treatment with different inhibitors of viral protease, directly on clinical samples, allow correct treatment of the patients. Another example is the differentiation of the protozoa Entamoeba histolytica (pathogenic) from E. dispar (nonpathogenic) and E. moshkovskii (non-pathogenic) in the intestinal tract of human, which can only be performed genetically [39]. The correct identification of E. histolytica avoids unnecessary treatment which, besides being expensive and capable of causing side effects to the patient, permits selection of drug-resistant species.

Epidemiology
Nucleic acid approaches have improved epidemiology, a science that deals with etiology, distribution, natural history, and control of diseases in humans [40]. In this section, some aspects of the molecular epidemiology of infectious diseases will be discussed, including the determination of etiological agents of diseases, association of genetic markers to transmission, treatment efficiency, clinical outcome patterns, applicability of genetic diversity knowledge in vaccine development, and the control/prevention of transmission of infectious agents.
Several genotyping techniques are mentioned in Section 1.1, and the same approaches can be used in order to determine the genetic variability. However, the development of enzymatic nucleic acid sequence determination with the use of chain-terminating inhibitors by Sanger [41] and its rapid automation allowed a prompt availability of gene sequences of several infectious agents and their hosts. More recently, next-generation sequencing methods associated to powerful bioinformatic tools have made complete sequences of small genomes of pathogenic virus and bacteria accessible during the occurrence of an epidemic or outbreak. These techniques will be briefly presented in Section 1.2.1.
Epidemiological studies on the association of pathogens and host genetic variability to disease susceptibility and pathogenicity/virulence can help to prevent and treat several infectious diseases (Section 1.2.2.). Moreover, the determination of genetic variability of infectious agents, antigens, and host population are useful for vaccine development (Section 1.2.3).

Classic genotyping and next-generation sequencing methods
Different from the diagnostic methods, where molecular markers are desired to be conserved among all individuals of a species, for population genetic epidemiological studies, the most important marker characteristics are individual high variability and neutral evolution. Nucleic acid-based conventional genotyping of microorganisms can be carried out by hybridization and amplification approaches as already described in Section 1.1. Moreover, the most classical molecular approaches used for epidemiological studies include restriction fragment length polymorphism (RFLP), macrorestriction analysis [42], and alternative PCR and pos-PCR techniques such as rapid amplification of polymorphic DNA (RAPD), analysis of variable number of tandem repeats (VNTR) [43], variation in repeated short motifs (microsatellites), multilocus microsatellite typing (MLMT) [44,45], single-strand conformation polymorphism (SSCP) [46], etc. These techniques are valuable for epidemiological studies of bacteria and protozoans, considering their genomic organization, structure, and large size. The detection of single-nucleotide polymorphism (SNP) associated to specific populational characteristics can be easily identified by sequencing. Also, in majority of viruses, which generally present a small genome and lack of repetitive tandem organized sequences, complete genomic sequencing is the genotyping method of choice for epidemiologic studies.
Genomic sequencing became possible for many biological applications after the development of first-generation sequencing methods, comprising Maxam-Gilbert's base-specific chemical cleavage [47] and Sanger's enzymatic reaction with chain terminator nucleotides, which is widely used [41]. The improvement of Sanger sequencing method with the use of thermostable proofreading DNA polymerases, fluorescent chain terminator nucleotides, and laser-based equipment associated to capillary electrophoresis enabled its rapid automation and commercial availability. Therefore, Sanger's method allowed a revolution in genome sequence projects of pathogenic microorganisms and other organisms, with high accuracy and a relatively low rate [48][49][50][51][52][53]. The search for a more throughput sequencing technique to obtain large complete genomic sequences, more rapidly and without purification or cloning of the nucleic acid of a specific organism, leads to the development of second-and third-generation sequencing methods.
The second-generation method was commercialized by Roche laboratories, the 454 Life Science equipment, which was based on pyrophosphate detection during DNA synthesis performed on an array composed of wells in a fiber-optic slide [54]. By this method, each well is filled with one bead containing a single DNA molecule polymer produced by genomic sharing, and primed at 3' and 5' ends with short sequence adaptors. DNA polymer attached to a bead is immersed in an emulsion to be amplified by PCR in order to obtain a high number of molecules to generate enough luminescence signals to be captured by a charge-coupled device (CCD) imager coupled to the fiber-optic array. Before its attachment to the well, each amplified DNA polymer is denatured with specific enzymes. Each sequence cycle consists of pyrophosphate detection in each fiber-optic well, released after the addition of one of the four DNA bases and captured by a luminescent reaction system and read at real time. Each well of the array produces a collection of short sequences of approximately 40 bp which are assembled by software with the use of a prototype Sanger-generated genomic sequence available in several public databanks. Roche laboratories discontinued the 454 Life Science which was replaced by the improved FLX titanium XLR and XL+ capable of individual length read of 450 and 700 bp, respectively. Other second-generation systems, characterized by the production of short sequence arrays, are based on DNA synthesis with fluorescent terminator reversible nucleotides and exonucleases. A CCD imager collects the enzymatically cleaved incorporated specific terminator fluorescent nucleotide, added by cycles of sequencing on primed DNA fragments, produced by sharing and attached on a solid platform, where it is also amplified by PCR [55]. These technologies present certain limitations such as a relative high cost, low throughput, and low sequence accuracy for large genomes, although being valuable for virus genomes [56].
Third-generation sequencing methodologies are in development, based on a technology called single molecule real time (SMRT), which use biological nanopores, without the necessity to amplify the target template [57]. All these next-generation sequencing techniques have been improving, and certainly more accessible and accurate devices will be available in near future.

Genetic variability and its impact on pathology and virulence
Genetic variability determination in microorganisms, performed by different methodologies as exemplified in Section 1.2.1, allows the identification of infectious disease distribution in time and place, as well as the investigation of transmission patterns and clinical manifestation and progression in affected populations. Knowledge of these factors is used for infectious disease intervention and prevention strategies, as for example in dengue fever, caused by four distinct serotypes of dengue virus (DENV1-4), belonging to the genus Flavivirus, family Flaviviridae.
The clinical manifestation and progression of dengue patients vary from absence of symptoms to a severe disease, mostly characterized by plasma leakage with or without hemorrhagic signs [58]. According to several studies, symptoms and severity of dengue diseases are associated to viral load, host immunity conditions, and the occurrence of antibody-dependent enhancement by heterologous DENV antibodies which are related to the number of previous serotypes infections. DENV serotypes are related to genetic variation in the major antigenic envelope and pre-membrane structural proteins. Thus, the characterization of DENV genetic variation and evolution in these specific regions is of crucial importance to understand the disease progress in an affected population.
Molecular epidemiological investigations, with sequencing of DENV 240 nucleotides of the envelop and nonstructural 1 coding regions, revealed distinct genomic groups of DENV serotypes 1 and 2, according to the geographical and temporal virus population circulation [59], which was confirmed by complete genome sequencing [60]. The same strategy was used to genetically characterize groups of DENV-3 and DENV-4, with distinct evolutionary constraints which interfere in transmission and disease manifestation [61,62]. Knowledge of these characteristics has implications not only for control and disease management strategies but also for vaccine development.
Microsatellite genomic sequences are useful for the treatment and control of infectious agents by the identification of individual and group of clonally disseminated microorganisms. This kind of study enables the identification of relapses, drug-resistant pathogens, and clinically distinct variants, as investigated in protozoan parasites such as Leishmania [45,63] and Plasmodium [64,65]. The geographical dissemination of microorganisms can also be characterized by microsatellites studies [66,67] helping in the characterization of transmission patterns which can be used for control strategies.
Domestic and sylvatic animal sources of pathogenic microorganisms infecting human, with or without clinical outcomes according to genetic variants, can also be investigated by genotyping. Toxoplasmosis, an example of a zoonosis, is caused by Toxoplasma gondii, a protozoan parasite with high genetic variability, from the apicomplexa group. Attempts to identify genetic variants associated to specific animal hosts or clinical outcome fail; however, it is possible to identify the source of parasite infection by genotyping with the use of multilocus PCR and RFLP [68]. Using this methodology, a study on German patients revealed that ocular, cerebral, and systemic toxoplasmosis presented predominance of T. gondii from lineage type II, the same genotype found in parasite oocysts isolated from cats of the same geographic region [69]. The prevalence of T. gondii lineages presents specific regional distribution, being highly variable in South America [70], where the parasites are isolated from domestic and wild animals, including chiropterans [71]. Based on these studies, it is possible to recognize several potential parasite intermediate hosts and map the risk areas for toxoplasmosis transmission.
Chaga's disease caused by Trypanosoma cruzi, a protozoan parasite from the kinetoplastida group, is also an excellent example of eco-epidemiologic study using genotyping. Parasites are transmitted naturally to humans by the exposure of abraded skin to feces of triatomine bugs, and they circulate in the wild in different sylvatic animals and triatomine species. There are six principal genetic variants or discrete typing units of T. cruzi determined by multilocus PCR, RFLP, and sequencing analysis, associated to geographical distribution and transmission cycles [72][73][74]. The association of the outcome of Chaga's disease and specific parasite genotypes is still controversial; however, comparative investigation of T. cruzi genotypes in human infections, sylvatic bugs, and animal reservoirs, in rural areas near forest, which is common in South America, can reveal potential vectors and animal sources of infection.

Disease control and vaccines
The most effective means to control an infectious disease is a vaccine, which is still not available for malaria, trypanosomiasis (Chaga's disease, sleeping sickness, leishmaniasis), and most of arboviruses (dengue fever, chikungunya fever, etc.). The genetic variability of infectious agents, a rapid genomic evolution in case of viruses, and several escape mechanisms from host immune system selected along parasite-host interactions constitutes intricate drawbacks for vaccine development. The problem of genetic prompt variability of virus genome can be circumvented by systematic sequencing during epidemic periods such as the one successfully made for influenza vaccine recommendation each year.
Influenza vaccine is composed of attenuated viruses produced in eggs, every year, following the virus strain recommendation by World Health Organization, after monitoring genetic variation in the gene encoding the surface protein hemagglutinin due to its antigenic properties (http://www.who.int/influenza/vaccines/virus/recommendations/201502_recommenda-tion.pdf?ua=1). Molecular epidemiological studies performed in different geographical regions are used in order to obtain the recommended vaccine strains. Also, there are geographical and populational differences in viral antigenic changes which should be taken into consideration as described in China, where dominant antigenic influenza clusters change more frequently than in the USA/Europe [75].
Dengue virus consists of a particular challenging agent for vaccine development since its genetic variation is directly related to the severity of disease progress, due to the antibodydependent enhancement by heterologous previous DENV infection. Thus, geographical regional circulation knowledge of DENV virus is essential to the success of a vaccine [61,76]. Considering the high speed in data acquisition, second-and third-generation sequencing of DENV circulating during interepidemic period could help to predict serotypes and its genetic variants in next outbreak, contributing to vaccine development and testing.

Final Considerations
Economic and social impact of infectious diseases should be considered worldwide, principally in developing countries, including necessity of medical care, vector control, people morbidity with loss of working hours, and reduced tourism in affected areas. The increase in population, high people mobility, and the lack of an effective vaccine against the most disseminated infectious diseases, such as dengue fever, malaria, tripanosomiasis, etc., make these human threats an important public health concern. Nucleic acid technologies have been contributing to the knowledge of infectious disease transmission, distribution, clinical manifestations, and progression, helping in its control and management. Several examples are discussed in this chapter, and hopefully, it will stimulate readers to investigate these microbiological intricacies and magnificent parasite-host interactions, more profoundly.