Open access peer-reviewed chapter - ONLINE FIRST

Metagenomics and Pandemic Viruses

By Paulo Vitor Marques Simas and Clarice Weis Arns

Submitted: June 25th 2020Reviewed: August 22nd 2020Published: September 21st 2020

DOI: 10.5772/intechopen.93687

Downloaded: 26


Humanity’s history contains many pandemic reports and now the scientific community has the possibility to identify the pathogens before the disease emergency. In this perspective, it is essential to carry out large-scale epidemiological studies in key animals that are in constant contact with humans. For this, the next generation sequencing (NGS) by the metagenomic approach—genetic material recovered directly from samples without previous amplification—is able to reveal the hidden microbial diversity. Metagenomes’ work aims to contribute to the facilitation of epidemiological studies through the adoption of simple effective strategies for the pathogens’ identification, understanding the evolutionary dynamic of them before the pandemic time. Here, we have presented some examples related to the successful metagenomic approaches and the continuous advice of the researchers to identify viruses and other possible pandemic pathogens.


  • environmental genetic material
  • phylogenetic network
  • biogeography
  • one health

1. Introduction

Emerging pathogens (those that have recently been introduced, discovered, or recognized; that have recently evolved; or that have increased in incidence through geographic expansion or adaptation to a greater diversity of hosts—spillover; or that have shown changes in their pathogenic properties), especially viral agents, present a unique challenge for science and medicine because little is known about them before they cause epidemics from zoonotic sources. Zoonotic transmission can occur through an overflow event from one animal species to another, eventually causing infections in humans as well. For most of these viruses, therapies and/or vaccination strategies have not been developed, and therefore, clinical treatment options for infected patients are limited to nonspecific supportive therapy (adapted from [1]).

In general, the epidemiological studies are conducted passively, after the establishing horizontal transmission of each viral infectious disease or after spillover events. Such a conduct will certainly not be sufficient to meet the new demands for infectious diseases. Therefore, one health, a science that promotes ecological health through the interaction between human, animal, and environmental health, arises with the proposal that all health should be thought together so that it has a sustainable condition of existence.

In this perspective, it is essential to carry out large-scale epidemiological studies in key animals such as bats and rodents in general. For this, the next generation sequencing (NGS) by the metagenomic approach—genetic material recovered directly from samples without previous amplification—is able to reveal the hidden microbial diversity. Metagenomes’ work aims to contribute to the facilitation of epidemiological studies through the adoption of simple effective strategies for the pathogens’ identification, understanding the evolutionary dynamic of them before the pandemic time.

2. Viral metagenomics: understanding the evolutionary history of viruses

There are many purposes that can be achieved through the use of metagenomics using NGS. Thomas et al. [2] described that this methodology can be a source of genetic information about the possible new biocatalysts or enzymes in cases of bioprospecting; can realize the genomic connections between the function and phylogeny of “non-cultivable” organisms; can identify the evolutionary profiles and associate the function and structure in microbial communities; and also can inference about the new hypotheses of microbial functions.

The Baltimore classification considers the virus taxonomy from seven groups: I (double-stranded DNA viruses), II (single-stranded DNA viruses), III (double-stranded RNA viruses, dsRNA), IV [positive-sense single-stranded RNA viruses, (+) ssRNA], V [negative-sense single-stranded RNA viruses, (−) ssRNA], VI (single-stranded RNA viruses with a DNA intermediate in their life cycle, ssRNA-RT), and VII (double-stranded DNA viruses with an RNA intermediate in their life cycle, dsDNA-RT). However, metagenomics approaches have contributed to understand the real universal viral taxonomy.

Since the RNA viruses are the most abundant to infect eukaryotes in general and all these viruses contain a specific gene, the RNA-dependent RNA polymerase (RdRp), responsible to produce the protein for their replication, it was possible to reconstruct their evolutionary history. Using 4617 different RdRps, they found five key subdivisions, two inclosing dsRNA and (−) ssRNA viruses, the other one containing + ssRNA and dsRNA, and two including only (+) ssRNA viruses. In these analyzes, it was showed that the (+) ssRNA viruses possibly have had a single ancestor and that the dsRNA viruses have emerged from (+) ssRNA in two independent events and the (−) ssRNA and could have emerged later from dsRNA. In addition, phylogenetic analysis including other genes showed the broad horizontal transference between the hosts remotely related, including between animals and plants. All these findings allow the continuous update of the viruses’ arrangement by International Committee on Taxonomy of Viruses (ICTV) [3]. Considering this continuous host-change, also analyzed from diversity of data previously unavailable and now discovered by metagenomics assays, possibly, the evolution of RNA viruses is associated with the evolutionary history of the host, which dates from hundreds of millions of years ago, from the ocean to the terrestrial environment. In this sense, it should be imprudent to associate the origin of some RNA viruses’ families of vertebrates from mosquitoes and ticks [4].

In general, when unrecognized zoonotic viral pathogens emerge from wildlife, it can induce a strong impact on the public health and generate a pandemic risk. In these cases, there are no molecular or serological diagnostic assays specific available for detecting them and the metagenomic appears like the strategy unique to start the developing of new diagnostic test. Temmam et al. [5], in a review paper, already suggested that to predict emerging zoonotic infections always have been an important challenge for public health. Furthermore, the viral metagenomics was pointed by them as the promising alternative to surveillance, especially to identify viromas in selected animals and arthropods closely in contact with humans.

For the diagnosis of viral diseases, the gold standard test is the viral isolation. However, it has the disadvantage of being laborious, consuming many days, in addition to the biosecurity issues of handling in the laboratory. Since the 1980s, with the advent of molecular techniques of polymerase chain reaction (PCR) and sequencing, the scientific community has gradually adopted these tests for the investigation and pathogens diagnosis, mainly the emerging ones. The international scientific community’s decision to sequence the human genome provided the rapid evolution of all molecular techniques and allowed the emergence of techniques such as real-time PCR and high-performance sequencing NGS.

Before the appreciation of metagenomics into the scientific community, the molecular diagnostic methods most common on the routine were practically performed using the PCR system and its variations and Sanger sequencing. Therefore, several commercial molecular diagnostic tests to detect viral, bacterial, and fungal pathogens were made available using real-time PCR, loop-mediated isothermal amplification (LAMP), and conventional PCR, among others.

These approaches are increasingly used to detect new viral pathogens as well as to generate complete genomes of non-cultivable viruses. The in silico identification of complete viral genomes from sequence data would allow a rapid phylogenetic characterization of these new viruses [6]. Thus, the methodology emerges as an alternative for the discovery of new viral agents in animals, allowing the expansion of knowledge of viral diversity as well as of potentially emerging zoonosis. The identified material can provide insights into the evolution and viruses phylogenies. These data, associated with studies on the biology of the hosts, can contribute to the understanding of the eco-epidemiology of several viruses that, through complementary analyzes, help in the identification of potentially pathogenic agents, a perspective action from One Health.

Important advances in the knowledge of microbial diversity through metagenomic tools in several ecosystems have allowed to associate the sequence data with the complex biological characteristics of an ecosystem, be it the human intestine, for example, whether aquatic or terrestrial environmental. This can be an essential source to understanding the microbial ecology and biogeography. Different works of metagenomics have shown that the biogeography of viruses, in general, is associated with the characteristics of habitats. As such, similar viral biogeography can be found in similar habitats that can be geographically far. On the other hand, also was identified substantial sharing of the human viroma between unrelated population and widespread viral quasi-species distinctly dissimilar habitats [7]. So, is really important to implement the metagenomic assay in several places around the world in order to deepen the understanding of this complex ecological dynamic. There are many antibiotics and antiviral molecules produced in these ecosystems. To identify the biosynthetic origin of these drugs can improve treatment alternatives for existing diseases or even for emerging infectious diseases.

Viromas of different animal species as well as different human organs have been obtained through this methodological approach. Brotman et al. [8] described the temporal relationship between vaginal microbiota and detection of human papilloma virus (HPV). A small DNA virus, the Torque Teno Virus (TTV), was unexpectedly found in samples from patients with ophthalmitis (negative bacterial and fungal cultures) [9]. Enteric metaviroma of shrews (genusCrocidura, one of nine genera of the shrew subfamily Crocidurinae), small insectivorous mammals similar to rodents, were determined and identified with new insect and virus viruses, including cyclovirus, picornavirus, and picorna-like virus. In addition, several cycloviruses, including human variants, were detected in wild shrews with a high prevalence rate. Complete or almost complete genomic sequences of these new viruses were determined and subjected to genetic characterization [10].

Metagenomic analyzes have shown that bats can be reservoirs to several species of RNA viruses. Many of these viruses are highly pathogenic and exhibit broad cell tropism, being able to infect a wide variety of cells and hosts (Hendra virus, HEV; Nipah virus, NIV; Ebola virus, EBOV; Meddle East Respiratory Syndrome virus, MERS; Severe Acute Respiratory Syndrome Coronavirus type 1, SARS-CoV; Severe Acute Respiratory Syndrome Coronavirus type 2, SARS-CoV-2). Viruses such as HEV, rabies virus (RABV), and NIV show high genomic conservation within their bat hosts, which suggests that they are under strong selective restrictions [7].

Considering that bats can be host a great viruses diversity and that little is known about their viroma, Dacheux et al. [12] determined the viral diversity of five different French insectivorous bats species (nine specimens). They detected viruses from many viral families that infect bacteria, plants, fungi, insects, and vertebrate animals and mammals (Retroviridae, Herpesviridae, Bunyaviridae, Poxviridae, Flaviviridae, Reoviridae, Bornaviridae, and Picobirnaviridae). They described new mammalian viruses, including rotavirus, gammaretrovirus, bornavirus, and bunyavirus, as well as the first nairovirus identification in bats.

Bat metaviroma from Myanmar (China) revealed the presence of new mammalian viruses. The analysis was conducted using organs of 853 bats of six species, identifying known sequences belonging to 24 viral families. Of the viral contigs (2% of the total sequences), 45% were related to vertebrate viruses, 28% to insect viruses, 27% to phages, and 95 contigs to plant viruses. The validation performed by PCR followed by phylogenetic analyzes led to the discovery of some new bat viruses of the genera Mamastrovirus, Bocavirus, Circovirus, Iflavirus, and Orthohepadnavirus [13].

In African fruit bats populations (Eidolon helvum), it was identified by metagenomics a great abundance and diversity of new herpes and papillomavirus. The authors also described a new adenovirus and detected, for the first time in Chiroptera, sequences of a poxvirus closely related to contagious mollusk [14].

Herman Tse et al. [15], carrying out studies of 156 apparently healthy rectal swab samples from bats also using a metagenomic approach, discovered a new Papillomavirus strain, Miniopterus schreibersii Papillomavirus type 1 (MscPV1), with a 7.5 kb long genome. In addition to the new agent characterization, the researchers also carried out several phylogenetic studies that allowed us to infer that MscPV1 and Erethizon dorsatum papillomavirus (EdPV1) are more closely related, with an approximate divergence of 60.2–91.9 million years.

He and collaborators (2014) identified hundreds of sequences related to alpha and Betacoronavirus sequencing 268 rectal swabs from 68 bats from four counties in Yunnan province. They also reported the complete genome of a new SARS-CoV (LYRa11) containing 29,805 nucleotides in length, 13 ORFs, 91% nucleotide identity with human and civet SARS CoVs, 89% similarity to another bat SARS-CoV-like. One of the most interesting reports was obtained through recombination analyzes. Such analyzes indicated that LYRa11 is a probable recombinant descendant of parental strains evolved from SARS-CoVs-like bats.

An outbreak of respiratory infection of unknown origin began to manifest in many people in Wuhan-Hubei-China in late 2019. Difficulties in controlling the disease by conventional methods of treatment suggested a new infectious disease with viral characteristics and effective transmission of person to person. A short time later and with the support of the international scientific community, it was confirmed that the new disease called Coronavirus Disease 2019 (COVID-19) was caused by a new coronavirus initially called 2019 Novel Coronavirus (2019-nCoV). It is not the first time that a human coronavirus has caused a major disease with risk of global spread. Severe Acute Respiratory Syndrome Coronavirus type 1 (SARS-CoV-1) and, in 2013, Middle East Respiratory Syndrome (MERS-CoV) emerged, both in two different places, of zoonotic origin and evolved from bats having intermediary animals as hosts, the civets and camelids, respectively. Using metagenomics approaches, in a few days, the scientific community has managed to obtain the complete genome of this virus. Based on genetic, evolutionary and molecular studies, the 2019-nCoV virus was named Severe Acute Respiratory Syndrome Coronavirus type 2 (SARS-CoV-2), a Sarbecovirus, Betacoronavirus, brother of SARS-CoV-1.

However, it is an important question without conclusive answer: what is the real reservoir for SARS-CoV-2? Until now, no virus isolation was successful from the none animal source but Lam et al. [16], by metagenomics approaches, identified several pangolin coronavirus lineages suggesting that these animals could be considered as possible hosts in the emergence of new coronaviruses and must be removed from wet markets to prevent zoonotic transmission.

In the COVID-19 pandemic, until August 2020, more than 20,000,000 people were infected and almost 800,000 death in more than 190 countries that detected the virus [17]. Peddu et al. [18] described the metagenomic analysis as a good alternative to investigate and to response to this and future viral pandemics; they evaluated, by metagenomic sequencing, positive and negative samples from Seattle, WA. Part of these samples showed superinfection or colonization with human parainfluenza virus 3 or Moraxella species, emphasizing to be essential to conduct molecular testes using a viral respiratory panel. In addition, negative samples for SARS-CoV-2 by RT-PCR were positive for Rhinovirus A and C, showing that the metagenomic analysis of these SARS-CoV-2 negative samples was able to identify candidate etiological agents for the respiratory signs in those patients.

Did we learn from the past epidemics? Are we prepared for the worst? Gonzalez et al. [19] stablished these key questions related our learning from the humanity history. According to them, the ultimate goal should be develop a resilient global health infrastructure; like the bio-surveillance using geographic information systems (GIS) and metagenomics to trace the molecular changes in pathogens during their emergence, and mathematical models to assess risk should be “critical point” for preventing a pandemic.

3. Epidemiological surveillance in bats by metagenomic approaches: a powerful tool for conducting large-scale studies

In recent years, emerging and serious infectious diseases have caused worldwide fear. It is also known that many of these diseases are caused by viruses from bats, such as Ebola, Marburg, SARS coronavirus (SARS-CoV), MERS coronavirus (MERS-CoV), Nipah (NIV) and Hendra (HEV) [20], and nowadays, SARS-CoV-2. The growing recognition of the bats importance as reservoirs for new diseases is due to the fact that they constitute 20% of known mammal species, have unique and diverse lifestyles, including the ability to fly, often presenting gregarious social structures achieving incredible abundance and densities (some cave bats reaching up to 500 individuals per square meter) and long life [21].

As more information has been obtained regarding the factors or causes of emergence, there has been an expectation that it is possible to predict the emergence of new pathogens. These and other factors have significantly increased the demand for new viral pathogens, especially at the human-animal interface in species of wild and domestic animals [22].

With the exception of studies focusing on lyssavirus, most viruses’ investigations in bats have been limited to one particular zoonotic agent involved in an outbreak of geographically localized disease [23]. COVID-19 showed the need to form an international front for active surveillance of bats different populations to detect potential zoonotic agents as well as low pathogenicity unknown viruses that can recombine/mutate and become pathogenic.

The emergence of highly pathogenic viruses such as SARS and MERS-CoV has identified coronaviruses as agents of high interest in epidemiological surveillance. In addition to concluding that SARS-CoV may have originated from bats, it is suggested that several other new viruses exist in animals and some of them pose a risk to public health [24, 25, 26, 27].

Although great advances have been made in the knowledge of these viruses, there is much to learn about the evolution of highly pathogenic agents in reservoir animals such as bats [1]. Several studies have pointed out a great diversity of coronaviruses belonging to the genus α- and β-coronavirus of the subfamily Orthocoronavirinae that occur widely in bat species around the world, including Africa, Europe, the Americas, and Asia. Interestingly, an analysis of viruses isolated from bats in Mexico showed that host species were driving forces in the evolution of coronaviruses, and that a single bat species can contain several coronaviruses. In addition, the phylogenetic association of CoVs with the species/genus was particularly evident in allopatric populations separated by significant geographical distances [22].

Simas and Arns [28] described a metagenomic methodology using bat common specie from urban areas in the Americas, the Tadarida brasiliensis, in order to establish a rapid methodology for active epidemiological surveillance in bats as the best reservoir animal model. The assay aimed to identify viral agents in oral and rectal swabs collected from asymptomatic T. brasiliensis bats from a colony in the Campinas-SP, Brazil. From this, these researchers described the diversity and abundance of the identified viral agents and could relate phylogenetically the identified. The workflow is described in Figure 1.

Figure 1.

Workflow described by Simas and Arns [28] to conduct active epidemiological surveillance from model reservoir animals like bats using metagenomic approach.

The most important steps are the pretreatment and the validation because these can remove the host genetic material and confirm the sequences of dataset identified. For the pretreatment, the researchers used filtration and treatment with DNAse and proteinase K enzymes. These procedures help to eliminate the genetic material “contaminants” and to assure the most viral genetic material into the sequence dataset.

Using these assays, a large number of excellent quality paired-end sequences were obtained in the HiSeq 2500 Illumina platform (345,409,110 reads paired-ends—76.47% Q ≥ 30). In the reading assembly procedures with the MetaVelvet and Metavir 2 genome assemblies’ platforms, many viral genetic materials from several pathogenic viruses were identified. It can be noted that the different platforms used provided complementary data, indicating the need to carry out similar procedures in studies that use the same metagenomic methodology.

Although the search for similarity carried out by MetaVelvet in different databases provided a small number of viral matches (97; 2 for coronavirus), these results were validated and allowed the identification of a coronavirus with a strong phylogenetic relationship with Porcine Epidemic Diarrhea virus (PEDV), a high pathogenicity swine virus, and human coronavirus, HCoV-NL63. PEDV has been reported in many other countries, including Germany, France, Switzerland, Hungary, Italy, China, South Korea, Thailand and Vietnam and was first identified in the United States in May 2013. The US outbreak occurred in 23 states, with 2692 confirmed cases leading to major economic losses. Studies have shown that all American PEDV strains are closely related to a strain from China, AH2012 [29]. However, the identification of PEDV in wild animals common in the Americas, such as bats Tadarida brasiliensis, can help to understand the evolution of these agents in animal reservoirs and to understand the eco-epidemiology from the genetic diversity studies like this.

Metavir 2 identified sequences of viruses associated with various pathogens in humans. Many sequences have been classified as belonging to the Herpesviridae family. Several viral agents in this family are known to cause a wide variety of human diseases, including various types of cancer. In addition, since they have a great capacity to infect many types of cells or tissues [30], bats may be serving as a reservoir for recombination and the emergence of new strains capable of infecting other animals even cause human infections.

Several viruses of the Order Caudovirales were also identified, most from the Siphoviridae family. These phages are capable of infecting several species of human pathogenic bacteria (Enterobacteria, Shigella, Mycobacterium, and Bacillus), so it is an indirect evidence of the presence of these bacteria also in bats. The concomitant detection of herpes and phages indicates that bats can act as important agents in the evolution of these viral agents, since the existence of recombination between them has already been described [31].

Many betaretroviruses, viruses that cause various types of tumors in primates, sheep, and rats have been detected. Sano et al. [32] also identified several viral agents from the Retroviridae and Herpesviridae families in bats in the Philippines. Dacheux et al. [12] determined the viral diversity of five different species of French insectivorous bats (nine specimens). All of these results suggest that retroviruses and herpesviruses are widely distributed in bat populations.

The detection of several dsRNA virus sequences, a virus group that cause gastroenteritis in children (rotavirus) and others that are pathogenic for cattle and sheep, their identification in bats contributes to the understanding of their circulation in ecosystems. Another Brazilian study also reported the presence of rotavirus in bat feces. Phylogenetic analyzes indicated the formation of a clade with sequences of bovine and human origins, suggesting recombination between the strains in animal hosts, events that precede transmission to humans in zoonotic viral diseases [33].

4. Conclusions

The metagenomic proceeding is fast and highly sensitive to access the genetic diversity on the ecosystem in general. With the use of the metagenomic approach, in a few days, the scientific community has managed to obtain the complete SARS-CoV-2 genome. Because epidemiological studies are still conducted from the onset of diseases, outbreaks are still being worked on, not prevention. From this perspective, this methodology showed to be able to be applied to conduct epidemiological surveys and it should be widely applied to understand, by the genetic diversity, the molecular eco-epidemiology of viral agents before the pandemic time.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Paulo Vitor Marques Simas and Clarice Weis Arns (September 21st 2020). Metagenomics and Pandemic Viruses [Online First], IntechOpen, DOI: 10.5772/intechopen.93687. Available from:

chapter statistics

26total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us