Metabarcoding approaches for the study of human vector-borne diseases using natural populations of vectors as biological samples.
The implementation of sustainable control strategies aimed at disrupting the transmission of vector-borne pathogens requires a comprehensive knowledge of the vector ecology in the different eco-epidemiological contexts, as well as the local pathogen transmission cycles and their dynamics. However, even when focusing only on one specific vector-borne disease, achieving this knowledge is highly challenging, as the pathogen may exhibit a high genetic diversity and multiple vector species or subspecies and host species may be involved. In addition, the development of the pathogen and the vectorial capacity of the vectors may be affected by their midgut and/or salivary gland microbiome. The recent advent of Next-Generation Sequencing (NGS) technologies has brought powerful tools that can allow for the simultaneous identification of all these essential components, although their potential is only just starting to be realized. We present a metabarcoding approach that can facilitate the description of comprehensive host-pathogen networks, integrate important microbiome and coinfection data, identify at-risk situations, and disentangle the transmission cycles of vector-borne pathogens. This powerful approach should be generalized to unravel the transmission cycles of any pathogen and their dynamics, which in turn will help the design and implementation of sustainable, effective, and locally adapted control strategies.
- vector-borne diseases
- transmission cycles
- vector ecology
- next-generation sequencing (NGS)
- blood meals
- One Health
Vector-borne diseases affecting human health are caused by pathogens transmitted by “living organisms” between humans or from animals to humans. These “living organisms” are known as “vectors,” which generally are bloodsucking arthropods, such as mosquitoes, ticks, flies, sandflies, fleas, or triatomine bugs. These arthropods ingest disease-producing microorganisms during a blood meal from an infected host (human or animal) and later transmit it to a new host during their subsequent blood meals . According to the World Health Organization (WHO), vector-borne diseases, such as malaria, dengue, human African trypanosomiasis, leishmaniasis, Chagas disease, yellow fever, Japanese encephalitis, or onchocerciasis, account for almost 20% of all infectious diseases worldwide. They cause more than 700,000 deaths annually, and more than half of the world’s population is estimated to be at risk of these diseases . They are a major obstacle to development, and the poorest segments of societies and least-developed countries are the most affected. The most deadly vector-borne disease, malaria, causes more than 400,000 deaths annually, mainly children under 5 years. However, the world’s fastest-growing vector-borne disease is dengue, with a 30-fold increase in disease incidence over the last 50 years [1, 2]. Currently, there is an estimation of 96 million cases of dengue per year, and more than 3.9 billion people in over 128 countries are at risk of contracting this disease [1, 3]. Chagas disease, which is one of the primary study models of our research group and classified by the WHO within the group of Neglected Tropical Diseases (NTDs), is a major public health problem in Latin America where 6–7 million people are currently infected [4, 5].
The control of vector-borne diseases relies mainly on control programs targeted against the different vectors. Nevertheless, the efficiency of the different vector control strategies is highly linked to the local ecology of the vectors , which in turn defines local transmission cycles. Consequently, for the implementation of sustainable control strategies aimed at disrupting the transmission of vector-borne pathogens, comprehensive knowledge of the vector ecology and behavior in the different eco-epidemiological contexts, as well as the local transmission cycles of the pathogens and their dynamics, is an essential need. However, even when focusing only on one specific vector-borne disease, achieving this knowledge is challenging. Indeed, the pathogen may exhibit a high genetic diversity, and multiple vector species or subspecies and host species may be involved. In addition, the development of the pathogen and the vectorial capacity of the vectors may be affected by their midgut and/or salivary gland microbiome. Sometimes, many pathogen species can also be involved. For example, leishmaniases are caused by more than 20 Leishmania species .
The recent advent of Next-Generation Sequencing (NGS) technologies has brought powerful tools, with enormous potential, allowing the simultaneous identification of all these components for the understanding of the eco-epidemiology of vector-borne diseases. Nevertheless, their potential is only just starting to be realized. Here, we present a metabarcoding approach based on NGS that can facilitate the creation of comprehensive host-pathogen networks, integrate important microbiome and coinfection data, identify at-risk situations, and disentangle the transmission cycles of vector-borne pathogens.
2. Complexity of vector-borne pathogen transmission cycles and their dynamics
The transmission cycles of vector-borne pathogens are shaped by the ecology and behavior of hosts and vectors in their specific environments and defined by the specific interactions between the vectors, the pathogens, and their hosts (which also act as blood-feeding sources of the vectors) . Consequently, the comprehensive identification of these interactions is critical to disentangle transmission cycles and understand their dynamics. In most cases, an extraordinary diversity of organisms is involved, making the identification of those interactions challenging. In the case of Chagas disease, for example, the causative agent, a protozoan parasite called Trypanosoma cruzi, presents a very high genetic diversity, which has been classified into seven discrete typing units (DTUs) . These DTUs are transmitted by more than 140 triatomine species, which live in a very wide variety of ecotopes and bioclimatic conditions , to more than 180 mammalian species, including wild animals, domestic animals, and human [11, 12]. In parallel, triatomines also take blood meals upon animals which are refractory to T. cruzi infection, called incompetent hosts, such as birds, reptiles, and amphibians [13, 14] (Figure 1). Finally, the establishment and development of the parasite and the vectorial capacity of the triatomines could be affected by the composition of their midgut microbiome , as has been shown for other vectors. For example, the development of Trypanosoma brucei, the agent of African trypanosomiasis, in its tsetse fly vector, is directly influenced by a microbiome-regulated gut immune barrier . In the same way, the sand fly midgut microbiome is a critical factor for Leishmania growth and differentiation to its infective state prior to disease transmission . Gut microbiome similarly modulates dengue virus infection in Aedes aegypti mosquitoes [18, 19], and microbiome manipulation may be used to control virus transmission [20, 21]. Similar observations exist for other vector/pathogen systems, such as ticks and the causative agent of Lyme disease , or malaria vectors , in which salivary gland microbiome may also play a role .
3. Metabarcoding: a highly sensitive and integrative approach to disentangle vector-borne pathogen transmission cycles
NGS technologies can generate millions of sequencing reads in parallel. This massive throughput sequencing capacity can produce sequence reads from fragmented libraries of a specific genome (i.e., genome sequencing) or from a pool of PCR products. Metabarcoding approaches rely on this technology where a large number of different amplicons of taxonomic informative genes (barcodes) can be sequenced. While metagenomics refers to the identification of all genomes within a particular ecosystem or sample, metabarcoding aims to identify only a subset of them (those that are of interest for a particular question) by sequencing of millions of different amplicons of these barcodes, without a necessity for cloning (i.e., sequences are obtained directly from a mix of different amplicons of different barcodes of interest) .
Consequently, in the case of vector-borne pathogens, starting only from the vectors as biological samples, it is possible to target and amplify well-chosen molecular markers (barcodes) of interest with universal primer sets to identify the different actors of transmission cycles (e.g., vertebrate blood sources, midgut microbiome, pathogen diversity, and vector diversity ). Other ecological interactions which are not directly involved in the transmission cycles but relevant for the understanding of the vector ecology and the dynamics of the transmission cycles (e.g., plant-feeding sources, sometimes required as a source of energy for routine activities such as flight, mating, and walking or a source of protein for maturation of eggs ) can also be identified. A schematic representation of the metabarcoding approach for the identification of ecological interactions of disease vectors is given in Figure 2. After purification of the total DNA (and RNA if working with RNA pathogens) contained in each vector midgut (and salivary glands, depending on the kind of vector) (1), molecular markers (barcodes) of interest are PCR amplified (after RT-PCR if working with RNA pathogens) (2). Then, to identify samples, a tag/index is added to each PCR product (amplicon). The same tag is used for all the amplicons obtained from a single sample (3). After high-throughput sequencing (4), the millions of reads (5) are sorted per sample thanks to the tags added to each amplicon (6).
Currently, the most common systems provide up to 384 different tags and 25 million reads per sequencing run. The depth (i.e., the number of reads or the number of sequences) obtained per molecular marker and sample depends on the number of labeled samples and the number of markers amplified per sample. For instance, if we amplify 10 molecular markers for 100 vector specimens and run at a depth of 25 million sequences, about 250,000 reads per vector specimen and 25,000 reads per marker and specimen will be theoretically obtained. This kind of multiplexing allows to considerably lower sequencing costs per sample. Downstream analyses with bioinformatics tools, such as those provided on the open access Galaxy platform , allows to obtain and identify the sequences corresponding to each targeted marker for each vector specimen. This approach is thus extremely powerful to further reconstitute the pathogen transmission cycles and understand its dynamics, since it can reveal, after adequate analyses, all the existing ecological interactions thanks to the simultaneous identification and for each specimen of its species or subspecies, its blood-feeding source(s), the pathogen(s) of interest, the species or lineage(s) of the pathogen(s) of interest, the composition of its midgut microbiome, of its salivary gland microbiome, its plant-feeding source(s), mutations associated with insecticide resistance, etc.
4. Unraveling T. cruzi transmission cycles in the Yucatan peninsula (Mexico): an example of the metabarcoding approach use
As a proof of concept, we recently performed a pilot study of the metabarcoding approach presented above using Chagas disease in the Yucatan peninsula (Mexico) . In this region, T. dimidiata is the main vector, and different genetic subgroups of this species [30, 31, 32] live in sympatry . The different molecular markers we selected for our metabarcoding approach are described below: (i) to classify T. dimidiata in its different genetics subgroups, we used primers targeting the Internal Transcribed Spacer ITS-2 as previously described ; (ii) for blood-feeding source identification, we used vertebrate universal primers targeting the 12S rRNA gene ; (iii) for T. cruzi, we used primers targeting the mini-exon gene, allowing further classification of the parasite in its different DTUs ; and (iv) finally, we used universal primers targeting the bacterial 16S rRNA gene to identify bacterial microbiome composition . This way, we aimed to determine if there were detectable interaction patterns between the genetic subgroups of T. dimidiata, their blood-feeding hosts, the infection with T. cruzi, the parasite DTUs, and the microbiome composition, allowing elucidating at finer scales the T. cruzi transmission cycles in the study area.
This study, which was based on 14 T. dimidiata bugs collected in wild as well as in domestic ecotopes, evidences the feasibility and high sensibility of the proposed approach . For example, we identified an average number of blood-feeding species per bug of 4.9 ± 0.7 and up to 7 blood-feeding species and 11 blood-feeding individuals in a single bug. Contrastingly, current techniques based on direct sequencing of PCR products can only identify the dominant sequence/host in each sample , while the addition of a cloning step prior to sequencing generally allows detecting up to three to five host species in some bugs [14, 39, 40, 41]. In the same way, we easily identified different DTUs infecting single bugs, while to date, most studies have relied on conventional Sanger sequencing approaches that are only capable of detecting the dominant genotype in biological samples, which almost precludes the possibility of detecting multiclonality. Based on this observation, NGS approaches capable of inventorying multiclonal infections are now being progressively adopted [42, 43, 44, 45, 46]. Regarding midgut microbiome, we were able to detect 23 bacterial orders and observed that its composition differed according to blood-feeding sources (Figure 3). Finally, all the 14 bugs belonged all to the same genetic subgroup.
To further assess potential transmission cycles of T. cruzi parasites by T. dimidiata among the identified blood source species, a feeding and parasite transmission network was constructed (Figure 4). Nodes of the network represent the species identified as blood meal sources, while the size of the corresponding node indicates feeding frequency on each species. Edges link species which are found together in multiple blood meals within individual bugs. Since birds cannot carry T. cruzi parasites, they only play a role as blood sources for triatomines, which is indicated by dotted edge connections between hosts. The solid lines between mammals indicate potential parasite transmission pathways. This network nicely highlights the mammals which would play the main role in T. cruzi transmission to human in the study area. Humans (Homo sapiens in Figure 4) may thus become infected by T. cruzi parasites originating from dogs (Canis lupus), cows (Bos taurus), and mice (Mus musculus), as well as from sylvatic hosts such as porcupines (Coendou spp.), squirrels (Sciurus spp.), and fruit bats (Artibeus spp.). Particularly, dogs appear as key actors which may favor parasite transmission to humans. This kind of networks is very informative, as it allows evidencing the animals that would play the main roles in the transmission of any pathogen to human (complementary studies focused directly on these animals may nevertheless be necessary) and that should be targeted as part of integrated control strategies aimed at disrupting parasite transmission. For example, management of the dogs and other peridomestic animals can be part of EcoHealth/One Health approaches . The network presented is the result of a pilot study based on a limited sample and is only used here to illustrate the potential of the proposed metabarcoding approach. Increasing the sample size in a wide variety of ecotopes and integrating vector, microbiome and coinfection data will undoubtedly allow identifying at-risk situations and disentangling transmission cycles. It may also help to identify bacteria which are part of the normal microbiota of triatomine bugs, bacteria associated with the presence/absence of infection of the bugs with T. cruzi, or bacteria of vital importance to the bugs. This knowledge can have important applications for the development of innovative control strategies [48, 49, 50]. The information provided by the approach can also be used to feed models including the hosts involved in the transmission to help assessing the effects of different host community managements on T. cruzi transmission to human and understand transmission dynamics over time [51, 52]. Transmission models are becoming increasingly important in vector-borne disease control programs. They allow evaluating different control strategies or combinations of them and assessing their cost-effectiveness and likelihood of success .
Consequently, the approach presented here provides very high-value information that can be used in multiple ways for further design and implementation of sustainable, effective, and locally adapted control strategies and deserves to be extended to other eco-epidemiological contexts and to any vector-borne pathogen. To date, metabarcoding approaches for the study of human vector-borne diseases using natural populations of vectors are being progressively adopted, but they are still timidly used [54, 55]. Moreover, they are still generally focused only on one of the components of transmission cycles, such as blood-feeding hosts [56, 57, 58, 59], plant-feeding hosts , microbiome composition [60, 61], or vector diversity  (Table 1), thus providing limited information, while the approach can be easily more integrative, as we illustrated here, to simultaneously identify the different actors involved in transmission.
|Vector||Geographic origin||Target DNA||Main findings||Reference|
|Mosquitoes (Anopheles punctulatus)||Different villages in Papua New Guinea||Mammalian blood-feeding hosts||Unbiased characterization of mammalian blood-feeding hosts, including unsuspected hosts and mixed blood meals. Human, dog, and pig were the most common host-feeding sources. The approach can also be adapted to evaluate interindividual variations among human blood meals|||
|Mosquitoes (Culex and Anopheles spp.)||Different sites in the coast of the Caspian Sea in northern Iran||Vertebrate blood-feeding hosts||The four most common mosquito species had similar host-feeding patterns. The most commonly detected hosts in these species were humans, cattle, and ducks|||
|Mosquitoes and sand flies||Forest sites in French Guiana||Mammal blood-feeding hosts||Accuracy of the short 12S marker proposed for the identification of Amazonian mammals. The accuracy of taxonomic assignations highly depends on the comprehensiveness of the reference library|||
|Triatomine bugs (Rhodnius pallescens)||Two sampling sites in in Panama||Vertebrate blood-feeding hosts||Reliability of the metabarcoding approach proposed for the identification of vertebrate blood-feeding host|||
|Phlebotomine sandflies (Phlebotomus and Lutzomya spp.)||Different sampling sites in Brazil, Israel, and Ethiopia||Plant-feeding hosts||Sand flies preferentially feed on Cannabis sativa plants. Potential utility for sand fly control|||
|Mosquitoes (Aedes and Culex spp.)||Different habitats across central Thailand||Bacterial and eukaryotic microbiome||Patterns of microbial composition and diversity that affect pathogen prevalence appeared to differ by both vector species and habitat for a given species. Microbial composition was less diverse in urban areas|||
|Tse-tse flies (Glossina palpalis palpalis)||Two trypanosomiasis foci in Cameroon||Bacterial microbiome||Endosymbiont Wigglesworthia was highly prominent. Potential role for Salmonella and Serratia in fly refractoriness to trypanosome infection. V4 region of the small subunit of the 16S ribosomal RNA gene was more efficient than the V3V4 region at describing the totality of the bacterial diversity|||
|Phlebotomine sand flies (Lutzomya and Brumptomyia spp.)||Various locations in French Guiana||Sand flies||Efficiency of metabarcoding based on the mitochondrial 16S rRNA for identification of sand fly diversity in bulk samples|||
|Mosquitoes and sand flies (various species)||3 sites along a gradient of anthropogenic pressure in French Guayana, area of Saint-Georges de l’Oyapock||Vectors and vertebrate blood-feeding hosts||Contrasting ecological features and feeding behavior among dipteran species, which allowed unveiling arboreal and terrestrial mammals, as well as birds, lizards, and amphibians. Lower vertebrate diversity was found in sites undergoing higher levels of human-induced perturbation|||
|Triatomine bugs (Triatoma dimidiata)||Different habitats in rural Yucatan (Mexico)||Vertebrate blood-feeding hosts, Trypanosoma cruzi parasite, midgut bacterial microbiome, triatomine bug||Ecological associations of triatomines which shape T. cruzi transmission cycles. Different DTUs infecting single bugs. Identification of 14 blood-feeding species. Up to 7 blood-feeding species and 11 blood-feeding individuals identified in a single bug. Human, dog, cow, and mice were the most common host-feeding sources. Dog was highlighted as the main host involved in the pathway of T. cruzi transmission to human. Dynamic midgut microbiome, including 23 bacterial orders, which differed according to blood sources|||
In this chapter, we presented a metabarcoding approach to study vector-borne pathogen transmission cycles and their dynamics and illustrated the feasibility and high sensitivity of the proposed approach with a recent study performed using Chagas disease in the Yucatan peninsula (Mexico), as a study model. Currently, NGS technologies are quickly becoming more affordable and cost-effective. Moreover, many bioinformatics tools have allowed to greatly simplify analyses in the last years. Consequently, this powerful approach deserves to be generalized to other eco-epidemiological contexts to unravel the transmission cycles of any vector-borne pathogen and their dynamics, which in turn will help the implementation of sustainable, effective, and locally adapted control strategies of their transmission.
This work received financial support from CONACYT (National Council of Science and Technology, Mexico) Basic Science (Project ID: CB2015-258752) and National Problems (Project ID: PN2015-893) Programs. This work was also funded by the Louisiana Board of Regents through the Board of Regents Support Fund [# LESASF (2018-2021)-RD-A-19] and grant #632083 from Tulane University School of Public Health and Tropical Medicine.