Metabarcoding: A Powerful Yet Still Underestimated Approach for the Comprehensive Study of Vector-Borne Pathogen Transmission Cycles and Their Dynamics

The implementation of sustainable control strategies aimed at disrupting the transmission of vector-borne pathogens requires a comprehensive knowledge of the vector ecology in the different eco-epidemiological contexts, as well as the local pathogen transmission cycles and their dynamics. However, even when focusing only on one specific vector-borne disease, achieving this knowledge is highly challenging, as the pathogen may exhibit a high genetic diversity and multiple vector species or subspecies and host species may be involved. In addition, the development of the pathogen and the vectorial capacity of the vectors may be affected by their midgut and/or salivary gland microbiome. The recent advent of Next-Generation Sequencing (NGS) technologies has brought powerful tools that can allow for the simultaneous identification of all these essential components, although their potential is only just starting to be realized. We present a metabarcoding approach that can facilitate the description of comprehensive host-pathogen networks, integrate important microbiome and coinfection data, identify at-risk situations, and disentangle the transmission cycles of vector-borne pathogens. This powerful approach should be generalized to unravel the transmission cycles of any pathogen and their dynamics, which in turn will help the design and implementation of sustainable, effective, and locally adapted control strategies.


Introduction
Vector-borne diseases affecting human health are caused by pathogens transmitted by "living organisms" between humans or from animals to humans.These "living organisms" are known as "vectors," which generally are bloodsucking arthropods, such as mosquitoes, ticks, flies, sandflies, fleas, or triatomine bugs.These arthropods ingest disease-producing microorganisms during a blood meal from an infected host (human or animal) and later transmit it to a new host during their subsequent blood meals [1].According to the World Health Organization (WHO), vector-borne diseases, such as malaria, dengue, human African trypanosomiasis, leishmaniasis, Chagas disease, yellow fever, Japanese encephalitis, or onchocerciasis, account for almost 20% of all infectious diseases worldwide.They cause more than 700,000 deaths annually, and more than half of the world's population is estimated to be at risk of these diseases [1].They are a major obstacle to development, and the poorest segments of societies and least-developed countries are the most affected.The most deadly vector-borne disease, malaria, causes more than 400,000 deaths annually, mainly children under 5 years.However, the world's fastest-growing vector-borne disease is dengue, with a 30-fold increase in disease incidence over the last 50 years [1,2].Currently, there is an estimation of 96 million cases of dengue per year, and more than 3.9 billion people in over 128 countries are at risk of contracting this disease [1,3].Chagas disease, which is one of the primary study models of our research group and classified by the WHO within the group of Neglected Tropical Diseases (NTDs), is a major public health problem in Latin America where 6-7 million people are currently infected [4,5].
The control of vector-borne diseases relies mainly on control programs targeted against the different vectors.Nevertheless, the efficiency of the different vector control strategies is highly linked to the local ecology of the vectors [6], which in turn defines local transmission cycles.Consequently, for the implementation of sustainable control strategies aimed at disrupting the transmission of vector-borne pathogens, comprehensive knowledge of the vector ecology and behavior in the different eco-epidemiological contexts, as well as the local transmission cycles of the pathogens and their dynamics, is an essential need.However, even when focusing only on one specific vector-borne disease, achieving this knowledge is challenging.Indeed, the pathogen may exhibit a high genetic diversity, and multiple vector species or subspecies and host species may be involved.In addition, the development of the pathogen and the vectorial capacity of the vectors may be affected by their midgut and/or salivary gland microbiome.Sometimes, many pathogen species can also be involved.For example, leishmaniases are caused by more than 20 Leishmania species [7].
The recent advent of Next-Generation Sequencing (NGS) technologies has brought powerful tools, with enormous potential, allowing the simultaneous identification of all these components for the understanding of the eco-epidemiology of vector-borne diseases.Nevertheless, their potential is only just starting to be realized.Here, we present a metabarcoding approach based on NGS that can facilitate

Complexity of vector-borne pathogen transmission cycles and their dynamics
The transmission cycles of vector-borne pathogens are shaped by the ecology and behavior of hosts and vectors in their specific environments and defined by the specific interactions between the vectors, the pathogens, and their hosts (which also act as blood-feeding sources of the vectors) [8].Consequently, the comprehensive identification of these interactions is critical to disentangle transmission cycles and understand their dynamics.In most cases, an extraordinary diversity of organisms is involved, making the identification of those interactions challenging.In the case of Chagas disease, for example, the causative agent, a protozoan parasite called Trypanosoma cruzi, presents a very high genetic diversity, which has been classified into seven discrete typing units (DTUs) [9].These DTUs are transmitted by more than 140 triatomine species, which live in a very wide variety of ecotopes and bioclimatic conditions [10], to more than 180 mammalian species, including wild animals, domestic animals, and human [11,12].In parallel, triatomines also take blood meals upon animals which are refractory to T. cruzi infection, called incompetent hosts, such as birds, reptiles, and amphibians [13, 14] (Figure 1).Finally, the establishment and development of the parasite and the vectorial capacity of the triatomines could be affected by the composition of their midgut microbiome [15], as has been shown for other vectors.For example, the development of Trypanosoma brucei, the agent of African trypanosomiasis, in its tsetse fly vector, is directly influenced by a microbiome-regulated gut immune barrier [16].In the same way, Figure 1.Complexity of T. cruzi transmission cycles.The parasite is divided into seven genetic subgroups (DTUs), which are transmitted by more than 140 triatomine species to more than 180 mammalian species, including wild animals, domestic animals, and human.In parallel, triatomines also take blood meals upon animals which are refractory to T. cruzi infection (incompetent hosts).Figure adapted from [25].
the sand fly midgut microbiome is a critical factor for Leishmania growth and differentiation to its infective state prior to disease transmission [17].Gut microbiome similarly modulates dengue virus infection in Aedes aegypti mosquitoes [18,19], and microbiome manipulation may be used to control virus transmission [20,21].Similar observations exist for other vector/pathogen systems, such as ticks and the causative agent of Lyme disease [22], or malaria vectors [23], in which salivary gland microbiome may also play a role [24].

Metabarcoding: a highly sensitive and integrative approach to disentangle vector-borne pathogen transmission cycles
NGS technologies can generate millions of sequencing reads in parallel.This massive throughput sequencing capacity can produce sequence reads from fragmented libraries of a specific genome (i.e., genome sequencing) or from a pool of PCR products.Metabarcoding approaches rely on this technology where a large number of different amplicons of taxonomic informative genes (barcodes) can be sequenced.While metagenomics refers to the identification of all genomes within a particular ecosystem or sample, metabarcoding aims to identify only a subset of them (those that are of interest for a particular question) by sequencing of millions of different amplicons of these barcodes, without a necessity for cloning (i.e., sequences are obtained directly from a mix of different amplicons of different barcodes of interest) [26].
Consequently, in the case of vector-borne pathogens, starting only from the vectors as biological samples, it is possible to target and amplify well-chosen molecular markers (barcodes) of interest with universal primer sets to identify the different actors of transmission cycles (e.g., vertebrate blood sources, midgut microbiome, pathogen diversity, and vector diversity [27]).Other ecological interactions which are not directly involved in the transmission cycles but relevant for the understanding of the vector ecology and the dynamics of the transmission cycles (e.g., plant-feeding sources, sometimes required as a source of energy for routine activities such as flight, mating, and walking or a source of protein for maturation of eggs [28]) can also be identified.A schematic representation of the metabarcoding approach for the identification of ecological interactions of disease vectors is given in Figure 2.After purification of the total DNA (and RNA if working with RNA pathogens) contained in each vector midgut (and salivary glands, depending on the kind of vector) (1), molecular markers (barcodes) of interest are PCR amplified (after RT-PCR if working with RNA pathogens) (2).Then, to identify samples, a tag/index is added to each PCR product (amplicon).The same tag is used for all the amplicons obtained from a single sample (3).After highthroughput sequencing (4), the millions of reads (5) are sorted per sample thanks to the tags added to each amplicon (6).
Currently, the most common systems provide up to 384 different tags and 25 million reads per sequencing run.The depth (i.e., the number of reads or the number of sequences) obtained per molecular marker and sample depends on the number of labeled samples and the number of markers amplified per sample.For instance, if we amplify 10 molecular markers for 100 vector specimens and run at a depth of 25 million sequences, about 250,000 reads per vector specimen and 25,000 reads per marker and specimen will be theoretically obtained.This kind of multiplexing allows to considerably lower sequencing costs per sample.Downstream analyses with bioinformatics tools, such as those provided on the open access Galaxy platform [29], allows to obtain and identify the sequences corresponding to each targeted marker for each vector specimen.This approach is thus extremely powerful to further reconstitute the pathogen transmission cycles and understand its dynamics, since it can reveal, after adequate analyses, all the existing ecological interactions thanks to the simultaneous identification and for each specimen of its species or subspecies, its blood-feeding source(s), the pathogen(s) of interest, the species or lineage(s) of the pathogen(s) of interest, the composition of its midgut microbiome, of its salivary gland microbiome, its plant-feeding source(s), mutations associated with insecticide resistance, etc.

Unraveling T. cruzi transmission cycles in the Yucatan peninsula (Mexico): an example of the metabarcoding approach use
As a proof of concept, we recently performed a pilot study of the metabarcoding approach presented above using Chagas disease in the Yucatan peninsula (Mexico) [27].In this region, T. dimidiata is the main vector, and different genetic subgroups of this species [30][31][32] live in sympatry [33].The different molecular markers we selected for our metabarcoding approach are described below: (i) to classify T. dimidiata in its different genetics subgroups, we used primers targeting the Internal Transcribed Spacer ITS-2 as previously described [34]; (ii) for blood-feeding source identification, we used vertebrate universal primers targeting the 12S rRNA gene [35]; (iii) for T. cruzi, we used primers targeting the mini-exon gene, allowing further classification of the parasite in its different DTUs [36]; and (iv) finally, we used universal primers targeting the bacterial 16S rRNA gene to identify bacterial microbiome composition [37].This way, we aimed to determine if there were detectable interaction patterns between the genetic subgroups of T. dimidiata, their blood-feeding hosts, the infection with T. cruzi, the parasite DTUs, and the microbiome composition, allowing elucidating at finer scales the T. cruzi transmission cycles in the study area.
This study, which was based on 14 T. dimidiata bugs collected in wild as well as in domestic ecotopes, evidences the feasibility and high sensibility of the proposed approach [27].For example, we identified an average number of blood-feeding species per bug of 4.9 ± 0.7 and up to 7 blood-feeding species and 11 blood-feeding individuals in a single bug.Contrastingly, current techniques based on direct sequencing of PCR products can only identify the dominant sequence/host in each sample [38], while the addition of a cloning step prior to sequencing generally  [25].

Schematic representation of the metabarcoding approach for the identification of ecological interactions of disease vectors. Figure adapted from
allows detecting up to three to five host species in some bugs [14,[39][40][41].In the same way, we easily identified different DTUs infecting single bugs, while to date, most studies have relied on conventional Sanger sequencing approaches that are only capable of detecting the dominant genotype in biological samples, which almost precludes the possibility of detecting multiclonality.Based on this observation, NGS approaches capable of inventorying multiclonal infections are now being progressively adopted [42][43][44][45][46]. Regarding midgut microbiome, we were able to detect 23 bacterial orders and observed that its composition differed according to blood-feeding sources (Figure 3).Finally, all the 14 bugs belonged all to the same genetic subgroup.
To further assess potential transmission cycles of T. cruzi parasites by T. dimidiata among the identified blood source species, a feeding and parasite transmission network was constructed (Figure 4).Nodes of the network represent the species identified as blood meal sources, while the size of the corresponding node indicates feeding frequency on each species.Edges link species which are found together in multiple blood meals within individual bugs.Since birds cannot carry T. cruzi parasites, they only play a role as blood sources for triatomines, which is indicated by dotted edge connections between hosts.The solid lines between mammals indicate potential parasite transmission pathways.This network nicely highlights the mammals which would play the main role in T. cruzi transmission to human in the study area.Humans (Homo sapiens in Figure 4) may thus become infected by T. cruzi parasites originating from dogs (Canis lupus), cows (Bos taurus), and mice (Mus musculus), as well as from sylvatic hosts such as porcupines (Coendou spp.), squirrels (Sciurus spp.), and fruit bats (Artibeus spp.).Particularly, dogs appear as key actors which may favor parasite transmission to humans.This kind of networks is very informative, as it allows evidencing the animals that would play the main roles in the transmission of any pathogen to human (complementary studies focused directly on these animals may nevertheless be necessary) and that should be targeted as part of integrated control strategies aimed at disrupting parasite transmission.For example, management of the dogs and other peridomestic animals can be part of EcoHealth/One Health approaches [47].The network presented is the result of a pilot study based on a limited sample and is only used here to illustrate the potential of the proposed metabarcoding approach.Increasing the sample size in a wide variety of ecotopes and integrating vector, microbiome and coinfection data will undoubtedly allow identifying atrisk situations and disentangling transmission cycles.It may also help to identify bacteria which are part of the normal microbiota of triatomine bugs, bacteria  associated with the presence/absence of infection of the bugs with T. cruzi, or bacteria of vital importance to the bugs.This knowledge can have important applications for the development of innovative control strategies [48][49][50].The information provided by the approach can also be used to feed models including the hosts involved in the transmission to help assessing the effects of different host community managements on T. cruzi transmission to human and understand transmission dynamics over time [51,52].Transmission models are becoming increasingly important in vector-borne disease control programs.They allow evaluating different control strategies or combinations of them and assessing their cost-effectiveness and likelihood of success [53].

Gut microbiome composition of Triatoma dimidiata. The average composition of the microbiome from 14 individuals is shown to the level of bacterial order (A). There are significant differences between male and female microbiomes, with females presenting a greater diversity of orders. (B) Microbiome composition is also significantly different depending on the dominant blood meal present in triatomine gut
Consequently, the approach presented here provides very high-value information that can be used in multiple ways for further design and implementation of sustainable, effective, and locally adapted control strategies and deserves to be extended to other eco-epidemiological contexts and to any vector-borne pathogen.To date, metabarcoding approaches for the study of human vector-borne diseases using natural populations of vectors are being progressively adopted, but they are still timidly used [54,55].Moreover, they are still generally focused only on one of the components of transmission cycles, such as blood-feeding hosts [56][57][58][59], plant-feeding hosts [28], microbiome composition [60,61], or vector diversity [62] (Table 1), thus providing limited information, while the approach can be easily more integrative, as we illustrated here, to simultaneously identify the different actors involved in transmission.Feeding and possible parasite transmission network of Triatoma dimidiata.Blood source nodes correspond to domestic (green symbols) and sylvatic (orange symbols) host species, as well as humans (blue), with the size proportional to the feeding frequency on each host.Diamond-shaped nodes represent birds, which do not carry Trypanosoma cruzi parasites, and circles represent mammals, which can be infected by T. cruzi.Edges link species which are found together in multiple blood meals within individual bugs, and the width of the lines is proportional to the frequency of the association between species.Solid dark gray lines link mammalian species, among which T. cruzi may circulate, while dotted light gray lines involve bird species, which only serve as blood sources for the bugs.Humans may thus become infected by T. cruzi parasites originating from dogs, cows, and mice, as well as from sylvatic hosts such as porcupines, squirrels, and fruit bats.Dogs can play a key role as domestic host/reservoir favoring parasite transmission to humans.On the other hand, cats, rats, and pigs play a secondary role in parasite transmission.Figure taken

Conclusions
In this chapter, we presented a metabarcoding approach to study vector-borne pathogen transmission cycles and their dynamics and illustrated the feasibility and high sensitivity of the proposed approach with a recent study performed using Chagas disease in the Yucatan peninsula (Mexico), as a study model.Currently, NGS technologies are quickly becoming more affordable and cost-effective.Moreover, many bioinformatics tools have allowed to greatly simplify analyses in the last years.Consequently, this powerful approach deserves to be generalized to other eco-epidemiological contexts to unravel the transmission cycles of any vectorborne pathogen and their dynamics, which in turn will help the implementation of sustainable, effective, and locally adapted control strategies of their transmission.Table 1.
Metabarcoding approaches for the study of human vector-borne diseases using natural populations of vectors as biological samples.

Figure 2 .
Figure 2. Schematic representation of the metabarcoding approach for the identification of ecological interactions of disease vectors.Figure adapted from[25].

Figure 3 .
Figure 3. Gut microbiome composition of Triatoma dimidiata.The average composition of the microbiome from 14 individuals is shown to the level of bacterial order (A).There are significant differences between male and female microbiomes, with females presenting a greater diversity of orders.(B) Microbiome composition is also significantly different depending on the dominant blood meal present in triatomine gut, which was identified by the analysis of 12 S rRNA vertebrate sequences.Figure taken from [27].
Figure 3. Gut microbiome composition of Triatoma dimidiata.The average composition of the microbiome from 14 individuals is shown to the level of bacterial order (A).There are significant differences between male and female microbiomes, with females presenting a greater diversity of orders.(B) Microbiome composition is also significantly different depending on the dominant blood meal present in triatomine gut, which was identified by the analysis of 12 S rRNA vertebrate sequences.Figure taken from [27].

Figure 4 .
Figure 4.Feeding and possible parasite transmission network of Triatoma dimidiata.Blood source nodes correspond to domestic (green symbols) and sylvatic (orange symbols) host species, as well as humans (blue), with the size proportional to the feeding frequency on each host.Diamond-shaped nodes represent birds, which do not carry Trypanosoma cruzi parasites, and circles represent mammals, which can be infected by T. cruzi.Edges link species which are found together in multiple blood meals within individual bugs, and the width of the lines is proportional to the frequency of the association between species.Solid dark gray lines link mammalian species, among which T. cruzi may circulate, while dotted light gray lines involve bird species, which only serve as blood sources for the bugs.Humans may thus become infected by T. cruzi parasites originating from dogs, cows, and mice, as well as from sylvatic hosts such as porcupines, squirrels, and fruit bats.Dogs can play a key role as domestic host/reservoir favoring parasite transmission to humans.On the other hand, cats, rats, and pigs play a secondary role in parasite transmission.Figure taken from [27].