Genetic and Phenetic Approaches to Anopheles Systematics

The Anopheles genus is probably one of the best studied genera among insects of medical importance. Of more than 500 species currently listed in the world about sixty species are vectors of malaria agents and about thirty species are responsible for most of the transmission [1-3]. The important epidemiological role of anopheline species has motivated many studies of taxonomy and systematics with traditional tools. With the advent of molecular tools, the development of informatics databases and new mathematical concepts on shape, characteriz‐ ing insects has become more and more accurate. Molecular tools based on the nucleotide polymorphism of DNA have allowed the identification of cryptic diversity, confirming and refining previous findings suggesting the existence of many species groups or complexes of sibling species. The development of international genetic sequence database collabora‐ tionshttp://www.insdc.org, http://www.barcodinglife.com/ allowed the use of reference sequences for species identification. Although not well developed, the same need for infor‐ matic databases arose for traditional taxonomy.


Introduction
The Anopheles genus is probably one of the best studied genera among insects of medical importance. Of more than 500 species currently listed in the world about sixty species are vectors of malaria agents and about thirty species are responsible for most of the transmission [1][2][3]. The important epidemiological role of anopheline species has motivated many studies of taxonomy and systematics with traditional tools. With the advent of molecular tools, the development of informatics databases and new mathematical concepts on shape, characterizing insects has become more and more accurate. Molecular tools based on the nucleotide polymorphism of DNA have allowed the identification of cryptic diversity, confirming and refining previous findings suggesting the existence of many species groups or complexes of sibling species. The development of international genetic sequence database collaborationshttp://www.insdc.org, http://www.barcodinglife.com/ allowed the use of reference sequences for species identification. Although not well developed, the same need for informatic databases arose for traditional taxonomy.
Despite the general acknowledge that traditional taxonomy is important, the decline in taxonomy and skills basis for identifying and describing biodiversity is a striking reality. Retiring taxonomists are leaving orphan reference collections -most often not digitalized -and associated catalogues or literature. Taxonomy being not considered as "big science", few students are entering the field. This has particularly negative effects when dealing with arthropod pests, nuisances or vector species because the corollary is that taxonomic expertise is lost and would drastically be missing if the sanitary situation requires it. Meanwhile the development of molecular identification tools, recent mathematical developments motivated by the need for quantifying morphological characters [4], a new field has progressively

Molecular identification of Anopheline species
Before the development and use of molecular assays for the identification of individual specimens, cytogenetics technique was widely used for Anopheline species. This method has proved to be extremely informative, not only for species identification but also in the analysis of population structure and determining the existence of sibling species. However, the required expertise for cytogenetics has limited its large scale application. Allozymes have also widely been used but the need for individuals to be stored in liquid nitrogen constrained the collection. Since the 1990s, the development of techniques for DNA amplification primarily by Polymerase Chain Reaction (PCR) in research laboratories together with the analysis of DNA polymorphism has taken precedence over all other techniques of identification to the species level. The huge expansion of molecular identification assays is related to their sensitivity, reliability and speed to generate high number of identifications. Moreover, these assays can be applied to all stages of development, sex, and on whole specimen or parts (e.g. legs). The first complex for which biologists have designed and validate species specific probes is for the An. gambiae Complex 1 , because of its obvious epidemiological importance in the Afrotropical region. Over time, a host of techniques have been developed with as common a species-specific amplification for determining an individual's membership in a taxon (Table 1). We do not intend to make an exhaustive presentation of all the molecular identification assays developed to date for Anopheles species and complexes, but rather to provide guidance on those most employed, with their advantages and disadvantages, as well as detail and review the relative merits of three different tools for species identification.

RFLP-PCR assays
RFLP-PCR (Restriction Fragment Length Polymorphism) assay is based on the amplification of a known locus of the genome and its subsequent digestion by a restriction enzyme ( Fig. 1A).  Just open this same document with Adobe Reader. If you do not have it, you can download it here. You can freely access the chapter at the Web Viewer here.
is characterized by a digestion profile with bands of different sizes. The need for two steps (amplification and digestion) is time-consuming (digestion can take between 1-3 hours) and expensive. However, an identification assay based on this method is particularly appropriate in the case of entomological survey where anopheline fauna of a region is not known. Indeed, such assay is a priori non-selective and all species encountered give a digestion profile. Examples of RFLP-PCR assays include work on the M and S molecular forms of An. gambiae, An. funestus Group, An. punctulatus Group, An. minimus Complex, An. oswaldoi Group and Arribalzagia Series (Table 1).

SSCP-PCR assays
PCR (SSCP PCR) is based on the nucleotide mutations in PCR products [59]. The SSCP-PCR (Single Strand Conformation Polymorphism) requires after the PCR amplification a step of heat denaturation of the PCR products, which are cooled very quickly to generate the formation of secondary structures of single stranded DNA. These formations migrate differentially based on their size and conformation, linked to polymorphism of the targeted region. Migration profile is species-specific and thus allows species identification. However, this method is also time-consuming (in particular with electrophoresis of several hours), and can pose problems of reproducibility. It requires special equipment and the use of polyacrylamide gel, more expensive than agarose gels. This kind of assay is not recommended for the identification of a large number of specimens. Some examples of SSCP-PCR tests include work on the An. funestus Group, including Asian species of the An. minimus and An. aconitus Subgroups (Table 1).

AS-PCR or PASA assays
The generalization of partial or complete sequencing of many genomes allowed the development of identification assays based on a single step easier to implement and above all faster. These assays have been named allele-specific (AS-PCR) or PCR amplification of specific alleles (PASA) (Fig. 1B). This kind of assay is very specific and robust. It allows to quickly screen a large number of specimens, it is the most common technique currently being developed ( Table  1). The basis of these assays is the identification of target amplification of a region of size known and specific to the different taxa studied. This assay therefore requires prior development of primers specific to each taxa and appropriate evaluation of the intraspecific variation of the targeted DNA region. Most recently developed identification tests are AS-PCR based focusing on the ITS2 differences [10, 15, 18, 25, 27-29, 31, 34, 38-40, 53, 58, 60, 61]; older assays targeted the IGS region (An. gambiae Complex) [5,6,12,13].
Usually assays are developed to identify several species in a single PCR. When the primers are combined in a single amplification reaction, it is called "multiplex PCR". When developing a molecular identification assay, primers must first be checked for specificity. Moreover, an internal positive control is highly recommended; outcomes must be "amplification" rather than "no-amplification". Indeed, non-amplifications are indistinguishable from a technical problem such as false negative. The choice of the locus of hybridization primers can be done either from a systematic sequencing of the regions of interest in the species studied, or from a random screening of regions not localized on the genome. In the first case, a prior sequencing of DNA regions studied is necessary. The choice of primers is then made on the basis of nucleotide differences observed between taxa on the target area in order to obtain fragments of specific sizes of each species (more than 25 bp difference). Thus, the identification is based on the length polymorphism amplified DNA fragments. In the second case, the selection of specific primers is made from screening random non-localized regions of the genome. Screening can highlight size of the amplified fragments specific taxa, and in this case be used for identification. Once bands of specific species are being recognized, they are cloned and sequenced. The fragment generated is called SCAR (Sequence Characterized Amplified Region). Of these nucleotide sequences are defined pairs of primers specific for the species to be identified. The combination of different primers may vary: 1) two pairs of primers for two different amplifications [62], 2) a pair of universal external primers and internal specific primers [63], 3) an universal primer and several species specific primers [10,13,15,31], or 4) several amplifications with species specific pair of primers [37] (Fig. 2).

Anopheles mosquitoes -New insights into malaria vectors 88
Please use Adobe Acrobat Reader to read this book chapter for free.
Just open this same document with Adobe Reader. If you do not have it, you can download it here. You can freely access the chapter at the Web Viewer here.

Molecular identification using quantitative assays
The methods described above are qualitative and determine to which species a given individual belongs. However, despite their usefulness for species identification, they are not suitable for quantifying samples with large numbers of mixed species. Moreover, disadvantages of the conventional PCR approaches include the requirement for post-PCR processing (gel electrophoresis of PCR products) and manual scoring of test samples which can be prone to error due to the similar amplicon sizes generated by certain species. For Anopheles species, different highthroughput methods based on real-time PCR have been described. These recent assays are based on TaqMan single nucleotide polymorphism (SNP) genotyping and are "closed tube" approaches that require only a single step to characterize a mosquito DNA sample. Unlike conventional AS-PCR, these assays do not require processing of samples by agarose gel electrophoresis, which is time consuming, restricts throughput and requires the use of the safety hazard ethidium bromide ( Table 1, An. funestus Group and An. gambiae Complex). One should expect the development of such assays in the future.

Species identification and barcode database
The initiative to barcode living forms was set out by [64] and since then the debate on "DNA taxonomy" has not ended with serious concerns about empirical approaches associated with DNA barcode data and their potential to impede rather than enhance the practice of taxonomy and the dissemination of reliable taxonomic information [65]. DNA barcoding is a new technique that uses the variations in short, standardized gene regions (Folmer region of the Cytochrome oxidase I, COI) can be used to identify known species and to discover new ones. This is possible because the variation within each species is low relative to the differences among species. Since its development in 2003, the application of this technology has grown from straightforward taxonomic identification to such fields as biodiversity monitoring and ecosystem reconstruction, with new uses emerging in public health, agriculture, economics and trade, and law enforcement. If a specimen is damaged or fragmented, at an immature stage of development, or part of an undiscovered cryptic species, even specialists may be unable to make identification. Barcoding solves these problems because non-specialists can obtain barcodes from tiny amounts of tissue, in many cases even when it has been digested. The principle relies on specimen identification using a partial sequence for COI. Investigators will identify specimen by first extracting its DNA, then amplifying and sequencing COI before comparing the sequence from the query with COI sequences for all known species. The use of DNA sequences in Diptera predates the formal proposal of DNA barcoding. Particularly extensive is the use of DNA sequences for Anopheles genus DNA barcoding aims at providing a new identification tool for unidentified specimens or cryptic diversity (see also http://www.barcodinglife.com/ index.php/Taxbrowser_Taxonpage?taxid=7809). DNA barcoding is now pursued today by the Consortium for the Barcode of Life (CBOL) (see also http://www.barcodeoflife.org/). To maximize adherence of barcoding projects to the global barcoding landscape, guideline for DNA extraction, amplification and sequencing (for high through put studies especially) have been released on the CBOL website. Moreover, the consortium created a reserved keyword namely BARCODE when new sequences submissions into International Nucleotide Sequence Database meet the standards established by the consortium.

Modern morphometrics applied to mosquitoes with emphasis on the Anopheles genus
In modern morphometrics, size and shape are derived from a configuration of landmarks collected on a non-articulated part, often a single organ. Mosquito species diagnostic using geometric morphometrics generally makes use of the wings because these structures are almost bidimensional and relatively rigid, reducing digitizing error. The most common technique is the landmark-based approach. A few anatomical landmarks available on a wing (or any measurable part of the body) are submitted to specialized analyses to provide size and shape information, with the further possibility to visualize shape changes. A few landmarks do not completely describe the wing, nor do they describe the complete body. However, provided there is anatomical correspondence among individual landmarks, only a partial capture of shape is needed to allow valid comparisons among populations and species. There are also other technical approaches, in particular for those cases where landmarks are not conspicuous. The reader should refer to the following references for detailed information on morphometrics such as mathematical approaches and statistical procedures [4,[66][67][68][69][70][71][72][73][74][75].

Why morphometrics?
Various arguments, not only related to cost/effectiveness, should convince most laboratories to apply modern morphometrics. The method is inexpensive. Modern morphometric techniques (at least 2D techniques) do not require more equipment than the one already present in any laboratory of entomology: optical devices (binocular microscope), computers and internet connection. They do not require from entomologists any new practice other than the usual dissecting and mounting, thus new personal is not necessary. The method is fast. While the dissection and sample preparation step might be time consuming, something which itself depends on the group of insects or the organ under study, the morphometric analysis is fast. Various hundreds of specimens can be measured (digitized) in one week, and the analytical steps can be performed in a few days or less. In spite of being fast, the method cannot pretend to quickly identify thousands of specimens. This could be improved with the progress of some specialized software aiming at the automatic digitization of mosquito wings [76]. Although some entomological knowledge is required, there is no need to be an expert in the insect group under study, a skill which is disappearing anyway since a few decades as stated above [77,78]. The required skill in morphometrics is the same whatever the taxonomic group under study: it is mainly the ability to use specialized software. Morphometric study is a nontraumatic approach, in the sense that it does not impede the application on the same specimens of most other characterizing techniques, including molecular techniques. Actually, the technique could be applied in complement to almost any other kind of study. There are indeed many circumstances in which morphologically distinct species cannot be identified anymore because diagnostic characters were destroyed by the technique of capture or lost in the transport from field to laboratory. Some diagnostic morphologic characters are just a few scales on a given place of the body, and these precious scales are not visible any more on damaged specimens. As an example, the Asian anopheline species An. dirus and An. cracens, or the

Anopheles mosquitoes -New insights into malaria vectors 90
Please use Adobe Acrobat Reader to read this book chapter for free.
Just open this same document with Adobe Reader. If you do not have it, you can download it here. You can freely access the chapter at the Web Viewer here.
In our experience, obstacles to adopt the strategy to use modern morphometrics in complement of other identification techniques -or as a main approach -are relatively easy to overcome. Modern morphometrics relies on sophisticated mathematical developments. They only require an intuitive understanding to allow a biological interpretation of the data. In the same way as molecular biologists have learned to use different specialized software, morphometricians have to assimilate the use of one or more dedicated software.
Unexpectedly, the picture step may be a problematic one. It is often the only financial investment needed in some laboratories to start applying modern morphometrics. No need however for a sophisticated optico-informatic device to capture the images. Current digital cameras applied to the binocular provide enough resolution and simpler use, even a simple scanner can provide reliable pictures [83]. The resolution, or size, of the picture, must be identical for each image. It should be as high as possible, but there is no rigid rule: the picture has to be taken with the idea to see the anatomical landmarks of interest. An important point is to keep an accurate information of size: size scale should be associated with the pictures. Unless a clear scale could be associated with each picture (Fig. 3), optical zooms should be avoided. And finally, there is no need for a complex imaging software: specialized and free software exists which only need the picture file as input.

C.
A.  Figure presented in [84]. Landmarks 8, 9, 10, 11 and 12 are homologous landmarks. The remaining landmarks (from 1 to 7) are defined by the transition between black and white scales. Courtesy of Nicolas Jaramillo (University of Antioquia, Medellin, Colombia). C. The centroid size is computed from the distances (in pixels) between the centroid of the configuration (black square) and each one of the landmarks. The coordinates (x,y) of the centroid position are the arithmetic average of all the x and y coordinates. Modified from [84].

Shape and size in modern morphometrics
Numerical data of shape are x,y coordinates of anatomical landmarks. Depending on the kind of landmarks, homologous landmarks or pseudo-landmarks, shape is the relative position of anatomical points (Fig. 3) or it is a sequence of points describing the contour of an organ (Fig.  4). Accordingly, different statistics apply, involving the Procrustes superimposition to the consensus configuration [72], or the elliptic Fourier analysis [85], respectively. In both approaches, shape changes can be visualized (Fig. 4). Size estimator in modern morphometrics is a single variable which is separate from the set of shape variables. It is thus possible to test for statistical relationship between size and shape (allometry). The landmark-based approach provides a global estimator of size using the totality of wing landmarks, which is called "centroid size" (Fig. 3). It provides information about size changes in as many directions as from the centroid to each landmarks. The centroid size of the wing is highly correlated to the traditional length and width of the wing [86], but not well correlated to smaller inter-landmark distances of the wing [66]. The size of an outline can be estimated in various ways, as for instance the perimeter of the outline or, better, the square root of its area.
In spite of providing many Type I landmarks, the mosquito wing is not easy to digitize because of the presence of scales on the veins. Scales can hide the area where two veins are crossing, so that the user has to guess the likely anatomical point of interest. One strategy is to make an estimation of the digitizing error and consider that with good scores the results can be submitted for publication. The digitizing error can be reduced by using the mean value of repeated measurements [87]; this can be performed also by taking the mean of left and right wings [88]. Phase contrast microscope can improve the relative transparency of the scales Anopheles mosquitoes -New insights into malaria vectors 92 Please use Adobe Acrobat Reader to read this book chapter for free.
Just open this same document with Adobe Reader. If you do not have it, you can download it here. You can freely access the chapter at the Web Viewer here.
helping to localize the junction of two veins [89,90]. Scales can be tentatively removed before digitizing the wings. Different techniques are used, from mechanical (Fig. 5) to chemical treatment [87,[91][92][93]. There is maybe a fourth response to that problem, which is to consider that the scale color could define landmarks. Indeed, especially in some groups of the Anopheles genus, scales have marked different colors at specific locations, producing a black and white pattern having an upmost importance as taxonomic characters. As long as the transition between black and white scales could be considered as the junction of different tissues, these landmarks could be assimilated to Type I landmarks. Calle et al [84] obtained remarquable results making use of these scale-defined landmarks together with the more classical landmarks (see landmarks 1 to 7 of Fig. 3B).

Distinguishing groups: size or shape?
Taxonomists know of many species being consistently larger or smaller than others, giving size character an undisputed importance for species recognition. Moreover, the size of the wing acquired a renewed importance because of its likely association with wing beat frequencies mediating assortative matings [94,95]: Stanford et al [96] found an agreement between size differences between incipient species of An. gambiae and their known level of assortative mating. In species recognition or distinction, a good discrimination between groups means not only to reveal statistically significant differences, but also to allow little overlapping between them, and this is generally best achieved through the comparison of shapes (instead of sizes). In addition to be more discriminant, shape is generally a more stable feature than size with regards to environmental variation. For these reasons, less overlapping and more stability, interspecific differences revealed by shape are generally of more taxonomic utility than size differences between species. As long as shape variation is not the passive consequence of size variation, i.e. an allometric effect of size differences, shape should be the main source of taxonomic information. However, the observed shape differences between groups after Procrustes analysis are not exchangeable to other groups [73], making it difficult to export the results. Even if not independent, size and shape can also be combined to improve species delimitation [97] Genetic and Phenetic Approaches to Anopheles

The need for morphometrics database
For taxonomic use, it is not only necessary to adopt powerful tools exploring morphological similarities, it is also important to share the results. Whenever a piece of DNA is distinguishing two taxa, it can be published, stored as a sequence in the Gene Bank and shared with other biologists or taxonomists. When a morphological, qualitative character is discovered allowing to distinguish two taxa, it can be published and shared with other people. Unfortunately, shape variables are sample dependent and cannot be shared in the same way as genetic or morphologic characters [73]. As shape variables are derived from raw coordinates of landmarks, the temptation would be to use the raw coordinates as reference data. However, when the objective is to distinguish very similar or cryptic species, the measurement error (ME) may represent a significant obstacle. ME is always higher between two users than between two measurements from the same user, so that in any circumstance a one-user data set is the most reliable set of data [73]. Two solutions are presently developed to adapt modern morphometrics to a more acceptable taxonomic use: (i) to share machine-computed coordinates [76], or (ii) to share images instead of coordinates. The latter initiative is already running for bees (http:// apiclass.mnhn.fr). It is in development for mosquitoes as a bank of reference images at http:// mom-clic.com/clic-bank under the name CLIC (Collection of Landmarks for Identification and Characterization). The need for such a database is underestimated because, as it can be deduced from the low number of works on Anopheles, the power of morphometrics to identify taxa is itself probably underestimated. The chances of successful identification would then depend on the relevance of reference images, on their level of shape divergence and on the classification techniques.
Apparently, mosquito wings show very similar venation patterns among different species and higher taxa, including different tribes. However, Dujardin [66] showed that Anopheles sp. could be distinguished from other genera of mosquitoes, based on their venation pattern using 13 landmarks. Regarding species complex, some attempts were made to separate the species of the An. dirus Complex (former An. dirus species A from Thailand, An. dirus species B from Malaysia) using traditional morphometric techniques applied to pupae and larvae [100]. However it may become impossible to identify slightly damaged specimens. In spite of similar size, the separation based on the wing venation pattern was satisfactory in both sexes, even when using rough mounting of wings on scotch tape. Latter study used old laboratory strains, so that an additional effect of morphological divergence could have enhanced the results. Similar studies are required on field specimens. Vincente et al [87] studied the intraspecific variation of An. atroparvus in various countries of Europe at 21 landmarks, adding one Portuguese population of An. maculipennis. Authors showed an overlapping on the first principal component of shape between allopatric An. maculipennis and An. atroparvus. The objective of the study was not to examine the discrimi-

Anopheles mosquitoes -New insights into malaria vectors 94
Please use Adobe Acrobat Reader to read this book chapter for free.
Just open this same document with Adobe Reader. If you do not have it, you can download it here. You can freely access the chapter at the Web Viewer here.
nating power of landmark-based morphometrics to separate both species, a task for which the discriminant analysis was more indicated. Notably, size could show drastic differences among populations, interfering with interspecific shape variation.
Five members of the Nyssorhynchus Subgenus were compared for wing and leg dimensions with promising results [98]. A few years later, Calle et al [84] used the landmark-based approach to compare 11 members of the same subgenus, some of them were cryptic species living in sympatry. A notable specificity of this study was the combined use of standard wing landmarks with some landmarks at the transition between black and white scales. The technique was able to correctly assign 97% of individuals to their respective species in the Argyritarsis Section (An. braziliensis, An. darlingi and An. marajoara) and 86% of individuals in the Albimanus Section (An. albimanus, An. aquasalis, An. benarrochi, An. nuneztovari, An. oswaldoi, An. rangeli, An. strodei, and An. triannulatus). These results are noticeable since some of these species are cryptic species, or species with overlapping variability of diagnostic characters, a few of them living in sympatry. In the Argyritarsis Section, shape-based reclassification scores were very high (97% for An. darlingi and An. braziliensis, and 100% for An. marajoara). An. braziliensis and An. marajoara differ by the presence or absence of tuft of scales in the abdominal segment II as well as by the color of scales of the abdominal segment VIII, with some other characters presenting overlapping variation. As for many morphologically close species of mosquitoes, the identification can be made very difficult on damaged specimens. The three species were collected from different geographic areas, which could also explain their significant size differences. In the Albimanus Section, An. triannulatus and An. rangeli did not show any overlapping in the morphospace described by shape, but they also strongly differed by size, with An. triannulatus being the smallest species and An. rangeli the largest one. To distinguish An. rangeli from An. nuneztovari may be much more difficult and need the examination of immature stages. The wing venation pattern could recognize 84% of An. rangeli and 90% of An. nuneztovari. High reclassification scores were also obtained when comparing An. aquasalis and An. nuneztovari. An. aquasalis is an important vector in Venezuela, while not in Colombia, and, without a very detailed morphological examination, it could be morphologically confounded with the Venezuelian vector An. nuneztovari. The wing venation pattern could distinguish these two species with scores as high as 90% [84]. In some parts of its distribution in Brazil, An. (Kerteszia) cruzii are sympatric with secondary vectors like An. homunculus and An. bellator. Identification of these species based on female specimens is often jeopardised by polymorphisms, overlapping morphological characteristics and damage caused to specimens during collection. Pairwise cross-validated reclassification showed that geometric morphometrics could distinguish between the three species with a reliability rate varying from 78 to 88% [105].

Conclusions
The taxonomy of the Anopheles greatly benefits from the powerful information provided by DNA sequences. The identification and detection of Anopheles species, especially cryptic and sibling species, are readily achieved using molecular identification assays. The DNA sequences are an invaluable source of phylogenetic information, which is not to say that for species recognition, DNA sequences should be the only alternative to traditional morphological approaches.
We presented here the interest to take into account the modern morphometric alternative for its ability to separate morphologically indistinguishable species, as well as for its unbeatable speed and low cost. Despite promising outcomes, the recent morphometric techniques were not often applied to distinguish anopheline species, and other possibilities, for instance the ones making use of artificial intelligence, were even not considered.
As long as a phenetic approach provides satisfactory scores of species classification, and when the objective is to identify species, its combination with molecular methods could help reducing costs. An integrative approach would not only be less expensive, it would preserve the interest of biologists for the morphological interaction with environmental changes and speciation events.