Evaluation of 2Rj genotyping via a PCR identification method. Samples were karyotyped microscopically prior to being assayed using the PCR protocol of Coulibaly et al.  * NA stands for ‘no amplification’
The African malaria vector, Anopheles gambiae, is characterized by multiple polymorphic chromosomal inversions and has become widely studied as a system for exploring models of ecological speciation. An attempt to develop a molecular diagnostic for the chromosomal forms of A. gambiae s.s. led to the development of a PCR-based diagnostic to differentiate M and S molecular forms based on a marker located on the X chromosome. Near complete reproductive isolation between M and S molecular forms has led to the suggestion that A. gambiae is in early stages of speciation. Comparative genomic studies have been applied to gain an understanding of the evolutionary process resulting in these forms, but models based on these studies currently lack consensus. Furthermore, various studies suggest further subdivisions within each molecular form. These topics are discussed and suggestions for further research needed to elucidate the population structure of A. gambiae are presented.
2. Anopheles gambiae species complex
Among the global vectors of human malaria arguably the most important species belong to the Anopheles gambiae complex, which include the most widespread and potent vectors of malaria in sub-Saharan Africa. The Anopheles gambiae species complex includes eight sibling species: A. gambiae s.s. Giles, A. arabiensis Patton, A. bwambae White, A. melas Theobald, A. merus Dönitz, A. quadriannulatus Theobald, A. amharicus Hunt, Coetzee and Fettene and A. comorensis Brunhes, le Goff and Geoffroy [1-4]. The status of these species was established via the demonstration of F1 hybrid sterility among crosses between populations [4-8], morphological features  and fixed differences in chromosomal inversions [5, 10]. Although the species cannot be reliably distinguished morphologically they do differ in terms of their ecology and geographic distributions (Figure 1). Two species, A. merus and A. melas, are associated with saltwater larval habitats and so are restricted in distribution to brackish water breeding sites along the east and west coasts respectively. A third saltwater species, A. bwambae, is only known to occur in association with hot springs in Semliki Forest National Park in eastern Uganda. The species, A. quadriannulatus and A. amharicus are primarily zoophilic and are not considered to be involved in the transmission of malaria. A. quadriannulatus occurs in southeastern Africa and A. amharicus in Ethiopia [2, 4]. A population on the island of Grande Comore in the Indian Ocean was described as a distinct species, A. comorensis, on the basis of morphological characters . Little is known about the biology of A. comorensis. The two remaining freshwater species, A. gambiae sensu stricto (hereafter referred to as A. gambiae) and A. arabiensis, have the broadest geographic distribution and are the most important vectors of human malaria (Figure 1) [11, 12]. A. gambiae has been the most studied with respect to molecular and population genetics, and its whole genome sequence was published in 2002 .
Natural populations of A. gambiae have an extremely complex genetic structure that has been the subject of a great deal of research, a summary of which will be the focus of this chapter. Populations of A. gambiae are thought to be undergoing speciation and have been the focus of numerous studies aimed at evaluating speciation models [14-16]. Discrete subpopulations of A. gambiae have been defined in two ways: chromosomal form and molecular form. Recently the M molecular form of A. gambiae was elevated to species status and designated Anopheles coluzzii Coetzee et al. . We retain the designation M and S forms to facilitate discussion of the recent literature.
3. Chromosomal forms of Anopheles gambiae
Chromosomal forms. The A. gambiae genome is organized on three chromosomes: two submetacentric autosomes and X/Y sex chromosomes, with males being the heterogametic sex. For descriptive purposes the autosomes are divided into two “arms” at the centromere. The longer arm is referred to as the right arm and the shorter the left arm. A high degree of chromosomal polymorphism, in the form of paracentric inversions, has been described in populations of A. gambiae. In a recent study Pombi et al.  describe 82 rare and 7 common inversions observed in natural populations. Inversions are not randomly distributed among chromosomes, but occur most often on the right arm of chromosome 2 (2R). Cytogenetic analysis is facilitated by the presence of giant polytene chromosomes in the cells of certain tissues. In early studies, the salivary glands of larvae were the source of material, but more recently ovarian nurse cells are used (the latter are easier to obtain and make better preparations for microscopic examination). Polytene chromosomes contain light and dark banding patterns that serve as critical landmarks for the determination of karyotypes (Figure 2). Protocols for the preparation of polytene chromosomes for karyotyping are available on-line at .
There is general agreement that inversions represent coadapted gene complexes that may enable individuals carrying them to occupy different ecological niches. The nonrandom distribution of inversion breakpoints along the chromosomes  and the distribution of inversion frequencies throughout the geographical ranges of the species strongly suggest that at least some of the inversions are maintained by selection that allows different species and, in the case of A. gambiae, populations, to survive and exploit a wide variety of habitats [21-23]. The best example is the strong association of inversions 2La and 2Rb with aridity, with the frequency of these inversions being highest in drier areas and even increasing in frequency during the dry season at single sites that experience distinct wet and dry seasons [21, 23, 24]. Specific inversion configurations are associated with specific habitats, leading to the term “ecophenotype” frequently applied to describe individuals carrying certain combinations of inversions . Chromosomal forms have been defined based on the configuration of five paracentric chromosome inversions on the right arm of chromosome 2 (2Rj, b, c, d and u) and one on the left arm of chromosome 2 (2La). Based on this, five chromosomal forms of A. gambiae have been described and named Mopti, Bamako, Bissau, Forest and Savanna according to the geographic regions from which they were first collected and indicating an association of each with a particular type of habitat, as illustrated in Figure 3 . Chromosomal forms are defined as follows:  the Forest form characterized by the typical non-inverted arrangement 2R+/+, 2L+/+, or by a single inversion polymorphism due to inversion 2Rb, 2Rd or 2La;  Bissau characterized by high frequencies of the 2Rd inversion and standard 2L+ arrangement;  Savanna exhibiting high frequencies of 2Rb and 2La inversions as well as polymorphism involving the 2Rcu arrangements and polymorphism in the j, d and the rare k inversion;  Bamako characterized by the fixed 2Rjcu arrangement and polymorphism in the 2Rb inversion;  Mopti showing high frequencies of 2Rbc, 2Ru and nearly fixed for 2La (Figure 2). The Savanna form has the broadest distribution occurring throughout sub-Saharan Africa, the Mopti form predominates in drier habitats in West Africa, the Forest form occurs in wetter habitats in Africa, the Bamako form occurs in habitats along the Niger River in West Africa and the Bissau form is restricted to West Africa (Figure 3) [26, 27].
It has furthermore been suggested that the chromosomal forms are to some extent reproductively isolated and represent distinct species or incipient species that have evolved or are evolving via a process described as “ecotypic speciation” [15, 25]. Studies of karyotype frequencies at sites where the Bamako, Mopti and Savanna forms occur in sympatry have revealed significant departures from the Hardy-Weinberg equilibrium (H-W) [10, 28-30]. Specifically, heterokaryotypes representing hybrids between the Savanna form and the other two were under-represented and Bamako/Mopti hybrids were never encountered. This observation led to the suggestion that there is partial reproductive isolation between Savanna and the other forms, nearly complete isolation between the Bamako and Mopti forms and that these forms represent incipient species. However, hybridization experiments involving crosses between the Bamako and Mopti forms resulted in viable offspring, demonstrating a lack of post-mating reproductive barriers between them [29, 31]. An estimate of genetic distance (based on allozyme frequencies)  between the Bamako and Mopti forms was reported as 0.015 , a value not higher than that typically found between local populations of a single mosquito species. We found that genotypic frequencies in a population composed of three chromosomal forms in Mali did not depart from Hardy-Weinberg expectations, suggesting that this population represents a single gene pool (Lanzaro, unpublished).
It should be emphasized that although these studies do not support reproductive isolation among chromosomal forms, they do not disprove it. Pre-mating isolating mechanisms may act as a barrier between subpopulations, even if post-mating mechanisms have not evolved, and isolation may be recent, so that not enough time has passed for the accumulation of substantial allozyme divergence between the forms. Lanzaro et al.  conducted a study based on 21 microsatellite loci distributed over the genome, examining genetic differentiation between the Bamako and Mopti forms in the villages of Banambani and Selinkenyi, Mali. This study revealed strong genetic differentiation between A. gambiae and A. arabiensis, used here as an outgroup. Within A. gambiae, different patterns of genetic differentiation, depending on the genomic location of the microsatellite loci, were observed. No genetic differentiation was found on the 3rd and X-chromosome whereas strong linkage disequilibrium and low levels of genetic differentiation were found for loci located on the 2nd chromosome in association with the inversions that occur there . Similar results were obtained in a study also using microsatellites distributed on all three chromosomes for samples collected in the villages of Selinkenyi, Soulouba, and Kokouna, Mali .
Gene flow, like other forces, may be higher in some parts of the genome and lower in others. For example, favorable genes can still be exchanged successfully even when barriers to gene flow are strong. Such genes could be at loci that confer local adaptations and at any linked loci. The significance of this is that gene flow, even if estimated accurately, may still fail to account for variation among different parts of the genome. This effect may be particularly strong for genes contained within inversions, both because of potentially strong selection and because of linkage imposed by the reduced recombination associated with inversions. This effect was explored by Tripet et al.  in a study in which they examined divergence for microsatellite loci contained within the j and b inversions compared with loci outside of inversions. Indeed they did find elevated divergence estimated from loci contained within the inversions relative to those outside. This pattern of divergence, with a strongly non-random distribution over the genome, was later described as a ‘mosaic genome architecture’ in a paper by Wang-Sattler et al. . As we shall see, this concept was later refined based on high resolution genome-wide analysis, ultimately leading to the recognition of ‘islands of speciation’ in the A. gambiae genome.
Using the chromosomal form concept to define genetically discrete populations is problematic because there is substantial overlap in inversions that define them, probably due to some level of contemporary gene flow. This creates ambiguities in assigning individuals to form, diminishing the utility of the chromosomal form concept for defining reproductive boundaries among populations. For example, in a recent survey of populations in Mali, we found that 26% of 2,459 individuals could not be assigned to a chromosomal form and in Cameroon 39% of 632 individuals could likewise not be assigned (Figure 3, data available at PopI ).
The role of chromosome inversions in A. gambiae evolution: Ecotypic Speciation. The chromosomal or ecotypic model of speciation was first described for anopheline mosquitoes by Coluzzi  and is the prevailing model applied to the chromosomal forms of A. gambiae [14, 15]. This model is founded on the observation that certain paracentric inversions that are polymorphic in A. gambiae are non-randomly distributed in nature. These are thought to contain multi-locus genotypes that are adaptive to specific aquatic habitats occupied by the immature stages of the mosquito. Under this model, populations carrying alternate gene arrangements would inhabit different, spatially isolated habitats. Genetic divergence, enhanced by reduced recombination associated with the inversions, would then evolve. Ultimately divergence would include genes resulting in reproductive isolation (reduced fitness in hybrids or behavioral differences preventing between form mating), explaining the observed deficiency of inversion heterozygotes. This model was initially adopted to describe the evolution of chromosomal forms of A. gambiae [15, 21, 28], but now has become the model for explaining the evolution of the molecular forms as described below [16, 39-42].
The most thorough evaluation of the ecotypic speciation model has been its application to the Bamako and Savanna forms in Mali . Central to this evaluation is the observation of niche partitioning with respect to larval habitat. This observation was based on a PCR identification method developed for detecting the 2Rj inversion  among larval samples collected in rock pools vs. more typical larval sites (puddles/ponds) in the village of Banambani, Mali. We evaluated this PCR method on a set of 85 field-collected adults previously scored for the 2Rj genotype cytogenetically. In total, we selected 25 2Rj homozygotes (j/j), 40 2Rj heterozygotes (+j/j) and 20 2Rj standard (+j/+j) from the villages of Banambani, Selinkenyi, Tinko and Seroume, Mali. The 2Rj PCR was accurate in calling 2Rj homozygotes (j/j) (100%) in all villages regardless of the presence of the 2Rc and u inversions (Table 1). However, the PCR was much less accurate for the standard arrangement for 2Rj (+j/+j), resulting in consistent false identification as 2Rj heterozygotes (+j/j) in 11 cases and 2Rj (j/j) homozygotes in 5 cases. Moreover, all true heterozytoes (+j/j) were misidentified as either j/j (N=13) or +j/+j (N=7). The low accuracy rate (=48.2%) of the 2Rj diagnostic PCR casts doubt on this sole example of niche partitioning (rock pool vs. other) in larval habitat distinguishing the two forms.
The 2Rj inversion polymorphism in Mali shows two mating patterns in different parts of the species range in this country. At sites along the Senegal River (e.g. villages of Sebetou, Seroume, Bantinngoungou, and Tinko), 2Rj inversion heterozygotes are commonly found and 2Rj karyotypes are in Hardy-Weinberg expectation (HWE). On the other hand, at sites along the Niger River and its tributaries (e.g. villages of Banambani, Doneguebougou, Senou, Kela, Selinkenyi, Soulouba, Yorobougoula, Kokouna), a severe deficiency of 2Rj heterozygotes are observed and 2Rj genotypes are not in HWE (Figure 4).
In the literature the Bamako form includes three genotypes, jcu/jcu, jcu/jbcu, and jbcu/jbcu, all homozygous for j . Other individuals carrying 2Rj inversion but not c and u inversions such as jbd/jbd, and jb/b, commonly found along the Senegal River, cannot be classified under the current definitions for chromosomal forms. 94% of the 2Rj homozygotes along the Niger River are Bamako forms, while no Bamako forms are found along the Senegal River.
|Run 1||Run 2|
Overall these results weaken the argument that paracentric inversions play a role in the evolution of reproductive isolation via divergent selection (ecotypic speciation), both because they cast doubt on the association of inversions with distinct larval habitats and on evidence for reproductive isolation between individuals that differ with respect to the inversions they carry (e.g. a lack of j inversion heterozygotes). Genome-wide comparisons of individuals with and without inversions have been conducted and these cast doubt on the role of inversions as forming “coadapted gene complexes”. These results are described in detail below.
The role of chromosome inversions in A. gambiae evolution: Comparative Genomics. Central to the “ecotypic speciation” model as applied to A. gambiae is the notion that inversions contain multi-locus genotypes that are adaptive to different environments. These “coadapted gene complexes” arise and are maintained as the consequence of reduced recombination within and around the inversion. Ultimately these become, either directly or indirectly, associated with reproductive isolation. One expectation arising from this phenomenon, assuming that reproductive isolation is incomplete or has evolved recently, is higher levels of genetic divergence in regions of the genome contained within the inversion relative to elsewhere in the genome. Indeed, in a genome-wide scan comparing individuals with and without the 2La inversion, significantly higher divergence was observed in a 3 Mb region of the genome within and proximal to the inversion . However, in a subsequent study that included a comparison of inverted and uninverted genomes for the four common 2R inversions (j, b, c and u), a region of the genome spanning ~26 Mb, divergence was limited to just one small region (~100 kb) in the 2Ru inversion . In both studies the Affymetrix Plasmodium/Anopheles Genome Microarray (P/A array), which contains 142,065 25bp probes, representing roughly 13,000 predicted genes, was used. Lack of divergence associated with the inversions hypothesized to be driving the “ecotypic speciation” process was unexpected. Several explanations were provided including that divergence between the inversion arrangements escaped detection due to shared ancestral polymorphism, extensive recombination within the inversions (gene flux) and limits to the resolution of the microarray they used .
In a more recent study  the genomes of individuals homozygous for the jbcu arrangement (Bamako form) were compared with individuals homozygous for the standard arrangement, +j+b+c+u (Savanna form). In this case all individuals were of the S molecular form (unlike the comparisons made in the White et al.  study, which were a mixture of M and S form individuals). In addition, Lee et al.  utilized an A. gambiae whole genome tiling microarray (WGTM) which provides a far higher resolution of the genome than the P/A array (probe density = 1 probe per 100,000bp for the P/A array; 1 probe per 17bp for the WGTM). As in the White et al.  study, this new study revealed very little divergence associated with the chromosome 2R inversions. However, a 3Mb region of the genome on the X chromosome, proximal to the centromere was observed. This is the same region of the genome that contains the sequence divergence used to define the M and S molecular forms (discussed in detail in the following sections). X chromosome divergence is associated with reproductive isolation observed between both the M and S molecular forms and between the Bamako and Savanna chromosomal forms. These results suggest that the 2R inversions may not be involved in either the evolution or maintenance of reproductive isolation among A. gambiae populations.
4. Molecular forms of Anopheles gambiae
Defining Molecular Forms. An attempt to develop a molecular diagnostic for the chromosomal forms of A. gambiae identified 10 nucleotide residues that differ between the Mopti and the Savanna or Bamako chromosomal forms in a 2.3 kb fragment at the 5’ end of the rDNA IGS region located on the X chromosome . These findings led to the development of a PCR-based diagnostic to differentiate Mopti chromosomal forms from Bamako and Savanna forms based on a single base pair substitution at the 540th nucleotide position in a 28S rDNA amplimer sequence. Mopti form individuals carry a C/C genotype and both Bamako and Savanna individuals a T/T genotype (Genbank accession number AF470112-6) . Individuals carrying C/C are referred to as M molecular form and those carrying the T/T genotype are known as S molecular form. There is good correspondence between the M molecular form and the Mopti chromosomal form in Burkina Faso and Mali, however, the Bamako and Savanna chromosomal forms cannot be distinguished (both are of the S molecular form). The association of M and S molecular forms and chromosomal forms breaks down at other locations in West Africa. For example, in western Senegal and Gambia the association between the Savanna chromosomal form and S molecular form does not hold  and the Forest form contains both M and S individuals. The M and S molecular forms, therefore, largely fail as a diagnostic for chromosomal form. However, the significance of the M and S forms of A. gambiae goes well beyond their utility as proxies for identifying chromosomal forms. The molecular form concept has now largely replaced chromosomal form for defining discrete sub-populations of A. gambiae, that are to some extent reproductively isolated.
M and S forms occur in sympatry at many sites in West and Central Africa, and typically there is a high degree of reproductive isolation between the two forms. M/S hybrids (C/T genotype) produced in the laboratory did yield clearly distinguishable hybrid patterns in females. Surprisingly, however, field collected individuals carrying “hybrid” karyotypes (putative hybrids between different chromosomal forms) did not produce results consistent with their being hybrid, but rather produced either M or S patterns . This observation supports the notion that certain karyotypes, thought to be fixed in one chromosomal form or another, are in fact shared, occurring commonly in one form and rarely in another, due to ancestral polymorphism and/or ongoing gene flow [40, 50]. This diagnostic now forms the basis of recognizing two distinct subpopulations of A. gambiae, known as molecular forms (M and S).
Alternate methods for distinguishing M and S forms. The original PCR-based diagnostic used to distinguish the M and S forms  was further developed into a method using a restriction digestion of PCR amplimers that allowed distinguishing A. gambiae from one of its sibling species A. arabiensis while simultaneously distinguishing M from S . This was useful in the field since A. arabiensis and both the M and S forms are morphologically indistinguishable and commonly occur in sympatry at study sites throughout West and Central Africa. In 2008, a new method for distinguishing the M and S forms was discovered which takes advantage of polymorphism in insertion sites for a group of retrotransposons known as short interspersed elements (SINEs). One of the SINE insertion sites, located on the X chromosome and referred to as SINE X6.1, was found to be fixed in the M form and absent in the S form. In subsequent studies, in which multiple M/S diagnostic methods were employed, some discrepancies in results were observed . These were most common in populations where M/S hybridization is common, for example in Guinea-Bissau.
Relationships between the M and S forms. Understanding the relationship between the two molecular forms has been the focus of an intense and ongoing research effort. The S form has the broadest distribution occurring throughout sub-Saharan Africa, whereas the M form occurs throughout West and parts of Central Africa. With the exception of a single site in northern Zimbabwe , M is absent from eastern Africa (Figure 5) .
Although the M and S forms are largely reproductively isolated in most places where they occur together, this is not true everywhere. Hybridization between forms occurs rarely (~1%) in Mali  and reproductive isolation between M and S appears to be complete in Cameroon . In The Gambia, M/S hybrids were identified from a number of sites at frequencies as high as 16.7% of the A. gambiae individuals sampled  and in Guinea-Bissau hybrids were recovered in over 20% of the individuals assayed [58, 59]. A cryptic subgroup of A. gambiae known as the "Goundry" population collected in Burkina Faso was recently found to be composed of 36% M/S hybrids . The Goundry population discovered in the Sudan Savanna zone of Burkina Faso in larval collections but absent in indoor adult collection of the same locality, suggesting that adult stage of Goundry populations mostly rest outdoors [60, 61]. These results suggest that linkage between the M and S alleles and those genes that directly affect reproductive isolation has broken down in a much broader geographic area than previously suggested. Therefore, the notion of an M form and an S form that are largely reproductively isolated (incipient species) and that hybridization only occurs in the "Far-West" region of Africa  is an oversimplification.
In the laboratory, chromosomal and molecular forms, including the Bamako and Savanna forms, appear to display no post-zygotic isolation [31, 63, 64]. Analysis of sperm recovered from inseminated females  and the composition of mating swarms  support the existence of strong, but not complete, pre-mating reproductive isolation between the M and S molecular forms in nature.
The two molecular forms display phenotypic divergence in different locations within their geographic range . Most notable among these phenotypic differences include differential insecticide resistance , desiccation resistance , larval habitat segregation , and wing morphological differentiation . It has been proposed that the mechanism responsible for promoting divergence is pre-zygotic  and associated with mate selection either during swarm formation [71, 72] or within a swarm . Diabate et al. found evidence of clustering of swarms composed of individuals of a single molecular form within the village of Donéguébougou, Mali . Mixed swarms of M and S forms were found elsewhere (Burkina Faso) but the occurrence of mixed swarms was lower than the frequency expected by chance. Manoukis et al. analyzed the shape of male swarms and suggested that a difference in swarm organization between M and S forms may enhance the behavioral isolation of the two forms .
5. Evolution of the M and S forms
Comparative genomics. Early studies aimed at describing patterns of genetic divergence among chromosomal forms revealed what was termed a “mosaic genome architecture”, with divergence distributed non-randomly over the genome (as described above, ). Comparisons of the M and S forms revealed a similar pattern. Initial work examined the distribution of microsatellite DNA polymorphism showing exceptionally high divergence in a region of the genome proximal to the centromere on the X chromosome, near the rDNA locus used to define the two forms [35, 73]. High levels of M/S form divergence on the X chromosome was substantiated through detailed examination of the centromeric region using DNA sequencing [74, 75].
The first high density genome-wide comparison of M and S was conducted by Turner et al.  using samples collected in Cameroon. They utilized an Affymetrix Plasmodium/Anopheles Genome Microarray which contains 142,065 25bp probes representing roughly 13,000 predicted genes. Divergence between the M and S genomes was very low and restricted to three discrete regions, one on the X chromosome (corresponding with the location identified in the microsatellite studies) and two on chromosome 2, one on 2L and one very small (37kb) region on 2R. In total, these diverged regions cover less than 2.8Mb, roughly 1% of the genome. In a subsequent study, utilizing the same microarray, but with samples collected in Mali, the small 2R region of divergence was not observed, and so this small region was considered not to contribute to reproductive isolation between the two forms . Later a third diverged region was observed on the left arm of chromosome 3L and this region, like the X and chromosome 2L regions, was proximal to the centromere . Taken together these studies revealed that the M and S genomes are diverged over only about 3% of their genomes and that this divergence is organized into 3 small regions located near the centromere on the X, 2L and 3L chromosomes, with the remainder of their genomes essentially undifferentiated. These regions of divergence have been considered to represent islands of speciation because it is thought that they contain genes that are directly involved in reproductive isolation.
Islands of speciation model. The widely held interpretation of this work is that A. gambiae forms represent incipient species, but with enough gene flow to prevent their genomes from diverging in all but a few, relatively small regions [34, 35, 37, 76, 77]. This interpretation is consistent with recent genic models of speciation that predict the existence of small regions of divergence between incipient species in the presence of some degree of gene flow (Figure 6) [78, 79]. The observation that putative “islands of speciation” in A. gambiae are located proximal to centromeres, where levels of recombination are known to be low, is likewise consistent with models that consider speciation to be driven by genes located in regions of the genome with reduced crossing-over [80, 81].
Incidental islands model. White et al.  developed PCR-RFLP assays to detect SNPs that occurred in each of the three islands of speciation and that were diagnostic for the M and S forms. They genotyped a total of 517 individuals including both M and S forms from Mali, Burkina Faso, Cameroon and Kenya. They found complete association among the three unlinked islands in 512 of the 517 individuals genotyped (275 M form and 237 S). Of the five exceptional genotypes, three were heterozygous at all three loci, suggesting these represented F1 hybrids. To account for the nearly complete linkage between the three diverged islands they suggest that gene flow between M and S must be nearly zero. The presence of F1 hybrids suggests that they have such low fitness that they contribute little to gene flow between the forms. As mentioned above F1 hybrids generated in the laboratory show no evidence of intrinsically low fitness, so it is assumed that these are maladapted to conditions in nature. Additional support for very low levels of between form gene flow come from comparisons of M and S based on high-density, genome-wide SNP genotyping  and whole genome sequences  which revealed widespread divergence between the M and S genomes. Collectively these studies propose an alternative model referred to as the “incidental islands” model [82, 83], which states that reproductive isolation between M and S is complete and that the observed islands of divergence may be incidental, meaning that the divergence observed in areas proximal to centromeres do not necessarily represent the location of genes underlying reproductive isolation but the divergence is due to segregating ancestral variation and not due to contemporary gene flow.
In summary, two opposing models exist that describe the relationship between the M and S forms. The “genomic islands of speciation” model suggests that divergence between the M and S genomes is restricted to small regions (~3% of the genome) that may contain the genes responsible for reproductive isolation between forms and that ongoing gene flow is responsible for very low levels of divergence over the remaining 97% of the genome. The second model, the “incidental islands of divergence” model, suggests that divergence between the two forms is far more extensive and widely distributed over the genome, that gene flow between the two forms is nearly zero and that the M and S forms therefore represent distinct species (Figure 6D).
6. Further sub-divisions within molecular forms
Although most discussions consider M and S as the major and biologically relevant subdivisions of A. gambiae there is evidence that the two can be further subdivided into population groups that are significantly diverged.
Subdivision within the S form. In a continent-wide survey Lehmann et al.  found that S form populations fall into two well defined clades, based on analysis of microsatellite DNA. They refer to these clades as the Northwest (Nigeria, Gabon, Democratic Republic of Congo, NW Kenya) and Southeast (SW Kenya, Tanzania, Malawi) divisions. Wang-Sattler et al.  also conducted an analysis based on microsatellite DNA and likewise report that the S form in eastern Africa (Kenya) are distinct from S form populations in the west (Mali). In addition to the East vs. West division between allopatric S form populations is the division of sympatric S form populations in Mali. These are described in detail above (Section 2). In brief, the S form in Mali is divided into the Bamako and Savanna chromosomal forms which display strong asssortative mating where they occur in sympatry at sites along the Niger River (, also see Figure 4). These two populations can be distinguished by the j inversion, which is fixed in the Bamako form and absent in the Savanna. Interestingly, although the two share the X-linked allele that defines them as S molecular form, a detailed analysis revealed that they are strongly diverged at a 3Mb region of the X chromosome, proximal to the centromere .
Subdivision within the M form. A comparison of the M form in Mali and the M form in Cameroon has revealed that the two are very different genetically, in fact, divergence between these two is higher than the level of divergence between the M and S forms . This observation has led to a recognition of two, distinct M form groups, the Mopti-M form, which is polymorphic with respect to the 2R b, c, and u chromosome inversions and the Forest-M form which lacks inversions on chromosome 2L and 2R [23, 84]. In addition to genetic divergence the Forest-M and Mopti-M forms differ in their ecology. The Mopti-M in Mali is most common in the dry northern part of the country whereas Forest-M is absent in the dry northern part of Cameroon and is restricted to the wet southern part of the country . This observation lends support the notion that chromosome inversions are involved in adaptation to arid environments.
The Goundry form. Genetic analysis of A. gambiae larvae from roadside pools in Burkina Faso and adults collected from inside nearby houses revealed the occurrence of a genetically distinct population present in the larval sample, but absent from adult collections . The larval population differed from the adult population with respect to the distribution of microsatellite alleles (FST=0.15), the presence of M/S hybrids (35% in the larval population, <1% in adults) and in the frequency of the 2La inversion (2La = 58% in larval population, 96% in adults). This distinct larval population is called the Goundry form, after one of the village collection sites. Based on these results it is supposed that the Goundry form is a unique form in which the adults rest nearly exclusively outdoors (exophilic) and which, although they carry the X-linked genetic markers that distinguish the M and S forms, the assortative mating associated with these markers is absent. Adults of the Goundry form have never been collected. Adults reared from larvae of the Goundry form were found to have increased susceptibility to infection with P. falciparum in laboratory experiments. 
7. Future directions
Reconciliation of the opposing speciation models and clarification of new “forms” await the resolution of a number of outstanding questions concerning interactions between the M and S forms. It is clear that the determination of the frequency of hybrid individuals requires that individuals be identified using multi-locus genotypes at unlinked loci, such as those employed by White et al. , as opposed to the widely used single locus X-linked markers. This would allow not only the recognition of F1 hybrids but backcross individuals as well. Determination of the frequencies of both F1 and backcross genotypes would provide information on the level of introgression. Moreover, multi-locus approach will allow identification of hybrid males. The application of this method to populations throughout the sympatric range of M and S would allow a description of spatial heterogeneity in levels of introgression that could be related to key environmental parameters that include mating cues that sustain assortative mating within forms as well as conditions that favor the survival of hybrid genotypes.