A Study of Schistosomiasis Prevalence and Risk of Snail Presence Spatial Distributions Using Geo-Statistical Tools

Schistosomiasis mansoni is an endemic disease, typical of developing countries. In Brazil, schistosomiasis is caused by the etiological agent Schistosoma mansoni, whose intermediate hosts are species of mollusc of the Biomphalaria genus. It is a fact, accepted by almost all the researchers, that S. mansoni was introduced in Brazil by the African slavery trade during the sixteenth century (Almeida Machado, 1982). It was in the northeast of Brazil that the sugar cane found fertile and favourable soil, especially in the coastal plains with their hot and humid climate, where today the states of Pernambuco and Bahia are located. The scarce manpower, obtained from the native Indian, did not meet demand and it was more profitable to import slave labour from Africa. From the mid-sixteenth century (1551-1575) until mid-nineteenth century (1851-1860) about four million slaves arrived in Brazil. This migration started in the main African regions of the west, the east, southwest and Mozambique. Although many regions of Africa supplied slave labour to Brazil, the majority originated from the Congo and Angola. The Portuguese colonization of Angola in the early sixteenth century, enabled the migration of more than two thirds of Africans, from the ports of Luanda, Benguela and Cabinda. During the sixteenth and early seventeenth centuries, there was a large influx of Africans from the ports of the Bay of Benin (region of Ghana/Nigeria). The Brazilian port of Salvador and Recife received most of the slaves (Klein, 2002), originated from endemic regions of both S. mansoni and S. haematobium infections. However, the absence of an intermediate host for S. haematobium in Brazil was a limiting factor which minimized the later problem in the country (Camargo, 1980). The wetlands used for planting sugar cane almost always follow the rivers’ banks and streams and the presence of molluscs of the Biomphalaria genus, susceptible to S. mansoni, provided the ideal environmental conditions for the schistosomiasis introduction into the country (Camargo, 1980). The endemic area has remained unchanged for several years, probably due to the shortage at that time of roads and transportation, hampering the population movement (Camargo, 1980).


Introduction
Schistosomiasis mansoni is an endemic disease, typical of developing countries.In Brazil, schistosomiasis is caused by the etiological agent Schistosoma mansoni, whose intermediate hosts are species of mollusc of the Biomphalaria genus.It is a fact, accepted by almost all the researchers, that S. mansoni was introduced in Brazil by the African slavery trade during the sixteenth century (Almeida Machado, 1982).It was in the northeast of Brazil that the sugar cane found fertile and favourable soil, especially in the coastal plains with their hot and humid climate, where today the states of Pernambuco and Bahia are located.The scarce manpower, obtained from the native Indian, did not meet demand and it was more profitable to import slave labour from Africa.From the mid-sixteenth century (1551-1575) until mid-nineteenth century (1851-1860) about four million slaves arrived in Brazil.This migration started in the main African regions of the west, the east, southwest and Mozambique.Although many regions of Africa supplied slave labour to Brazil, the majority originated from the Congo and Angola.The Portuguese colonization of Angola in the early sixteenth century, enabled the migration of more than two thirds of Africans, from the ports of Luanda, Benguela and Cabinda.During the sixteenth and early seventeenth centuries, there was a large influx of Africans from the ports of the Bay of Benin (region of Ghana/Nigeria).The Brazilian port of Salvador and Recife received most of the slaves (Klein, 2002), originated from endemic regions of both S. mansoni and S. haematobium infections.However, the absence of an intermediate host for S. haematobium in Brazil was a limiting factor which minimized the later problem in the country (Camargo, 1980).The wetlands used for planting sugar cane almost always follow the rivers' banks and streams and the presence of molluscs of the Biomphalaria genus, susceptible to S. mansoni, provided the ideal environmental conditions for the schistosomiasis introduction into the country (Camargo, 1980).The endemic area has remained unchanged for several years, probably due to the shortage at that time of roads and transportation, hampering the population movement (Camargo, 1980).
With the entrance of other countries into the sugar trade, sugar production in the northeast of Brazil declined in the early eighteenth century, leading to a decline in demand for slave labour.At that time the gold and diamond rush initiated in the state of Minas Gerais began and thus, with the urgent need of workers for the mines, the first great migratory flow to the gold and diamonds mines brought the slave labour from the northeast to Minas Gerais.It is estimated that one fifth of the population at that time moved to Minas Gerais (Prado Junior, 1986), using the "ways of São Francisco" (Rey, 1956) as the main access route.It is probable that schistosomiasis also came along with these early migrants.Nowadays, Minas Gerais has 853 municipalities and schistosomiasis is present in 518 of them.According to official data, approximately 12 million people are at risk of disease (SES, 2006).

Schistosomiasis prevalence
Pirajá Silva made the first report about the S. mansoni presence in Brazil in 1908.This researcher noted the presence of parasite eggs in faeces of a patient treated at the Medical School of Bahia (Silva, 1908).Teixeira (1919) observed the first cases in the city of Belo Horizonte, Minas Gerais.On that occasion the eggs of S. mansoni were found in the faeces of 49 patients (0.5%) from 9,995 people considered "of all ages and conditions".Among those infected, 36 (73.5%) were children under the age of 15.Martins (1938) worked in the cities of Montes Claros, Salinas, Jequitinhonha, Espinosa, Brejo das Almas, Rio Pardo, Fortaleza and Tremedal.Using the sedimentation method, 348 people of different ages were examined, with 100 (28.7%) having a positive stool examination for S. mansoni.Versiani et al. (1945) examined 2,352 schoolchildren of both sexes, aged 7 to 15 in the city of Belo Horizonte.The screening method was used for sedimentation.294 (12.5%) excreted eggs of S. mansoni in their stools.Pellon and Teixeira (1950) published the most comprehensive Helminthological School Survey ever conducted in Brazil.440,786 schoolchildren were examined, covering eleven states: Maranhão, Piauí, Ceará, Rio Grande do Norte, Paraíba, Pernambuco, Alagoas, Sergipe, Bahia, Espírito Santo and Minas Gerais.The screening method used was qualitative (Lutz , 1917).The survey revealed that 44,478 people had schistosomiasis mansoni, resulting in a prevalence of 10.09%.In the state of Minas Gerais, in thirteen physiographic zones, 7,991 people were diagnosed with schistosomiasis, yielding a prevalence of 4.92%.Pellon and Teixeira (1953) presented a new survey conducted in five other states (Goiás, Mato Grosso, Paraná, Rio de Janeiro and Santa Catarina), supposedly non-endemic areas for schistosomiasis.The Lutz method diagnosed 145 (0.08%) cases of schistosomiasis, after examining 174,206 people.Other surveys are described in Pellegrino et al. (1975) and Katz et al. (1978).The Ministry of Health in the northeast of Brazil implemented the Special Programme for Schistosomiasis Control (PECE) in Brazil in 1976, covering the states of Alagoas, Ceará, Paraíba, Pernambuco, Rio Grande do Norte and Sergipe.Due to the diversity of methodologies used in the diagnosis, the results became difficult to evaluate, however, it is admitted that PECE has been effective in reducing the disease morbidity (Almeida Machado, 1982;Lima e Costa et al., 1996).In 1980 PECE became a routine programme named the Schistosomiasis Control Programme (PCE) and was extended to the states of Bahia and Minas Gerais (Massara, 2005).The PCE started in 1983 with management and execution by the Superintendent of Public Health Campaigns (SUCAM) of the Ministry of Health in five municipalities in the north and northeast.Since 1990 these responsibilities were transferred to the National Health Foundation (FUNASA) and in 1993 the Municipal Health Agents started implementation of activities of the PCE (Lima e Costa et al., 1996).The evaluation and monitoring of activities of the PCE, done manually until 1996, is now performed by computer with the creation of Localities System (SISLOC) and the Schistosomiasis Control Programme System (SISPCE) (Guimarães, 2010).Between 1988 and2004, the PCE identified low prevalence (less than 5.0%) in 24.5% of the municipalities studied and average prevalence (between 5% and 15%) in 35.0%.High prevalence (greater than 15.0%) was found in 40.5% of the studied municipalities (SES, 2006).Carvalho et al. (1987Carvalho et al. ( , 1988Carvalho et al. ( , 1989Carvalho et al. ( , 1994Carvalho et al. ( , 1997Carvalho et al. ( , 1998a)), Katz & Carvalho (1983), Campos & Briques (1988), Kloss et al. (2004), Gazzinelli et al. (2006), Enk et al. (2008Enk et al. ( , 2009)), Massara et al. (2008) andTibiriça (2008) (Paraense & Deslandes, 1962), B. amazonica (Paraense, 1966), B. oligoza (Paraense, 1974), B. occidentalis (Paraense, 1981), B. cousini (Paraense, 1966) and B. tenagophila guaibensis (Paraense, 1984).Among these, only the first three are naturally found infected with S. mansoni.Another three species, B. amazonica, B. peregrina and B. cousini, were experimentally infected, being considered as potential hosts of this trematode (Corrêa & Paraense, 1971;Paraense & Correa, 1973;Caldeira et al., 2010).Seven species have been reported as having a presence in Minas Gerais state.Based on work by Souza et al. (2001) and supplemented with information obtained from the Laboratory of Helminthology and Medical Malacology, René Rachou Research Center, Oswaldo Cruz Foundation (LHMM-CPqRR/Fiocruz) and the Regional Health Management of Juiz de Fora (GRS-JF), 484 of the 853 municipalities of the state were surveyed and molluscs were found in 345 of them with the following distribution: B. glabrata (216 municipalities), B. straminea (160), B. tenagophila (86), B. peregrina (80), B. schrammi (28), B. intermedia (21) and B. occidentalis (05).The three host species have been found in 35 counties simultaneously.Observations on the biology and understanding of the molluscs' population structure of the Biomphalaria genus are important, especially for studying the epidemiology and prevention of schistosomiasis (Kawazoe, 1975).The presence of the Planorbidae family of molluscs is acknowledged since the Jurassic period, reported in the United States and Europe and occupying large tracts of land between latitudes 70° N and 40° S. The altitude does not influence the survival of the molluscs, since they are observed from sea level to 3,000 metres of altitude in the Rocky Mountains, or Lake Titicaca, 4,280 metres (Baker, 1945).The study of the habitat of these molluscs, as well as their behaviour in relation to the climate, results in valuable information when the goal is disease transmission control.They are commonly found in small water collections, both natural (streams, creeks, ponds, swamps) and artificial (irrigation ditches, small dams), wind speed less than 30 cm/s and a vegetation necessary for their nourishment and protection of eggs under aquatic foliage.In most habitats the presence of microflora and organic matter, little turbidity, good sunshine, pH between 6 and 8, NaCl content below 3 by 1000 and average temperature between 20 and 25 degrees C is observed.However, molluscs can tolerate wide variations in physical, chemical and biological characteristics of their environments and some specimens may migrate slowly against the current occupying other breeding sites upstream of the original colonies (WHO, 1957;Paraense, 1972,;Rey, 2001).Biomphalaria molluscs developed a wide repertoire of survival mechanisms.In the rainy season, mainly due to flooding, the mollusc population decreases.The repopulation and breeding occurs mostly in the late dry season, when the number of areas with standing water (lentic) increases.In dry regions, drying reduces the number of individuals in each rainy season and the population is re-established from the few survivors.The desiccation resistance is a physiological adaptation of Biomphalaria molluscs who enter a state of dormancy, reducing the need for and water loss.Another adaptation of molluscs is the acceleration of development during the rainy season to ensure the production of new individuals and thus the colony can withstand the next dry season.(Paraense, 1955(Paraense, , 1972;;Grisolia & Freitas, 1985;Juberg et al., 1987).The mollusc's reproductive mechanism has a fundamental role in the species' perpetuation.Because they are hermaphrodites, there is both self and cross-fertilization.The egg is wrapped in elastic capsules, gelatinous, tough and transparent.The average number of eggs per egg capsule is 20 and can reach a hundred.Under favourable conditions, molluscs opt for cross-fertilization, however, under unfavourable conditions, a few individuals can use the mechanism of self-fertilization, initiating a new population (founder effect).The molluscs are highly prolific; a single individual is capable of generating at the end of three months, nearly 10 million descendants (Barbosa, 1970;Paraense, 1972;Thomas, 1995).Excluding fortuitous factors, survival of molluscs will not exceed one year.Its persistence in an outbreak stems from the multiplication rate, which depends on several ecological factors influencing fertility, laying and egg viability (Paraense, 1972;Baptista & Juberg, 1993).Of the three host species of S. mansoni, B. glabrata is the most important, due to its extensive geographic distribution, high rates of infection and efficiency in eliminating cercariae and consequently the agent spreading.Moreover, its distribution in Brazil is almost always associated with schistosomiasis occurrence (Lutz, 1917).The species was found in the municipality of Esteio located in the metropolitan region of Porto Alegre, RS (Carvalho et al., 1998b).Biomphalaria tenagophila has epidemiological importance in the south of Brazil, being responsible for outbreaks in São Paulo.In the state of Minas Gerais it is responsible for maintaining the focus on Itajubá city (Katz & Carvalho, 1983).B. straminea has the widest distribution among the three species, being found in almost all river basins.However it has greater epidemiological importance in the northeast of Brazil, where, in some areas, it is solely responsible for the foci maintenance with high disease prevalence.This species has been epidemiologically linked to the occurrence of cases of schistosomiasis in Paracatu in Minas Gerais (Carvalho et al., 1988).Working in non-endemic regions in Minas Gerais (Carvalho et al., 1994(Carvalho et al., , 1997(Carvalho et al., , 1998a)), located in the west of the state, two districts were found with B. glabrata (Araxá and Sacramento), 20 with B. straminea (Bonfinópolis de Minas, Cachoeira Dourada, Cascalho Rico, Centralina, Conceição das Alagoas, Douradoquara, Grupiara, Ipiaçu, Ituiutaba, João Pinheiro, Lagamar, Lagoa Grande, Monte Alegre de Minas, Paracatu, Sacramento, Santa Vitória, Uberaba, Uberlândia, Unaí and Vazante) and 4 with B. tenagophila (Agua Comprida, Patos de Minas, Uberaba and Uberlândia).The schistosomiasis distribution in Minas Gerais is not regular.The disease is endemic in the northern (including parts of Médio São Francisco and Itacambira), eastern and central areas (Alto Jequitinhonha, Metalúrgica, Oeste e Alto São Francisco).The highest infection rates are found in the northeast and east of the state which encompasses the areas of Mucuri, Rio Doce and Zona da Mata (Pellon & Teixeira, 1950;Katz et al., 1978;Carvalho et al., 1987;Lambertucci et al., 1987).In endemic areas, large concentrations of these molluscs, together with other risk factors, favour the existence of cities with a high prevalence of schistosomiasis.

Schistosomiasis and geoprocessing
Schistosomiasis is a parasitosis determined in space and time by environmental and behavioural factors of residents in endemic areas.Its distribution in the state of Minas Gerais is not regular, since areas of high prevalence are close to non-endemic regions.Thus, despite advances in knowledge in the field of schistosomiasis, the disease remains a major public health problem in the country, requiring larger investments in preventive measures such as sanitation and health education, as well as in studies that allow for disease control through geoprocessing methodologies (Amaral et al., 2006;Guimarães et al., 2009).Another aspect of this problem is the increasing occurrence of repeated little outbreaks of acute schistosomiasis related to rural tourism, especially in Minas Gerais.This phenomenon involves the middle and upper class sections of the population who have first time contact with the disease during leisure activities practicing rural tourism in nearby endemic areas (Enk et al., 2003;Massara et al., 2008;Enk et al., 2010).Under these circumstances geoprocessing can be applied to characterize, to better understand the interconnection of these factors and to provide a more complete picture of disease transmission.Computational resources, such as Geographic Information System (GIS), allow for complex analysis of a large amount of information and to display the results of this analysis in graphical maps.Data generated by the GIS have an important role in the study of schistosomiasis, especially in relation to the interaction of disease with environmental conditions (Guimarães et al., 2006).The use of GIS for the study of schistosomiasis in Brazil was also done in several states: Bahia (Bavia et al., 1999(Bavia et al., , 2001)); Minas Gerais (Brooker et al., 2006;Fonseca et al., 2007a;Fonseca, 2009;Freitas et al., 2006;Gazzinelli et al., 2006;Guimarães et al., 2006Guimarães et al., , 2008Guimarães et al., , 2009Guimarães et al., , 2010aGuimarães et al., , 2010b;;Guimarães, 2010;Martins, 2008;Martins-Bedê et al., 2009, 2010;Carvalho et al., 2010;Tibiriça et al., 2011); Pernambuco (Barbosa et al., 2004;Araújo et al., 2007;Galvão et al., 2010).This work has two main objectives: the first is to estimate the probability of occurrence of each mollusc species (B. glabrata, B. tenagophila, B. straminea) in the state of Minas Gerais, Brazil, and to determine which of these species is more related to the prevalence of the disease; the second is to estimate the prevalence of schistosomiasis for the entire state of Minas Gerais, using the estimated probability of snail existence, besides socio-economic and environmental variables.

Materials and methods
This study was carried out in the Minas Gerais state, Brazil.Minas Gerais is 586,520.368km 2 in size and is located in the west of the southeastern region of Brazil, which also contains the states of São Paulo, Rio de Janeiro and Espírito Santo.It borders with Bahia and Goiás (north), Mato Grosso do Sul (far west), the states of São Paulo and Rio de Janeiro (south) and the state of Espírito Santo (east).It also shares a short boundary with the Brazilian Federal District.It is situated between 14°13'58" and 22°54'00" S latitude and between 39°51'32" and 51°02'35" W longitude.Minas Gerais is one of the 26 states of Brazil, of which it is the second most populous, the third richest and the fourth largest in area.The landscape of the state is marked by mountains, valleys and large areas of fertile lands presenting altitudes between 198.88 to 1573.18 metres.According to the IBGE (2010), there were 19,597,330 people residing in the state.The last census revealed the following numbers: urbanization (85.3%), population growth between 2000-2010 (1.1%), men (49.2%), women (50.8%).Ethnic groups found in Minas Gerais include: Caucasian (45.39%),Multiracial (44.28%),African (9.22%), Asian (0.95%) and Amerindian people (0.16%) (IBGE 2010).The population was divided into: 0-4 (6.52%), 5-14 (15.91%), 15-19 (8.77%), 20-39 (32.91%), 40-59 (24.1%) and above 60 (11.79%) years of age (IBGE, 2010).The Human Development Index (HDI) of Minas Gerais was 0.800, with variations between 0.568 (Angelândia) and 0.841 (Poços de Caldas) (SNIU, 2000).On average, 66.36% of households have access to the main water supply, of which the worst municipality has 3.91% and the best has 99.26%.Also, on average about 49.2% of households have a bathroom or toilet and general sewage network, being that the worst and the best municipality have 0% and 97.5%, respectively (SNIU, 2000).Minas Gerais is composed of 853 municipalities.Of these, 523 have active transmission of schistosomiasis and a population of 10,870,063 living in endemic areas (Drummond et al., 2010).In 440 municipalities, the Schistosomiasis Control Programme has been implemented.It includes health education activities that emphasize the importance of schistosomiasis, stool examinations, treatment of positive cases and a sanitation programme has been implanted in some areas with the help of local communities interested in the control of the disease and willing to work as a team (Drummond et al., 2006).The Figure 1 shows the localization of all municipalities of Minas Gerais and the endemic region.

Materials
The following data were used to achieve the objectives:

Schistosomiasis prevalence data
Schistosomiasis prevalence values (Pv) were obtained from the Brazilian Schistosomiasis Control Programme (PCE) through the annual reports of the Secretary of Public Health Surveillance (SVS) and the Secretary of Health in the State of Minas Gerais (SESMG).The PCE in Minas Gerais began in 1986 and since 2000 it has been under the coordination of the SESMG in collaboration with Municipal Health Systems.The aim of the PCE is to prevent the occurrence of the hepatosplenic form and to prevent the transmission in focus areas (SESMG, 2006).The Kato Katz technique is the methodology used to determine prevalence, examining one slide per person.Among the 853 municipalities of Minas Gerais state, only 255 municipalities presented information on disease prevalence.The disease is considered non-endemic in the west, northwest and south of the state, and endemic in the northern, eastern and central areas.The largest infection rates are found in the northeastern and eastern areas of the state (Pellon & Teixeira, 1950;Katz et al., 1978;Carvalho et al., 1987).The spatial distribution of the schistosomiasis prevalence is presented in Figure 2a.

Intermediate hosts' data
Data on the distribution of Biomphalaria molluscs were provided by the Laboratory of Helminthiasis and Medical Malacology of the René Rachou Research Center (CPqRR/Fiocruz-MG).Molluscs were collected in breeding places from different municipalities in Minas Gerais at different periods, using scoops and tweezers, and then packed to be transported to the laboratory (Souza & Lima, 1990).Specific identification was performed according to the morphology of the shells, reproductive system and renal ridge of the molluscs (Deslandes, 1951;Paraense & Deslandes, 1955a, 1955b, 1959;Paraense, 1975Paraense, , 1981)), and more recently by low stringency polymerase chain reaction and restriction fragment length polymorphism (Vidigal et al., 2000).Among the 853 municipalities of Minas Gerais state, 194 municipalities did not present any of the three species  The Linear Spectral Mixture Model (LSMM) is an image processing algorithm that generates the fraction images with the proportion of each component (vegetation, soil and shade) inside the pixel, which is estimated by minimizing the sum of square of the errors.In this work the so called vegetation, soil and shade fraction images were generated using the MODIS data and the estimated values for the spectral reflectance components were also used as an input to the model (Guimarães et al., 2010b).Some indices related to the water presence and quantification as defined in Fonseca et al. (2007b) were also used in the work.The median of the water accumulation (WA) was used to measure the amount of water that may exist in the municipality.Based on the declivity and water accumulation data, the mobility of water (MW) in the same seasons was calculated.The meteorological variables consisted of total precipitation (PC) and the minimum (TN) and maximum (TX) temperature average for summer and winter seasons, which were obtained from the Center for Weather Forecast and Climate Studies (CPTEC), in the same date of MODIS images.With the attempt of characterizing the disease using aspects related to the climate, the day temperature difference (dT) variable was developed in the present work: in the summer and winter seasons.That variable was used and proposed by the authors Malone et al. (1994) and Bavia et al. (2001), and is associated with the difference among the maximum and minimum temperature, in the winter and summer seasons.

Socioeconomic data
Three socioeconomic variables, obtained by the Foundation João Pinheiro (FJP) in 2004, were used in the work: the index of need in health (INH), the economical index (IE) and the Factor of Allocation of Financial Resources for Attention to Health (FA).Eighteen socioeconomic variables supplied by the Brazilian National System of Urban Indicators (SNIU) were also used, and they included data of human development index (HDI), of longevity (HDIL), income (HDII) and education (HDIE) for the years 1991 and 2000.Three variables with information of water quality from 2000 were also included referring to the percentage of domiciles with access to the general net of water supply (WaterNet), with access to the water through wells or nascents (WaterWellNasc) and with other access forms to water (WaterOther).Eight more variables from 2000 were included regarding the sanitary conditions of the municipalities being studied, which are: the percentage of domiciles with a bathroom connected to rivers or lakes (SanRiverLake), connected to a ditch (SanDitch), to rudimentary sewage (SanSewageR), to septic sewage (SanSewageS), to a general net (SanNet), to other sewerage type (SanOther), with a bathroom or sanitarium (WithSan) and without a bathroom or sanitarium (WithoutSan).

Methods
To achieve the first objective, two approaches have been considered: indicator kriging and logistic regression.The logistic regressions were based on the information of the snail species (B.glabrata, B. tenagophila and B. straminea), as well as environmental variables; indicator kriging methodology uses only the snail distribution information.The second objective was achieved by applying multiple regressions methodology.

Indicator kriging
Geostatistical methods, such as indicator kriging, may be defined as a technique of statistical inference, which allows the estimation of values and the uncertainties associated with the attribute during the spatialization of a sample property (Felgueiras et al., 1999).It is a nonlinear estimator, which is applied on a sample set of the attribute whose values are modified according to a nonlinear transformation.According to Felgueiras (1999), the indicator kriging is considered non-parametric because it does not use any kind of distribution of a priori probability for a random variable.Instead, it enables the construction of a discretized approximation of the cumulative distribution function of the random variable.The mollusc attributes (class of species and localization) were distributed along the drainage network of 15 river basins, according to the methodology used by Guimarães et al. (2009).Variogram models were fitted for each class (Biomphalaria species), in each basin, through exploratory analysis, using the geostatistical procedures.These procedures involved the creation of experimental semivariograms and fitting them to mathematical theoretical models.After model fitting, indicator kriging procedures were applied to obtain an approximation of the conditional distribution function of the random variables.Based on the estimated function, maps of mollusc spatial distributions along with the corresponding uncertainties for the entire basin were built.

Logistic regression
The logistic regression was applied in this work with the purpose of predicting the existence probability of Biomphalaria species in Minas Gerais state, Brazil, and after to diagnose which of the three species (B. glabrata, B. tenagophila and B. straminea) have a greater influence on the risk of schistosomiasis in Minas Gerais state.The logistic regression is used to predict the dependent variable and this variable should be qualitative.The dependent variable possesses two answers (0 or 1).In this work, the dependent variables are B. glabrata (BG), B. tenagophila (BT) and B. straminea (BS) species.So using logistic regression, the BG species variable receives the value '0' for the municipalities where they do not exist and the value '1' where they do exist.The BT species variable receives the value '0' for the municipalities where they do not exist and the value '1' where they do exist.Finally, the BS species variable receives the value '0' for the municipalities where they do not exist and the value '1' where they do exist.Therefore, this work had as dependent variables the B. glabrata, B. tenagophila and B. straminea species and the socioeconomic and environmental variables mentioned above were considered as explanatory variables.The variable selection for the logistic regression models was performed in steps: 1. data collection and variables preparation: transformations were tested in the explanatory variable, such as quadratic, inverse, logarithm and square root, with the purpose of normalizing the explanatory variables, 2. reduction of the number of explanatory variables: all variables whose correlations with the dependent variable were not significant at 5% level, and explanatory variables with correlation higher than 0.8 with another explanatory variable were discarded, 3. production of logistic regression models: all possible regressions were generated with the remaining variables, 4. selection of the best model: goodness-of-fit and ROC curve were analyzed.

Multiple regression
The prevalence data has been generated on a municipality level, therefore all the input variables were integrated inside the municipalities' boundaries, using GIS systems and exported to a standard spreadsheet for the statistical analysis and modelling.A logarithmic transformation was made to the dependent variable (prevalence, denoted by Pv) to increase the correlation with independent variables.Multicollinearity effects among the independent variables were detected.The variables selection technique was used in order to choose a set of variables that better explain the dependent variable.It was done by the R 2 criterion, using all possible regression procedures (Neter et al., 1996).This selection technique consists of the identification of a subset with few variables and a coefficient of determination R 2 sufficiently closed to that when all variables are used in the model.Interaction effects were also included in the model.The dependent variable was randomly divided into two sets: one with 123 cases for variables selection and model definition, and another with 132 cases for model validation.
The multiple linear regressions were employed based on two approaches: the global model, where a linear regression model was established to estimate the disease throughout the state (Guimarães et al., 2006(Guimarães et al., , 2008)), and the regional model (Guimarães et al., 2010b) where a regression model was generated to estimate the disease in each region determined by the SKATER algorithm (Assunção et al., 2006).The purpose of regionalization is to divide Minas Gerais state into four regions where selected variables are considered uniform in those regions.Thereby, it is possible to obtain better models to estimate the schistosomiasis prevalence.The regionalization consisted of two steps.In the first step, homogeneous and contiguous regions were determined from the Minas Gerais state, using the following variables: calcareous areas, percentage of water in the city, Biomphalaria species, index of need for health, digital elevation model, vegetation (obtained by the mixture model), accumulated rainfall and average temperature.The second step consisted of fitting different linear regression models for each region.The obtained models were then used to build the risk map for all municipalities of the Minas Gerais state.

Results and discussion
All the variables previously mentioned were generated and transferred to a database using the software TerraView/TerraLib (http://www.dpi.inpe.br/geoschisto/).

Indicator kriging
Indicator kriging was used to make inferences about the presence of the species of Biomphalaria.As described in section 2.2.1, eight classes are the results of the methodology.The result was a map of the species' distribution and a map of the uncertainties associated with the classification.Figure 3a shows the associated classes to the Biomphalaria species, with a maximum level of uncertainty of 0.78.The map of uncertainties (Figure 3b) showed that the higher uncertainties have been concentrated along class transition areas.As a consequence, regions where several classes may occur, more transitions have been found and, therefore, higher uncertainties.

Logistic regression
Unlike indicator kriging, logistic regression estimates the occurrence probability for each species separately.Variables that represent vegetation, temperature, precipitation and topography, were found in the logistic regression models, giving an indication of important conditions for the mollusc's development.The first model generated by logistic regression, for B. glabrata species, has five variables and is presented in Equations 1 and 2.
where Prob BG is the estimated probability of existence of the B. glabrata species in Minas Gerais state, b are estimated parameters, X are the explanatory variables and G is given by Equation 2.
exp(1.837 -0.003( ) + 11.173( ) -0.121( ) + 0.015( ) -0.390( )) where DEM is the digital elevation model, NDVI W is the winter normalized vegetation index, PC W is the mean winter precipitation and TN W is the winter minimum temperature.
It is observed that the DEM variable possesses an inverse relationship with the presence of the B. glabrata species; in other words, the lower the digital elevation model, the larger is the presence of B. glabrata.The PC W and TN W (winter minimum temperature) variables also presented an inverse relationship with the presence of the B. glabrata species.It is noticed that the larger the vegetation index (NDVI) and precipitation in summer, the larger is the presence of the B. glabrata species.This can be explained by the natural conditions of the habitat of the mollusc that lives close to vegetation and water, in search of protection against solar radiation and high temperatures.
The second model generated by logistic regression, for B. tenagophila species, has four variables and it is presented in Equations 3 and where Prob BT is the probability estimated of the existence of the B. tenagophila species in Minas Gerais state, b are parameters estimated, X are the explanatory variables and T is presented in Equation 4.
According to Equations 3 and 4, an indirect relationship in all the variables can be noticed.This coincides with the ideal conditions for the snail's development, as the topography of low elevated defined by DEM, the water concentration or low topographical variation that can be associated to the Shad S variable, the humidity of the plants defined by the MIR W and the maximum temperature average for summer seasons associated to the TX S variable.
Equations 5 and 6 emphasize that environmental and socioeconomic aspects contribute to the presence of the B. straminea species.It is observed that the smaller the SanRiverLake, DEM and PC W variables, the larger is the presence of B. straminea, while the WithoutSan and PC S variables possess a direct relationship.The models selected, for each one of the snail species, were analyzed to verify if they presented a good adjustment, through Receiver Operating Characteristic (ROC) or ROC curve.ROC curve is a measure that confronts mistakes with successes and successes with mistakes.The larger the area below the curve, the better is the model adjustment.
The result of ROC curve of the models selected for each one of the snail species is presented in is observed that the model of the B. glabrata species presented the best adjustment, with 98% of area below the ROC curve.The B. tenagophila and B. straminea species obtained 73% and 82%, respectively.Based on the models selected, the probabilities of presence of Biomphalaria species were estimated for the whole Minas Gerais state.Figure 5 presents the probability of the presence of B. glabrata (Figure 5a), B. tenagophila (Figure 5c) and B. straminea (Figure 5e) species using logistic regression.Additionally, Figure 5 shows the estimated presence of B. glabrata (Figure 5b), B. tenagophila (Figure 5d) and B. straminea (Figure 5f) species using indicator kriging which is obtained from section 3.1 results by merging the probabilities of the joint classes.
The estimative of the presence of each species was correlated with the historical data of schistosomiasis prevalence (Pv), with the intention of determining which of the Biomphalaria species is more related with the disease prevalence (Table 1).The values in highlights (*) are the estimates of the species that possess significant correlation on a 95% confidence level (p<0.05).Based on Table 1, B. glabrata is the species of Biomphalaria that has more correlation with the disease prevalence.This result is confirmed by the study of Lutz (1917), which affirms the distribution of B. glabrata is more associated to the occurrence of the schistosomiasis in Brazil.

Multiple regression
The purpose of this section is to use the Biomphalaria presence probabilities, as estimated by logistic regressions, as explanatory variables together with the environmental variables.The analysis of the correlation matrix showed that some variables had non-significative correlations with the prevalence of schistosomiasis (Pv) at a 95% confidence level and some variables were highly correlated among themselves, indicating that the model could be further simplified.Variable selection was performed by the R 2 criterion using all possible regressions (Neter et al., 1996).

Global model
The four variables selected were: probability estimated of  Figure 6a shows the estimated Pv for all municipalities in the Minas Gerais state using the estimated regression equation ( 7). Figure 6b presents the plot of the residuals, resulting from the difference between observed (Figure 2a) and estimated Pv from 255 municipalities.In Figure 6b, dark colours (red and blue) represent overestimated values, light colours (red and blue) underestimated ones, and in white are the municipalities where the estimated prevalence differs very little from the true values.

Regional model
The state of Minas Gerais was divided into four regions using the Skater algorithm.The result of the regionalization is shown in Figure 7a.   Figure 7b shows the estimated values of Pv for the whole state of Minas Gerais using equations (8, 9, 10 and 11).In addition, Figure 7c shows the residuals from 255 municipalities.In this figure, gray represents "no prevalence", dark colours represent overestimated values, light colours represent the underestimated values and in the white municipalities with good estimates.The regional model for Region 1 (R 1 ) reflects the effect of sanitation (housing with bathroom or toilet connected to a ditch), topography (DEC), vegetation (NDVI S ) and the influence of molluscs (B.tenagophila).Region 1 achieved a R 2 value of 0.91.The result obtained by Guimarães (2010) suggested it would be interesting to do a detailed study in the Sul Meso region, which is part of Region 1, to determine the prevalence and the transmitter of schistosomiasis.
The model for Region 2 (R 2 ) shows the effect of vegetation (NDVI w ) and temperature.The same relationship between temperature and vegetation was also obtained by Bavia et al. (2001) in Bahia state and Guimarães et al. (2006Guimarães et al. ( , 2010b) ) in Minas Gerais state.The R 2 found for this model was 0.45.The models for Regions 3 and 4 (R 3 and R 4 ) show that Pv was associated with topography (DEM), weather (temperature) and socioeconomic effects.Among the regional models, Region 3 had the lowest R 2 (0.35).Region 4 had R 2 of 0.51.The variables obtained in Regions 3 and 4 were the same variables found by Guimarães et al. (2010b) also in the same regions.In all models (global and regional) the presence of Biomphalaria, socioeconomic effects, vegetation index and temperature were the most important variables.These characteristics are the same as environmental conditions for the presence and development of molluscs (infection of the intermediate host) and sanitation (water contamination -presence of S. mansoni cercariae) obtained by Guimarães et al. (2010b).Bavia et al. (2001), in Bahia, showed that the distribution of schistosomiasis is related to the vegetation index and temperature using data from NOAA/AVHRR.

Conclusion and further work
The use of GIS, remote sensing, geostatistical and statistical techniques together proved to be quite suitable for the study of schistosomiasis.This study explored the relationship between the schistosomiasis prevalence and the existing snail species in Minas Gerais state, Brazil (B. glabrata, B. tenagophila, B.straminea).The results showed that the snail species that has more correlation with the disease prevalence is B. glabrata.
The generated results showed that logistic regression or indicator kriging are consistent tools and the obtained map can be used as an auxiliary tool to formulate proper public health strategies and to guide fieldwork, considering the places with higher occurrence probability for occurrence of the most important species.Although not shown here, it is interesting to notice that among the models generated for regression logistic for the three Biomphalaria species, there was coincidence in the selection of variables.Variables that represent the vegetation, temperature, precipitation and topography were presented in the models, giving an indication of the important conditions for the mollusc's development.Most of the selected explanatory variables in the global and regional models were related to environmental conditions for the presence and development of the mollusc, even as to the transmission of schistosomiasis.
The results of the regression models show that regionalization improves the estimation of the schistosomiasis prevalence in Minas Gerais state.Based on this model, a schistosomiasis risk map was built for Minas Gerais using an interpolator.Martins-Bedê et al. (2009) and Guimarães et al. (2010b) also obtained a better model with the use of regionalization.
To conclude, this study can contribute significantly to the choice of actions by the health decision makers, allowing them, on the one hand to narrow the set of municipalities in the state of Minas Gerais for which treatment and sanitation should be a priority, and on the other hand, to focus on preventive measures in the municipalities where transmission can occur.
The use of GPS is recommended for field surveys to obtain the coordinates of the foci of Biomphalaria.Thus the results can be compared to the logistic regression and the accuracy of predictive models can also be improved.
The methodology used in this study can be utilized to control and to take preventive measures to prevent schistosomiasis transmission in the areas with occurrence of the disease.

Fig. 1 .
Fig. 1.Spatial localization of Minas Gerais in Brazil and the endemic region in highlight.
(B. glabrata, B. tenagophila and B. straminea), 216 municipalities presented information of the B. glabrata, 86 of B. tenagophila, 160 of B. straminea, 60 municipalities found B. glabrata and B. tenagophila, 101 reported the presence of B. glabrata and B. straminea, 44 had B. tenagophila and B. straminea, and 35 municipalities presented information of the three Biomphalaria species.The spatial distribution of the Biomphalaria species data are presented in Figure 2b.

Fig. 2 .
Fig. 2. (a) Distribution of the schistosomiasis prevalence and (b) distribution of the Biomphalaria species in Minas Gerais state, Brazil.2.1.4Environmental data Environmental data were derived from Moderate Resolution Imaging Spectroradiometer (MODIS) and from the Shuttle Radar Topography Mission (SRTM) sensor.Nine variables of MODIS sensor were used, collected in two seasons, summer (from 17/Jan/2002 to 01/Feb/2002 period) and winter (from 28/Jul/2002 to 12/Aug/2002 period) and two in February of 2000 from the SRTM.MODIS product comprises the Blue, Red, Near Infrared (NIR) and the Middle Infrared (MIR) bands, the Normalized Difference Vegetation Index (NDVI) and the Enhanced Vegetation Index (EVI) and the derived indexes of the mixture model: vegetation (Veg), soil (Soil) and shadow (Shad).The variables from SRTM sensor are: the digital elevation model (DEM) and the declivity (DEC), derived of DEM.The Linear Spectral Mixture Model (LSMM) is an image processing algorithm that generates the fraction images with the proportion of each component (vegetation, soil and shade) inside the pixel, which is estimated by minimizing the sum of square of the errors.In this work the so called vegetation, soil and shade fraction images were generated using the MODIS data and the estimated values for the spectral reflectance components were also used as an input to the model(Guimarães et al., 2010b).Some indices related to the water presence and quantification as defined inFonseca et al. (2007b) were also used in the work.The median of the water accumulation (WA) was used to measure the amount of water that may exist in the municipality.Based on the declivity and water accumulation data, the mobility of water (MW) in the same seasons was calculated.The meteorological variables consisted of total precipitation (PC) and the minimum (TN) and maximum (TX) temperature average for summer and winter seasons, which were obtained from the Center for Weather Forecast and Climate Studies (CPTEC), in the same date of MODIS images.With the attempt of characterizing the disease using aspects related to the climate, the day temperature difference (dT) variable was developed in the present work: in the summer and winter seasons.That variable was used and proposed by the authorsMalone et al. (1994) andBavia et al. (2001), and is associated with the difference among the maximum and minimum temperature, in the winter and summer seasons.

Fig. 3 .
Fig. 3. (a) Estimated Biomphalaria species distribution with a maximum level of uncertainties of ≤ 0.78; (b) Uncertainties associated with the classification.
Fig. 4. ROC curve of the models selected for each one of the Biomphalaria species: (a) ROC curve of the B. glabrata species, (b) ROC curve of the B. tenagophila species, (c) ROC curve of the B. straminea species.

Fig. 6 .
Fig. 6.(a) Global model -prevalence (%) 0.001-5.000(green), 5.001-15.000(yellow) and above 15.001(red); (b) Residuals models.The precipitation, minimum temperature and estimate of B. glabrata were positively correlated with Pv.The human development index was negatively correlated with Pv.This is consistent with the adequate environmental conditions for the transmission of schistosomiasis.The transmission depends on the presence of B. glabrata, places with temperature above 15 o C, hydric collections and low economic status.The result of this model has the same variables (precipitation, temperature and HDI) obtained byGuimarães et al. (2006).The difference is that the variable obtained byGuimarães et al. (2006) related to the type of vegetation (forest, savannah and caatinga) and in this study to the presence of the B. glabrata.However, this variable (presence of the B. glabrata) had already been obtained byGuimarães et al. (2010b) using indicator kriging.
T (probability estimated of B. tenagophila using logistic regression), DEC (slope from SRTM), NDVI S (summer normalized difference vegetation index), C (percentage of housing with bathroom or toilet connected to a ditch), NDVI W (winter normalized difference vegetation index), TN S (summer minimum temperature), TN W (winter minimum temperature), TX W (winter maximum temperature), DEM (digital elevation model from SRTM), TX S (summer maximum temperature), NHI (need of health index), HDIE 00 (HDI education for the year of 2000).
also published results of schistomiasis prevalence studies in Minas Gerais state.Finally, Drummond et al. (2010) reported that the PCE examined 2,643,564 stool samples in the period of 2003-2007 and obtained 141,284 positives for S. mansoni in Minas Gerais.

Table 1 .
Correlation of the estimative of Biomphalaria species with the historical data of schistosomiasis prevalence.
Regression models were developed for each of the four regions with the same 255 variables used in the global model and the same selection procedure (123 cases for variables selection and model definition, and another with 132 cases for model validation).Different numbers of variables were selected in each region to determine the best regression model.