Use of geoprocessing techniques in the study of schistosomiasis.
Schistosomiasis, caused by
In Brazil, there are eleven species and one subspecies of
In Minas Gerais state, the presence of seven species:
The snails of the
The intermediate hosts’ distribution of the parasite in Minas Gerais associated with favorable eco-epidemiological conditions gives the schistosomiasis expansive character not seen even in non-endemic regions [6, 10, 11].
Public health and the environment are influenced by the patterns of space occupation. Therefore, the use of geoprocessing techniques to analyze the spatial distribution of health problems allows one to determine local risks and delimit areas that concentrate the most vulnerable situations (occurrence of disease, characteristics of the environment and habitat of the intermediate host / vector). It is also possible with the use of geographic information systems to plan, schedule, control, monitor, and evaluate the diseases in groups according to their risk of transmission .
The use of Geographic Information Systems (GIS) and statistical tools in health has been facilitated by access to epidemiological data bases, enabling the production of thematic maps that contribute to the formulation of hypotheses about the spatial distribution of diseases and their relation to the socioeconomic variables .
The use of GIS and Remote Sensing (RS) are powerful tools for working complex analysis of a large number of information and viewing the results of this analysis in graphical maps. Since the seventies, RS has been applied to social sciences and health . There are numerous information collected by RS data, describing some biotic and abiotic factors . Application of RS and GIS techniques for mapping the risk of parasitic diseases, including schistosomiasis, has been performed over the past 15 years .
The estimate of schistosomiasis prevalence using GIS was first used in the Philippines and the Caribbean by [17, 18]. In Brazil, the use of GIS in schistosomiasis was first used by  in the state of Bahia. The authors constructed maps with environmental characteristics (total precipitation for three consecutive months, the annual maximum and minimum temperature and diurnal temperature differences), prevalence of
Table 1 shows a brief history of the use of GIS techniques in the study of schistosomiasis in several countries.
The main objective of the present study is to establish a relationship between schistosomiasis positivity index and the environmental and socioeconomic variables, in the Minas Gerais State, Brazil, using multiple linear regressions at small communities and cities levels.
|Philippines, the Caribbean||Landsat (MSS)||climate||[17, 18]|
|China||NOAA (AVHRR), Lansdat (TM)||ecological zones|||
|Egypt||NOAA (AVHRR)||temperature, NDVI|||
|Southeast Asia||NOAA (AVHRR)||NDVI|||
|Kenya||-||linear regression, mapping techniques, cluster analysis|||
|Egypt||NOAA (AVHRR), Lansdat (TM)||dT, NDVI, MDE|||
|Brazil||-||temperature, precipitation, DEM, soil type, vegetation type|||
|China||Landsat (TM)||classification, GIS|||
|Tanzania||GIS, logistic regression|||
|Egypt||NOAA (AVHRR), Lansdat (TM)||dT, BED, NDVI|||
|Brazil||NOAA (AVHRR)||NDVI, dT|||
|Tanzania||NOAA (AVHRR)||LST, NDVI, DEM, precipitation, logistic regression|||
|Ethiopia||NOAA (AVHRR)||LST, NDVI|||
|Ethiopia||NOAA (AVHRR)||NDVI, temperature, logistic regression|||
|China||NOAA (AVHRR), Lansdat (TM)||TNDVI|||
|-||Chad, Cameroon||NOAA (AVHRR)||ecology|||
|Africa (sub-Saharan Africa)||NOAA (AVHRR)||SIG|||
|-||Cameroon||NOAA (AVHRR), EROS||logistic regression|||
|Uganda||Landsat (TM)||ecological zones|||
|Côte d'Ivoire||Landsat (ETM), NOAA (AVHRR), EROS, MODIS||environmental and socioeconomic data||[16, 48]|
|Brazil||-||logistic regression models and Bayesian spatial models|||
|Brazil||MODIS, SRTM||regression, elevation, mixture model, NDVI|||
|Brazil||-||spatial analysis, GPS, immunological data|||
|Brazil||-||social and environmental data, regression|||
|Brazil||-||GPS and GIS|||
|Africa||-||ecology, GIS, RS, geostatistics|||
|Côte d'Ivoire||-||socioeconomic data, logistic regression, Bayesian model|||
|Tanzania||-||social and ecological data, Bayesian models, logistic regression, NDVI, elevation, cluster analysis|||
|Brazil||MODIS, SRTM||social and environmental data, RS, NDVI, temperature, regression|||
|Brazil||MODIS, SRTM||linear regression, imprecise classification, regionalization and pattern recognition|||
|China||SPOT||ecological data, land use, land cover, classification, Bayesian model, RS, NDVI, slope, LST|||
|China||-||GIS, spatial analysis and clustering, Bayesian model,|||
|Mali||NOAA (AVHRR)||Bayesian models, NDVI, LST, GIS, logistic regression|||
|Brazil||-||kriging, spatial distribution||[67, 68]|
|China||-||GIS, spatial analysis, clustering, kernel|||
|Brazil||MODIS||meteorological data, socioeconomic, sanitation, RS, regression|||
|Brazil||-||kernel, GPS, spatial distribution|||
|Brazil||MODIS, SRTM||social and environmental data, sanitation, biological, RS, NDVI, temperature, regression, kriging||[1, 74]|
|Brazil||MODIS||decision tree, environmental data, RS|||
|East Africa||-||Bayesian geostatistics, logistic regression, Markov chain Monte Carlo simulation,|||
|Africa||MODIS||Climate change, spatial distribution, temperature, precipitation, MaxEnt, soil|||
|-||Ethiopia, Kenya||NOAA (AVHRR)||geostatistics, LST, NDVI, elevation, environmental data, LQAS, LpCP|||
|Brazil||-||GPS, GIS, spatial distribution|||
2. Material and methods
The study area includes 4,846 small communities (called localities) in the entire State of Minas Gerais, Brazil. The dependent variable is the schistosomiasis positivity index (
2.2. Schistosomiasis positivity index
Schistosomiasis positivity index (
2.3. Intermediate hosts
Information about the existence of
The distribution of Biomphalaria snails used for this study was defined as:
The spatial distribution of the schistosomiasis
2.4. Environmental data
Twenty eight environmental variables were obtained from remote sensing and meteorological sources.
The remote sensing variables were derived from Moderate Resolution Imaging Spectroradiometer (MODIS) and from the Shuttle Radar Topography Mission (SRTM) sensor.
The variables of MODIS sensor used were collected in two seasons, summer (from 17/Jan/2002 to 01/Feb/2002 period) and winter (from 28/Jul/2002 to 12/Aug/2002 period). MODIS data were composed by the blue, red, near and middle infrared bands and also the vegetation indices (NDVI and EVI) .
The Linear Spectral Mixture Model (LSMM) is an image processing algorithm that generates fraction images with the proportion of each component (vegetation, soil, and shade) inside the pixel, which is estimated by minimizing the sum of square of the errors. In this work, the so called vegetation, soil, and shade fraction images were generated using the MODIS data, and the estimated values for the spectral reflectance components were also used as an input to the regression models .
Others variables obtained from SRTM were also used in this study: the digital elevation model (
2.5. Socioeconomic data
Socioeconomic variables obtained by The Brazilian Institute of Geography and Statistics (IBGE) census for the year 2000 were also used as explanatory variables. The variables used in this work are those related to the water quality (percentage of domiciles with access to the general net of water supply, access to the water through wells or springheads, and with other access forms to the water), and to the sanitary conditions (the percentage of domiciles with bathroom connected to rivers or lakes, connected to a ditch, to rudimentary sewage, to septic sewage, to a general net, to other sewerage type, with bathroom or sanitarium and without bathroom or sanitarium).
Indicator kriging and multiple linear regressions were employed to estimate the presence of the intermediate host and the schistosomiasis disease, respectively.
2.7. Indicator Kriging
Since information about existence of
The categorical attributes (classes) used for this study were defined as:
The snail attributes (class of species and localization) were distributed along the drainage network of 15 River Basins (Buranhém, Doce, Grande, Itabapoana, Itanhém, Itapemirim, Jequitinhonha, Jucuruçu, Mucuri, Paraíba do Sul, Paranaíba, Pardo, Piracicaba/Jaguari, São Francisco and São Mateus), according to the methodology used by .
In , however the indicator kriging was used only at municipalities’ level, but in this study it was used for localities level.
Indicator kriging procedures were applied to obtain an approximation of the conditional distribution function of the random variables. Based on the estimated function, maps of snail spatial distributions along with the corresponding uncertainties for the entire state and also map of estimated prevalence of schistosomiasis were built.
The indicator kriging result was used as a variable in multiple regression models.
2.8. Multiple linear regressions
Multiple linear regressions are a form of regression analysis in which data are modeled by a least squares function which is a linear combination of the model parameters and depends on one or more independent variables.
The regression analysis was applied with the schistosomiasis
The dependent variable was randomly divided into two sets: one with 852 cases (localities) for variables selection and model definition, and another with 738 cases for model validation.
Due to the high number of independent variables, some procedures were performed for variables selection. The relations among the dependent and the independent variables were analyzed in terms of correlation, multi co-linearity, and possible transformations that better explain the dependent variable.
A logarithmic transformation for the dependent variable (denoted by
The analysis of the correlation matrix showed that some variables had non-significative correlations with
Since multi co-linearity effects among the remaining independent variables were detected, variables selection techniques were used in order to choose a set of variables that better explain the dependent variable. Variable selection was performed by the R2 criterion using all possible regressions .
This selection technique consists in the identification of a best subset with few variables and a coefficient of determination R2 sufficiently close to that when all variables are used in the model.
Interaction effects were also analyzed to be included in the model. After performing the residual analysis, the chosen regression model was then validated. The final estimated regression function was computed using the entire data set (definition and validation), and it was applied to all localities to build a risk map for schistosomiasis positivity index.
The multiple regressions were developed based on two approaches: a global model (throughout the state) and a regional model (regionalization).
Regionalization is a classification procedure using the SKATER algorithm (Spatial ‘K’luster Analysis by Tree Edge Removal) applied to spatial objects with an areal representation (municipalities), which groups them into homogeneous contiguous regions .
Regionalization was applied in Minas Gerais to divide the state into four homogeneous regions. The choice of the number of regions was based on the spatial distribution of localities (Figure 1b) in order to achieve an adequate number of localities in each region.
The regional model was developed by doing a regression model separately in each of the four regions formed by first applying the SKATER algorithm using environmental variables .
The models validation was performed using the Root Mean Square Error (RMSE) and the Mean Squared Prediction Error (MSPR), given by .
where represent, respectively, the observed and predicted positivity index in the
The RMSE measures the variation of the observed values around the estimated values. Ideally, the values of RMSE are close to zero. The MSPR is computed the same way as the RMSE, but using validation samples.
The final models were applied in all 4,846 localities to estimate the positivity index.
2.9. Simple average interpolator
The simple average interpolator (SAI) algorithm of the software SPRING  was used to estimate the value of
where is the estimated positivity index of the 8 neighbors of the point (
The file generated by interpolation was a grid with spatial resolution of 1 km. The purpose of using this tool was to determine which of the mesoregions presented estimated values above 15% (class with high positivity index).
3. Results and discussion
The GeoSchisto Database (http://www.dpi.inpe.br/geoschisto/) was created containing all variables used in this study.
The indicator kriging result was a regular grid of 250 x 250 meters with the estimate of
3.1. Global model
The five variables selected were: presence or absence of the
The final model, with R2 = 0.18, was:
Fig. 3a shows the estimated
The precipitation, minimum temperature, EVI and sanitation were positively correlated with
The result of this model has the same variables (
3.2. Regional model
The Minas Gerais State was divided into four regions using the SKATER algorithm. Table 2 presents the number of localities in each region used for model generation and for model validation. The regionalization can be seen in Figure 4 a.
|Model Generation||Model Validation||Total|
|Region 1 (R1)||104||66||170|
|Region 2 (R2)||428||262||690|
|Region 3 (R3)||220||338||558|
|Region 4 (R4)||100||72||172|
Regression models were developed for each of the four regions with the same 94 variables used in the global model, and the same selection procedure. Different numbers of variables were selected in each region to determine the best regression model.
The final models generated for each region (Fig. 4c) and their R2 were:
Figure 4a shows the estimated values of
The regional model for Region 1 (R1) reflects the effect of sanitation (households with other forms of water than tap water, wells or springs) and the influence of weather (precipitation and temperature of summer). Region 1 achieved a R2 value of 0.35. The model obtained by  for the same Region 1 also has the same sanitation variable (percentage of homes with another type of access to water). The relationship between temperature and disease was also obtained by  and [55, 74].
The models for Regions 2 and 3 (R2 and R3) show the presence of
The model for Region 4 (R4) shows that
In all models the presence of
 also showed that the distribution of schistosomiasis in Bahia, at municipalities level, is related to the vegetation index (
3.3. Simple Averages Interpolator (SAI)
Table 3 presents the mean square error (RMSE) and Mean Squared Error of Prediction (MSPR) for the global and regional models, for each region. From this table we can observe that the mean square decreased from 10.739 to 9.979 when we used separate models for each region. It was also noted that the RMSE of the Regional Model was smaller than the RMSE of the Global Model for all four regions, highlighting the importance of using different equations and different variables for each region. Since the Regional Model can be considered a better model the Simple Averages Interpolator (SAI), was applied using the known positivity index of the 1,590 localities (Fig. 5a), and using the regression estimated positivity index of all 4,842 localities (Fig. 5b). The objective of applying SAI to all estimated
Figure 5a shows clusters presence in six mesoregions (Norte de Minas, Jequitinhonha, Vale do Mucuri, Vale do Rio Doce, Metropolitana de Belo Horizonte and Zona da Mata) with the highest
Thus, the Norte de Minas, Jequitinhonha, Vale do Mucuri, Vale do Rio Doce, Metropolitana de Belo Horizonte and Zona da Mata mesoregions are endemics areas.
Sul/Sudoeste de Minas and Triângulo Mineiro/Alto Parnaíba mesoregions are not endemic areas, but have a schistosomiasis focus (Itajubá municipality in Sul mesoregion). The Sul/Sudoeste de Minas mesoregion has 146 municipalities representing about 20% of municipalities in Minas Gerais State and is a non-endemic area for schistosomiasis. Due to the high concentration of cities in an area of 49,523.893 km2 (which represents less than 10% of the area of Minas Gerais State) and a high agricultural economy, it is a region with high risk of schistosomiasis transmission. Therefore, it would be interesting to do a detailed study in the Sul mesoregion to determine the schistosomiasis
Also, it would be interesting to keep surveillance in the municipalities of the Triângulo Mineiro/Alto Parnaíba mesoregion that presented
4. Conclusions and future work
This study shows the importance of a joint use of GIS and RS to study the risk of schistosomiasis. Moreover, it can be concluded that the combined use of GIS and statistical techniques allowed the estimation of schistosomiasis
Results of the regression models show that regionalization improves the estimation of the disease in Minas Gerais. Based on this model, a schistosomiasis risk map was built for Minas Gerais.  and  also obtained a better model with the use of regionalization when estimating schistosomiasis at a municipality level.
The Simple Averages Interpolator is a technique that may indicate possible local to transmission and surveillance of schistosomiasis.
It is recommended the use of GPS for field surveys together and the application of this methodology with images of better spatial resolution (10-30m) in other states for validation. Also, we recommend using a smaller area (municipality or mesoregion) estimate for the schistosomiasis.
The methodology used in this study can be utilized to control schistosomiasis in the areas with occurrence of the disease and also it can be used to take preventive measures to prevent the disease transmission.
Next step will be to utilize data from the PCE by localities to study other diseases such as ascariasis, hookworm, trichuriasis, etc, using data from CBERS and/or Landsat and new methodologies (Geographically Weighted Regression, Generalized Additive Model, etc).
The authors woud like to acknowledge the support of Sandra da Costa Drummond (Fundação Nacional de Saúde) and the support of CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico) (grants # 300679/2011-4, 384571/2010-7, 302966/2009-9, 308253/2008-6).