Spatial Modeling in Epidemiology

María Guzmán Martínez; Eduardo Pérez-Castro; Ramón Reyes-Carreto; Rocio Acosta-Pech

doi:10.5772/intechopen.104693

Abstract

The objective of this chapter is to present the methodology of some of the models used in the area of epidemiology, which are used to study, understand, model and predict diseases (infectious and non-infectious) occurring in a given region. These models, which belong to the area of geostatistics, are usually composed of a fixed part and a random part. The fixed part includes the explanatory variables of the model and the random part includes, in addition to the error term, a random term that generally has a multivariate Gaussian distribution. Based on the random effect, the spatial correlation (or covariance) structure of the data will be explained. In this way, the spatial variability of the data in the region of interest is accounted for, thus avoiding that this information is added to the model error term. The chapter begins by introducing Gaussian processes, and then looks at their inclusion in generalized spatial linear models, spatial survival analysis and finally in the generalized extreme value distribution for spatial data. The review also mentions some of the main packages that exist in the R statistical software and that help with the implementation of the mentioned spatial models.

Keywords

Geostatistic
gaussian process
spatial GLM
spatial survival analysis
spatial extremes

Author Information

Show +

María Guzmán Martínez*
- Universidad Autónoma de Guerrero, México
Eduardo Pérez-Castro
- Universidad Autónoma de Guerrero, México
Ramón Reyes-Carreto
- Universidad Autónoma de Guerrero, México
Rocio Acosta-Pech
- Colegio de Postgraduados, México

*Address all correspondence to: manguzgm@gmail.com

1. Introduction

The term spatial statistics is used to describe a wide range of statistical models and methods for the analysis of geo-referenced data [1]. Its rapid use has been increasing in various fields of science, such as biology, image processing, environmental and earth sciences, ecology, epidemiology, agronomy, forestry, among others [2]. In epidemiology, spatial statistics are used to study the occurrence of health-disease events or deaths in a region of interest. It is now known that several public health problems tend to exhibit spatial dependence (spatial autocorrelation, spatial variability), and that sometimes these problems are related to climatic factors that are generally of a spatially continuous nature or with factors specific to the study region. The use of classical statistical techniques to model spatial data generally leads to an overestimation of model parameters [1]; and although they may eventually help, these models, lacking adequate structure, will not be able to model the spatial variability of the data; valuable information that will be sent to model error and cannot be used to explain the nature of the phenomenon under study.

Recent studies have shown that spatial models can help identify spatial patterns in infectious and non-infectious diseases. These models also help determine the factors that favor them, such as sociodemographic, environmental, etc.; as well as generate maps to visualize the distribution of morbidity or mortality of infectious and non-infectious diseases, and identify critical points in the spatial distribution [3, 4].

Generalized linear spatial models (GLSM), which are a particular class of multilevel or hierarchical models, have been used for the study of certain diseases (infectious and non-infectious). The estimation of GLSM parameters can be done under the frequentist or Bayesian approach [1], some examples are given below. A spatial Poisson regression model, where parameter estimation was performed under the frequentist approach, was used to study esophageal cancer incidence rates [5] and the sociodemographic risk factors for diabetes [6]. Under the Bayesian approach, these models have been used to study the relationship between Visceral Leishmaniasis incidence rates and climatological variables [7], as well as to identify risk factors associated with nontuberculous mycobacterial infections [8]. Spatial Binomial regression models, under the Bayesian approach, have been used to describe patterns of occurrence of dengue and chikungunya [9], and filariasis [10]. Under the classical approach, spatial binomial regression models have been used to investigate environmental and sociodemographic factors associated with leptoserosis disease [11]; are also used to study risk factors associated with HIV infection among drug users [12].

On the other hand, survival analysis under the spatial approach has also received great attention in recent years, because geographic location can play a relevant role in predicting disease survival [13]. Fragility models (spatial survival models) can be an option to analyze the heterogeneity of the data when it cannot be explained by the covariates in a classical survival model. In spatial survival models, in addition to covariates, a random effect known as frailty is added, which modifies the hazard function of an individual, or of spatially correlated individuals [14]. Generally, the random factor, which is assigned a multivariate normal distribution, plays an important role in modeling survival times; since in this term the differences that exist in the socioeconomic level, access to medical care, population density, weather conditions, among others, can be taken into account. It is worth mentioning that spatial survival models have been applied in studies such as: recovery time in patients with COVID-19 [15], hospitalization time in dengue patients [16], HIV/AIDS survival [17] and breast cancer [18] to name a few. In all these works, the estimation of the model parameters was under the Bayesian approach.

Extreme events in public health (for example, the saturation of hospitals) are generally analyzed through measures of central tendency or time series, however, these approaches are not the most appropriate to understand extreme events (unusual events); that when they occur they strongly impact the health care network, thus often collapsing the system [19]. The extreme value theory (EVT) aims to study the probability of occurrence of extreme events (values) of a phenomenon of interest over time, generally these values only occur when they exceed a threshold. Although the applications of EVT in public health are scarce, if they exist at all; an application was presented when predicting extreme events of annual seasonal influenza mortality and the number of emergency department visits in a network of hospitals [20], another application was presented when modeling elevated cholesterol levels using the spikes-over-threshold model [21]. In both cases, the parameters were estimated under the frequentist approach. Given the advantages they have with the application of a spatial model, it would be convenient to study the extreme events of the health sector in space, for which there is already a methodology known as spatial modeling of extreme values [22].

The objective of this work is to provide a general review of the theoretical framework of spatial statistical models developed in the area of geostatistics, which have been used in the area of epidemiology to analyze, model and predict the phenomena of interest. Some of the packages that exist in the statistical software R [23] to carry out said spatial analyzes are also mentioned.

2. Gaussian processes

A stochastic process Wt,t∈T is a collection of random variables. That is, for each t∈T, Wt is a random variable [24]; if the stochastic process is indexed by a coordinate space s∈A⊂Rd, then the stochastic process is called a random field [25]. A realization of the random field, Ws,s∈A, is given by Ws1=ys1…Wsn=ysn. Generally from the sample ys1,…,ysn one tries to know the characteristics of the process W in si, i=1,…,n; and with this information to make inference of the process Ws on all A⊂Rd, a convex set where s varies continuously. To the geo-referenced data ys1,…,ysn is often referred to as geocoded, geostatistical data or point-referenced data. The study of this type of data is known as geostatistics, which is a part of spatial statistics that studies phenomena with continuous variation in space, a convex region denoted A [26].

A process W is second order stationary if it has finite variance, constant mean and its covariance function depends only on distance. Having second-order stationarity in a stochastic process implies having intrinsic stationarity, i.e., second-order stationarity is stronger than intrinsic stationarity. On the other hand, weak stationarity and second-order stationarity are equivalent in the space [27]. The following defines what is known as a Gaussian process (field).

Definition 1. A stochastic process Ws:s∈A⊂R2, where s varies continuously on a fixed subset A content in R2, is a Gaussian process if for any collection of locations s1,…,sn with si∈A, the joint distribution of Ws1…Wsn is multivariate Gaussian [1].

What is known as a stationary Gaussian process is defined below.

Definition 2. A Gaussian process Ws:s∈A⊂R2, is stationary if ∀s∈A:

EWs=0,E1

VarWs=σ2,E2

and its correlation function depends only on the distance, i.e.

CorrWsWs′=ρh,E3

where h=s−s′ is the Euclidean distance that exists between s and s′.

That is, the mean and variance of Ws are constant and its correlation function only depends on the distance, so that

W∼N0σ2ρhE4

Given W=W1…Wn, where Wi=Wsi, the distribution of W is normal multivariate NM, i.e.

W∼NM0σ2R,E5

where the ij element of R is given by Rij=CorrWsiWsj=ρhij, hij=si−sj is the Euclidean distance between si and sj. Note that the covariance of the Gaussian process is given by CovW=σ2R.

In this way, the correlation structure of a stationary Gaussian process can be studied through the ρh function. Several parametric expressions for this function are shown in the Table 1. In these correlation functions, ϕ>0 is a range parameter controlling the spatial decay over distance; h=s−s′ is the Euclidean distance between s and s′ and h≥0; Γ⋅ denotes the gamma function. κ>0, in theory of spatial extremes Jκ⋅ and Kκ⋅ are the Bessel and modified Bessel function of the third kind of order κ [28], while in the spatial survival analysis and generalized linear models Kκ⋅ is the modified Bessel function of the second kind of order κ [29]; κ is a shape parameter that determines the analytic smoothness of the underlying process W [1]. In the powered exponential correlation function 0<κ≤2 and in the Bessel correlation function κ≥0.

Family	Correlation function
Exponential	ρhϕ=exp−hϕ
Gaussian	ρhϕ=exp−h2ϕ2
Spherical	ρhϕ=1−1.5hϕ+0.5hϕ3
Circular	ρhϕ=1−2πa1−a2+sin−1a
Cubic	ρhϕ=1−7hϕ2−354hϕ3+72hϕ5−34hϕ7
Wave	ρhϕ=ϕhsinhϕ
Matérn	ρhϕκ=12κ−1ΓκhϕκKκhϕ
Powered exponential	ρhϕκ=exp−hϕκ
Cauchy	ρhϕκ=1+hϕ2−κ
Stable	ρhϕκ=exp−hϕ
Bessel	ρhϕκ=2ϕhκΓκ+1Jκhϕ

Table 1.

Models for the spatial correlation structure of a spatial process.

3. Gaussian spatial model

Generally from process Ws:s∈A⊂R2, there is a noisy version, i.e., a set of observation ys1,…,ysn of the random variables Ys1,…,Ysn, si∈A. In this way Ys is a measurement process of Ws, s∈A [1, 26].

The Gaussian geostatistical model, in the absence of independent variables, is given by

Ys=μ+Ws+Zs,s∈A,E6

where μ is a constant mean effect, Ws is a stationary Gaussian process (1) and Zs is the error term in the model with Zs∼N0τ2; τ2 is the nugget effect variance. Zs is known as measurement error, micro-scale variation or a non-identifiable combination of the two [22, 26].

Thus for a realization of a stationary Gaussian spatial process, Ys=Ys1…Ysn, si∈A and i=1,…,n with

Ysi=μ+Wsi+Zsi,E7

where

Wsi∼N0σ2.
Zsi are mutually independent and identically distributed, Zsi∼N0τ2,i=1,…,n.
Zsi are independent of the process W⋅ [26].
Conditional on W⋅, random variables Ysi, i=1,…,n, are mutually independent with normal distribution,

Ysi∣W⋅∼Nμ+Wsiτ2.E8

The joint distribution of Ys is normal multivariate given by

Ys∼NMμ1σ2Rϕ+τ2I,E9

where

μ is the mean of the Gaussian process W⋅ and 1 is a vector of dimension n×1_.
σ2 is the variance of the process W⋅_.
Rϕ is a matrix of correlations of dimension n×n, whose elements given by

Rϕij=ρYhijϕ,E10

where hij=si−sj is the euclidean distance that exists between si and sj that are in A, and ϕ is a spatial scale parameter.

τ2 is the variance of Z_andI is the identity matrix of dimension n×n_.
Note that the covariance of the Ys is given by CovYs=σ2Rϕ+τ2I_.

When Ys can be explained by a set of covariates that also depend on the location, Xs=X1s.…Xps, then the model is given by

Ys=Xsβ+Ws+Zs,s∈A,E11

with

Ys∼NMXsβσ2Rϕ+τ2I,E12

where β=β0…βp is a vector of unknown regression parameters; in this case also CovYs=σ2Rϕ+τ2I. The unknown parameters in this model are β, σ2, τ2 and ϕ. The parameters of the Models (4) y (5) can be estimated under the classical approach (maximum likelihood or maximum restricted likelihood) and under the Bayesian statistical approach [1, 30]. Among the most important points in geostatistics is the modeling of the covariance structure of the spatial process and the identification of the interpolation method that will be used to perform the prediction of the process in the non sampled points in A. Regarding the last point, [31] made a compilation of the most used criteria for assessing the performance of the spatial interpolation method.

The geoR package contains the likfit function that allows to estimate, under Maximum likelihood (ML) or restricted maximum likelihood (REML), the parameters of a Gaussian process [32]. The function likfit estimates the coefficients of the models (4) y (5).

The function krige.cov of the same package helps to perform the spatial prediction of a Gaussian process using simple kriging (SK), ordinary kriging (OK), external trend kriging (KTE) and universal kriging (UK) [33]. The package glmmfields allows to fit Gaussian models [34] under the Bayesian approach.

On the other hand, with the function glmmfields of the package glmmfields, the coefficients of the models (4) and (5) can be estimated under the Bayesian approach. The function glmmfields reports the posterior median of the parameters with their respective 95% credible intervals; this function, also reports the values of the Gelman and Rubin statistic [35], where values less than 1.20 would indicate convergence of the chain.

4. Generalized linear spatial models

Generalized linear models (GLM) [36, 37] are very useful when the response variable does not follow a normal distribution. The assumptions of GLMs are

1. Yi, i=1,…,n are mutually independent with expectations μi.
The μi are specified by gμi=ηi, where g⋅ is a known link function.
The linear predictor is given by ηi=xi′β, where xi is a vector of explanatory variables associated with the response Yi, and β is a vector of unknown parameters.

The Yi follow a common distributional family, indexed by their expectations μi, and possibly by additional parameters common to all n responses.

An important extension of this basic class of models is the generalized linear mixed model (GLMM) [38], in which Y1,…,Yn are mutually independent conditional on the realized values of a set latent random variables (random effects) U1,…,Un and the conditional expectations are given by gμi=Ui+xi'β. A generalized linear spatial model is a GLMM in which the U1,…,Un are derived from spatial process. Diggle and Ribeiro in 2007 [1], refers to these models as generalized linear geostatistical model (GLGM). In accordance with Diggle et al. [39], the assumptions of the generalized linear spatial models are as follows

W is a stationary Gaussian process, W∼N0σ2ρh, (Eq. (1)).
Conditionally an W, the random variables Yi, i=1,…,n are mutually independent, with distributions fiyWsi=fyMi, specified by the values of the conditional expectations Mi=EYiWsi.
gMi=xi′β+Wsi for some known link function g and explanatory variable xi=xsi.

Then Mi=g−1xi′β+Wsi, where the linear predictor would be given by ηi=xi′β+Wsi.

Taking Diggle and Tawn as a precedent (1998) [39]; Jing and De Oliveira in 2015 [40] state the GLSM as follows

Yi∣Wi∼p⋅μi.E13

where

W∼NMXβσ2RE14

R is of the same form as the Gaussian process (2)

Ysi:i=1,…,n are conditionally independent given W with pdfs or pmfs p⋅μi.
EYiWi=μi and g⋅ is a known one-to-one link function.
X=1x1…xp is a known n×p+1 design matrix assumed of full-rank, with 1 a vector of n×1 of ones and xj=xjs1.…xjsn', where xjsi is the value of the j-th covariate of the i-th sampling location, and β=β0β1…βp is the vector of unknown regression parameters.

Since g is the link function then gμi=ηi and μi=g−1ηi, i=1,…,n, where the linear predictor is given by ηi=Wi, then μi=g−1Wi. The unknown parameters in GLSM are β, σ2 and ϕ.

The two most widely used GLSM for spatial count data are the Poisson and Binomial spatial models [39, 41].

The geoCount [40] package implements the GLSM; the function runMCMC is used to generate posterior samples of the Gaussian process and the GLSM parameters, with which the parameter estimates and their credibility intervals can be obtained.

In the package geoRglm [42, 43], the functions glsm.krige, pois.krige and binom.krige implement the GLSMs, in this case, parameter estimation is performed under the frequentist approach. While the functions krige.bayes, pois.krige.bayes and binom.krige.bayes, which also implement the GLSMs, estimate the parameters under the Bayesian approach. These functions report estimates of β, σ2 and ϕ.

The glmmfields package implements the Gamma, Poisson, Negative Binomial, Binomial and Lognormal models using the function glmmfields [34], parameter estimation is performed under the Bayesian approach. The function glmmfields reports the parameter estimates using the posterior median with their respective 95% percentile credible intervals; it also reports the Gelman and Rubin diagnostic values.

5. Spatial survival models

Generally, survival analysis models are specified through their hazard function, ht, whose intuitive interpretation is that htδt is the conditional probability that a patient will die in the interval tt+δt, given tat they have survived until time t. The most widely used approach to modeling ht, at least in medical applications, is to use a semi-parametric formulation [44]. In this approach, the hazard for the i-th patient is modeled as

hti=h0tiexpxi′β,E15

where xi is a vector of explanatory variables for patient i and h0t is an unspecified baseline hazard function. This is known as a proportional hazards (PH) model, because for any two patients i and j, hti/htj does not change over time [1].

Another key idea in survival analysis is frailty, this corresponds to the random effects term used; time-to-event data will be group into strata, such as clinical sites, geographic regions, etc. This gives rise to mixed models, which include a random effect (the frailty) that correspond to a stratum’s overall health status [30]. To illustrate, let tij be the time to death or censoring for subject j in stratum i, j=1,…,ni, i=1,…,m. Let xij be a vector of individual specific covariates, then

htijxij=h0tijexpxij′β+Wi,E16

where Wi es the stratum-specific frailty term, designed to capture differences among strata; strata are typically denoted by si, i=1,…,m, so si denotes the location of the i-th patient and Wi=Wsi. It can be assumed that the Wi are independent identical distribution (iid), i.e.

Wi∼N0σ2.E17

But it can also be assumed that Wi arises from a Gaussian process, i.e. if W=W1…Wm, then

W∼NM0σ2Rϕ.E18

This way, suppose subjects are observed at m distinct spatial locations s1,…,sm∈A. Let tij be a random event time associated with the j-th subject in si, assume the survival time tij lies in the interval aijbij, i=1,…,m, j=1,…,ni; and xij be a related p-dimensional vector of covariates, then are defined proportional hazard (PH) frailty models, accelerated failure time (AFT) frailty models and proportional odds (PO) frailty models.

PH frailty models are the extensions of the population hazards model which is best known as the Cox model [44] a widely pursued model in survival analysis. PH frailty models extends the Cox model such that the hazard of an individual depends in addition on an unobserved random variable W, then introducing an additive frailty term Wi for each individual in the exponent of the hazard function as follows

htijxij=h0tijexij′β+Wi.E19

The corresponding survival function and the density are given by

Stijxij=S0tijexij′β+Wi,ftijxij=exij′β+WiS0tijexij′β+Wi−1f0tij,E20

where S0⋅, f0⋅ and h0⋅ are the baseline survival function, baseline density and baseline hazard function assumed to be unique for all individual in the study population respectively.

Accelerated failure time frailty model extends the AFT model such that the hazard of an individual depends in addition on an unobserved random variable W [45, 46, 47]. Introducing an additive frailty term Wi for each individual in the exponent of the hazard function it becomes:

htijxij=h0exij′β+Witijexij′β+Wi.E21

The survival function and density are given by

Stijxij=S0exij′β+Witij,ftijxij=exij′β+Wif0exij′β+Witij,E22

where S0⋅, f0⋅ and h0⋅ are the baseline survival function, baseline density and baseline hazard function assumed to be unique for all individual in the study population respectively.

Finally, proportional odds frailty model is given by

htijxij=h011+e−xij′β−Wi−1S0tij.E23

The survival function and density are given by

Stijxij=S0tije−xij′β−Wi1+e−xij′β−Wi−1S0tij,ftijxij=f0tije−xij′β−Wi1+e−xij′β−Wi−1S0tij2,E24

where S0⋅, f0⋅ and h0⋅ are the baseline survival function, baseline density and baseline hazard function assumed to be unique for all individual in the study population respectively.

In the frailty models, it is possible to deal with left, right and interval censoring of the data. Among the packages that exist in the R statistical software to perform spatial survival analysis is the spBayesSurv package [48]; the function survregbayes estimates the parameters of the PH, AFT and PO spatial models under the classical and Bayesian approach; also reports the posterior mean and median of the regression coefficients and of the parameters of the covariance function of the Gaussian process, σ2 and ϕ, with their 95% credible intervals. The spBayesSurv package uses the powered exponential function (Table 0) to model the spatial correlation of the data.

Also in R, there is the spatsurv package [49], which implements the function survspat that fits parametric PH spatial survival models. This function reports the estimates and posterior median of the parameters β, σ2 and ϕ with the respective credibility intervals.

6. Spatial generalized extreme value model

According to Coles (2001) [50], given Y1,…,Yn a sequence of independent random variables with a common distribution function F with Mn=maxY1…Yn, if there a sequences of constants an>0 and bn such that

PMn−bn/an≤z→Gz,E25

when n→∞, for a non-degenerative distribution function G, then G is a member of the generalized extreme value (GEV) distribution family

Gyητξ=exp−1+ξy−ητ−1ξ,E26

defined on z:1+ξz−η/τ>0, where −∞<η<∞, τ>0 and −∞<ξ<∞.

Davison et al. in 2012 [51], describe spatial GVE as follows. For each s in R2, suppose that Ys is GEV distributed whose parameters μs, σs and ξs vary smoothly for s in R2 according to a stochastic process Ws. We assume that the processes for each GEV parameters are mutually independent Gaussian processes [52]. Then

ηs=fηsβη+Wηsσηϕηκη,τs=fτsβτ+Wτsστϕτκτ,ξs=fξsβξ+Wξsσξϕξκξ,E27

where fη, fτ and fxi are deterministic functions depending on a regression parameters βη, βτ and βξ respectively. While Wη, Wτ and Wξ are a zero mean stationary Gaussian process with correlation function ρhϕηκη, ρhϕτκτ and ρhϕξκξ respectively, i.e.

Wη∼N(0,ση2ρhϕηκη,Wτ∼N(0,στ2ρhϕτκτ,Wξ∼N(0,σξ2ρhϕξκξ.E28

Then conditional on the values of the tree Gaussian process at the sites s1…sk, the maxima are assumed to follow GEV distributions

Ysi∣ηsj,τsj,ξsj∼GEVηsjτsjξsjE29

z independently for each location s1,…,sk, j=1,…,k and i=1,…,n.

Davison et al. in 2012 [51], proposed the construction of Bayesian hierarchical models for spatial extremes.

The SpatialExtremes package [53] allows modeling spatial extremes, through max-stable processes with the function fitmaxstab, which reports the values of the parameter estimates with their respective standard errors.

To implement hierarchical Bayesian models, the function latent is used, this reports the posterior median of the scale, shape and location parameters with their respective credible intervals.

Another package in the literature to model spatial extremes is glmmfields [34], with the function glmmfields, parameter estimation is performed under the Bayesian approach. The function glmmfields also allows modeling spatial extreme events incorporating temporally, that is, time, these models are known as spatio-temporal models.

7. Conclusions

The main characteristic of spatial data is that observations close in space tend to be correlated, and in spatial modeling this correlation is used to understand the behavior of the phenomenon under study in a region of interest.

Omitting the spatial dependence of the data can generate a bias of the information and, consequently, lead to an incorrect inference. Therefore, adequately describing the spatial pattern of an event can provide sufficient elements to elaborate possible hypotheses of its cause. As we have seen, the spatial variability of georeferenced data can be studied with the spatial models developed in geostatistics. The usefulness of these models has been demonstrated in several applications related to the identification of social structures, disease patterns, occupational patterns, as well as in the identification of populations (or subgroups) that are at greater or lesser risk of an event. As we have seen, in statistics, all correctly processed information helps in correct decision making. In this sense, this paper aims to introduce the reader to the use of spatial models in geostatistics.

If the response or variable of interest is the cases (counts) of sick people in a given region, or the new cases of a disease in a given period of time (incidence), then Poisson GLSMs can be useful to know the spread of the disease in the population of interest, predict new cases, and identify the variables that influence the occurrence of the disease. On the other hand, when the response variable is a binary or ratio variable, such as mortality rates or infection rates, then binomial GLSMs can be helpful. These models have been used to study the prevalence of dengue and to identify the variables associated with the event.

Conflict of interest

The authors declare no conflict of interest.

Abbreviations

AFT	Accelerated failure time
EVT	Extreme value theory.
GEV	generalized extreme value.
GLSM	Generalized Linear Spatial Models.
GLM	Generalized linear models.
GLMM	generalized linear mixed model
GLGM	generalized linear geostatistical model
iid	Independent identical distribution
NM	Normal multivariate.
PH	Proportional hazards.
PO	Proportional odds.

References

1. Diggle PJ, Ribeiro PJ. Model-Based Geostatistics. New York: Springer; 2007. p. 228
2. Gaetan C, Guyon X. Spatial Statistics and Modeling. New York, NY: Springer Science+Business Media, LLC; 2010
3. Rezaeian M, Dunn G, St. Leger S, Appleby L. Geographical epidemiology, spatial analysis and geographical information systems: A multidisciplinary glossary. Journal of Epidemiology and Community Health. 2007;61(2):98-102. DOI: 10.1136/jech.2005.043117
4. Chowell G, Rothenberg R. Spatial infectious disease epidemiology: On the cusp. BMC Medicine. 2018;16(1):1-5. DOI: 10.1186/s12916-018-1184-6
5. Mohebbi M, Wolfe R, Jolley D. A poisson regression approach for modelling spatial autocorrelation between geographically referenced observations. BMC Medical Research Methodology. 2011;11(1):1-11. DOI: 10.1186/1471-2288-11-133
6. Kauhl B, Schweikart J, Krafft T, Keste A, Moskwyn M. Do the risk factors for type 2 diabetes mellitus vary by location? A spatial analysis of health insurance claims in Northeastern Germany using kernel density estimation and geographically weighted regression. International Journal of Health Geographics. 2016;15(1):1-12. DOI: 10.1186/s12942-016-0068-2
7. Ben-Ahmed K, Aoun K, Jeddi F, Ghrab J, El-Aroui MA, Bouratbine A. Visceral leishmaniasis in Tunisia: Spatial distribution and association with climatic factors. The American Journal of Tropical Medicine and Hygiene. 2009;81(1):40
8. Lipner EM, Knox D, French J, Rudman J, Strong M, Crooks JL. A geospatial epidemiologic analysis of nontuberculous mycobacterial infection: An ecological study in Colorado. Annals of the American Thoracic Society. 2017;14(10):1523-1532
9. Hira FS, Asad A, Farrah Z, Basit RS, Mehreen F, Muhammad K. Patterns of occurrence of dengue and chikungunya, and spatial distribution of mosquito vector Aedes albopictus in Swabi district, Pakistan. Trop Med Int Heal. 2018;23(9):1002-1013. DOI: 10.1111/tmi.13125
10. Slater H, Michael E. Mapping, Bayesian geostatistical analysis and spatial prediction of lymphatic filariasis prevalence in Africa. PLoS One. 2013;8(8):28-32. DOI: 10.1371/journal.pone.0071574
11. Mayfield HJ, Lowry JH, Watson CH, Kama M, Nilles EJ, Lau CL. Use of geographically weighted logistic regression to quantify spatial variation in the environmental and sociodemographic drivers of leptospirosis in Fiji: A modelling study. Lancet Planet Heal. 2018;2(5):223-232. DOI: 10.1016/S2542-5196(18)30066-4
12. Zhou YB, Wang QX, Liang S, Gong YH, Yang MX, Chen Y, et al. Geographical variations in risk factors associated with HIV infection among drug users in a prefecture in Southwest China. Infectious Diseases of Poverty. 2015;4(1):1-10. DOI: 10.1186/s40249-015-0073-x
13. Zhou H, Hanson T, Zhang J. SpBayesSurv: Fitting bayesian spatial survival models using R. Journal of Statistical Software. 2020;92(9):1-33. DOI: 10.18637/jss.v092.i09
14. Banerjee S, Wall MM, Carlin BP. Frailty modeling for spatially correlated survival data, with application to infant mortality in Minnesota. Biostatistics. 2003;4(1):123-142. DOI: 10.1093/biostatistics/4.1.123
15. Mahanta KK, Hazarika J, Barman MP, Rahman T. An application of spatial frailty models to recovery times of COVID-19 patients in India under Bayesian approach. Journal of Scientific Research. 2021;65(03):150-155. DOI: 10.37398/JSR.2021.650318
16. Aswi A, Cramb S, Duncan E, Hu W, White G, Mengersen K. Bayesian spatial survival models for hospitalisation of dengue: A case study of Wahidin hospital in Makassar, Indonesia. International Journal of Environmental Research and Public Health. 2020;17(3):1-12. DOI: 10.3390/ijerph17030878
17. Martins R, Silva GL, Andreozzi V. Bayesian joint modeling of longitudinal and spatial survival AIDS data. Statistics in Medicine. 2016;35(19):3368-3384. DOI: 10.1002/sim.6937
18. Zhou H, Hanson T, Jara A, Zhang J. Modeling county level breast cancer survival data using a covariate-adjusted frailty proportional hazards model. The Annals of Applied Statistics. 2015;9(1):43-68. DOI: 10.1214/14-AOAS793
19. Chiu Y, Chebana F, Abdous B, Bélanger D, Gosselin P. Mortality and morbidity peaks modeling: An extreme value theory approach. Statistical Methods in Medical Research. 2018;27(5):1498-1512. DOI: 10.1177/0962280216662494
20. Thomas M, Lemaitre M, Wilson ML, Viboud C, Yordanov Y, Wackernagel H, et al. Applications of extreme value theory in public health. PLoS One. 2016;11(7):3-9. DOI: 10.1371/journal.pone.0159312
21. De Zea BP, Mendes Z. Extreme value theory in medical sciences: Modeling total high cholesterol levels. J Stat Theory Pract. 2012;6(3):468-491. DOI: 10.1080/15598608.2012.695673
22. Gelfand AE, Schliep EM. Spatial statistics and gaussian processes: A beautiful marriage. Spatial Statistics. 2016;18:86-104. DOI: 10.1016/j.spasta.2016.03.006
23. R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2021. URL https://www.R-project.org/
24. Ross SM et al. Stochastic Processes. Vol. 2. New York: Wiley; 1996
25. Grigoriu M. Stochastic processes. In: Calculus S, editor. Birkhäuser. Boston: MA; 2002. pp. 103-204
26. Diggle PJ, Ribeiro PJ, Christensen OF. An introduction to model-based geostatistics. In: Møller J, editor. Spatial Statistics and Computational Methods. New York: Springer Verlag; 2003. pp. 43-86
27. Cressie N, Wikle CK. Statistics for Spatio-Temporal Data. John Wiley & Sons; 2015
28. Ribatet M. A user’s Guide to the SpatialExtremes Package. Lausanne, Switzerland: EPFL; 2009
29. Chilès JP, Delfiner P. Geostatistics: Modeling Spatial Uncertainty. New York: Wiley Series In Probability and Statistics; Vol. 497; 2009
30. Banerjee S, Carlin BP, Gelfand AE. Hierarchical Modeling and Analysis for Spatial Data. 2nd ed. United States of America: Chapman and Hall/CRC; 2004. p. 472
31. Li J, Heap A. A review of comparative studies of spatial interpolation methods in environmental sciences: Performance and impact factors. Ecol Inform. 2011;6(3–4):228-241. DOI: 10.1016/j.ecoinf.2010.12.003
32. Ribeiro PJ, Diggle PJ. geoR: A package for geostatistical analysis. R-NEWS. 2001;1:15-18
33. Goovaerts P. Geostatistics for Natural Resources Evaluation. New York: Oxford University Press; 1997
34. Anderson SC, Ward EJ. Black swans in space: Modeling spatiotemporal processes with extremes. Ecology. 2019;100(1):1-23. DOI: 10.1002/ecy.2403
35. Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences. Statistical Science. 1992;7(4):457-511. DOI: 10.1214/ss/1177011136
36. Nelder J, Wedderburn R. Generalized linear models. Journal of the Royal Statistical Society Series A (General). 1972;135(3):370-384. DOI: 10.2307/2344614
37. McCullagh P, Nelder JA. Generalized Linear Models. 2nd ed. London: Chapman and Hall; 1989
38. Breslow NE, Clayton DG. Approximate inference in generalized linear mixed models. Journal of the American Statistical Association. 1993;88(421):9-25
39. Diggle PJ, Tawn JA, Moyeed RA. Model-based geostatistics. Journal of the Royal Statistical Society: Series C (Applied Statistics). 1998;47(3):299-350
40. Jing L, De Oliveira V. Geocount: An R package for the analysis of geostatistical count data. Journal of Statistical Software. 2015;63:1-33. DOI: 10.18637/jss.v063.i11
41. Christensen OF, Waagepetersen R. Bayesian prediction of spatial count data using generalized linear mixed models. Biometrics. 2002;58(2):280-286. DOI: 10.1111/j.0006-341x.2002.00280.x
42. Christensen OF, Ribeiro PJ Jr. geoRglm-a package for generalised linear spatial models. R News. 2002;2(2):26-28
43. Ribeiro PJ Jr, Christensen OF, Diggle PJ. geoR and geoRglm: Software for model-based geostatistics. Hornik K, Leisch F, Zeileis A, editors. Vienna: 3rd International Workshop on Distributed Statistical Computing (DSC 2003): 20-22 March 2003; p. 2
44. Cox DR. Regression models and life-tables. J R Stat Soc [B]. 1972;34(2):187-202. DOI: 10.1111/j.2517-6161.1972.tb00899.x
45. Buckley J, James I. Linear regression with censored data. Biometrika. 1979;66(3):429-436. DOI: 10.2307/2335161
46. Wei L. The accelerated failure time model: A useful alternative to the cox regression model in survival analysis. Statistics in Medicine. 1992;11(14–15):1871-1879. DOI: 10.1002/sim.4780111409
47. Zhang J, Lawson AB. Bayesian parametric accelerated failure time spatial model and its application to prostate cancer. Journal of Applied Statistics. 2011;38(3):591-603. DOI: 10.1080/02664760903521476
48. Zhou H, Hanson T, Zhang J. spBayesSurv: Fitting Bayesian Spatial Survival Models Using R. 2017. arXiv preprint arXiv:1705.04584
49. Taylor BM, Rowlingson BS. Spatsurv: An R package for bayesian inference with spatial survival models. Journal of Statistical Software. 2017;77(4):1-32. DOI: 10.18637/jss.v077.i04
50. Coles S, Bawa J, Trenner L, Dorazio P. An Introduction to Statistical Modeling of Extreme Values. Vol. 208. London: Springer; 2001. p. 208
51. Davison AC, Padoan SA, Ribatet M. Statistical modeling of spatial extremes. Statistical Science. 2012;27(2):161-186. DOI: 10.1214/11-STS376
52. Casson E, Coles S. Spatial regression models for extremes. Extremes. 1999;1(4):449-468. DOI: 10.1023/A:1009931222386
53. Ribatet M. SpatialExtremes: An R Package for Modelling Spatial Extremes. R Package Version 2.1-0. 2020. Available from: https://cran.r-project.org/web/packages/SpatialExtremes/

[1] 1. Diggle PJ, Ribeiro PJ. Model-Based Geostatistics. New York: Springer; 2007. p. 228

[2] 2. Gaetan C, Guyon X. Spatial Statistics and Modeling. New York, NY: Springer Science+Business Media, LLC; 2010

[3] 3. Rezaeian M, Dunn G, St. Leger S, Appleby L. Geographical epidemiology, spatial analysis and geographical information systems: A multidisciplinary glossary. Journal of Epidemiology and Community Health. 2007;61(2):98-102. DOI: 10.1136/jech.2005.043117

[4] 4. Chowell G, Rothenberg R. Spatial infectious disease epidemiology: On the cusp. BMC Medicine. 2018;16(1):1-5. DOI: 10.1186/s12916-018-1184-6

[5] 5. Mohebbi M, Wolfe R, Jolley D. A poisson regression approach for modelling spatial autocorrelation between geographically referenced observations. BMC Medical Research Methodology. 2011;11(1):1-11. DOI: 10.1186/1471-2288-11-133

[6] 6. Kauhl B, Schweikart J, Krafft T, Keste A, Moskwyn M. Do the risk factors for type 2 diabetes mellitus vary by location? A spatial analysis of health insurance claims in Northeastern Germany using kernel density estimation and geographically weighted regression. International Journal of Health Geographics. 2016;15(1):1-12. DOI: 10.1186/s12942-016-0068-2

[7] 7. Ben-Ahmed K, Aoun K, Jeddi F, Ghrab J, El-Aroui MA, Bouratbine A. Visceral leishmaniasis in Tunisia: Spatial distribution and association with climatic factors. The American Journal of Tropical Medicine and Hygiene. 2009;81(1):40

[8] 8. Lipner EM, Knox D, French J, Rudman J, Strong M, Crooks JL. A geospatial epidemiologic analysis of nontuberculous mycobacterial infection: An ecological study in Colorado. Annals of the American Thoracic Society. 2017;14(10):1523-1532

[9] 9. Hira FS, Asad A, Farrah Z, Basit RS, Mehreen F, Muhammad K. Patterns of occurrence of dengue and chikungunya, and spatial distribution of mosquito vector Aedes albopictus in Swabi district, Pakistan. Trop Med Int Heal. 2018;23(9):1002-1013. DOI: 10.1111/tmi.13125

[10] 10. Slater H, Michael E. Mapping, Bayesian geostatistical analysis and spatial prediction of lymphatic filariasis prevalence in Africa. PLoS One. 2013;8(8):28-32. DOI: 10.1371/journal.pone.0071574

[11] 11. Mayfield HJ, Lowry JH, Watson CH, Kama M, Nilles EJ, Lau CL. Use of geographically weighted logistic regression to quantify spatial variation in the environmental and sociodemographic drivers of leptospirosis in Fiji: A modelling study. Lancet Planet Heal. 2018;2(5):223-232. DOI: 10.1016/S2542-5196(18)30066-4

[12] 12. Zhou YB, Wang QX, Liang S, Gong YH, Yang MX, Chen Y, et al. Geographical variations in risk factors associated with HIV infection among drug users in a prefecture in Southwest China. Infectious Diseases of Poverty. 2015;4(1):1-10. DOI: 10.1186/s40249-015-0073-x

[13] 13. Zhou H, Hanson T, Zhang J. SpBayesSurv: Fitting bayesian spatial survival models using R. Journal of Statistical Software. 2020;92(9):1-33. DOI: 10.18637/jss.v092.i09

[14] 14. Banerjee S, Wall MM, Carlin BP. Frailty modeling for spatially correlated survival data, with application to infant mortality in Minnesota. Biostatistics. 2003;4(1):123-142. DOI: 10.1093/biostatistics/4.1.123

[15] 15. Mahanta KK, Hazarika J, Barman MP, Rahman T. An application of spatial frailty models to recovery times of COVID-19 patients in India under Bayesian approach. Journal of Scientific Research. 2021;65(03):150-155. DOI: 10.37398/JSR.2021.650318

[16] 16. Aswi A, Cramb S, Duncan E, Hu W, White G, Mengersen K. Bayesian spatial survival models for hospitalisation of dengue: A case study of Wahidin hospital in Makassar, Indonesia. International Journal of Environmental Research and Public Health. 2020;17(3):1-12. DOI: 10.3390/ijerph17030878

[17] 17. Martins R, Silva GL, Andreozzi V. Bayesian joint modeling of longitudinal and spatial survival AIDS data. Statistics in Medicine. 2016;35(19):3368-3384. DOI: 10.1002/sim.6937

[18] 18. Zhou H, Hanson T, Jara A, Zhang J. Modeling county level breast cancer survival data using a covariate-adjusted frailty proportional hazards model. The Annals of Applied Statistics. 2015;9(1):43-68. DOI: 10.1214/14-AOAS793

[19] 19. Chiu Y, Chebana F, Abdous B, Bélanger D, Gosselin P. Mortality and morbidity peaks modeling: An extreme value theory approach. Statistical Methods in Medical Research. 2018;27(5):1498-1512. DOI: 10.1177/0962280216662494

[20] 20. Thomas M, Lemaitre M, Wilson ML, Viboud C, Yordanov Y, Wackernagel H, et al. Applications of extreme value theory in public health. PLoS One. 2016;11(7):3-9. DOI: 10.1371/journal.pone.0159312

[21] 21. De Zea BP, Mendes Z. Extreme value theory in medical sciences: Modeling total high cholesterol levels. J Stat Theory Pract. 2012;6(3):468-491. DOI: 10.1080/15598608.2012.695673

[22] 22. Gelfand AE, Schliep EM. Spatial statistics and gaussian processes: A beautiful marriage. Spatial Statistics. 2016;18:86-104. DOI: 10.1016/j.spasta.2016.03.006

[23] 23. R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2021. URL https://www.R-project.org/

[24] 24. Ross SM et al. Stochastic Processes. Vol. 2. New York: Wiley; 1996

[25] 25. Grigoriu M. Stochastic processes. In: Calculus S, editor. Birkhäuser. Boston: MA; 2002. pp. 103-204

[26] 26. Diggle PJ, Ribeiro PJ, Christensen OF. An introduction to model-based geostatistics. In: Møller J, editor. Spatial Statistics and Computational Methods. New York: Springer Verlag; 2003. pp. 43-86

[27] 27. Cressie N, Wikle CK. Statistics for Spatio-Temporal Data. John Wiley & Sons; 2015

[28] 28. Ribatet M. A user’s Guide to the SpatialExtremes Package. Lausanne, Switzerland: EPFL; 2009

[29] 29. Chilès JP, Delfiner P. Geostatistics: Modeling Spatial Uncertainty. New York: Wiley Series In Probability and Statistics; Vol. 497; 2009

[30] 30. Banerjee S, Carlin BP, Gelfand AE. Hierarchical Modeling and Analysis for Spatial Data. 2nd ed. United States of America: Chapman and Hall/CRC; 2004. p. 472

[31] 31. Li J, Heap A. A review of comparative studies of spatial interpolation methods in environmental sciences: Performance and impact factors. Ecol Inform. 2011;6(3–4):228-241. DOI: 10.1016/j.ecoinf.2010.12.003

[32] 32. Ribeiro PJ, Diggle PJ. geoR: A package for geostatistical analysis. R-NEWS. 2001;1:15-18

[33] 33. Goovaerts P. Geostatistics for Natural Resources Evaluation. New York: Oxford University Press; 1997

[34] 34. Anderson SC, Ward EJ. Black swans in space: Modeling spatiotemporal processes with extremes. Ecology. 2019;100(1):1-23. DOI: 10.1002/ecy.2403

[35] 35. Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences. Statistical Science. 1992;7(4):457-511. DOI: 10.1214/ss/1177011136

[36] 36. Nelder J, Wedderburn R. Generalized linear models. Journal of the Royal Statistical Society Series A (General). 1972;135(3):370-384. DOI: 10.2307/2344614

[37] 37. McCullagh P, Nelder JA. Generalized Linear Models. 2nd ed. London: Chapman and Hall; 1989

[38] 38. Breslow NE, Clayton DG. Approximate inference in generalized linear mixed models. Journal of the American Statistical Association. 1993;88(421):9-25

[39] 39. Diggle PJ, Tawn JA, Moyeed RA. Model-based geostatistics. Journal of the Royal Statistical Society: Series C (Applied Statistics). 1998;47(3):299-350

[40] 40. Jing L, De Oliveira V. Geocount: An R package for the analysis of geostatistical count data. Journal of Statistical Software. 2015;63:1-33. DOI: 10.18637/jss.v063.i11

[41] 41. Christensen OF, Waagepetersen R. Bayesian prediction of spatial count data using generalized linear mixed models. Biometrics. 2002;58(2):280-286. DOI: 10.1111/j.0006-341x.2002.00280.x

[42] 42. Christensen OF, Ribeiro PJ Jr. geoRglm-a package for generalised linear spatial models. R News. 2002;2(2):26-28

[43] 43. Ribeiro PJ Jr, Christensen OF, Diggle PJ. geoR and geoRglm: Software for model-based geostatistics. Hornik K, Leisch F, Zeileis A, editors. Vienna: 3rd International Workshop on Distributed Statistical Computing (DSC 2003): 20-22 March 2003; p. 2

[44] 44. Cox DR. Regression models and life-tables. J R Stat Soc [B]. 1972;34(2):187-202. DOI: 10.1111/j.2517-6161.1972.tb00899.x

[45] 45. Buckley J, James I. Linear regression with censored data. Biometrika. 1979;66(3):429-436. DOI: 10.2307/2335161

[46] 46. Wei L. The accelerated failure time model: A useful alternative to the cox regression model in survival analysis. Statistics in Medicine. 1992;11(14–15):1871-1879. DOI: 10.1002/sim.4780111409

[47] 47. Zhang J, Lawson AB. Bayesian parametric accelerated failure time spatial model and its application to prostate cancer. Journal of Applied Statistics. 2011;38(3):591-603. DOI: 10.1080/02664760903521476

[48] 48. Zhou H, Hanson T, Zhang J. spBayesSurv: Fitting Bayesian Spatial Survival Models Using R. 2017. arXiv preprint arXiv:1705.04584

[49] 49. Taylor BM, Rowlingson BS. Spatsurv: An R package for bayesian inference with spatial survival models. Journal of Statistical Software. 2017;77(4):1-32. DOI: 10.18637/jss.v077.i04

[50] 50. Coles S, Bawa J, Trenner L, Dorazio P. An Introduction to Statistical Modeling of Extreme Values. Vol. 208. London: Springer; 2001. p. 208

[51] 51. Davison AC, Padoan SA, Ribatet M. Statistical modeling of spatial extremes. Statistical Science. 2012;27(2):161-186. DOI: 10.1214/11-STS376

[52] 52. Casson E, Coles S. Spatial regression models for extremes. Extremes. 1999;1(4):449-468. DOI: 10.1023/A:1009931222386

[53] 53. Ribatet M. SpatialExtremes: An R Package for Modelling Spatial Extremes. R Package Version 2.1-0. 2020. Available from: https://cran.r-project.org/web/packages/SpatialExtremes/

Spatial Modeling in Epidemiology

Recent Advances in Medical Statistics

Abstract

Keywords

Author Information

María Guzmán Martínez*

Eduardo Pérez-Castro

Ramón Reyes-Carreto

Rocio Acosta-Pech

1. Introduction

2. Gaussian processes

Table 1.

3. Gaussian spatial model

4. Generalized linear spatial models

5. Spatial survival models

6. Spatial generalized extreme value model

7. Conclusions

Conflict of interest

Abbreviations

References

Spatial Statistics in Vector-Borne Diseases

Spatial Modeling in Epidemiology

Recent Advances in Medical Statistics

Abstract

Keywords

Author Information

María Guzmán Martínez*

Eduardo Pérez-Castro

Ramón Reyes-Carreto

Rocio Acosta-Pech

1. Introduction

2. Gaussian processes

Table 1.

3. Gaussian spatial model

4. Generalized linear spatial models

5. Spatial survival models

6. Spatial generalized extreme value model

7. Conclusions

Conflict of interest

Abbreviations

References

Continue reading from the same book

Recent Advances in Medical Statistics