Open access peer-reviewed chapter

# Advances in Spatio-Temporal Modeling and Prediction for Environmental Risk Assessment

By S. De Iaco, S. Maggio,M. Palma and D. Posa

Submitted: November 20th 2011Reviewed: July 2nd 2012Published: August 22nd 2012

DOI: 10.5772/51227

## 1. Introduction

Meteorological readings, hydrological parameters and many measures of air, soil and water pollution are often collected for a certain span, regularly in time, and at different survey stations of a monitoring network. Then, these observations can be viewed as realizations of a random function with a spatio-temporal variability. In this context, the arrangement of valid models for spatio-temporal prediction and environmental risk assessment is strongly required. Spatio-temporal models might be used for different goals: optimization of sampling design network, prediction at unsampled spatial locations or unsampled time points and computation of maps of predicted values, assessing the uncertainty of predicted values starting from the experimental measurements, trend detection in space and time, particularly important to cope with risks coming from concentrations of hazardous pollutants. Hence, more and more attention is given to spatio-temporal analysis in order to sort out these issues.

Spatio-temporal geostatistical techniques provide useful tools to analyze, interpret and control the complex evolution of various variables observed by environmental monitoring networks. However, in the literature there are no specialized monographs which contain a thorough presentation of multivariate methodologies available in Geostatistics, especially in a spatio-temporal context. Several authors have developed different multivariate models for analyzing the spatial and spatio-temporal behavior of environmental variables, as it is clarified in the following brief review.

In multivariate spatial analysis, direct and cross correlations for the variables under study are quantified by estimating and modeling the matrix variogram. The difficulty in modeling this matrix function, especially the off diagonal entries of the same matrix, has been first faced by using the linear coregionalization model (LCM), proposed by [Matheron, 1982].

For matrix covariance functions, [Gneiting et al., 2010] constructed a parametric family of symmetric covariance models for stationary and isotropic multivariate Gaussian spatial random fields, where both the diagonal and off diagonal entries are of the Matérn type. In the bivariate case, they provided necessary and sufficient conditions for the positive definiteness of the second-order structure, whereas for the other multivariate cases they suggested a parsimonious model which imposes restrictions on the scale and cross-covariance smoothness parameters. In the bivariate case, where the smoothness parameter is the same for both covariance functions, the Gneiting model is a simplified LCM. The Gneiting cross-covariance model also assumes that the scale parameter is the same for all the covariance functions and the cross-covariance functions. Both the LCM and Gneiting constructions for cross-covariances result in symmetric models; however, no distributional assumptions are required for using a LCM, which can easily incorporate components with compact support and multiple ranges and an unbounded variogram component.

Although models for multivariate spatial data have been extensively explored [Wackernagel, 2003, Gelfand et al., 2004, Ver hoef Barry, 1998], models for multivariate spatio-temporal data have received relatively less attention. In the literature, it is common to use classical techniques for multivariate spatial and temporal analysis [Wackernagel, 2003, De Iaco et al., 2001b]. Recently, canonical correlation analysis was combined with space-time geostatistical tools for detecting possible interactions between two groups of variables, associated with pollutants and atmospheric conditions [De Iaco, 2011]. In the dynamic modeling framework, there are some results in studying the spatio-temporal variability of several correlated variables: [Gelfand et al., 2005], for example, extended univariate spatio-temporal dynamic models to multivariate dynamic spatial models. Moreover, [Li et al., 2008] proposed a methodology to evaluate the appropriateness of several common assumptions, such as symmetry, separability and linear model of coregionalization, on multivariate covariance functions in the spatio-temporal context, while [Choi et al., 2009] proposed a spatio-temporal LCM where the multivariate spatio-temporal process was expressed as a linear combination of independent Gaussian processes in space-time with mean zero and a separable spatio-temporal covariance. [Apanasovich Genton, 2010] considered some solutions to the symmetry problem; moreover, they proposed a class of cross-covariance functions for multivariate random fields based on the work of [Gneiting, 2002]. The maximum likelihood estimation of heterotopic spatio-temporal models with spatial LCM components and temporal dynamics was developed by [Fassò Finazzi(2011)]. A GSLib [Deutsch Journel, 1998] routine for cokriging was properly modified in [De Iaco et al., 2010] to incorporate the spatio-temporal LCM, previously developed using the generalized product-sum variogram model [De Iaco et al., 2003]. Recently, in [De Iaco et al., 2012a] an automatic procedure for fitting the spatio-temporal LCM using the product-sum variogram model has been presented and some computational aspects, analytically described by a main flow-chart, have been discussed. In [De Iaco et al., 2012 b] simultaneous diagonalization of the sample matrix variograms has been used to isolate the basic components of a spatio-temporal LCM and it has been illustrated how nearly simultaneous diagonalization of the cross-variogram matrices simplifies modeling of the matrix variogram.

In the following, after an introduction of the theoretical framework of the multivariate spatio-temporal random function and its features (Section 2), a review of recent techniques for building admissible models is proposed (Section 3). Successively, the spatio-temporal LCM, its assumptions and appropriate statistical tests are presented (Section 4) and techniques for prediction and risk assessment maps are introduced (Section 5). Some critical aspects regarding sampling, modeling and computational problems are discussed (Section 6). Finally, a case study concerning particle pollution (PM10) and two atmospheric variables (Temperature and Wind Speed) in the South of Apulian region (Italy), has been presented (Section 7). Before using the spatio-temporal LCM to describe the spatio-temporal multivariate correlation structure among the variables under study, its adequacy with respect to the data has been analyzed; in particular, the assumption of symmetry of the cross-covariance function, has been properly tested [Li et al., 2008]. By using a recent fitting procedure [De Iaco et al., 2012 b], based on the simultaneous diagonalization of several symmetric real-valued matrix variograms, the basic structures of the spatio-temporal LCM which describes the spatio-temporal correlation among the variables, have been easily detected. Predictions of the primary variable (PM10) are obtained by using a modified GSLib program, called COK2ST [De Iaco et al., 2010]. Then, risk maps showing the probability that the particle pollution exceeds the national law limit have been associated to predition maps and the estimation of the probability distributions for two sites of interest have been produced.

## 2. Multivariate spatio-temporal random function

Let Z(u)=[Z1(u),,Zp(u)]T,be a vector of pspatio-temporal random functions (STRF) defined on the domainD×Td+1, with(d3), then
{Z(u),  u=(s,t)D×Td+1},E1

represents a multivariate spatio-temporal random function ( MSTRF), where s=(s1,,sd)are the coordinates of the spatial domain Ddand tthe coordinate of the temporal domainT.

Afterwards, the MSTRF will be denoted with Zand its components withZi. The pSTRF Zi, i=1,,p,are the components of Zand they are associated to the spatio-temporal variables under study; these components are called coregionalized variables [Goovaerts, 1997].

The observations zi(uα),i=1,,p,  α=1,,Ni,of the pvariablesZi, at the pointsuαD×T, are considered as a finite realization of a MSTRF Z.

### 2.1. Moments of a MSTRF

Given a MSTRF Z, with pcomponents, we define, if they exist and they are finite:

• the expected value, or first-order moment of each componentZi,

E[Zi(u)]=mi(u),    uD×T,  i=1,,p;E2
• the second-order moments,

1. the variance of each componentZi,

Var[Zi(u)]=E[Zi(u)mi(u)]2,    uD×T,  i=1,,p;E3

2. the cross-covariance for each pair of STRF (Zi, Zj),ij,

Cov[Zi(u),Zj(u')]=Cij(u,u')=E[(Zi(u)mi(u))(Zj(u')mj(u'))],E4
u,u'D×T,  i,j=1,,p,  ij;E5

3. the cross-variogram for each pair of STRF (Zi, Zj),ij,

2 γij(u,u')=Cov[(Zi(u)Zi(u')),(Zj(u)Zj(u'))],E6
u,u'D×T,  i,j=1,,p,  ij.E7

Note that fori=j, we obtain:

• the covariance of the STRFZi, called direct covariance, or simply covariance,

Cii(u,u')=E[(Zi(u)mi(u))(Zi(u')mi(u'))],E8

with

u,u'D×T;E9
• the direct variogram of the STRFZi,

2 γii(u,u')=Var[(Zi(u)Zi(u')],    u,u'D×T.E10

These moments describe the basic features of a MSTRF, such as the spatio-temporal correlation for each variable and the cross-correlation among the variables.

In multivariate Geostatistics, admissibility conditions concern both the cross-covariances and the cross-variograms, as described in the following.

Let Zbe a MSTRF, with componentsZi,i=1,,p, and let {u1,,uN}a set of Npoints of a spatio-temporal domainD×T; the direct and cross-covariances of the MSTRF must satisfy the following inequality:
i=1p j=1p α=1N β=1NλαiλβjCij(uαuβ)0,E11

for any choice of the Npoints uαand for any choice of the weightsλαi. Using the matrix notation, the (p×p)matrices C(uαuβ)=[Cij(uαuβ)]of the direct and cross-covariances of the STRF Zi(uα)and Zj(uβ)will be admissible if they satisfy the following condition:

α=1Nβ=1NλαTC(uαuβ)λβ0,E12

where λα=[λα1,,λαp]Tis a (p×1)vector of weightsλαi.

As in the univariate case, the (p×p)matrices Γ(uαuβ)=[γij(uαuβ)]of the direct and cross-variograms of the STRF Zi(uα)and Zj(uβ)will be admissible if, for any choice of the N pointsuα, they satisfy the following condition

α=1Nβ=1NλαTΓ(uαuβ)λβ0,E13

under the constraint: 3mm α=1Nλα=0.

### 2.3. Stationarity hypotheses

Stationarity hypotheses allow to make inference on the MSTRF. In particular, second-order stationarity and intrinsic hypotheses concern the first and second-order moments of the MSTRF.

#### 2.3.1. Second-order stationarity

A MSTRF Z, with p components, is second-order stationary if:

• for any STRF Zi, i,,p,

E[Zi(u)]=mi,        uD×T, i=1,,p;E14
• for any pair of STRF Ziand Zj, i,j=1,,p,the cross-covariance Cijdepends only on the spatio-temporal separation vector h=(hs,ht)between the points u and u + h:

Cij(h)=E[(Zi(u+h)mi)  (Zj(u)mj)]=E[Zi(u+h)  Zj(u)]mi mj,E15

whereu,u+hD×T, i,j=1,,p. Fori=j, the direct covariance function of the STRF Ziis obtained.

There exist several physical phenomena for which neither variance, nor the covariance exist, however it is possible to assume the existence of the variogram.

#### 2.3.2. Intrinsic hypotheses

A MSTRF Z, with pcomponents, satisfies the intrinsic hypotheses if:

• for any STRF Zi, i=1,,p,

E[Zi(u+h)Zi(u)]=0  ,        u,u+hD×T,  i=1,,p;E16
• for any pair of STRF Ziand Zj, i,j=1,,p,the cross-variogram exists and it depends only on the spatio-temporal separation vector h:

2 γij(h)=Cov[(Zi(u+h)Zi(u)),(Zj(u+h)Zj(u))],E17

where

u,u+hD×T,  i,j=1,,p.E18

Second-order stationarity implies the existence of the intrinsic hypotheses, however the converse is not true. Intrinsic hypotheses imply that the cross-variogram can be expressed as the expected value of the product of the increments:

γij(h)=12 E{[Zi(u+h)Zi(u)][Zj(u+h)Zj(u)]},E19
u,u+hD×T,  i,j=1,,p.Fori=j, the direct variogram of the STRF Ziis obtained.

#### 2.3.3. Properties of the cross-covariance for second-order stationary MSTRF

Given a second-order stationary MSTRF, the cross-covariance satisfies the properties listed below.

The cross-covariance is not invariant with respect to the exchange of the variables:

Cij(h)Cji(h),        ij,E20

as well as it is not invariant with respect to the sign of the vectorh:

Cij(h)Cij(h),        ij.E21

However, the cross-covariance is invariant with respect to the joint exchange of the variables and the sign of the vector h:

Cij(h)=Cji(h).E22
(13)

#### 2.3.4. Properties of the cross-variogram for intrinsic MSTRF

Afterwards, the main properties of the cross-variogram for intrinsic MSTRF are given.

1. The cross-variogram vanishes at the origin, that is:

γij(0)=0.E23
1. The cross-variogram is invariant with respect to the exchange of the variables:

γij(h)=γji(h).E24
1. The cross-variogram is invariant with respect to the sign of the vector h:

γij(h)=γij(h).E25

From (15) and (16) follows that the cross-variogram is completely symmetric, as it will be pointed out in the next sections.

#### 2.3.5. Separability for a MSTRF

The cross-covariance Cijfor a second-order stationary MSTRF Z is separable if:

Cij(h)=ρ(h) aij,        h=(hs,ht)D×T,  i,j=1,,p,E26

where aijare the elements of a (p×p)positive definite matrix and ρ()is a correlation function. In this case, it results:

Cij(h)/Cij(h')=ρ(h)/ρ(h'),    h,h'D×T,  i,j=1,,p,E27

hence the changes of the cross-covariance functions, with respect to the changes of the vector h, do not depend on the pair of the STRFZi,Zj.

The cross-covariance Cijfor a second-order stationary MSTRF Z is fully separable if:

Cij(hs,ht)=ρS(hs) ρT(ht)aij,    (hs,ht)D×T,  i,j=1,,p,E28

where aijare the elements of a (p×p)positive definite matrix, ρS()is a spatial correlation function and ρT()is a temporal correlation function. In the literature, many statistical tests for separability have been proposed and are based on parametric models [Brown et al., 2000, Guo, 1998, Shitan Brockwell, 1995], likelihood ratio tests and subsampling [Mitchell et al., 2005] or spectral methods [Fuentes, 2006, Scaccia Martin, 2005].

#### 2.3.6. Symmetry for a MSTRF

The cross-covariance Cijof a second-order stationary MSTRF Z, with pcomponents, is symmetric if:

Cij(h)=Cij(h),    hD×T,  i,j=1,,p,E29

or, equivalently, if:

Cij(h)=Cji(h),    hD×T,  i,j=1,,p.E30

The cross-covariance Cijof a second-order stationary MSTRF Z, with pcomponents, is fully symmetric if:

Cij(hs,ht)=Cij(hs,ht),    (hs,ht)D×T,  i,j=1,,p,E31

or, equivalently,

Cij(hs,ht)=Cij(hs,ht),    (hs,ht)D×T,  i,j=1,,p.E32

Atmospheric, environmental and geophysical processes are often under the influence of prevailing air or water flows, resulting in a lack of full symmetry [de Luna Genton, 2005, Gneiting, 2002, Stein, 2005].

Fig. 1 summarizes the relationships between separability, symmetry, stationarity and the LCM in the general class of the cross-covariance functions of a MSTRF Z. If a cross-covariance is separable, then it is symmetric, however, in general, the converse is not true. Moreover, the hypothesis of full separability is a special case of full symmetry.

Several tests to check symmetry and separability of cross-covariance functions can be found in the literature [Scaccia Martin, 2005, Lu Zimmerman, 2005a, Lu Zimmerman, 2005b, Li et al., 2008].

## 3. Techniques for building admissible models

In the following, a brief review of the most utilized techniques to construct admissible cross-covariance models is presented.

1. For the intrinsic correlation model, the matrices Care described by separable cross-covariancesCij, i,j=1,,p[Mardia Goodall, 1993], that is:

Cij(uα,uβ)=ρ(uα,uβ)aij,E33

where the coefficients aijare the elements of a (p×p)positive definite matrix, and ρ(,)is a correlation function. However, this model is not flexible enough to handle complex relationships between processes, because the cross-covariance function between components measured at each location always has the same shape regardless of the relative displacement of the locations. As it will be discussed in the next section, the LCM is a straightforward extension of the intrinsic correlation model.

1. In the kernel convolution method [Ver hoef Barry, 1998] the cross-covariance functions is represented as follows:

Cij(uα,uβ)=d+1d+1ki(uαu)kj(uβu')ρ(uu')dudu',E34

where the kiare square integrable kernel functions and ρis a valid stationary correlation function. This approach assumes that all the variablesZi, i=1,,p, are generated by the same underlying process, which is very restrictive. Moreover, this model and its parameters lack interpretability and, except for some special cases, it requires Monte Carlo integration.

1. In the covariance convolution for stationary processes [Gaspari Cohn, 1999, Majumdar Gelfand, 2007] the cross-covariance functions is represented as follows:

Cij(h)=d+1Ci(hh')Cj(h')dh',E35

where Ciare second-order stationary covariances. The motivation and interpretation of the resulting cross-dependency structure is rather unclear. Although some closed-form expressions exist, this method usually requires Monte Carlo integration.

1. Recently, an approach based on latent dimensions and existing covariance models for univariate random fields, has been proposed; the idea is to develop flexible, interpretable and computationally feasible classes of cross-covariance functions in closed form [Apanasovich Genton, 2010].

## 4. Linear coregionalization model

LCM is based on the hypothesis that each direct or cross-variogram (covariogram) can be represented as a linear combination of some basic models and each direct or cross-variogram (covariogram) must be built using the same basic models [Journel Huijbregts, 1981].

LCM is utilized in several applications because of its flexibility, moreover it encouraged the development of algorithms able to estimate quickly the parameters of the selected model, assuring the admissibility conditions, even in presence of several variables [Goovaerts, 1997, Goulard, 1989, Goulard Voltz, 1992].

Let Z be a second-order stationary MSTRF with p components, Zi,  i=1,,p, the direct and cross-covariances for the spatio-temporal LCM are defined as follows:

Cij(h)=Cov[Zi(u),Zj(u+h)]=l=1Lbijl  cl(h),          i,j=1,,p,E36

where clare covariances, called basic structures, and the non-negative coefficients bijlsatisfy the following property:

bijl=bjil,    i,j=1,,p;E37

hence, in the LCM, it is assumed that:

Cij(h)=Cij(h),E38
Cij(h)=Cji(h),E39

with

i,j=1,,p,  ij.E40

The matrix C for the second-order stationary Z is built as follows:

C(h)=l=1LBl cl(h).E41

Analogously, it is also possible to introduce the LCM for a MSTRF which satisfies the intrinsic hypotheses. In such a case, the direct and cross-variograms are built as follows:

γij(h)=l=1Lbijl  gl(h)  ,        i,j=1,,p,E42

where each basic structure glis a variogram and the L matrices Blof the coefficientsbijl, corresponding to the sill values of the basic modelsgl, are positive definite.

Then, for a MSTRF which satisfies the intrinsic hypotheses the matrix Γof the direct and cross-variograms is:

Γ(h)=l=1LBl gl(h).E43

The necessary and sufficient conditions because the model defined in (17) and (20) are admissible are:

1. cl(gl)
2. the matricesBl=[bijl], called coregionalization matrices, must be positive definite.

The necessary, but not sufficient conditions, for the coefficients bijlare the following:

a)
biil0        i=1,,p,  l=1,,L;E44
b)
|bijl|biil  bjjl        i,j=1,,p,  l=1,,L.E45

The basic structures gl(h)=gl(hs,ht)of the spatio-temporal LCM (20) can be modelled by using several spatio-temporal variogram models known in the literature, such as the metric model [Dimitrakopoulos Luo, 1994], Cressie-Huang models [Cressie Huang, 1999], Gneiting models [Gneiting, 2002] and many others [De Iaco et al., 2002, Kolovos et al., 2004, Ma, 2002, Ma, 2005, Porcu, 2008, Stein, 2005].

As discussed in [De Iaco et al., 2003], each basic spatio-temporal structure, gl(hs,ht), l=1,,L, can be modelled as a generalized product-sum variogram [De Iaco et al., 2001a], that is

gl(hs,ht)=γl(hs,0)+γl(0,ht)kl  γl(hs,0)  γl(0,ht),        l=1,,L,E46

where

γl(hs,0)and γl(0,ht)are, respectively, the marginal spatial and temporal variograms at lth scale of variability; kl, defined as follows
kl=sill[γl(hs,0)]+sill[γl(0,ht)]sill[gl(hs,ht)]sill[γl(hs,0)]  sill[γl(0,ht)],E47

is the parameter of generalized product-sum variogram model and it is such that

0<kl1max{sill[γl(hs,0)];  sill[γl(0,ht)]}.E48

The inequality (23) represents a necessary and sufficient condition in order that each basic structuregl(hs,ht), withl=1,,L, is admissible. Recently, it was shown that strict conditional negative definiteness of both marginals is a necessary as well as a sufficient condition for the generalized product-sum (21) to be strictly conditionally negative definite [De Iaco et al., 2011a, De Iaco et al., 2011 b].

Substituting (21) in (20), the spatio-temporal LCM can be defined through two marginals: one in space and one in time, i.e.:

Γ(hs,0)=l=1LBl  γl(hs,0),        Γ(0,ht)=l=1LBl  γl(0,ht).E49

Using the generalized product-sum variogram model it is possible:

1. to identify the different scales of variability and build the matrices Bl,  l=1,,L,by means of the direct and cross marginal variograms;

2. to describe the correlation structure of processes characterized by a different spatial and temporal variability.

### 4.1. Assumptions in the spatio-temporal LCM

Fitting a spatio-temporal LCM to the data requires the identification of the spatio-temporal basic variograms and the corresponding positive definite coregionalization matrices, however this is often a hard step to tackle. A recent approach [De Iaco et al., 2012b], based on the simultaneous diagonalization of a set of matrix variograms computed for several spatio-temporal lags, allows to determine the spatio-temporal LCM parameters in a very simple way.

In several environmental applications [Wackernagel, 2003], the cross-covariance function is not symmetric, as for example, in time series in presence of a delay effect, as well as in hydrology, for the cross-correlation between a variable and its derivative, such as water head and transmissivity [Thiebaux, 1990]. Hence, this assumption should be tested before fitting a spatio-temporal LCM.

A useful hint to verify the symmetry of the cross-covariance can be given by estimating all the pseudo cross-variograms [Myers, 1991] of the standardized variablesZ˜i,i=1,,p, i.e., γ˜ij(h)=0.5  Var [Z˜i(u)Z˜j(u+h)], i,j=1,,p, ij.If the differences between the estimated pseudo cross-variograms γ˜ij(h)and γ˜ji(h),i,j=1,,p,are zero or close to zero, then it could be assumed that the cross-covariances are symmetric.

The appropriateness of the assumption of symmetry of a spatio-temporal LCM can be tested by using the methodology proposed by [Li et al., 2008], based on the asymptotic joint normality of the sample spatio-temporal cross-covariances estimators. Given a set Λof user-chosen spatio-temporal lags and the cardinality cofΛ, let Gn={Cji(hs,ht):(hs,ht)Λ,i,j=1,,p}be a vector of cp2cross-covariances at spatio-temporal lags k=(hs,ht)inΛ. Moreover, let C^ji(hs,ht)be the estimator of Cji(hs,ht)based on the sample data in the spatio-temporal domainD×Tn, where Drepresents the spatial domain and Tn={1,,n}the temporal one, and define{C^ji(hs,ht):(hs,ht)Λ,i,j=1,,p}. Under the assumptions given in the above paper, |Tn|1/2(G^nG)dNcp2(0,Σ), where |Tn|Σconverges toCov(G^n,G^n). Then the tests for symmetry properties can be based on the following statistics

TS=|Tn|(AG^n)T(AΣAT)1(AG^n)dχa2,E50

where a is the row rank of the matrix A, which is such that AG=0under the null hypothesis.

Moreover, the choice of modeling the MSTRF Z by a spatio-temporal LCM is based on the prior assumption that the multivariate correlation structure of the variables under study is characterized by L (L2)scales of spatio-temporal variability. On the other hand, if the multivariate correlation of a set of variables does not present different scales of variability (L=1), then the cross-covariance functions are separable, i.e.,

Cij(h)=ρ(h) bij,        i,j=1,,p,E51

where bijare the entries of a (p×p)positive definite matrix B and ρ()is a spatio-temporal correlation function. Hence, as in the spatial context [Wackernagel, 2003], a spatio-temporal intrinsic coregionalization model can be considered.

Obviously, this last model is just a particular case (L=1) of the spatio-temporal LCM defined in (17) and it is much more restrictive than the linear model of coregionalization since it requires that all the variables have the same correlation function, with possible changes in the sill values. Note that, if a cross-covariance is separable, then it is symmetric.

Remarks

• In the spatio-temporal LCM, each component is represented as a linear combination of latent, independent univariate spatio-temporal processes. However, the smoothness of any component defaults to that of the roughest latent process, and thus the standard approach does not admit individually distinct smoothness properties, unless structural zeros are imposed on the latent process coefficients [Gneiting et al., 2010].

• In most applications, the wide use of the spatio-temporal LCM is justified by practical aspects concerning the admissibility condition for the matrix variograms (covariances). Indeed, it is enough to verify the positive definiteness of the coregionalization matrices, Bl, at all scales of variability.

• The spatio-temporal LCM allows unbounded variogram components to be used [Wackernagel, 2003].

## 5. Prediction and risk assessment in space-time

For prediction purposes, various cokriging algorithms can be found in the literature [Journel Huijbregts, 1981, Chilés Delfiner, 1999]. As a natural extension of spatial ordinary cokriging to the spatio-temporal context, the linear spatio-temporal predictor can be written as

Z^(u)=α=1N  Λα(u)Z(uα),E52

where u=(s,t)D×Tis a point in the spatio-temporal domain, uα=(s,t)αD×T,α=1,,N,are the data points in the same domain and Λα(u),  α=1,,N,are (p×p)matrices of weights whose elements λαij(u)are the weights assigned to the value of the jth variable, j=1,,p,at the αth data point, to predict the ith variable, i=1,,p, at the point uD×T.

The predicted spatio-temporal random vector Z^(u)at uD×T,is such that each component Z^i(u), i=1,,p,is obtained by using all information at the data pointsuα=(s,t)α  D×T, α=1,,N.

The matrices of weights Λα(u),  α=1,,N,are determined by ensuring the unbiased condition for the predictor Z^(u)and the efficiency condition, obtained by minimizing the error variance [Goovaerts, 1997].

The new.GSLib. routine COK2ST [De Iaco et al., 2010] produces multivariate predictions in space-time, for one or all the variables under study, using the spatio-temporal LCM (20) where the basic spatio-temporal variograms are modelled as generalized product-sum variograms. An application is also given in [De Iaco et al., 2005].

Similarly, for environmental risk assessment, the formalism of multivariate spatio-temporal indicator random function (MSTIRF) and corresponding predictor, have to be introduced. Let

I(u, z )=[I1(u,z1),,Ip(u,zp)]T,E53

be a vector of p spatio-temporal indicator random functions (STIRF) defined on the domainD×Td+1, with(d3), as follows

Ii(u, zi )={1if  Zi  is  not  greater  (or  not  less)  than  the  threshold  zi ,0otherwise E54

wherez =[z1,,zp]T. Then

{I(u,z ),  u=(s,t)D×Td+1},E55

represents a MSTIRF. In other words, for each coregionalized variable Zi,with i=1,,p,a STIRF Iican be appropriately defined. Then the linear spatio-temporal predictor (27) can be easily written in terms of the indicator random variablesIi, i=1,,p. If the spatio-temporal correlation structure of a MSTIRFis modelled by using the spatio-temporal LCM, based on the product-sum, the new GSLibroutine COK2ST [De Iaco et al., 2010] can be used to produce risk assessment maps, for one or all the variables under study. Ifp=1, the dependence of the indicator variable is characterized by the corresponding indicator variogram of I:2γ ST(h)= Var [I(s+hs,t+ht)I(s,t)], which depends solely on the lag vectorh=(hs,ht), (s,s+hs)D2and(t,t+ht)T2. After fitting a model forγ ST, which must be conditionally negative definite, ordinary kriging can be applied to generate the environmental risk assessment maps. In this case, the GSLibroutine K2ST [De Iaco and Posa, 2012] can be used for prediction purposes in space and time.

## 6. Some critical aspects

Multivariate geostatistical analysis for spatio-temporal data is rather complex because of several problems concerning:

a. sampling,

b. the choice of admissible direct and cross-correlation models,

c. the definition of automatic procedures for estimation and modeling.

Sampling problems

There exist several sampling techniques for multivariate spatio-temporal data, as specified herein. Let

Ui={uαD×T,α=1,,Ni},        i=1,,p,E56

be the sets of sampled points in the spatio-temporal domain for the p variables under study. It is possible to distinguish the following situations:

1. total heterotopy, where the sets of the sampled points are pairwise disjoint, that is

i,j=1,,p,  UiUj=,  ij;E57
1. partial heterotopy, where the sets of the sampled points are not pairwise disjoint, that is

ij  |  UiUj,    i,j=1,,p;E58
1. isotopy, where the sets of the sampled points coincide, that is

i,j=1,,p,    UiUj.E59

A special case of partial heterotopy is the so-called undersampling: in such a case, the points where a variable, called primary or principal, has been sampled, constitute a subset of the points where the remaining variables, called auxiliary or secondary, have been observed. The secondary variables provide additional information useful to improve the prediction of the primary variable.

Modeling problems

In Geostatistics, the main modeling problems concern the choice of an admissible parametric model to be fitted to the empirical correlation function (covariance or variogram). In particular, in multivariate analysis for spatio-temporal data it is important to identify an admissible model able to describe the correlation among several variables which describe the spatio-temporal process. In this context, it is suitable to underline that

• it is not enough to select an admissible direct variogram to model a cross-variogram;

• the direct variograms are positive functions, on the other hand cross-variograms could be negative functions;

• only some necessary conditions of admissibility are known to model a cross-variogram, as the Cauchy-Schwartz inequality, however sufficient conditions cannot be easily applied [Wackernagel, 2003].

The use of multivariate correlation models well-known in the literature, such as the LCM [Wackernagel, 2003], the class of non-separable and asymmetric cross-covariances, proposed by [Apanasovich Genton, 2010], or the parametric family of cross-covariances, where each component is a Matérn process [Gneiting et al., 2010], requires the identification of several parameters, especially in presence of many variables.

Moreover, estimation and modeling the direct and cross-correlation functions could be compromised by the sampling plan.

Computational problems

For spatial multivariate data different algorithms for fitting the LCM have been implemented in software packages. [Goulard, 1989] and [Goulard Voltz, 1992] described an iterative procedure to fit a LCM using a weighted least-squares like technique: this requires first fitting the diagonal entries, i.e. the basic variogram structures must be determined first. In contrast, [Xie Myers, 1995] and [Xie et al., 1995] developed an alternative method for modeling the matrix-valued variogram by near simultaneously diagonalizing the sample variogram matrices, without assuming any model for the variogram matrix; [Kunsch et al., 1997] proposed estimating the range parameters of a LCM using a non-linear regression method to fit the range parameters; [Lark Papritz, 2003] used simulated annealing to minimize a weighted sum of squares of differences between the empirical and the modelled variograms; [Pelletier et al., 2004] modified the Goulard and Voltz algorithm to make it more general and usable for generalized least-squares or any other least-squares estimation procedure, such as ordinary least-squares; [Zhang, 2007] developed an algorithm for the maximum-likelihood estimation for the purely spatial LCM and proved that the EM algorithm gives an iterative procedure based on quasi-closed-form formulas, at least in the isotopic case. Significant contributions concerning estimation and computational aspects of a LCM can be found in [Emery, 2010]. Unfortunately, for multivariate spatio-temporal data there does not exist software packages which perform in an unified way a) the structural analysis, b) a convenient graphical representation of the covariance (variogram) models fitted to the empirical ones and c) predictions. One of the solutions could be to extend the above mentioned techniques to space-time. Indeed, some routines already implemented in the GSLib software [Deutsch Journel, 1998] or in the module gstat of R, can only be applied in a multivariate spatial context.

In the last years, a new GSLib routine, called COK2ST, can be used to make predictions in the domain under study, utilizing the spatio-temporal LCM, based on the generalized product-sum model [De Iaco et al., 2010]. This routine could be merged with the automatic procedure for fitting the spatio-temporal LCM using the product-sum variogram model, presented in [De Iaco et al., 2012 a], in order to provide a complete and helpful package for the analyst who needs to obtain predictions in a spatio-temporal multivariate context. This is certainly the first step for other developments and improvements in this field.

## 7. Case study

In the present case study, the environmental data set, with a multivariate spatio-temporal structure, involves PM10daily concentrations, Temperature and Wind Speed daily averages measured at some monitored stations located in the South of Apulia region (Italy), from the 1st to the 23rd of November 2009. In particular, there are 28 PM10survey stations, 60 and 54 atmospherical stations for monitoring Temperature and Wind Speed, respectively. As it is highlighted in Fig. 2(a), over the domain of interest, almost all the PM10monitoring stations are either traffic or industrial stations, depending on the area where they are located (close to heavy traffic area or to industrialized area). The remaining monitoring stations are called peripheral. In Fig. 2(b), box plots of PM10daily concentrations classified by typology of survey stations are illustrated.

Fig. 3 shows the temporal profiles of the observed values. It is evident that low (high) values of Temperature and Wind Speed are associated with high (low) values ofPM10.

Note that, during the period of interest, the PM10threshold value fixed by National Laws for the human health protection (i.e. 50 μg/m3which should not be exceeded more than 35 times per year) has been overcome the 13rd, the 14th, the 15th and the 23rd of November 2009. Indeed, as regards this last aspect, it is worth noting that the highest PM10daily averages have been registered at all kinds of monitored stations, even at stations located at peripheral areas, likely for the transport effects caused by wind. As shown in Fig. 2(b), although the maximum PM10values registered at stations close to heavy traffic area are greater than the threshold value, it is evident that these values are less than the ones measured at the other stations.

Spatio-temporal modeling and prediction techniques have been applied in order to assess PM10risk pollution over the area of interest for the period 24-29 November 2009. In particular, the following aspects have been considered:

1. estimating and modeling spatio-temporal correlation among the variables; in the fitting stage of a spatio-temporal LCM, the recent procedure [De Iaco et al., 2012b] based on the nearly simultaneous diagonalization of several sample matrix variograms, has been applied and the product-sum variogram model [De Iaco et al., 2001a] has been fitted to the basic components;

2. spatio-temporal cokriging based on the estimated model, in order to obtain prediction maps for PM10pollution levels during the period 24-29 November 2009 and indicator kriging [Journel, 1983] in order to construct risk maps related to the probability that predicted PM10concentrations exceed the threshold value (50μg/m3) fixed by National Laws;

3. generating and comparing, for two sites of interest (one close to an industrial area and the other one close to a heavy traffic area), the probability distributions that PM10daily concentrations exceed some risk levels during the period 24-29 November 2009.

### 7.1. Modeling spatio-temporal LCM

Modeling the spatio-temporal correlation among the variables under study by using the spatio-temporal LCM, requires first to check the adequacy of such model. In particular, the symmetry assumption has been checked by

a. exploring the differences between the pseudo cross-variograms of the standardized variables (standardized by subtracting the mean value and dividing this difference by the standard deviation),

b. using the methodology proposed by [Li et al., 2008].

As regards point a), the largest absolute difference has been equal to 0.135 and has been observed among the differences between the two pseudo cross-variograms concerning PM10and Wind Speed standardized data. On the other hand, for the point b), three pairs composed by six stations, with consecutive daily average measurements have been selected for the test. These pairs of monitoring stations have been picked to be approximately along the SENWdirection, since the prevalent wind direction over the area under study and during the analyzed period wasSENW. Moreover, the temporal lag has been selected in connection with the largest empirical cross-correlations for all variable combinations; hence ht=1has been considered for all testing lags. Three different variables and three pairs of stations generate 9 testing pairs in the symmetry test, and consequently the degree of freedom a for the symmetry test is equal to 9. The test statistic TS (25) has been equal to 0.34 with a corresponding p-value equal to 0.99. Hence, the results from both the procedures have highlighted that the spatio-temporal LCM is suitable for the data set under study.

By using the recent fitting procedure [De Iaco et al., 2012b] based on the nearly simultaneous diagonalization of several sample matrix variograms computed for a selection of spatio-temporal lags, the basic independent components and the scales of spatio-temporal variability have been simply identified. In particular, the spatio-temporal surfaces of the variables under study have been computed for 7 and 5 user-chosen spatial and temporal lags, respectively (Fig. 4).

Then, the 35 symmetric matrices of sample direct and cross-variograms have been nearly simultaneous diagonalization in order to detect the independent basic components. In this way, 3 scales of spatial and temporal variability have been identified: 10, 18 and 31.5 km in space, and 2.5, 3.5 and 6 days in time.

Thus, the following spatio-temporal LCM has been fitted to the observed data:

Γ(hs,ht)=B1 g1(hs,ht)+B2 g2(hs,ht)+B3 g3(hs,ht),E60

where the spatio-temporal variograms gl(hs,ht),l=1,2,3,are modelled as a generalized product-sum model, i.e.

gl(hs,ht)=γl(hs,0)+γl(0,ht)klγl(hs,0) γl(0,ht).E61

The spatial and temporal marginal basic variogram models, γl(hs,0) and γl(0,ht),respectively, and the coefficients kl,l=1,2,3,previously defined in (22), are shown below:

γ1(hs,0)=86  Exp (||hs||;10),        γ1(0,ht)=165  Exp (|ht|;2.5),        k1=0.0057,E62
γ2(hs,0)=0.95  Exp (||hs||;18),        γ2(0,ht)=3.7  Exp (|ht|;3.5),        k2=0.02418,E63
γ3(hs,0)=0.29  Gau (||hs||;31.5),        γ3(0,ht)=0.83  Exp (|ht|;6),        k3=1.1633,E64

where Exp (;a)and Gau (;a)denote the well known exponential and Gaussian variogram models, with practical range a [Deutsch Journel, 1998].

Finally, the matrices Bl,l=1,2,3,of the spatio-temporal LCM (31), computed by the procedure described in [De Iaco et al., 2012 a], are the following:

B1=[0.9820.0440.0240.0440.0180.0060.0240.0060.007],  B2=[43.4212.6321.5502.6320.3950.1181.5500.1180.105],  E65
B3=[71.42910.1194.16710.1191.7260.4144.1670.4140.357].  E66

Fig. 5 shows the spatio-temporal variograms and cross-variograms fitted to the surfaces ofPM10, Temperature and Wind Speed daily averages. Then, the spatio-temporal LCM (31) has been used to produce prediction and risk assessment maps for PM10daily concentrations, as discussed hereafter.

### 7.2. Prediction maps and risk assessment

In order to obtain the prediction maps for PM10pollution levels for the period 24-29 November 2009, spatio-temporal cokriging has been applied, using the routine COK2ST [De Iaco et al., 2010]. Risk assessment maps have been associated to the prediction maps. Spatial indicator kriging has been applied to assess the probability that predicted PM10daily concentrations exceed the PM10threshold value fixed by National Laws for the human health protection (i.e. 50μg/m3), during the period of interest. Fig. 6 and Fig. 7 show contour maps of the predicted PM10values and the corresponding risk maps, for the period 24-29 November 2009. The red points on the maps represent the monitoring stations.

It is important to highlight that the highest PM10values are predicted in the Eastern part of the domain of interest: this area corresponds to the boundary between Lecce and Brindisi districts which is strictly close to an industrial site, such as the thermoelectric power station Enel-Federico II, located in Cerano (Brindisi district). Moreover, in this area the probability that PM10daily concentrations exceed 50 μg/m3is high during the predicted week. It is worth noting that on Saturday and Sunday the predicted values of PM10daily concentrations show lower average levels than the ones estimated during the working days, when heavy traffic contributes to keep pollution concentrations high; consequently, the corresponding risk maps do not show hazardous PM10conditions.

### 7.3. Probability distributions for different sites

After producing predicted maps of PM10daily concentrations, it is also interesting to estimate the probability distribution that PM10daily concentrations exceed some risk levels at sites characterized by sources of pollution.

Two different sites, one close to an industrialized area, located at Brindisi district, and the other one close to a heavy traffic area, located at Lecce district, have been considered in order to generate and compare the probability distributions that PM10daily concentrations overcome several risk levels during the period 24-29 November 2009. Fig. 8 shows that, all over the industrialized area of interest, the probability that PM10daily concentrations exceed the threshold fixed by National Laws (50μg/m3) is very high (greater than 80%) during the period 24-27 November 2009 (working days); on the other hand, during the weekend (28-29 November 2009) such a probability decreases at 40-42%.

Note that at the site close to a heavy traffic area, the probability that PM10daily concentrations exceed the national law limit is very low during the analyzed period, except on the 25th of November; moreover, it drops off rapidly for values below the national threshold. This empirical evidence highlights that there is no critical PM10exceeding for the selected traffic site, especially during the last 3 days of the week. As it is shown, this is a very powerful tool since any action of environmental protection might be adopted in advance by taking into account the actual likelihood of dangerous PM10exceeding.

## 8. Conclusions

In this paper, some significant theoretical and practical aspects for multivariate geostatistical analysis have been discussed and some critical issues concerning sampling, modeling and computational aspects, which should be faced, have been pointed out. The proposed multivariate geostatistical techniques have been applied to a case study pertaining particle pollution (PM10) and two atmospheric variables (Temperature and Wind Speed) in the South of Apulian region.

Further analysis regarding the integration of land use and possible sources of pollution through an appropriate geographical information system could be helpful to fully understand the dynamics ofPM10, which is still considered one of the most hazardous pollutant for human health.

## Acknowledgement

The authors are grateful to the Editor and the reviewers, whose comments contribute to improve the present version of the paper. This research has been partially supported by the 5per1000 project (grant given by University of Salento in 2011).

## How to cite and reference

### Cite this chapter Copy to clipboard

S. De Iaco, S. Maggio,M. Palma and D. Posa (August 22nd 2012). Advances in Spatio-Temporal Modeling and Prediction for Environmental Risk Assessment, Air Pollution - A Comprehensive Perspective, Budi Haryanto, IntechOpen, DOI: 10.5772/51227. Available from:

### chapter statistics

3Crossref citations

### Related Content

Next chapter

#### Old and New Air Pollutants: An Evaluation on Thirty Years Experiences

By Margherita Ferrante, Maria Fiore, Gea Oliveri Conti, Caterina Ledda, Roberto Fallico and Salvatore Sciacca

First chapter

#### Introduction to Infrared Spectroscopy

By Theophile Theophanides

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

View all Books