Commonly used models for semivariogram fitting in soil science
1. Introduction
Soil varies considerably from location to location (Nielsen et al., 1973). Knowledge on soil spatial variability is important in ecological modelling (Burrough, 1983; Corwin et al., 2006), environmental prediction (Trangmar et al., 1985), precision agriculture (Goderya, 1998), soil quality assessment (Heuvelink and Pebesma, 1999; McBratney et al., 2000), and natural resources management. Adequate understanding of the variability in soil properties as a function of space and time is necessary for developing logical, empirical and physical models of soil and landscape processes (Burrough, 1993; Foussereau et al., 1993; Wilding et al., 1994). Geostatistics, a widely used approach, has been used to identify the spatial structure in the variability of soil attributes (Vieira, 2000; Carvalho et al., 2002; Vieira et al., 2002). Semivariance function characterizes the spatial continuity between points. When the semivariance is plotted against the lag distance or separation distance between points, the plot is called semivariogram (Fig. 1; McBratney and Webster, 1986; Isaaks and Srivastava, 1989). The structure of the semivariogram is explained by three properties; the nugget, the sill and the range (Fig. 1). These spatial structures of semivariogram help in identifying autocorrelation and replicating samples, revealing dominant pattern in data series, identifying major ongoing processes (Si et al., 2007), designing experiments (Fagroud and van Meirvenne, 2002) and monitoring networks (Prakash and Singh, 2000), selecting proper data analysis method and interpreting data (Lambert et al., 2004), and assessing simulation and uncertainty analysis in a better way (Papritz and Dubois, 1999). The semivariogram structures also help to quantify spatial dependence between observations. Modelling of observed semivariance values helps in predicting the spatial distribution of attribute values (Goovaerts, 1998). The spatial distribution of attribute values is very important in separating random noise in semivariance, and interpolation and mapping analysis such as kriging (Deutsch and Journel, 1998; Nielsen and Wendorth, 2003).
The choice of theoretical models and its fitting procedure is very important to get a better prediction of unsampled locations (McBratney and Webster, 1986). Spherical model (Burgess and Webster, 1980; Vieira et al., 1981; Van Kuilenburg et al., 1982; Vauclin et al., 1983; Trangmar, 1985), exponential model (David, 1977), Gaussian model (McBratney and Webster, 1986) and linear plateau model (Burgess and Webster, 1980; Hajrasuliha et al., 1980; Vauclin et al., 1983) are important among the most commonly used semivariogram models in the field of soil science. The maximum likelihood method (Cressie, 1991) or least square regression (Vieira et al., 1981; Yost et al., 1982; Trangmar et al., 1985) including the weighted least square methods (Cressie, 1985) optimize the parameter value by minimizing the deviations of model prediction from the experimental semivariances. Small sum of deviations between the model and the experimental values indicate superior performance of the models. Small and comparable sum of the deviations indicates comparable performance of the models. Therefore, selection of model is prerequisite for better prediction (Burnham and Anderson, 2002). In reality, there can be several models to choose from. Selection of good models requires balancing of goodness of fit and complexity, which is generally, determined using a likelihood ratio approach leading to a chisquare test. The Akaike Information Criterion (AIC) values can be calculated from maximum likelihood function (Akaike, 1973), which is used to evaluate the performance of the models (Webster and McBratney, 1989). The smallest AIC value indicates the best model, but the values of parameters such as nugget, sill and range can be different from model to model, even though they have the same physical meaning (Trangmar, 1985). Different value of a parameter for different models clearly indicates the uncertainty associated with the parameter, which makes the interpretation difficult. Therefore, interpretation of a particular spatial process from the parameter values of a particular model will be biased. In addition the parameter value from individual models may have large uncertainty. This uncertainty is inherent to the model selection and is well beyond just the issue of determining best model(s). Model averaging is a technique designed to help in accounting the uncertainty associated with the models and their parameters.
The model averaging is not a new concept in statistics. People started combining and averaging models since 1970s for different purposes (Hoeting et al., 1999). During averaging models are assigned with weights based on their performance. The AIC values are used to calculate the weights for the models (Burnham and Anderson, 2002). Before the recent development of computational power, the averaging procedure ignored the uncertainty associated with models. Recently, the use of model averaging is increased to reduce the uncertainty associated with the model selection. Information on the use of model averaging procedure in soil science is scarce (Webster and McBratney, 1989) and there is no information of reducing uncertainty associated with the semivariogram model parameters from averaging of commonly used semivariogram model parameters. Therefore, the objective of this paper is to reduce the uncertainty associated with semivariogram model parameters through averaging. The weighted average of the parameter values of commonly used models can be a better way of describing the spatial processes.


Spherical Model  
Exponential Model  
Gaussian Model  
Linear Plateau Model 
2. Theory
Geostatistics can explain the spatial variability and the patterns of a variable in field from its autocorrelation. It describes the relationship between measurements at different locations (or times) separated by certain distance (or time). For example, a soil property
The variance (
where,
The experimental semivariogram can be fitted to a mathematical model. Not all the mathematical functions that seem to fit the observed values can be considered as semivariogram model. One important criterion for the semivariogram models is to be “positivedefinite” to ensure the nonnegative covariance values restricting the models to be permissible for fitting (Isaaks and Srivastava, 1989; Goovaerts, 1997; Deutsch and Journel, 1998). There are a number of permissible semivariogram models including the most commonly used spherical, exponential, Gaussian, and linear plateau (Table 1).
Among the fitting procedures, the weighted nonlinear least square method is the most robust and reliable method (Cressie, 1985). This procedure minimizes the residual sum of squared errors (RSS) between experimental semivariance data and the models by optimizing the model parameters: nugget, sill and range values. The RSS can be calculated using Eq. 3.
where,
where,
Or it can be estimated as in Eq. 6.
where,
where,
where,
where,
To explain the uncertainty associated with the parameter, we need to calculate the variance associated with each optimized parameter. If a model has
The
The
An additional derivative will give,
It is conventional to remove the factors of 2 by defining
For a linear equation, the second derivative term can be dismissed because it is zero or small enough when compared to the term involved in first order derivative. So the
So for a three parameter (
The approximation of the covariance matrix can be done by inversing the matrix
This matrix component indicates the variance associated with each parameter for different models. At the same time we need to calculate the variance associated with the averaged parameter. The parameter variance for each model and the weight assigned to the model are used to calculate the variance of the averaged parameter (Eq. 18) (Burnham and Anderson, 2002)
where,



Minimum  52.50  3.55 (1.27) 
Maximum  78.75  166.40 (5.11) 
Mean  64.23  23.58 (2.87) 
Median  64.41  17.20 (2.84) 
Mode  58.75  16.40 (2.80) 
Skewness  0.28  3.02 (0.42) 
Kurtosis  0.73  11.88 (0.11) 
Standard Deviation  6.06  22.24 (0.72) 
Variance  36.76  494.48 (0.52) 
3. Materials and methods
Demonstration of this average parameter estimation procedure was done using two soil parameters, which were chosen from two different experimental datasets. One soil property, sand content, was collected from a regularly spaced (3 m) 128 point transect (Zeleke and Si, 2005). The sampling site was located at Smeaton, Saskatchewan, Canada (53^{°}40′N latitude and 104^{°}58′W longitude). The hydrometer method was used to determine the particle sizes (Gee and Bauder, 1986). Another soil property, copper (Cu) content, was selected from a dataset consisting of 359 topsoil samples collected by the Swiss Federal Institute of Technology in the Swiss Jura (Atteia et al., 1994). The data points were selected according to a regular configuration of a mesh of 250 m × 250 m with several clusters from an area of approximately 1450 ha. The cluster samples were collected following nested sampling design with 100, 40, 15, and 6 m sampling interval. Copper content was measured using a direct current plasma spectroscopy (ARL, Spectran V) with other 5 heavy metals (Atteia et al., 1994).
4. Results and discussion
The exploratory information about the dataset is given in Table 2. The skewness of the sand 0.28, which indicates a near normal statistical distribution of the dataset. The copper content is highly skewed (skewness = 3.02) and exhibits a log normal statistical distribution. Therefore, we decided to normalize the copper content using natural logarithm transformation. The exploratory information of logarithm transformed values for copper content is presented in parentheses in Table 2.
Semivariance for sand content (Fig. 2) and logarithm transformed copper content (Fig. 3) are calculated and plotted as a function of lag interval. For the semivariance calculation of regularly spaced sand samples, the lag interval is 6 m, which is twice the minimum sampling interval. The maximum lag distance used for this calculation is 144 m and the minimum number of pairs used for the semivariogram calculation is 24. The semivariance of the copper content is calculated using minimum and maximum lag distance of 0.1 km and 2 km respectively, which leads to at least 20 pairs of data points. The experimental semivariograms are fitted with four most commonly used semivariogram models (Table 1) following weighted least square estimation. The fitted semivariogram models and the experimental semivariograms are shown in Fig. 2 and Fig. 3 for sand and copper, respectively. The optimized parameters (nugget, sill and range) values for each of the models are presented in Table 3. The nugget value optimized by different models for the sand content varies from 14.67 % (exponential model) to 18.13 % (Gaussian model). The value of sill varies from 36.40 % (linear plateau model) to 39.27 % (exponential model). The optimized range value that indicates the distance over which the processes are spatially dependent, for the sand content varies widely from 47.10 m (Gaussian model) to over 100 m (Spherical model). The variance (values in the paresis in Table 3) associated with each parameters are approximated from the calculation of Hessian Matrix (Press et al., 1992). The variance associated with the parameter estimate is very high as the semivariance is calculated from the semivariance cloud, which spreads over a range of values. The nugget, sill, and range have their own physical significance and different models should result in the same set of parameter values. However, due to model uncertainty, four models have different set of optimized parameter values for the sand content. The range of optimized parameter values (Table 3) from different models clearly indicate how uncertain our models and parameters are.







%  %  m  
Sand  Spherical  16.08 (257.45) 
36.71 (231.66) 
100.94 (29610.00) 
0.438  0.021 
Exponential  14.67 (423.95) 
39.27 (1081.00) 
49.15 (32550.00) 
0.374  0.021  
Gaussian  18.13 (198.24) 
36.57 (214.91) 
47.10 (5233.00) 
0.495  0.040  
Linear Plateau  16.57 (217.15) 
36.40 (181.53)  74.37 (8379.00) 
0.455  0.026  




Copper  Spherical  0.101 (0.074) 
0.553 (0.037) 
0.480 (0.767) 
0.183  0.067 
Exponential  0.027 (0.177) 
0.548 (0.039) 
0.132 (0.133) 
0.049  0.061  
Gaussian  0.122 (0.062) 
0.544 (0.034) 
0.171 (0.097) 
0.224  0.061  
Linear Plateau  0.089 (0.080) 
0.543 (0.033) 
0.271 (0.235) 
0.164  0.063 
Similarly there are a wide range of optimized parameter values from different models for copper content. The optimized nugget values vary widely in different models (0.027 ln(mg^{2}/kg^{2}) for exponential model to 0.122 ln(mg^{2}/kg^{2}) for Gaussian model) (Table 3). Wide range of sill and range values clearly indicates the uncertainty associated with the parameters. Different parameter values for different models make the interpretation of the parameters and the semivariogram structures difficult and uncertain. The ratio of the nugget semivariance (V_{N}) to the sill semivariance (V_{S}) gives a measure of the strength or degree of spatial structure (Cambardella et al., 1994). Therefore, it is sometime difficult to explain the semivariogram structure of a particular property with different nugget and sill values for different models.
In this situation the goodness of fit for each model is calculated. The residual sum of squared error (RSS) between the experimental data and the model data is presented in Table 3. The comparable RSS values indicate that the four models are comparable. From the RSS value, the maximum likelihood is calculated for each model. Akaike Information Criterion (AIC) for different models is calculated from the maximum likelihood of the models and is presented in Table 4. The AIC values for the spherical (167.269) and exponential models (167.187) fitted for sand semivariogram are very close indicating a close performance of the models. The performance of the linear plateau and Gaussian models are also acceptable. The AIC values for all four models are very similar for copper content (vary from 111.739 to 113.739). The similar AIC values for the models indicate that all four models are almost equivalent. Though the performance of the models is very comparable, the optimized parameter values from different models are not similar. In this situation, the selection of parameter value from a particular model can be associated with high uncertainty.
The model averaging is conducted to reduce parameter uncertainty. Weights are assigned to the models based on their performance and importance. The model with lowest AIC value indicates best model. Based on the smallest AIC, ΔAIC is calculated from the difference between AIC value of a particular model and the smallest AIC value out of four models. The ΔAIC is used to measure the likelihood of each model. The value of the likelihood of models helps in assigning weights for different models. The spherical and the exponential models for the sand semivariogram have the highest weight, which are almost equal. This indicates an equivalent performance of these two models. The Gaussian and linear plateau models have very small weights, indicating less importance of the models than that of the other two. Whereas for copper, the weights assigned to different models are close enough to explain the performance as equivalent. Based on the weights assigned to each model, the average value of the parameter and the variance associated with each parameter is estimated. Sand has an average nugget, sill and range value of 15.42 %, 37.92 % and 75.53 m respectively. The variance associated with each averaged parameters are presented in the paresis in Table 4. The higher weight of the spherical and exponential model indicates that two models have large contribution to the average value of parameters. Similarly, the variance of the averaged parameters will have large contribution from spherical and exponential models. Whereas for copper, the contribution to the average value by each models is comparable. The average value of nugget, sill and range for copper are 0.081 ln(mg^{2}/kg^{2}), 0.546 ln(mg^{2}/kg^{2}), and 0.219 km respectively. The variance of each averaged parameter is also calculated and presented in Table 4 (in paresis). The averaged variance reduced the uncertainty associated with the parameters by taking the average.










Sand  Spherical  167.269  0.4946  15.42 %  37.92 %  75.53 m 
Exponential  167.187  0.4748  (335.79)  (635.06)  (31007.42)  
Gaussian  151.279  0.0001  
Linear Plateau  161.687  0.0303  
Copper  Spherical  111.739  0.1230  0.081 ln(mg^{2}/kg^{2})  0.546 ln(mg^{2}/kg^{2})  0.219 km 
Exponential  113.703  0.3285  
Gaussian  113.658  0.3212  (0.106)  (0.036)  (0.235)  
Linear Plateau  112.964  0.2270 
When different models optimize parameters with a wide range of values, the weighted average of the estimated parameter values provide more concise information and better understanding about the parameters and thus the spatial structure. The variance of the optimised parameters clearly indicates the reduced uncertainty of the parameters. This weighted average value of the parameters provides a better prediction about the underlying processes by reducing the uncertainty associated with the parameters and the bias in their selection. Better understanding of the parameters provides guidance for sampling and insights on experimental design.
5. Conclusion
The performance of the commonly used semivariogram models, used for fitting the experimental semivariogram, is comparable. Different models optimize a particular parameter differently, which indicates the uncertainly associated with the parameter. The uncertain parameter value makes the prediction of spatial processes difficult and uncertain. A model averaging procedure is used to obtain the parameter value and to reduce model parameter uncertainty.
Two soil properties (sand content and copper content) are taken as examples for demonstrating the average parameter estimation procedure. The sand content is measured at regular sampling interval along a transect, whereas the copper content is measured at variable sampling intervals in a twodimensional field. The semivariogram is calculated for both properties and fitted with four most commonly used mathematical models (spherical, exponential, Gaussian and linear plateau). Weighted least square estimation is used for fitting these models to the experimental semivariogram. The goodness of fit for each model is calculated from their residual sum of squares. The parameter for each models are optimized during the fitting procedure. The likelihood of each model is calculated based on the Akaike Information Criterion (AIC) for each model. Different weights were assigned to each model based on their performance and importance from the AIC values. These weights are used for obtaining the weighted average of the optimized parameters. The weighted average of the estimated parameters reduced the uncertainty associated with the parameters and the bias in their selection. The average parameter values and reduced uncertainty provide more concise information about the spatial structure and consequently provide better guidance for sampling, experimental design, and interpolation and mapping.
Acknowledgements
The project was funded by a CSIRO Post Doctoral Fellowship and the University of Saskatchewan.
References
 1.
Akaike H (1973) Information theory as an extension of the maximum likelihood principle. In: Second International Symposium on Information Theory. Petrov, B.N., Csaki, F., Eds, Akademiai Kiado, Budapest, 267281 p.  2.
Atteia O, Dubois JP, Webster R (1994) Geostatistical analysis of soil contamination in the Swiss Jura. Environ. poll. 86, 315327.  3.
Burgess TM, Webster R (1980) Optimal interpolation and isarithmic mapping of soil properties. I. The semivariogram and punctual kriging. J. soil sci. 31, 315331.  4.
Burnham KP, Anderson DR (2002) Model selection and multimodal inference: A Practical InformationTheoretic approach, Springer.  5.
Burrough PA (1983) Multiscale sources of spatial variation in soil. I. The application of fractal concepts of nested levels of soil variogram. J. soil. sci. 34: 577597.  6.
Cambardella CA, Moorman TB, Novak JM, Parkin TB, Karlen DL, Turco RF, Konopka AE (1994) Field scale variability of soil properties in central Iowa soils. Soil sci. soc.am. j. 58, 15011511.  7.
Carvalho JRP, Silveira PM, Vieira SR (2002) Geostatistics in determining the spatial variability of soil chemical characteristics under different preparations. Pesq. agropec. bras. 37, 11511159.  8.
Corwin DL, Hopmans J, de Rooij GH (2006) From Field to LandscapeScale Vadose Zone Processes: Scale Issues, Modeling, and monitoring. Vadose zone j. 5, 129139.  9.
Cressie N (1985) Fitting variogram models by weighted least squares. J. int. assoc. math. geol. 17, 563586.  10.
Cressie NAC (1991) Statistics for spatial data. John Wiley & Sons, NY, 900 p.  11.
David M (1977) Geostatistical Ore reserve estimation. Elsevier, Amsterdam.  12.
Deutsch CV, Journel AG (1998) GSLIB Geostatistical software library and user’s guide. 2nd eds. Oxford Uni. Press, NY.  13.
Fagroud M, Van Meirvenne M (2002) Accounting for soil spatial correlation in the design of experimental trails. Soil sci. soc.am. j. 66, 11341142.  14.
Foussereau X, Hornsby AG, Brown RB (1993) Accounting for variability within map units when linking a pesticide fate model to soil survey. Geoderma 60, 257276.  15.
Gee GW, Bauder JW (1986) Particle size analyses. In: A. Klute (Eds.) Method of soil analyses. Part 1: Physical and Mineralogical Methods. ASA, Madison, Wisconsin, USA.  16.
Goderya FS (1998) Field scale variations in soil properties for spatially variable control: a review. J. soil contam. 7, 243264.  17.
Goovaerts P (1997) Geostatistics for natural resources evaluation.Oxford Uni. Press., NY.  18.
Goovaerts P (1998) Geostatistical tools for characterizing the spatial variability of microbiological and physiochemical soil properties. Biol. fertil.soils 27, 315334.  19.
Hajrasuliha SW, Baniabassi N, Metthey J, Neilsen DR (1980) Spatial variability of soil sampling for salinity studies in SouthWest Iran. Irrig. sci. 1, 197208.  20.
Heuvelink GBM, Pebesma EJ (1999) Spatial aggregation and soil process modelling. Geoderma 89, 4765.  21.
Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging: a tutorial. Statist. sci. 14, 382417.  22.
Isaaks EH, Srivastava RM (1989) An introduction to applied geostatistics, Oxford University Press, Toronto, Canada.  23.
Jian X, Olea RA, Yu Y (1996) Semivariogram modelling by weighted least squares. Comp. Geosci. 22, 387397.  24.
Johnson JB, Omland KS (2004) Model selection in ecology and evolution. Trends ecol. evol. 19, 101108.  25.
Lambert DM, LowenbergDebeor J, Bongiovanni R (2004) A comparison of four spatial regression models for yield monitor data: a case study from Argentina. Precis.agr. 5, 579600.  26.
Matheron G (1962) Traité de Géostatistique Appliqué, Tome 1. Memoires du Bureau de RecherchesGéologiquesetMiniéres, Paris.  27.
McBratney AB, Bishop TFA, Teliatnikov IS (2000) Two soil profile reconstruction techniques. Geoderma 97, 209–221.  28.
McBratney AB, Webster R (1986) Choosing functions for semivariograms of soil properties and fitting them to sampling estimates. J. soil sci. 37, 617639.  29.
Nielsen DR, Biggar JW, Erh KT (1973) Spatial variation of field measured soil water properties. Hilgardia 42, 214259  30.
Nielsen DR, Wendroth O (2003) Spatial and temporal statistics: sampling field soil and their vegetation. Catena Verlag, Reiskirchen, Germany.  31.
Pannatier Y (1996) VARIOWIN: Software for spatial data analysis in 2D. SpringerVerlag, NY.  32.
Papritz A, Dubois JP (1999) Mapping heavy metals in soil by (non) linear kriging: an empirical validation. In: GómezHerńandez, J. et al. (Eds.) geoENV IIGeostatistics for environmental applications. Kluwer Academic Publ. Dordrecht, The Netherlands. 429440 p.  33.
Prakash MR, Singh VS (2000) Network for ground water monitoringa case study. Environ. geol. 39, 628632.  34.
Press WH, Teukolsky SA, Vetterling WT, Flannery BP (1992) Numerical recipies in C. The art of scientific computing.Cambridge University press, NY.  35.
Si BC, Kachanoski RG, Reynolds WD (2007) Analysis of soil variability. In:E.G. Gregorich (Eds.) Soil sampling and methods of analysis. 11631191 p.  36.
Trangmar BB, Yost RS, Uehara G (1985) Application of geostatistics to spatial studies of soil properties.Adv. agro. 38, 4594  37.
Van Kuilenburg J, De Gruijter JJ, Marsman BA, Bouma J (1982) Accuracy of spatial interpolation between point data on soil moisture supply capacity, compared with estimates from mapping units. Geoderma 27, 311325.  38.
Vauclin M, Vieira SR, Vachaud G, Neilsen DR (1983) The use of cokriging with limited field soil observations. Soil sci. soc.am. j. 47, 175184.  39.
Vieira SR (2000) Geostatistics in studies of spatial variability of soil. In: Novais, R.F. et al. (Eds.) Topics in soil science. Viçosa, Brazilian soc. soil sci. Vol 1, 154 p.  40.
Vieira SR, Millete J, Topp GC, Reynolds WD (2002) Handbook for geostatistical analysis of variability in soil and climate data. In: Alvarez, V. et al. (Eds.) Topics in soil science. Viçosa, Brazilian soc. soil sci. V.2.145 p.  41.
Webster R, McBratney AB (1989) On the Akaike Information Criterion for choosing models for variograms of soil properties. J. soil sci. 40, 493496.  42.
Wilding LP, Bouma J, Goss D (1994) Impact of spatial variability on modeling. In: Bryant, R., Arnold, R.W., (Eds.) Quantitative modelling of soil forming processes. SSSA Special Publication #39. Soil Sci. Soc. Am. Inc. Madison, WI. 6175 p.  43.
Yost RS, Uehara G, Fox RL (1982) Geostatistical analysis of soil chemical properties of large land areas. I. Semivariograms. Soil sci. soc.am. j. 46, 10281032.  44.
Zawadzaki J, Fabijanczyk P (2007) Use of variogramsfro field magnetory analysis in upper Silesia industrial area. Stud. geophys. geod. 51, 535550.  45.
Zeleke TB, Si BC (2005) Scaling relationships between saturated hydraulic conductivity and soil physical properties. Soil sci. soc. am. j. 69, 16911702.