Effect of Sampling Density on Estimation of Regional Soil Organic Carbon Stock for Rural Soils in Taiwan

Accurately quantifying soil organic carbon (SOC) stocks in soils is considered necessary and important for studying the soil quality and productivity, modeling the global carbon cycle, and assessing the global climate change. The objectives of this chapter are (1) to evaluate the effects of sampling density and interpolation methods on spatial distribution of SOC density (SOCD) and (2) to estimate the SOC stocks in 0–30, 0–50, and 0–100 cm layer of Tainan rural soils (2192 km 2 ), Taiwan. Ordinary kriging (OK), empirical Bayesian kriging (EBK), and inverse distance weighting (IDW) methods and four sampling densities ( n = 7388, 1168, 370, or 77) were used for spatial interpolation. The results indicated that different sampling densities had significant effects on predicting the spatial patterns of SOCD, but no significant difference was found among three interpolation methods. Spatial pattern of SOCD obtained from the highest sampling density appeared to be the most detailed distribution, and the prediction accuracy showed a reducing trend with decreasing sampling density. At least 1 sample per 2 km × 2 km area was suggested. The estimates of SOC stocks in different layers of Tainan soils ranged from 8.03 to 8.08 million tons in 0–30 cm, 11.92 to 12.04 million tons in 0–50 cm, and 20.38 to 20.65 million tons in 0–100 cm.


Introduction
Soil organic carbon (SOC) is one of the largest carbon reservoirs of the earth's surface and plays an important role in the global carbon cycle [1,2]. As the Kyoto Protocol was adopted in the annual Conference of Parties (COPs) of the UNFCCC in 1997, soil organic carbon and its potential to become a managed sink for atmospheric CO 2 have received much attention. Accurately quantifying soil organic carbon (SOC) stocks in soils is considered necessary for studying the soil quality, modeling the global carbon cycle, and assessing the global climate change. In recent years, many countries and local government have attempted to assess the C stock in their regions, including the soil organic carbon density (SOCD) and storage at global level [3][4][5], especially in some European countries, the United States of America, Indonesia [6], South Korea, New Zealand [7], and Australia [8].
In Taiwan, accurate estimation of SOC stocks based on detailed soil investigation is still absent at the national scale or regional scale. There have been several soil survey projects on agricultural soils for various purposes by Taiwan Agricultural Research Institute (TARI), Council of Agriculture, Taiwan. By calculating the SOC content of soil pedons and the distribution area of different soil orders, Chen and Hseu [9] first attempted to estimate the SOC stocks in rural lands of Taiwan. They indicated that 81 Tg (million tons) and 162 Tg of SOC were stored in the 0-30 and 0-100 cm of agricultural soils within an area of 1.68 million ha. Chen et al. [10] reported that SOC stocks in the 0-20 cm soil layer of cultivated lands (~0.85 million ha) were 21.7 and 27.5 Tg, which were calculated based on two legacy database obtained from detail soil surveys conducted by TARI in 1960s and 1980s, respectively. Taking the cultivated lands into account, the estimates of SOC stocks in the upper 20 cm soils of both studies mentioned earlier were similar (27.5 and 27.3 Tg), indicating that legacy soil survey data are our best resource to monitor the dynamics of soil C [6].
SOC stocks have strong spatial heterogeneity and dependence. Geostatistics have proven to be a useful tool in predicting the spatial distribution of soil properties that are very spatially dependent. Several spatial interpolation methods have been used to explore the spatial distribution characteristics of SOC. For example, ordinary kriging (OK) interpolation estimation, which provides the best linear unbiased prediction at unsampled locations, has been widely used to describe the structure of spatial dependence and quantify SOC stocks in relatively large areas [11]. At present, there are dozens of spatial interpolation methods described in the literature; however, many factors such as sample size and the nature of the data are possible to affect the estimation of a spatial interpolator, and until now, there are no consistent findings regarding what is the best interpolation method. Many studies had focused on comparing different estimation methods to reduce uncertainty of regional SOC prediction. However, studies assessing the effect of sampling density on spatial variability of SOC estimation were relatively few [12], and issues of sampling density and interpolation method are both important to our understanding of SOC variability [13].
Chien et al. [14] and Liu et al. [15] have compared the performance of some spatial interpolation methods at regional scale in Taiwan; however, estimating the SOC stocks of the whole city by different interpolation methods has never been previously studied in Taiwan. The objectives of this chapter are (1) to estimate the soil organic carbon density (SOCD) and SOC stocks in 0-30, 0-50, and 0-100 cm soils and its spatial distribution at four sampling densities at regional scale, (2) to evaluate the effects of sampling density on estimation of SOCD and SOC stocks, and (3) to compare the difference of SOC stocks among three geostatistical techniques. The estimation will be an important reference for predicting the SOC stock in the humid subtropical region.

Basic environmental and soil conditions of Tainan city
Tainan city is located in the southwest of Taiwan with a total area of 2192 km 2 . The mean air temperature is 28.7°C in summer and 18.4°C in winter. The mean annual rainfall is 1698 mm. Except for the raining season beginning from May to September, especially the monthly rainfall is less than the evaporation and transpiration during the summer (June to August). The soil temperature regime of the study area is hyperthermic (>22°C), and soil moisture regime of most area is ustic (drying in summer from June to August). About one-third of area is occupied by hill land (30-50 m asl) in the eastern part of Tainan city, and the other two-thirds of area is calcareous alluvial plain. About 57 and 34% of the soils of Tainan city are Entisols and Inceptisols based on USDA soil classification, respectively. The most soils are sandy loam to silt loam soil texture, neutral to basic reaction, and well-drained soils. Both geographical features and soil conditions favor the growth of most vegetables, fruits, and rice production; thus, Tainan city is an important agricultural production area in Taiwan in the last five decades. Soils in the coastal alluvial area are saline soils and are used for fish farming.

Soil database of soil pedons
Dataset for estimating the SOC stock in agricultural soils of Tainan was obtained from a detailed soil survey project, which was conducted from 1992 to 2010 by TARI. Soil pedons were sampled by auger along a 250 m × 250 m cell-sized grid in the field, meaning that every 6.25 ha of the arable land has a representative soil pedon. The upper 150 cm soils were collected by dividing into six depth intervals, and soil organic matter (SOM), pH, CEC, P, K, Ca, and Mg extracted by Mehlich-III extractant were analyzed for each soil sample by TARI. Here, we converted the content of SOM to SOC by dividing a Van Bemmelen factor of 1.724 on the assumption that SOM contains 58% of organic C averagely. From these data, SOC stocks were computed for 0-30, 0-50, and 0-100 cm soil layers. The soil organic carbon density (SOCD, kg m −2 ) for a certain soil depth (h) was calculated as follows: is the soil organic carbon content of a certain layer, Bd i is the bulk density (g cm −3 ), and d i is the depth (cm). As Bd determination was not included in TARI's soil dataset, we adopted the following pedotransfer function for estimating the bulk density [16]  After removing the outliers and missing data, the extracted database contains the information of 7388 soil pedons.

Soil sampling design
The initial soil sampling scheme was based on a regular grid with cell sizes of 250 m × 250 m across the whole cultivated land of Tainan city. In this study, all samples were used in four subsequent estimations of SOC stocks based on regular grids of 250 m × 250 m (n = 7388), 1 km × 1 km (n = 1168), 2 km × 2 km (n = 370), and 5 km × 5 km (n = 77), respectively. One point (soil pedon) was selected near each center of the four sampling grids, and the SOCD of selected point was taken for the observed SOCD of its corresponding grid. The patterns of four scales of sampling density are shown in Figure 1. Seventy percent of the points were randomly selected as test data for spatial interpolation, and the rest (30%) were used for validation. The grid numbers, total sample numbers, sample number for spatial interpolation, and sample number for validation under different sampling densities are listed in Table 1

Comparison of three spatial interpolation methods (IDW, OK, and EBK)
All interpolation methods have been developed based on the theory that points closer to each other have more correlations and similarities than those farther. In this study, the spatial interpolation was conducted using three different interpolation methods, which are available in the ArcGIS 10.1, to compare their estimation of SOC stocks of Tainan city soils under different sampling densities: (1) the inverse distance weighting (IDW), (2) ordinary kriging (OK), and (3) empirical Bayesian kriging (EBK). The former two methods (IDW and OK) are commonly used to spatially interpolate soil properties, while the third one (EBK) is a new probabilistic data interpolation method that is included in ArcGIS 10.1 Geostatistical Analyst.

Inverse distance weighting (IDW)
Inverse distance weighting (IDW) method is assumed that the rate of correlations and similarities between neighbors is proportional to the distance between them that can be defined as a distance reverse function of every point from neighboring points. The interpolating function is listed as follows: where Z(x) is the predicted value at an interpolated point, Z i is the amount at a known point, n is the total number of known points used in interpolation, d i is the distance between point i and the prediction point, and w i is the weight assigned to point i. Higher weighting values are assigned to those points, which are closer to the interpolated points. As the distance increases, the weight decreases, and u is the weighting power that imposes the amount of weight decreases with respect to the increase in distance [17,18].

Ordinary kriging (OK)
Ordinary kriging (OK) is the most common type of kriging in practice. Kriging is a linear estimator that the estimate of the unknown value is a linear combination of the known data values [18]. The aim of kriging is to estimate the value of a random function, z, at one or more unsampled points or over larger blocks, from more or less sparse sample data on a given support, say z( This can be shown as follows: where w j is the weight assigned to the known value of z(x j ), and z*(x 0 ) is the estimated value.
To ensure that the estimate is unbiased, weights are made to sum to 1 [17,18].

Empirical Bayesian kriging (EBK)
Empirical Bayesian kriging (EBK) is a geostatistical interpolation method that automates the most difficult aspects of building a valid kriging model. Other kriging methods in Geostatistical Analyst are required to manually adjust parameters to receive accurate results, but EBK automatically calculates these parameters through a process of subsetting and simulation, which is implemented by estimating a lot of semivariogram models instead of a single semivariogram. The prediction in unknown locations in common kriging methods is done through calculation of semivariogram with respect to the known data locations, resulting in the underestimation of the standard error of the prediction due to overlooking the uncertainty of semivariogram. On the contrary, EBK uses an intrinsic random function as the kriging model despite the other kriging methods. The other main difference of EBK with that of the other kriging model is that EBK does not assume a tendency toward an overall mean; thus, there is the same chance for large deviations to get larger or smaller [17].
The following steps are followed in EBK. (1) Using the available data, a semivariogram model is estimated. (2) Given this semivariogram, a new value is simulated at each of the input data location.
(3) With respect to the simulated data, a new semivariogram model is estimated accordingly. The calculation of a weight for the latest semivariogram according to Bayes' rule is the next step in this field. The semivariogram estimated in Step 1 is used to simulate a new set of values at the input location during the repetition of Steps 2 and 3. A new semivariogram model and its weight are produced given the simulated data. During this step, the predictions and their respective standard errors are produced at the unsampled locations. This step finally creates a spectrum of semivariograms [17].

Calculation of the SOC stocks
After spatial interpolation, a SOCD surface was created to cover the entire area of Tainan city soils. This surface was exported as a raster layer with the defined resolution (250 m × 250 m, 1 km × 1 km, 2 km × 2 km, and 5 km × 5 km), in which every grid square was assigned both a SOCD value and an area value. The next step was to accumulate as follows: where SOC stock h is the total amount of soil organic carbon stock at depth h in Tainan soils, n is the total grid number of the raster, i is the ith grid square, SOCD ih is the soil organic carbon density for the ith grid square calculated to depth h, and Area grid is the area of each grid square, set by the defined resolution. The performance of IDW, OK, and EBK in mapping the spatial distribution of SOCD was evaluated by using samples from the validation set ( Table 1).

Evaluation of the accuracy of three interpolation methods
Mean error (ME), mean absolute error (MAE), mean relative error (MRE), and root mean square error (RMSE) were calculated as follows: where S oi is the estimated SOCD at location i, S vi is the observed SOCD at location i, and n is the total number of sample observations. The MAE and RMSE provide a measure of interpolation precision with lower values indicating more precise methods, while the ME and MRE measure the bias. Smaller ME, MRE, and RMSE values indicate less error. The coefficient of determination R 2 of linear regression line between the predicted and the measured values was also used as a measure of performance for each method.

Accuracy of different interpolation methods
The ME, MAE, MRE, RMSE, and R 2 values of cross-validation obtained from EBK, OK, and IDW methods are listed in Table 2. The results showed the trend that ME, MAE, MRE, and RMSE increased while R 2 decreased with reducing sampling density for a certain depth as it was expected. At the highest sampling density (1 sample per 6.25 ha), IDW method performed best with the lowest MAE, MRE, and RMSE values in 0-30 cm layer, while EBK method performed best in 0-50 cm (R 2 = 0.663) and 0-100 cm (R 2 = 0.740) layers. At the density of 1 sample per 1 km 2 , the best performance was obtained by IDW method in 0-30 cm layer and by OK method in the upper 50 and 100 cm soils. At the sampling scale of 1 sample per 4 km 2 , EBK and IDW methods performed best in 0-30 and 0-50 cm layers, respectively. In 0-100 cm layer, ME, MAE, and MRE obtained from EBK were the smallest, and R 2 obtained from OK was the highest. At the scale of 1 sample per 25 km 2 , the prediction accuracy was low based on the R 2 value. The validation result revealed that sampling density should be more than 1 sample per 4 km 2 at least in the study area.
Considering the SOC stored at different depths, the best performance for estimating SOCD in 0-30 cm layer was obtained by IDW method at the scale of 1 sample per 6.25 ha and per 1 km 2 . In 0-50 and 0-100 cm layers, EBK and OK methods performed best at the highest sampling scale and the scale of 1 sample per 1 km 2 , respectively. EBK method was hypothesized as the best interpolation method, but we found that OK and IDW interpolation methods performed nearly as well as EBK in this study, and all three interpolation methods performed approximately well. Additionally, when the sampling scales were 1 sample per 6.25 ha and per 1 km 2 , the R 2 value increased with soil depth; in other words, the prediction accuracy of three interpolation methods was relatively poor for estimating the SOCD in 0-30 cm layer. It indicated that soil organic carbon is affected by other related factors, and the regulating processes are complicated and vary spatially, especially in the upper soil [19].
The effect of sampling density on prediction accuracies in our study was consistent with other researches. Zhang et al. [13] conducted a research of similar sampling schemes with ours (from 0.5 km × 0.5 km to 2 km × 2 km), and they found prediction accuracies of SOC content obtained from OK and LUK (kriging combined with land use information) increased with decreased grid size. Sun et al. [12] also reported that sampling density significantly affected the estimation of regional SOC concentration, but trends do not increase regularly with the sampling density, primarily due to the complicated factors on the spatial variation in SOC. In contrast, Chien et al. [14] evaluated the sampling scale (approximately 1 sample per 7.7-20 ha) in a 10-km 2 area and indicated that sufficient spatial information about the soil properties could still be retained even when the original sampling densities were reduced to nearly half. The best sampling design depends on the reasonable costs and acceptable extent of estimation error, for example, Sun et al. [12] found that increasing 18% of prediction accuracy had to increase the sampling density for almost 15 times. In our case, at a depth of 100 cm layer, the increases in prediction accuracy (RMSE) were 16-37% as soil samples became six times, whereas the increases in accuracy were 28-46% as soil samples increased 20 times. Therefore, sampling density should be evaluated more comprehensively in the future work.

Spatial distribution of SOCD from different interpolation methods and sampling designs
At a depth of 0-30 cm, the spatial pattern of SOCD that generated from OK method at different sampling densities has a similar distribution, but the spatial heterogeneity and resolution of the patterns varied among different sampling densities (Figure 2). The spatial pattern obtained from 7388 samples (grid size = 250 m × 250 m) appeared the most detailed SOCD spatial distribution. In general, high SOCD (>3.8 kg m −2 ) was found from the north to the northwest region and in the east by south part, and lower SOCD (<2.6 kg m −2 ) was found in the middle and southeast by south part of Tainan (Figures 2a, 3a, and 4a). The spatial variation and local differences became less evident with decreasing the sampling density, especially at the scale of 1 sample per 25 km 2 (5 km × 5 km grid size) (Figure 2d). A similar trend appeared in the spatial patterns of SOCD that generated from EBK and IDW methods among different sampling scales (Figures 3d and 4d). As a whole, the spatial heterogeneity and resolution of the distribution patterns varied between IDW and two kriging methods. IDW is an exact interpolator that predicts a value which is identical to the measured value at a sample location [18]. Therefore, the local maxima and local minima are reserved in estimating the spatial distribution of SOCD. There were some minor differences in the spatial patterns between OK and EBK methods at the same sampling scale. The distribution area of the highest (>5.0 kg m −2 ) and lowest (<2.6 kg m −2 ) SOCD estimated by EBK method was smaller than those estimated by OK method, indicating that the EBK method has a higher degree of smoothing effect when sampling grid size was larger than 1 km × 1 km. At a depth of 0-50 ( Figure 5) and 0-100 cm (Figure 6), the effects of interpolation method and sampling density on the SOCD distribution pattern were similar with those in 0-30 cm layer, so we only present those obtained from EBK methods here. In general, the effect of sampling density on the result of regional SOCD estimation is very obvious. The SOCD interpolation contours, which were compiled from three methods, described SOCD spatial variability with more accuracy and detail as the sampling density increases.    As the sampling density reduced with larger grid sizes, much more points turned to be dark purple (underestimated) and dark green (overestimated), indicating an increasing difference between the measured and the predicted values of SOCD. Most of the dark green points were added to the central region, while the dark purple points were added to the western region. The spatial distribution pattern of differences between the measured and the predicted values of SOCD in 0-30 cm by EBK and IDW was shown in Figures 8 and 9. At the highest sampling density (1 sample per 6.25 ha), difference plot that generated by EBK method (Figure 8a) had more yellow points than those generated by OK method (Figure 7a), indicating that EBK has a smaller interpolation error than OK at this sampling scale. The distribution of differences generated by IDW was almost appeared by yellow points at the highest sampling density (Figure 9a), meaning that the differences between the observed and the predicted values of SOCD were less than 0.5 kg m −2 . It also indicated that the IDW method had the smallest interpolation error among three methods. Generally, the spatial pattern of estimation error obtained from three interpolation methods had similar distribution pattern between the sampling scale of 1 sample per 1 km 2 and per 4 km 2 in 0-30 cm soil layer of Tainan. The patterns of interpolation error in 0-50 and 0-100 cm layers were similar with those in 0-30 cm layer, so we omitted them in the text. With the decrease in sampling density, the differences between Effect of Sampling Density on Estimation of Regional Soil Organic Carbon Stock for Rural Soils in Taiwan http://dx.doi.org/10.5772/64210 the measured and the predicted values of SOCD became larger, and high uncertainty was distributed around the local maxima and minima sites [18].

Estimation of SOC stocks
The SOC stocks in soil layers of 0-30, 0-50, and 0-100 cm were listed in Table 3. At the highest sampling density (1 sample per 6.25 ha), the estimates of SOC stocks in 0-30 cm soil layer of Tainan were similar among three interpolation methods, which ranged from 8.03 to 8.08 million tons, while SOC stocks in 0-50 cm layer ranged from 11.92 to 12.04 million tons and in 0-100 cm layer ranged from 20.38 to 20.65 million tons. The SOC stocks at a depth of 0-30 cm increased with decreasing sampling density but decreased at a depth of 0-100 cm. On the basis of estimates at the highest sampling density, the effect of sampling scale on SOC stocks generally had less than 4% of differences under the same soil layer and interpolation method.
Although the effect of sampling scale on the result of regional SOCD estimation is obvious, there was no significant effect on the estimation of total SOC stocks in Tainan in this study.
According to our estimation, around 40% of the total SOC stock in the upper 100 cm was held in 0-30 cm layer and 58% in 0-50 cm layer of agricultural soils. The ratios were slightly lower than those estimated by previous studies, which reported that 46-66% (with an average of 50%) of the total organic carbon in the upper 100 cm was stored in 0-30 m layer and 65-81% (with an average of 70%) in 0-50 cm layer of cultivated soils in Taiwan [9,20]. For forest lands of Tainan, 40-49% of the total SOC stock in the upper 100 cm was held in 0-30 cm layer and 61-65% in 0-50 cm layer in this study. This is in accordance with the estimates of Tsai et al. [21], Effect of Sampling Density on Estimation of Regional Soil Organic Carbon Stock for Rural Soils in Taiwan http://dx.doi.org/10.5772/64210 which reported that 41-84% (with an average of 59%) of the total organic carbon in the upper 100 cm was stored in 0-30 m layer and 67-98% (with an average of 78%) in 0-50 cm layer of forest soils in Taiwan.

Land use effect on SOC stocks and SOCD
Soil organic carbon (SOC) stocks in different soil layers under different land uses were listed in Table 4 As agriculture is the major land use in Tainan, the SOCDs of different cropping soils in the agricultural lands were further estimated by EBK interpolation method and listed in Table 5.
In Tainan, lands for rice cropping, upland, orchard, and fallow uses were 18.2, 41.8, 37.0, and 3.0% of the total agricultural lands, respectively. The mean SOCD of different cropping soils decreased in the following order in all soil layers: rice cropping land > upland > abandoned or fallow land > orchard.
Tainan has a humid subtropical climate, which is favorable to the degradation of soil organic matter. Main parent materials of Tainan soil are calcareous sandstone, shale, and mudstone [22]. Thus, majority of the cultivated lands is calcareous alluvial soil with neutral to basic soil reaction as well as higher buffering capacity to resist changes in pH caused by chemical fertilizer. Therefore, SOC storage of agricultural soil in Tainan is not greatly affected by management or cropping system, except for rice cropping. Long-term rice cultivation has been reported to increase the SOC storage in surface soils of Taiwan [10]. In our study, however, the mean SOCD of rice cropping land was slightly, but not significantly, higher than other cropping system (

Uncertainty (improvement of the estimation of SOC stock)
The number of soil samples, the distance between sampling locations, and the choice of interpolation are factors that affect the prediction of spatial distribution for soil properties [18,23]. Generally, the larger the number of soil samples, the more accurate the kriging maps of soil properties [18,24]. The original database (7388 soil samples) that we used in this study was obtained from a detailed soil survey in Tainan; thus, the sample number for spatial interpolation at the highest sampling density should be large enough to provide valuable information when comparing with other researches. The distance between sampling locations is another factor that influences the spatial patterns of SOCD. Despite large sample number, the sample locations are not evenly distributed over the whole area (Figure 1), and it probably results in a higher uncertainty of estimation in the region with sparsely or no located observations. OK is one of the most commonly used spatial interpolation methods that only consider the spatial autocorrelation and heterogeneity of SOC but overlooks the influence of environmental variables (and so as EBK). However, SOC status is influenced by many soil characteristics and environmental factors; those overlooked factors may also contribute to the interpolation error in this study, especially in the surface soils. In addition, land use is very intensive in Taiwan. The smallest sampling grid in our study (250 m × 250 m) may still be divided by different land uses and managements, which is possibly to result in high spatial variation in SOC. In the future, better techniques or models should be developed for a better understanding of the spatial distribution of SOCD and relationships between environment variables and SOCD, which are important to predict SOC stocks.

Conclusion
In this study, OK, EBK, and IDW methods and four scales of sampling density (1 sample per 6.25 ha, 1 km 2 , 4 km 2 , and 25 km 2 ) were used for spatial interpolation of SOCD in Tainan. The results indicated that sampling density has significant effect on the prediction for spatial patterns of SOCD. The spatial pattern obtained from the highest sampling density appeared the most detailed SOCD spatial distribution, and all indices of prediction accuracy showed a reducing trend with decreasing sampling density for a certain depth. We suggested that sampling density should be more than 1 sample per 4 km 2 at least in this study area.
All three interpolation methods performed on SOCD and SOC stocks approximately well; however, OK and EBK methods had a smoothing effect, while IDW method reserved the local maxima and local minima in estimating the spatial distribution of SOCD. Although the sampling density had a significant effect on spatial prediction of SOCD, the estimates of SOC stocks in Tainan were not significantly influenced by the sampling density and interpolation methods. The estimates of SOC stocks in 0-30 cm soil layer of Tainan ranged from 8.03 to 8.08 million tons, while SOC stocks in 0-50 cm ranged from 11.92 to 12.04 million tons and in 0-100 cm ranged from 20.38 to 20.65 million tons. In terms of agricultural land uses, the mean SOCD was slightly influenced by rice cropping system with little increase in SOCD.