Descriptive statistics of data measured in the forest inventory dataset
Several issues, related with forest fires, forest disturbances (García-Martín et al., 2008), forest productivity (Chirici et al., 2007; Palmer et al., 2009), forest changes over time (Hu & Wang, 2008), or the role of forests in the global carbon balance cycle (Hese et al., 2005) are, nowadays, the focus of numerous studies and investigations. All these subjects demand the knowledge about aboveground biomass (AGB) stocks and/or its dynamics. Besides the availability of biomass, the information about the growth of forests is of increasing importance. This variable, which is related with the total biomass growth in a specific ecosystem, is called Net Primary Production (NPP). Annual NPP represents the net amount of carbon captured by plants through photosynthesis each year (Melillo et al., 1993; Cao & Woodward, 1998). In practice, NPP can be defined and measured in terms of either biomass or CO2 exchange (Field et al., 1995). Waring et al. (1998) define NPP as the sum of live biomass periodic increment (B) and dead biomass (losses, e.g. broken branches, fallen leaves) [NPP = B + losses]. NPP is an important ecological variable due to its relevance for accurate ecosystem management and for monitoring the impact of human activity on ecosystems vegetation at a range of spatial scales: local, regional and global (Melillo et al., 1993). It is one of the most complete and complex variables, since it reflects the growth of the entire ecosystem thus avoiding the analysis of only part of its components. NPP provides a complete view of the ecosystem including information, not only from the arboreal stratum, but also from the shrubs and all the litter produced from each stratum. Thus, the significance of NPP not only reflects the complexity of its measurement or estimation, but also its integrative ecological perspective ecosystems.
Mapping AGB stocks or NPP with the utmost accuracy and expedite methodologies is therefore a challenge. The need of continuous maps where the phenomenon under study can be individually analysed or used as auxiliary variable in a specific model requires that the spatial predictions are represented in the most accurate way. Over the years different spatial prediction methods have been explored in diverse data type (Isaaks & Srivastava, 1989; Goovaerts, 1997; Labrecque et al., 2006; Sales et al., 2007; Meng et al., 2009). Some approaches have a simple application methodology however others are sometimes complex in what concerns to their implementation, or the selection of the variables to be used.
Estimation of AGB has been made by a range of methods, from field measurements to remote sensing-based methods, as well GIS-based modelling approaches with auxiliary data (Lu, 2006). Traditionally, to predict the spatial distribution of AGB throughout the territory, the variables calculated based on the forest inventory dataset were usually assigned to the forest polygons, stratified by species, and mapped by aerial photos interpretation. Despite the field measurements being the most accurate methods for collecting biomass data, the level of precision of the resultant biomass map will depend of the land cover classification detail and of the sample intensity. In fact, the forest inventories data at regional or national scale are often not spatially exhaustive enough to generate continuous AGB estimates, thus limiting the use of this approach over large areas. An additional limitation is the long temporal resolution of these estimations, generally made in cycles of 10 or more years, which could not be compatible with the need of analysis and monitoring of the ecosystems’ dynamics.
Remote sensing-based methods have been the most widely used approach to map AGB. The utility of the spectral information recorded by remote sensing for monitoring vegetation or gathering ecophysiological information over large areas is very well recognized, since satellite data became accessible for land cover dynamic studies. Different imagery data have been employed, such as coarse spatial-resolution data as SPOT-VEGETATION (Chirici et al., 2007; Jarlan et al., 2008), NOAA AVHRR (Häme et al., 1997; Atkinson et al., 2000), MODIS (Zheng et al., 2007, Muukkonen & Heiskanen, 2007); medium spatial-resolution data as ASTER (Muukkonen & Heiskanen, 2007), Landsat TM/ETM+ (Tomppo et al., 2002; Rahman et al., 2005; Meng et al., 2009); high spatial-resolution data as IRS P6 LISS-IV (Madugundu et al., 2008) and radar data (Hyde et al., 2007; Liao et al., 2009).
AGB can be estimated by means of Direct Radiometric Relationships (DRR), which consist in establishing regression relationships, such as ordinary least squares (OLS), between the satellite spectral data (e.g. individual spectral bands, band ratios, vegetation indices and other possible transformations) as independent variables, and the measured parameter at each corresponding inventory sample plot position in each forest cover strata. AGB can be directly predicted by multiple regression analysis between spectral data response and biomass amount (Labrecque et al., 2006; Muukkonen & Heiskanen, 2007); by nonparametric approaches including K nearest neighbour (KNN) (Tomppo, 1991; Meng et al., 2007), or by artificial neural network (ANN) (Liao et al., 2009); or indirectly predicted by using characteristic such as crown diameter or leaf area index (LAI). In this case, these variables are firstly derived from the imagery data and subsequently used in regression analysis to estimate AGB.
Spatial prediction models (algorithms) have been used for spatially predicting vegetation attributes. In general, these interpolation techniques are classified in deterministic and statistical (probabilistic) models (Isaaks & Srivastava, 1989; Goovaerts, 1997; Hengl, 2009). Attending that in the Earth sciences there is usually a lack of sufficient knowledge concerning how properties vary in space, a deterministic model may not be appropriate. Therefore, to make predictions at locations for which observations do not exist, with inherent uncertainty in predictions, the use of probabilistic models is necessary (Lloyd, 2007).
Spatial statistics and geostatistics were developed to describe and analyze the variation in both natural and man-made phenomena on above or below the land surface (Cressie, 1993). Largely developed by Matheron (1963) in the 1960s, to evaluate recoverable reserves for the mining industry, geostatistical models have been systematically applied in a wide range of fields (Cressie, 1993; Goovaerts, 1997). Today, geostatistics and the theory of regionalized variables (Matheron, 1971) are used to explore and describe the presence of spatial variation that occur in most natural resource variables. Introduced to remote sensing by Woodcock et al. (1988) and by Curran (1988), geostatistical models have been used to design optimum sampling schemes for image data and ground data; to increase the accuracy in which remotely sensed data can be used to classify land cover; or to estimate continuous variables. Geostatistical models are reported in numerous textbooks (e.g. Isaaks & Srivastava, 1989; Cressie 1993; Goovaerts, 1997; Deutsch & Journel, 1998; Webster & Oliver, 2007; Hengl, 2009; Sen, 2009) such as Kriging (plain geostatistics); environmental correlation (e.g. regression-based); Bayesian-based models (e.g. Bayesian Maximum Entropy) and hybrid models (e.g. regression-kriging).
Despite Regression-kriging (RK) is being implemented in several fields, as soil science, few studies explored this approach to spatially predict AGB with remotely sensed data as auxiliary predictor. Hence, this research makes use of RK and remote sensing data to analyse if spatial AGB predictions could be improved.
This research presents two case studies in order to explore the techniques of remote sensing and geostatistics for mapping the AGB and NPP. The first, aims to compare three approaches to estimate
2. Case study I – Aboveground biomass prediction by means of remotely sensed imagery, field inventory data and geostatistical modeling
2.1. Study area
This study was carry out in Portugal (Continental), extending from the latitudes of 36º 57’ 23” and 42º 09’ 15”N and the longitudes of 09º 30’ 40” and 06º 10’ 45” W (Figure 1). This area
includes two distinctive bioclimatic regions: a Mediterranean bioclimate in everywhere except a small area in the North with a temperate bioclimate. With four distinct weather seasons, the average annual temperatures range from about 7° C in the highlands of the interior north and center and about 18° C in the south coast. Average annual precipitation is more than 3000 mm at the north and less than 600 mm at the south.
Due to a 20 years of severe wild fires during summer time, and intense people movement from rural areas to sea side cities or county capital, forestry landscape changed from large trees’ stands interspersed by agricultural lands, to a fragmented landscape. The land cover is fragmented with small amount of suitable soils for agriculture and the main areas occupied by forest spaces. Forest activity is a direct source of income for a vast forest products industry, which employs a significant part of the population.
2.2. Methods and data
2.2.1. GIS and field data
In a first stage a GIS project (ArcGis 9.x), was created in order to identify
2.2.2. Biomass estimation from the forest inventory dataset
In order to calculate the biomass exclusively from the forest inventory, the biomass values measured in each field plot were spatially assigned to the pine stands land cover map polygons. In the cases where multiple plots were coincident with the same polygon, weighted averages were calculated proportionally to the area of occupation in that polygon.
2.2.3. Remote sensing imagery
In this research we used the Global MODIS vegetation indices dataset (h17v04 and h17v05) from the Moderate Resolution Imaging Spectroradiometer (MODIS) from 29 August 2006: (MOD13Q1.A2006241.h17v04.005.2008105184154.hdf; and
MOD13Q1.A2006241.h17v05.005.2008105154543.hdf), freely available from the US Geological Survey (USGS) Earth Resources Observation and Science (EROS) Center. The Global MOD13Q1 data includes the MODIS Normalized Difference Vegetation Index (NDVI) and a new Enhanced Vegetation Index (EVI) provided every 16 days at 250-meter spatial resolution as a gridded level-3 product in the Sinusoidal projection.
MODIS data was projected to the same Portuguese coordinate system (Hayford-Gauss, Datum of Lisbon with false origin) used in the GIS project.
2.2.4. Direct Radiometric Relationships (DRR)
Using GIS tools, field inventory dataset was updated with information from MODIS images. The spectral information extracted (NDVI and EVI) was then used as independent variables for developing regression models. Linear, logarithmic, exponential, power, and second-order polynomial functions were tested on data relationship analysis. The best model achieved was then applied to the imagery data, and the predicted aboveground biomass map was produced. In some pixels where Vegetation index values were very low, the biomass values predicted by the regression equations were negative, so these pixels were removed, because in reality negative biomass values are not possible.
2.2.5. Geostatistical modeling
Regression-kriging (RK) (Odeh et al., 1994, 1995) is a hybrid method that involves either a simple or multiple-linear regression model (or a variant of the generalized linear model and regression trees) between the target variable and ancillary variables, calculating residuals of the regression, and combining them with kriging. Different types or variant of this process, but with similar procedures, can be found in literature (Ahmed & De Marsily, 1987; Knotters et al.; 1995; Goovaerts; 1999; Hengl et al.; 2004, 2007), which can cause confusion in the computational process.
In the process of RK the predictions are combined from two parts; one is the estimate obtained by regressing the primary variable on the
where are estimated drift model coefficients (is the estimated intercept), optimally estimated from the sample by some ﬁtting method, e.g. ordinary least squares (OLS) or, optimally, using generalized least squares (GLS), to take the spatial correlation between individual observations into account (Cressie, 1993); are kriging weights determined by the spatial dependence structure of the residual and are the regression residuals at location si.
RK was performed using the GSTAT package in IDRISI software (Eastman, 2006) both to automatically fit the variograms of residuals and to produce final predictions (Pebesma, 2001 and 2004). The first stage of geostatistical modeling consists in computing the experimental variograms, or semivariogram, using the classical formula (Eq. 3):
where is the semivariance for distance
Semivariogram gives a measure of spatial correlation of the attribute in analysis. The semivariogram is a discrete function of variogram values at all considered lags (e.g. Curran 1988; Isaaks & Srivastava 1989). Typically, the semivariance values exhibit an ascending behaviour near the origin of the variogram and they usually level off at larger distances (the sill of the variogram). The semivariance value at distances close to zero is called the nugget effect. The distance at which the semivariance levels off is the range of the variogram and represents the separation distance at which two samples can be considered to be spatially independent.
For fitting the experimental variograms we tested the exponential, the gaussian and the spherical models, using iterative reweighted least squares estimation (WLS, Cressie, 1993). Finally, RK was carried out according to the methodology described in http://spatial-analyst.net. The EVI image was used as predictor (auxiliary map) in RK. GSTAT produces the predictions and variance map, which is the estimate of the uncertainty of the prediction model, i.e. precision of prediction.
2.2.6. Validation of the predicted maps
The validation and comparison of the predicted AGB maps were made by examining the discrepancies between the known data and the predicted data. The dataset was, prior to estimates, divided randomly into two sets: the prediction set (276 plots) and the validation set (52 plots). According to Webster & Oliver (1992), to estimate a variogram 225 observations are usually reliable. The prediction approaches were evaluated by comparing the basic statistics of predicted AGB maps (e.g., mean and standard deviation) and the difference between the known data and the predicted data were examined using the mean error, or bias mean error (ME), the mean absolute error (MAE), standard deviation (SD) and the root mean squared error (RMSE), which measures the accuracy of predictions, as described in Eqs. (4. - 7.).
where: N is the number of values in the dataset,
2.3. Results and discussion
Pinus pinasterstands characteristics
The descriptive statistics of pine stands data are presented in Table 1, where:
The pine stands are highly heterogeneous with ages ranging from 8 to 110 years old and the biomass per hectare ranging from 0.9 to 136.1 ton ha-1. The values of Biomass present a normal distribution with mean
|(trees ha-1)||(year)||(m)||(cm)||(m)||(m2 ha-1)||(m3 ha-1)||(ton ha-1)|
2.3.2. Aboveground biomass estimation from the inventory dataset
The estimates based in the inventory dataset were achieved by assigning the 328 field plot biomass values (weighted by each polygon area) into all the polygons of the pine cover class. After the global calculation, the dataset used for training (276 plots) was used to make a first validation of this approach. Hence, a regression was established between the biomass values, measured in the field plots, and the forest inventory polygon data. In Figure 3 it is presented the positive relationship between the measured and the predicted data with a coefficient of determination (
2.3.3. Aboveground biomass estimation from DRR
After performing correlation analyses, between AGB and Vegetation indices, several regression models were developed using stand-wise forest inventory data and the MODIS vegetation indices (NDVI and EVI) as predictors.
The best correlation was obtained with EVI as independent variable as (Eq. 8):
The AGB was then estimated for the entire study area. The low correlation achieved is explained, in part, by the heterogeneity of pine stands and the high effect of mixed pixels (Burcsu et al., 2001) in coarse resolution MODIS data (250 m).
As it can be seen in Figure 4, the reflectance value recorded in the boundary pixels of the polygons limits is not pure, they record both pine stands, and the neighbouring land cover classes reflectance values.
2.3.4. Aboveground biomass estimation from geostatistical methods
To spatially estimate the AGB by geostatistical approach, the first step consisted in the modeling and analysis of the experimental semivariograms (Eq. 3). The directional semivariograms of the residuals showed anisotropy at 38.6º, so at this direction were fitted Exponential, Gaussian and Spherical models. Based on experimentation, the exponential variogram model was fitted better (nugget of 703.75 and a partial sill of 390.17 reaching its limiting value at the range of 43,9Km) to the calculated biomass pine stands data (Figure 5). The present data showed a low spatial autocorrelation. The high nugget effect, visible in the figure, which under ideal circumstances should be zero, suggests that there is a significant amount of measurement error present in the data, possibly due to the short scale variation.
2.3.5. Validation and comparison of the aboveground biomass estimation approaches
The validation of the AGB estimation approaches was made by comparing the calculated basic statistics (Table 2) in the 52 validation random samples. Training and validation sets were compared, by means of a Student's
As expected, the Inventory Polygons method produced the best statists. The mean error (ME), which should ideally be zero if the prediction is unbiased, shows a bias in the three approaches, being lower in the Inventory polygons method, and higher in the DRR method.
The analysis of the root mean squared errors (RMSE), shows that Inventory Polygons present the lower discrepancies in the estimations (RMSE=33.53%), and RK achieve estimations under lower errors (RMSE=51.95%) than the DRR approach (RMSE=61.62%). Despite this, the errors from the two prediction approaches are very high, which can be explained by the low correlation found between the vegetation indices data, as explained above. This limitation can be overcome by using remote sensing data with higher spatial resolution. Moreover, the work area must also be sectioned into smaller areas, to minimize the heterogeneity that is observed in very large landscapes.
(average - ton ha-1)
In order to determine the significance of the differences between interpolation methods, analysis of variance (ANOVA) was performed (Table 3). The results show that, at alpha level 0.05, do not exist significant differences between the biomass values, predicted by the different methods.
A quantitative comparison of the complete AGB maps, estimated by the three approaches, was additionally made. The estimates (ton ha−1) are shown in the Table 4. In order to better preserve the land cover areas, the maps were brought to the resolution of 50x50m, and then clipped by the pine land cover mask.
(average – ton ha-1)
The three AGB maps originates very similar average values (ton ha-1), and the differences between the maximum and minimum values of total biomass (tonnes) estimated by the different methods varies less than 1.6%.
Although there has been a low discrepancy between the total biomass values, estimated by three maps, the analysis of the correlation coefficient of regressions, carried out between the three maps, show low to moderate correlation between
Based in the calculated statistics of the validation dataset and in the global biomass estimations for entire area, we can consider that the Regression-kriging geostatistical prediction approach, with remotely sensed imagery as auxiliary variable, increases the classifications accuracy when compared with estimates based merely in the Direct Radiometric Relationships (DRR). Furthermore, the accuracy of these estimations could increase by using imagery data with higher spatial resolution, and if the work region is more homogeneous.
The biomass maps derived by the three methods (Inventory Polygons, Direct Radiometric Relationships and Regression-Kriging) for the whole study area are presented in Figure 7.
3. Case study II – Biomass growth (NPP) of
Pinus pinasterand Eucalyptus globulusstands, in the north of Portugal. Estimations by means of LANDSAT ETM+ images
3.1. Study area
This research took place within an area in the northern part of Portugal where
Both species are ecologically well adapted, despite
3.2. Methods and data
3.2.1. Methodology used in geometric and radiometric corrections
The available LANDSAT-7 ETM+ Image was acquired on the 15th of September 2001 at 10:02:13 (UTC). The image was geometrically and radiometrically corrected using MiraMon ("WorldWatcher"). This software allows displaying, consulting and editing raster and vector maps and was developed by the Autonomous University of Barcelona (UAB) remote sensing team. The software allows for the geometric correction of raster (e.g., IMG and JPG: satellite images, aerial photos, scan maps) or vector maps (e.g., VEC, PNT, ARC and POL and NOD), based on ground control points coordinates.
In the present research the ground control points were collected from Portuguese topographic maps on a 1/25000-scale, using the original ETM+ Scene. Twenty-five control points were collected (Toutin, 2004) to allow image correction and eleven control points were used for its validation. A first-degree polynomial correction was chosen for the geometric correction, using the nearest neighbour option for the resampling process.
Two Digital Elevation Models (DEMs) were constructed for each study area (
3.2.2. Methodology used to calculate vegetation indices
Within the study area, 31 sampling plots for the
In table 5, G represents the reflectance on the green wavelength; R is the reflectance in the red wavelength; NIR is the reflectance in the near infrared wavelength; and MIR1 and MIR2 are the reflectance in the two middle infrared bands from LANDSAT ETM+ image.
3.2.3. Model adjustment and selection
The available data (31 sampling plots for the
|6||RVI1||Tucker (1979); Xia (1994); Baret |
3.2.4. Comparison of the NPP images
NPP images obtained from different methodologies were compared by the
k = number of land-cover categoriesrepresents the overall proportion of area correctly classifiedis the expected overall accuracy if there were chance agreement between reference and mapped data
According to Green (1997) when there is complete agreement between two maps K=1, and a kappa value of zero, the two maps are said to be unrelated.
Moss (2004) considers that when Kappa is less than 20 the strength of agreement between both images is poor; between 0.21 and 0.40 fair; between 0.41 and 0.60 moderate; between 0.61 and 0.80 good; higher than 0.81 very good. However, according to Green (1997), kappa lower than 0.40 indicates a low degree of agreement; between 0.40 and 0.75 a fair to good degree of agreement; and higher than 0.75 a high degree of agreement.
3.3. Results and discussion
3.3.1. Identification of the best prediction variables
In order to identify whether if it was possible to directly or indirectly estimate NPP from the remote sensing data, the Vegetation Index better correlated with NPP was identified from the general correlation matrix and analysed. The most relevant results are summarised in Table 6.
As presented in Table 6,
The NDVI and TVI2 are the best correlated indices for the
The best correlated vegetation indices were selected as independent variables for adjusting regression models to estimate NPP.
3.3.2. Models for the NPP
The best mathematical models to estimate the NPP for the
The observed standard error of the estimates are lower in the model using as independent variable the blue, the green and the red reflectances, and in the model using the NDVI, respectively. However, the model with NDVI as independent variable reveals a lower ME. Additionally, this model has a superior applicability since the individual bands reflectance have a great variation along the year, thus varying from image to image.
Based in the field measurements and in the estimated NPP, by the model using only the NDVI directly as independent variable (R2=0.493), two images were created for the entire study area (Figures 9a and 9b).
After the classification into four classes (1 – NPP < 5 ton ha-1year-1; 2- 5≤ NPP <10 ton ha-1year-1; 3 - 10 ≤ NPP < 15 ton ha-1year-1; and 4 - NPP > 15 ton ha-1year-1) the cross tabulation was carried out and the matrix error table analysed.
Kappa statistic showed a slight agreement around 37%. However, for a first approach these results are a good indicator for further studies. From the analyses of the
A significant result to estimate
3.3.3. Models for the NPP
The best mathematical models to estimate the NPP for the
As in the
In this research, AGB and NPP estimates were carried out by means of forest inventory data remote sensing imagery and geostatistical modeling. The general conclusions are:
In the case study I, tree Aboveground biomass (AGB) mapping approaches were compared: Inventory Polygons; Direct Radiometric Relationships (DRR) and Regression-kriging (RK). Pure pine stands were mapped and AGB estimates were achieved using data collected in the National Forest inventory dataset. The Inventory polygons method was used since the field plots of forest inventory dataset fall within all the polygons of the forest cover map. At the same time, this approach was used to compare and validate DRR and RK methods.
The results showed that DRR and RK, using Vegetation Indices transformed from MODIS remotely sensed data, can be used for biomass mapping purposes. However, it should be pointed out that, in the present research, the coarse resolution of MODIS (250m) data associated with small polygons of the pine landcover class did not allow to extract the pure spectral response of this vegetation type. Hence, the correlation between AGB and NDVI as independent variable is not as high as desired.
This limitation can be overcome by using images with higher spatial resolution. Moreover, these methodologies can be applied with greater accuracy in areas where land cover polygons are large enough to minimize, as much as possible, the effect of edging.
The analysis of statistical parameters of validation dataset such as the mean error (ME), the mean absolute error (MAE), standard deviation (SD) and the root mean squared error (RMSE) show that RK, making use of geostatistical modeling techniques, combined with remote sensing data as auxiliary variable improves the predictions when compared to DRR. Furthermore, RK has the advantage of generating estimates for the spatial distribution of AGB and its uncertainty for the study area. The uncertainty maps allow the evaluation of the reliability of estimates by identifying the sites with major uncertainties which can be useful to select different estimation methods for those areas.
In the case study II, some simplified methodologies were proposed to estimate NPP. For the
Despite the direct NPP estimation from remote sensing data did not provide very promising results, it was possible to establish indirect relationships between some vegetations indices calculated from Landsat ETM+ imagery data and the litter NPP, shrubs NPP and from basal area of the studied forest stands.
Those simplifications can be extremely important when time and economic resources are limited. The importance of those methodologies could become more relevant as NPP is a variable very difficult to obtain, consuming time and demanding hard fieldwork.
The loss in accuracy is certainly compensated by decrease of fieldwork. The balance between both should only be taken in each particular case, considering the general context of each situation (e.g., time and funds available, human resources available, objectives of the research).
Authors would like to express their acknowledgement to the Portuguese Science and Technology Foundation (FCT), programmes SFRH/PROTEC/49626/2009 and FCT FCOMP-01-0124-FEDER-007010 (PTDC/AGR-CFL/68186/2006).