Descriptive statistics per composition classes (where N is the number of trees per hectare, G the basal area per hectare, CC the crown cover calculated with satellite data, PHC crown horizontal projection, max the maximum, min the minimum, SD the standard variation and CV the coefficient of variation).

## Abstract

Assessment and monitoring of forest structure is frequently done with absolute density measures with field forest inventory data and expansion methods. The development of basal area and the number of trees estimation functions with data derived from very high spatial resolution satellite images enable their short-term and cost-effective evaluation, allowing also the estimation for the area not requiring extrapolation methods. The functions of basal area and the number of trees per hectare are based on crown cover obtained with very high spatial resolution satellite images for two evergreen oaks and umbrella pine. The three tree species are especially important in the agroforestry systems of the Mediterranean region. The linear functions fitted for pure stands of the three species and mixed stands of cork and holm oak and of cork oak and umbrella pine showed a better performance for basal area than for the number of trees per hectare. The inclusion of dummy variables for species composition improved the accuracy of the functions.

### Keywords

- quickBird image
- multi-resolution segmentation
- crown horizontal projection
- absolute density measures
- regression

## 1. Introduction

Holm oak (*Quercus rotundifolia*), cork oak (*Quercus suber*), and umbrella pine (*Pinus pinea*) are three of the most representative species in agroforestry systems in the Mediterranean basin [1, 2]. Their stands have usually low density, open heterogeneous canopies and as main production bark for cork oak and fruit for all [3, 4, 5]. They also have other productions, such as agricultural crops, pasture and extensive grazing. Due to the climate variability of the Mediterranean region, these multiple use systems enable risk dispersion and the sustainability in time [1, 2, 6].

Absolute density measures, especially the number of trees and basal area, are referred in silviculture textbooks as a proxy for competition and growing space at tree level and thus for growing conditions at stand level [7, 8, 9, 10, 11, 12, 13, 14, 15]. These measures are reported as a single value per stand, as the sum of the tree values, reported for a standard area, usually the hectare [7, 13, 16, 17, 18, 19], and have the advantage of enabling comparison between different stands [7, 8, 9, 10, 11, 12, 13, 14, 15].

The conventional approach to calculate the number of trees and basal area is using field plots of forest inventories, which have a statistical sampling design and an expansion factor when working on an area basis [20, 21, 22, 23, 24, 25]. Though a high accuracy is attained at field inventory plots, they have disadvantages related to the area sampled and the extrapolation methods needed for the evaluation on an area basis.

Remote sensing-derived data have been used to estimate both the number of trees [26, 27, 28, 29, 30, 31, 32, 33] and basal area [26, 29, 30, 32, 34, 35, 36, 37, 38, 39, 40]. The advantages are the estimation for small and large areas [25, 41, 42, 43]; enabling time series analysis in short time cycles [25, 42]; data collection at different scales, depending on the imagery spatial resolution [42, 43]; and enabling the evaluation of the area not requiring extrapolation methods [44, 45, 46, 47, 48, 49]. The disadvantages are related to the imagery spatial resolution; accuracy decreases with the increase of the pixel size [50, 51, 52].

## 2. Materials and methods

### 2.1. Satellite images and inventory data

Two satellite images were acquired as four-band products (Blue, Green, Red and Near-Infrared), with the bands pan-sharpened with the UNB algorithm [53]. The image from QuickBird satellite (hereafter Mora) was collected in August 2006, has an area of 80 km^{2} (central coordinate 8°4′53.98”W and 38°51′16.12”N) and a spatial resolution of 0.7 m. The image from WorldView-2 satellite (hereafter Alcácer) was collected in June 2011, has an area of 35 km^{2} (central coordinate 8°40′28.20”W and 38°27′45.71”N) and has a spatial resolution of 0.5 m. The forest area is composed of holm oak and cork oak in the former and cork oak and umbrella pine in the latter.

The images were orthorectified, georeferenced and atmospherically corrected. Ortho-rectification was carried out with the rational polynomial coefficient model (RCP) using the image metadata and ASTER GDEM digital elevation model (resolution 30 m) in Envi 4.8 [54]. For georeferencing the images, field data points obtained with global navigation satellite system (GNSS) were used, which resulted in a root mean square error (RMSE) of 0.49 m for Mora and 0.30 m for Alcácer. Atmospheric correction was done using the image digital number for top of atmosphere (ToA) reflectance to convert it to soil level reflectance using dark-object subtraction method [55].

From each image a vegetation mask per specie by a set of sequential steps was derived, using object-oriented analysis [56, 57]. Contrast split segmentation based on the normalized difference vegetation index image (NDVI, [58] in eCognition, version 8.0.1 [59]) was used to isolate tree crowns, resulting in a vegetation mask. In the second step multi-resolution segmentation [60] enabled the separation of the vegetation mask in different objects, using an algorithm with scale, shape and compactness as splitting parameters. The third step used the nearest neighbor algorithm to classify the objects per tree specie, in Weka 3.8.0 software [61], based on the descriptive statistics of the bands and on vegetation indices (NDVI, EVI, SAVI, and SR).

### 2.2. Inventory data

Each image was divided into a square grid, a function of their resolution, 65 × 65 pixels (45.5 × 45.5 m, 2070.25 m^{2}) for Mora and 90 × 90 pixels (45 × 45 m, 2025 m^{2}) for Alcácer. The grid area is similar to other studies with very high spatial resolution satellite images (e.g., [62]). Composition and crown cover (defined as the percent ratio between crown horizontal projection and grid area) were derived from the vegetation mask per specie using ArcGIS version 10 [63]. The grids were classified as monospecies if only one specie was present and multispecies otherwise. To design forest inventory, crown cover was grouped in three strata (10–30, 30–50 and >50%) and a random stratified sampling by proportional allocation was applied.

The field surveys took place in 2011 in Mora and 2013–2014 in Alcácer, measuring 17, 24 and 32 monospecies plots of holm oak (QR), cork oak (QS) and umbrella pine (PP), respectively, and 23 and 49 multispecies plots of holm oak and cork oak (QRQS) and cork oak and umbrella pine (QSPP), respectively. The total number of plots is 145. The main stand was defined as all individuals with diameter at breast height equal or larger than 6 cm for holm and cork oaks and 5 cm for umbrella pine. In each plot, for the main stand, diameter at breast height, total height and four crown radii (directions in north, south, east and west) were measured [16], as well as recorded tree location by GNSS. The number of trees per hectare (N) was defined as the sum of all individuals of the main stand per plot, scaled to a reference area, the hectare. Basal area per hectare (G) was defined as the sum of the sectional area of all the trees of the main stand in the plot, scaled to a reference area, the hectare [7, 16].

### 2.3. Statistical analysis

In the development of the functions for the number of trees and basal area per hectare, four types of functions were considered (Figure 1): (i) per composition classes (QR, QS, PP, QRQS and QSPP); (ii) not considering composition; (iii) considering crown cover per specie; and (iv) considering composition as dummy variables. Dummy variables were defined as a dichotomic variable where 1 refers to a specific composition (QR, QS, PP, QRQS or QSPP) and 0 to the other compositions. For (i) and (ii) the functions were fitted using ordinary least square linear regression through the origin (Eq. (1), where D is N or G, CC the crown cover and α the regression coefficient) and for (iii) and (iv) multiple linear regression with stepwise method and AIC as selection criteria (Eq. (2), where α and β are regression coefficients, d the dummy variables, i = QR, QS and PP and j = QR, QS, PP, QRQS and QSPP). For multiple regression, multicolinearity among explanatory variables was analyzed with the variance inflation factor (VIF), where no serious multicoliniarity is expected for VIF ≤ 10 [64, 65]. As suggested by several authors (e.g., [66, 67]) the sum of squares of the residuals (SQR), the coefficient of determination (R^{2}) and the adjusted coefficient of determination (R^{2}_{aj}) were used to evaluate the statistical properties of the models.

Whenever an independent data set is not available for model validation, re-sampling or jack-knifing methods are recommended. One of these methods uses PRESS residuals, calculated as the difference between the observed and the estimated value of a variable (in this study N or G), as a cross-validation statistic [68, 69, 70], which is considered also to improve reliability in the accuracy assessment [43]. PRESS residuals are attained by fitting the model iteratively k times with n-1 observations (where n is the total number of observations) guaranteeing their independence [68, 69, 70]. The validation of the models was tested for bias and precision—the former with the mean of the PRESS residuals (Eq. (3), where PRESSrm is the mean of the PRESS residuals, y the observed value, ŷ the estimated value and i the number of the observation) and the latter with the mean of the absolute PRESS residuals values (PRESSram, Eq. (4)). To the former was added the analysis of the 5th and 95th percentile of the PRESS residuals [70]. Heteroscedasticity associated with the error term was evaluated graphically with the plots of the studentized residuals and the estimated values and the normality of the studentized residuals with normal quantile-quantile plots and Shapiro Wilk normality test, for a probability level of 0.001 [68, 71]. To enable comparison with other studies (e.g., [36]), the relative root mean square error (RMSEr, Eq. (5)) was also computed. The statistical analysis was done using R statistical software [72].

## 3. Results and discussion

### 3.1. Satellite image analysis

The vegetation mask per species accuracy resulting from the multi-resolution and object-oriented classification, illustrated in Figure 2, depends on several factors, namely the forest tree density, the limits between forest trees and other land uses, image date, forest tree species reflectance and crown shape. Stands with low density and isolated forest trees, such as holm oak, cork oak and umbrella pine in agroforestry systems, enable the distinction and separation of the tree crowns with accuracy in very high spatial resolution satellite images (e.g., [62]), minimizing the confounding effects of shadow pixels which occur in tree crowns at closer spacings [73]. Conversely, the diffused boundaries between forest tree crowns and other land uses can decrease classification accuracy [74]. Thus image dates with the highest contrast between different land uses, especially between herbaceous vegetation and forest trees, such as the summer (June–August) in the Mediterranean region, produce the smallest confounding influence [75, 76]. The other two variables that influence accuracy of the vegetation mask per species are the reflectance similarity between species and crown shapes, the more similar the lower the accuracy. The forest areas of Mora and Alcácer are composed predominantly of isolated trees, though some clusters occur, and the images were acquired in the dry summer period (June and August), thus with high contrast between the forest trees and the other land uses, especially the forest understory that was composed of soil and dry herbaceous vegetation. Regarding reflectance and crown shape, holm oak and cork oak have larger similarities with wide ranges of reflectance and an irregular crown shape when compared to umbrella pine that has a narrower reflectance range and nearly circular crowns. Nonetheless, the spectral signatures of the tree species are different enough to enable a good classification [44, 46, 47, 48, 49]. The accuracy evaluated with Kappa statistic [77, 78] encountered true positives for 90% of the holm oak and 87% of the cork oak for Mora and 98% of the umbrella pine and 74% of the cork oak for Alcácer.

### 3.2. Number of trees and basal area functions

The variability for N is larger than for G and for QS, QRQS and QSPP than for QR and PP, denoted by the larger coefficient of variation (Table 1 and Figure 3). The number of trees and their spatial distribution in the stand determine their growing space and the tree crown area is also a function of the species epinastic control [12] and spacing, with crown closure originating crown shyness [14], thus unbalancing the relation between crown cover and basal area and crown cover and number of trees. Holm oak, cork oak and umbrella pine have, especially in adult trees, weak epinastic control and are shade intolerant [5, 79], thus justifying the wide variability observed in the plots. However, positive correlations are observed for CC and G for all plots (0.72), stronger for QR (0.86) and QSPP (0.85) and weaker for QS and PP (0.66 each) and QRQS (0.62). These results are expected due to the almost direct relation between crown and stem diameters [80, 81]. Conversely, the correlation between CC and N is weaker when compared to the former. Positive correlations are found for all plots (0.47), stronger for QS (0.72), QR (0.57) and QSPP (0.54) and weaker for QRQS (0.36) and no correlation for PP (−0.04). The number of trees per hectare is frequently associated with the individual stems and the stage of development and the stand structure [7, 8, 9, 10, 11, 12, 13, 14, 15], giving rise to a higher variability in the number of trees (Figure 4).

Min | Max | Mean | SD | CV (%) | Min | Max | Mean | SD | CV(%) | |
---|---|---|---|---|---|---|---|---|---|---|

All | QR | |||||||||

N | 5 | 140 | 67 | 28.0 | 42.4 | 29 | 116 | 72 | 27 | 37.1 |

G (m^{2} ha^{−1}) | 2.6 | 18.2 | 8.9 | 3.3 | 37.5 | 4.0 | 11.1 | 6.7 | 2.1 | 32.0 |

CC (%) | 9.7 | 72.6 | 37.1 | 15.1 | 40.6 | 13.7 | 67.6 | 37.6 | 15.9 | 42.3 |

PHC (m^{2}) | 195.8 | 1470.0 | 756.1 | 306.5 | 40.5 | 284.2 | 1399.0 | 777.7 | 329.2 | 42.3 |

QS | PP | |||||||||

N | 5 | 135 | 60 | 33 | 54.2 | 5 | 119 | 72 | 23 | 32.4 |

G (m^{2} ha^{−1}) | 5.2 | 15.4 | 9.1 | 2.8 | 30.7 | 3.8 | 15.9 | 9.5 | 2.6 | 27.5 |

CC (%) | 13.4 | 70.5 | 35.3 | 14.0 | 39.7 | 13.0 | 66.7 | 40.5 | 17.1 | 42.3 |

PHC (m^{2}) | 271.3 | 1460.2 | 723.9 | 293.0 | 40.5 | 263.5 | 1350.3 | 819.1 | 346.7 | 42.3 |

QRQS | QSPP | |||||||||

N | 19 | 140 | 62 | 29 | 47.7 | 15 | 123 | 69 | 30 | 43.1 |

G (m^{2} ha^{−1}) | 2.6 | 14.7 | 6.7 | 2.7 | 40.3 | 2.9 | 18.2 | 10.3 | 3.7 | 36.0 |

CC (%) | 14.3 | 58.8 | 29.4 | 11.4 | 38.6 | 9.7 | 72.6 | 39.8 | 14.5 | 36.5 |

PHC (m^{2}) | 296.9 | 1216.7 | 607.8 | 233.6 | 38.4 | 195.8 | 1470.0 | 804.9 | 293.9 | 36.5 |

In general, for the functions fitted for all composition classes (Table 2), there is an improvement in the statistical properties of the models from those not considering composition (N3 and G3) to those that included it as crown cover per specie (N2 and G2), with the best properties attained by those with composition as dummy variables (N1 and G1). For the number of trees, a reduction of −27.4% for SQR and an increase of 3.2% for R^{2}_{aj} from N3 to N1, a reduction of −18.0% for SQR and an increase of 2.1% for R^{2}_{aj} from N2 to N1 and a reduction of −8.0% for SQR and an increase of 1.0% for R^{2}_{aj} from N3 to N2 was observed. For basal area a reduction of −60.0% for SQR and an increase of 2.7% for R^{2}_{aj} from G3 to G1, a reduction of −53.5% for SQR and an increase of 2.5% for R^{2}_{aj} from G2 to G1 and a reduction of −4.3% for SQR and an increase of 0.2% for R^{2}_{aj} from G2 to G3 was observed. This is reflected in the better function performance of N1 or G1 when compared with N2 and N3 or G2 and G3. Bias is the lowest for N1 and G1 while precision is similar for all functions. The better statistical properties of N1 and G1 when compared with N3 and G3 could be, at least partially, explained by the inclusion of the composition variable in the functions as N and G have different behavior per composition classes (cf. Table 2). Comparing N1 and N2 or G1 and G2, the better performance of N1 and G1 can be explained by the relations between crown cover and the number of trees or basal area, for the different composition classes and probably by the influence of the clusters of mixed trees. When comparing N1 or G1, with the functions per composition class (Ni or Gi, i = 4,5,6,7,8), better model statistical properties are attained for N4 and N5 and G8 and G4, which correspond to the composition classes with smaller variability between CC and N and G (cf. Table 2), respectively. Nonetheless, in general, bias is larger for the number of trees than for the basal area. Similar results were attained for above-ground biomass functions with crown horizontal projection as an explanatory variable [44, 45, 46, 49]. The regression coefficients are presented in Eqs. (6)–(9) and Table 3.

Model | Independent variables | SQR | R^{2} | R^{2}_{aj} | PRESSrm | PRESSram | Percentile | RMSEr | |
---|---|---|---|---|---|---|---|---|---|

5th | 95th | ||||||||

N | |||||||||

N1 | CC, dQR, dQS, dPP, dQRQS, dQSPP | 89,148 | 0.884 | 0.879 | 0.002 | 0.828 | −1.362 | 1.920 | 37.0 |

N2 | CCQR, CCQS, CCPP | 105,215 | 0.863 | 0.860 | 0.145 | 0.794 | −1.178 | 1.966 | 40.1 |

N3 | CC | 113,595 | 0.852 | 0.851 | 0.172 | 0.802 | −1.105 | 2.062 | 41.7 |

N4 | CCQR | 9870 | 0.900 | 0.893 | 0.196 | 0.912 | −1.445 | 1.404 | 33.7 |

N5 | CCQS | 11,176 | 0.899 | 0.895 | 0.014 | 0.862 | −1.146 | 1.873 | 35.9 |

N6 | CCPP | 37,293 | 0.795 | 0.788 | 0.300 | 0.776 | −0.667 | 2.113 | 47.6 |

N7 | CCQRQS | 19,842 | 0.836 | 0.830 | 0.148 | 0.748 | −1.303 | 1.873 | 44.6 |

N8 | CCQSPP | 31,382 | 0.878 | 0.875 | 0.111 | 0.833 | −1.227 | 1.969 | 38.0 |

G | |||||||||

G1 | CC, dQR, dQS, dPP, dQRQS, dQSPP | 585 | 0.955 | 0.953 | 0.001 | 0.794 | −1.528 | 1.763 | 22.7 |

G2 | CCQR, CCQS, CCPP | 898 | 0.931 | 0.929 | 0.171 | 0.806 | −1.690 | 1.729 | 28.1 |

G3 | CC | 936 | 0.928 | 0.927 | 0.162 | 0.810 | −1.727 | 1.661 | 28.7 |

G4 | CCQR | 39 | 0.953 | 0.950 | 0.237 | 0.899 | −1.301 | 1.602 | 22.8 |

G5 | CCQS | 142 | 0.935 | 0.932 | 0.202 | 0.883 | −1.882 | 1.356 | 26.6 |

G6 | CCPP | 267 | 0.913 | 0.910 | 0.277 | 0.873 | −1.304 | 1.627 | 30.5 |

G7 | CCQRQS | 136 | 0.898 | 0.894 | 0.136 | 0.805 | −1.168 | 2.104 | 34.3 |

G8 | CCQSPP | 162 | 0.971 | 0.970 | 0.098 | 0.776 | −1.432 | 1.654 | 18.1 |

Model | α | Model | α |
---|---|---|---|

N3 | 1.68195 | G3 | 0.22776 |

N4 | 1.7749 | G4 | 0.16797 |

N5 | 1.7041 | G5 | 0.2436 |

N6 | 1.5324 | G6 | 0.21361 |

N7 | 1.9839 | G7 | 0.21583 |

N8 | 1.65806 | G8 | 0.255849 |

The coefficient of determination of the functions to estimate the number of trees per hectare with explanatory variables derived from satellite data is rather variable from 0.38 to 0.88 [26, 29, 30, 32]; an even wider variation is found for the functions to estimate the basal area per hectare, from 0.07 to 0.92 [26, 29, 30, 32, 35, 40]. In general, the functions for N and G estimation developed in this study have higher R^{2} than those of the aforementioned studies. RMSEr decreases from N3 and G3 to N1 and G1 (Table 2) and is smaller than that attained value for the basal area as studied by reference Hyvönen et al. [36] (46–53%) and Günlü et al. [40] (29–34%).

For N1, N2, G1 and G2, no serious multicoliniarity is expected as VIF for all explanatory variables is smaller than 7.7. The studentized residuals of N1, N2, N3, G1, G2 and G3 do not show systematic variations; the normal probability graphs approach the straight line (Figures 5 and 6), thought with deviations in the extremes, which is corroborated by the residuals not meeting normality (all, p < 0.001). Similar results were attained for N3–N8 and G4–G8.

## 4. Conclusions

The vegetation mask per species from very high-resolution satellite images enables deriving accurately crown cover. In spite of the similarities between holm oak, cork oak and umbrella pine, the methods of multi-segmentation and object-oriented classification allow their distinction with accuracy.

The functions with the best performances for the number of trees per hectare and basal area per hectare for the three species were attained with the explanatory variable crown cover and composition classes as dummy variables.

The agroforestry systems in general, and those of holm oak, cork oak and umbrella pine in particular, have high spatial variability which makes these functions well suited to evaluate and monitor their stands in space and time.

## Acknowledgments

The authors would like to thank all the forest producers for permission to undertake forest inventory; Paulo Mesquita for image processing; and Carla Coelho, David Gomes and Pedro Antunes for their fieldwork. The work was financed by Programa Operativo de Cooperação Transfronteiriço Espanha–Portugal (POCTEP), project Altercexa–Medidas de Adaptación y Mitigación del Cambio Climático a Través del Impulso de las Energías Alternativas en Centro, Alentejo y Extremadura (Refª 0317_Altercexa_I_4_E and 0406_ALTERCEXA_II_4_E), TrustEE–innovative market based Trust for Energy Efficiency investments in industry (Project ID: H2020–696140) and National Funds through FCT–Foundation for Science and Technology under the Project UID/AGR/00115/2013.

## Conflict of interest

Both authors declare that there is no conflict of interest.