Optical and active sensor derived variables used in forest AGB estimation.
Comprehensive measurements of global forest aboveground biomass (AGB) are crucial information to promote the sustainable management of forests to mitigate climate change and preserve the multiple ecosystem services provided by forests. Optical and radar sensors are available at different spatial, spectral, and temporal scales. The integration of multi-sources sensor data with field measurements, using appropriated algorithms to identify the relationship between remote sensing predictors and reference measurements, is important to improve forest AGB estimation. This chapter aims to present different types of predicted variables derived from multi-sources sensors, such as original spectral bands, transformed images, vegetation indices, textural features, and different regression algorithms used (parametric and non-parametric) that contribute to a more robust, practical, and cost-effective approach for forest AGB estimation at different levels.
- aboveground biomass
- regression algorithms
- machine learning algorithms
- remote sensing
- satellite-derived predictor variables
The aboveground biomass (AGB) of forests directly provides the amount of organic matter in living and dead plant materials, for example, of leaves, branches, and stem, and it is expressed in dry weight per unit area . AGB is also one of the major reservoirs of carbon in forest ecosystems and a key indicator of forest health . Thus, measuring and monitoring AGB changes is crucial to understanding AGB role in the global carbon cycle to reduce carbon dioxide concentrations and to mitigate climate change . AGB was recognized as an Essential Climate Variable by the United Nations Framework Convention on Climate Change (UNFCCC)  and an essential biophysical parameter of forest ecosystems  for estimating carbon and water cycling, and energy fluxes between land surface and atmosphere. These are processes relevant on the background of climate change . In addition, AGB resources underlie the development of a bio-based economy as part of the European Union Forest-Based Vision targets sector 2030 . The key role of AGB forests leads to the need for accurate and precise estimates of AGB to assess changes in forest structure, including understanding the roles of forests in the terrestrial carbon flux and global climate change .
Spatial and temporal quantification and monitoring of AGB are important to assess the impacts of climate and land use changes on the global carbon cycle and understand the causing effects on ecosystem resilience . Fast, accurate and timely estimation and monitoring of AGB, at local, regional, and global scales, will significantly reduce the uncertainty in carbon stock assessments and provide valuable information to better support sustainable forest management strategies [9, 10]. The frequently used methods to estimate AGB are the indirect, which are based on mathematical relations between biomass, dependent variable and one or a few easy-to-measure tree variables, independent variables . Traditionally, indirect methods use forest inventories and allometric functions at tree level to evaluate biomass at plot level and an extrapolation method to assess all area .
In last decade, there has been an increase of biomass mapping and estimations for global terrestrial ecosystems using remote sensing (RS) . Satellite-derived predictor variables have been used to estimate AGB and assess its spatio-temporal variability through valuable RS approaches . The development and implementation of RS-based AGB estimation’ monitoring frameworks may provide low-cost and accurate operational geospatial tools for rapidly and effectively detecting, mapping and assessing relevant changes in AGB in any study area. On the other hand, this same type of geospatial tool might be able to support the execution, monitoring, and impact evaluation of nature conservation policies and strategies, by providing a systematic and accurate forest AGB estimation monitoring system . RS provides variables indirectly related to AGB, and the spectral response (original or transformed sensor bands) of the vegetation cover, density, shade, and texture is correlated with AGB .
In response to the urgent need for much improved mapping of global forest biomass and the lack of current space systems capable of meeting this need, the BIOMASS mission arose from European Space Agency (ESA) as an Earth observing satellite planned to launch in October 2022 . BIOMASS mission aims to provide the scientific community with accurate maps of world’s forest biomass, including height, state, disturbance patterns, and how they are changing [17, 18]. Gathering BIOMASS mission information with the economic and environmental knowledge will enable to reach a better spatial planning of the forest AGB .
The objective of this chapter is to present the satellite images, satellite-derived predictor variables and algorithms used to estimate forest AGB with RS data. To accomplish this objective, the Section 2 presents the types of satellite sensors used in RS. The Section 3 describes the satellite-derived predictor variables. In Section 4, there is a description of the most used algorithms to predict forest AGB. Section 5 presents a discussion of the more frequently used satellite images, algorithms, and variable importance for AGB estimation with RS. Finally, in Section 6, the main conclusions are presented.
2. Remote sensing satellite sensors
Currently, there is a wide variety of RS data available for forest AGB estimation, such as optical (passive) and radar (active) sensors data, at different spatial, spectral, and temporal scales resolution, from local to global scale, with or without costs . The selection of the proper satellite data it will depend on the scope of the research and on the study area, considering the area size, type of forest, and available budget to obtain accurate forest AGB estimations. Optical and radar RS follow similar approaches for forest AGB analysis, modeling, mapping, and monitoring . Optical RS imagery gives spectral information of the horizontal forest structure , while radar RS imagery gives information of the vertical forest structure due to the ability of penetrating the forest canopy to a certain depth .
The analyze of optical data is easier to use than radar RS data because, after calibrating and correcting the images, the data can be directly processed and extracted and similarly interpreted as a photograph. As passive sensor, optical RS sensors needs a light source (e.g., sun light), and the quality of the images is dependent of the weather and day light. These sensors record a variety of electromagnetic spectrum radiation frequency, especially at wavelengths of visible light and infra-red. Optical RS sensors record the reflectance of the electromagnetic spectrum of earth objects in the visible (Blue, Green, and Red) and infrared regions (near infrared – NIR and short wave infrared – SWIR). Visible and NIR (VNIR), and SWIR are the wavelengths more sensitive to vegetation characteristics.
Optical RS data are widely used to estimate AGB of different types of forest, such as in temperate forests , Mediterranean forests , sub-tropical forests , and boreal forests . Different types of spatial resolution sensors (low: 100–1000 m, and medium: 10–100 m) have been used in studies of AGB estimation at global, regional, and local scale, such as MODIS , Landsat ETM+ , SPOT , and more recently Sentinel-2 . Very high spatial resolution sensors (<1 m), such as IKONOS, Quickbird II, WorldView-2, have advantage over low and medium spatial resolution sensors, despite their cost. These sensors provide finer spatial scale with a greater thematic resolution and accuracy which can be used to complement AGB forestry measurements from the forest inventory [31, 32]. Monitoring and evaluation of AGB estimation over time can be performed by using optical RS due to existing continuous data and wide spatial coverage. However, despite the widespread usage of optical RS data in the estimation of forest AGB, these are limited to their poor penetration capacity in the forest canopies and clouds, in addition to presenting problems of data saturation in high AGB density .
On the contrary, interpretation of radar RS sensor data, that is, Synthetic Aperture Radar (SAR) and Light Detection and Ranging (LiDAR) sensors imagery, is not always straightforward because of the signal being responsive to surface characteristics, like structure and moisture, and consequently to have the non-intuitive and side-looking geometry. Despite that, radar provides much more information than just an image to be visually analyzed because of being characterized by two data values for each pixel, a Magnitude value (image analogous to one collected by an optical sensor) and a Phase value (it cannot be visually interpreted). As an active sensor, radar has the advantage of providing its light source and enabling it to operate 24 hours a day.
SAR sensors have been used to estimate forest AGB to complement the spectral reflectance characteristics of vegetation in the optical regions and are very useful in regions often covered by clouds . These sensors are sensitive to water content in vegetation and register information independently of the weather conditions  through the interaction of the radar waves with tree scattering elements . The techniques most used in biomass studies using SAR data are regression based on backscattering amplitudes  and interferometry based on backscattering amplitudes and phases . The important factors in the backscattering coefficient are wavelength (e.g. ,K, X, C, L, P), polarization (e.g., HH, VV, HV, VH), incidence angle, land cover, and terrain properties (e.g., roughness and dielectric constant).
The short wavelength X and C bands, which interact only with tree canopy surface layer information, are more suitable for AGB estimation in areas with low biomass . The long wavelength L and P bands are more suitable for AGB estimation of dense forest with relatively high biomass density for presenting greater interaction with the forest elements, such as branch, stem, and soil under the canopy  and by allowing canopy height to be retrieved by polarimetric interferometry .
The polarization information can be linear and crossed. The linear polarization is obtained through linear transmission and reception signal, horizontal (H) and vertical (V), HH and VV, respectively . In the cross-polarization, the transmission and reception signal are different, i.e., HV and VH. Cross-polarized VH backscatter has been considered the largest dynamic range and the highest correlation with biomass . In the case of linear polarization, ground conditions can affect the biomass-backscatter relationship. HH backscatter comes mainly from stem-ground scattering, while VV backscatter results from both volume and ground scattering.
SAR sensors has been used in several forest AGB estimation studies, such as in Miombo Savanna Woodlands , deciduous broadleaved and mixed broadleaf-conifer forests [30, 42], tropical peat swamp forests , tropical savannas and woodlands , temperate deciduous forest , rainforest , coniferous temperate forest , and mixed and deciduous boreal forest . Currently, there is a considerable number of ongoing SAR missions from various agencies, namely Sentinel-1A,B, ALOS-2, SOACOM-1a,b, Cosmo-SkyMed SG, and PAZ. For historical analysis, it can use data from sensors, such as, ERS-1,2, JERS, Envisat, Radarsat-1,2, ALOS, TerraSAR-X, and TanDEM-X. However, despite the positive correlation with the forest structure parameters, SAR data can present saturation problems. The saturation problems can be over different types of temperate, boreal, and tropical forests  and depend on the wavelengths, L band at around 100–150 t/ha  and 250 t/ha , polarization, characteristics of the vegetation stand structure and ground conditions . Furthermore, estimating AGB with only SAR data is difficult, since these data provide information on the roughness of the land cover surfaces and do not distinguish the types of vegetation .
LiDAR is an active RS method sensor composed by a laser, a scanner and a specialized GPS receiver used from spacecraft satellite for Space-borne Laser Scanning (SLS), aircraft for Airborn Laser Scanning (ALS) – the most used in forestry approaches – and on the ground level, Terrestrial Laser Scanning (TLS). LiDAR uses pulsed laser light to measure ranges (variable distances) to the object Earth. Differences in return times and laser wavelengths serve to calculate distance traveled which is then converted to elevation. These measurements generate a cloud of points that allow the 3D representation of the vegetation, based on the identification of the X, Y, and Z location, and can penetrate within forest canopy . The airborne LiDAR has the advantage of covering a large area, allowing its use in large areas with minimum occlusion of the terrain by vegetation. Also, it does not saturate even at very high biomass levels (>1000 Mg/ha) .
LiDAR provides accurate measurements of vegetation structures, such as height, crown size, basal area, stem volume, and vertical profile. These measurements to characterize the canopy or crown cover in three dimensions and consequently to estimate forest AGB in any study area . The LiDAR metrics can be extracted on the basis of either individual trees  or areas . The extraction of these metrics depends on the laser return signal (discrete-return vs. waveform), scanning pattern (scanning or profiling), and footprint size (small vs. large). LiDAR data have been used to estimate forest AGB in combination with other sensors data (optical and/or radar) [15, 58]. Moreover, LiDAR data are a good complement to forest inventory data because they capture spatial variability and can be acquired quickly and in large or difficult to access areas . However, the limited availability and the cost of these data prevent its extensive application .
Integration of multi-source RS data, that is, optical, SAR or/and LiDAR data, is important to improve AGB estimation because more information about forest structure features is integrated than just by a sensor. The integration of multiple data sources for more accurate forest AGB estimations has been explored by several authors [44, 59, 60]. In this way, the advantages of each sensor are highlighted to the detriment of negative characteristics of the sensors . Nevertheless, RS data should be complemented with AGB field data measurements, as training and validation data, in order to improve the accuracy of the AGB estimation model .
3. Satellite-derived predictor variables
In studies of forest AGB estimation, it is important to integrate different types of independent variables derived from passive and active sensors, such as original spectral bands, transformed images, vegetation indices, and textural variables (Table 1), to achieve accurate predictive models .
|Optical||Spectral features||Spectral bands, vegetation indices, and transformed images||(e.g. )|
|Spatial features||Textural images||(e.g. )|
|Active||Radar||Backscattering coefficients, textural images, interferometry SAR, and Polarimetric SAR interferometry||(e.g. [38, 64, 65, 66])|
|LiDAR||Lidar metrics based on statistical measures of point clouds or estimated products (e.g. canopy height or individual trees)||(e.g. [67, 68])|
Spectral bands (e.g., VNIR and SWIR) reflect the vegetation structure, texture, and shadow, related with leaf cellular structure and plant pigments, which are correlated with AGB . VNIR wavelengths are more sensitive to pigments and overall canopy health, while SWIR capture many biochemical, leaf mass per area, and discriminates moisture content of soil and vegetation . In Red region occurs absorption by chlorophyll, while in NIR region, there is a pronounced reflection by mesophyll cells . Green band is related with the greenness of vegetation representing the absorption intensity of Blue and Red energy by plant leaves and the reflection intensity of green energy . On the other hand, hyperspectral sensors allow to have many narrow contiguous spectral bands and can accurately discriminate absorption features’ wavelength position and shape.
Transformed images have been used to reduce the dimension of the data, required for optimal performance, by producing new variables from optical multispectral data . There are some image transformation techniques, such as Principal Component Analysis (PCA), Minimum Noise Fraction transform (MNF), and Tasseled Cap Transform (TCT). PCA is the most used transformed image to enhance the change information from stacked multisensor data and captures maximum variance generating a new reduced set of bands in which the information is concentrated and that have little correlation . For
MNF transform is a linear transformation of the reflectance data of hyperspectral and high spectral resolution images to determine the inherent dimensionality of image data, to segregate noise in the data, and to reduce the computational requirements for subsequent processing [72, 73]. MNF is composed of two PCA rotations that separate the noise from the data. The first rotation consists in the call noise whitening process in which the principal components of the noise covariance matrix are used to decorrelate and rescale the noise in the data and obtain transformed data. The transformed noise data have unit variance and no band-to-band correlations. The second rotation uses the principal components derived from the noise whitening process and rescaled by the noise standard deviation. The inherent dimensionality of the second transformation data is determined by examining the final eigenvalues and the associated images. In this way, MNF transformation may reduce the dimensionality of hyperspectral image data, eliminate correlations band-to-band, and order components in terms of image quality .
TCT is a conversion of the original bands of an image into a new set of bands through an orthogonal transformation with defined interpretations, useful for vegetation mapping, and directly associated with important physical parameters of the vegetation . TCT uses a similar concept to the PCA in which linear combinations of the original image bands are performed. The tasseled-cap band are produced as the sum of image band 1 times a constant plus image band 2 times a constant, etc. The coefficients used to create the tasseled-cap bands are derived statistically from images and empirical observations and are specific to each imaging sensor. TCT aims to compress the spectral data in few bands associated with physical scene characteristics with minimal information loss . These bands are then correlated, transforming them orthogonally into a new set of axes associated with physical features. The resulting spectral features consists in three axes defined as brightness, greenness, and wetness that are directly associated with important physical parameters . Brightness – the first feature of TCT – is a weighted sum of all the bands and is related with bare or partially covered soil, natural and man-made features, and variations in topography . Greeneess – the second feature of TCT – is a measure of the contrast between the NIR band and the visible bands and is related with vegetation amount in the image. Wetness – the third feature of TCT – is orthogonal to the first two components and is related to canopy and soil moisture . TCT was developed in 1972 for the Landsat multi-spectral scanner (MSS) to understand the growth patterns of plants, soil moisture, and other hydrological features in the spectral space formed by combinations of different bands but is adapted to current sensors.
Vegetation index is a spectral transformation of at least two bands to improve the contribution of the vegetation properties of an image. The wide variety of vegetation indices are calculated based on the ratio between two or more bands to contrast the high absorption by leaf pigments (chlorophylls, carotenoids, and xanthophylls) in the visible spectral region (400–700 nm), high reflectance by leaves in the NIR region (700–1300 nm), and moderate water absorption in the SWIR (1300–2100 nm) . There is a wide range of vegetation indices that are used in the estimation of AGB (Table 2).
|Normalized difference vegetation index||NDVI|||
|Enhanced vegetation index||EVI|
Note: C1 = 6; C2 = 7 5; L = 1; G = 2 5
|Modified simple ratio||MSR|||
|Specific leaf area vegetation index||SLAVI|||
|Soil-adjusted vegetation index||SAVI|||
|Triangular vegetation index||TVI|||
|Corrected transformed vegetation index||CTVI|||
|Transformed triangular vegetation index||TTVI|||
|Ratio vegetation index||RVI|||
|Normalized ratio vegetation indexes||NRVI|||
|Infrared percentage vegetation index||IPVI|||
|Optimized soil-adjusted vegetation index||OSAVI|
Y = 0.16
|Normalized difference index using bands 4 & 5 of Sentinel-2||NDI45|||
|Inverted red-edge chlorophyll index (Sentinel-2)||IRECI|||
|Transformed normalized difference vegetation index||TNDVI|||
|Sentinel-2 red-edge position||S2REP|||
|Green normalized difference vegetation index||GNDVI|||
|Green ratio vegetation index||GRVI||[101, 102]|
|Normalized difference water index||NDWI||[83, 103]|
|Moisture stress index||MSI|||
Two most common vegetation indices used in forest AGB estimation are NDVI and SR . NDVI is the fraction of the difference and the sum of the NIR and Red bands where chlorophyll absorbs Red whereas the mesophyll leaf structure scatters NIR . SR is the ratio between NIR and RED  and intends to capture the contrast between the RED and NIR bands for vegetated pixels. Both vegetation indices prove to have a good relation with the AGB estimation derived from satellite images data in several types of forests [63, 82]. Further, canopy moisture content can be quantified through vegetation water indices, namely NDWI which is related with NIR and SWIR bands .
Texture is a feature used to identify objects or regions of interest in an image , based on mathematical pattern (spatial) analysis. Texture is characterized by defining local spatial organization of spatially varying spectral values that is repeated in a region of larger spatial scale. The variations in these scales in the image values that constitute texture are generally due to an underlying physical variation in the landscape that changes reflectivity or emissivity . Textural analysis techniques can be used to provide quantitative metrics that are highly sensitive to the underlying processes of change . However, as the texture has many different dimensions, there is no single texture representation method that is suitable for a variety of textures. There are several methods of extracting textures from RS images; however, texture measurements based on the gray level co-occurrence matrix (GLCM)  is one of the most used in forest AGB estimation [64, 108]. The extraction of appropriate descriptions of texture involves selecting moving window sizes which in GLCM is a key parameter in texture analysis [106, 109]. Theoretically, variation and contrast should increase with increasing size and displacement of the window until the size of the textured objects is reached . Overall, small windows produce noisier estimates of the texture descriptor and maintain a high spatial resolution, while larger windows amplify the estimation errors near spatial instances. Due to the variety of objects in an image, when estimating texture parameters should not be used a fixed window. The estimation of texture parameters should be done based on the directional invariant measures, which are the averages between the texture measures of four directions (0, 45, 90,° and 135°) . There are also 8 statistical texture measures, from 14 suggested by Haralick , considered the most relevant for the analysis of RS images: angular second moment, contrast, variance, homogeneity, correlation, entropy, mean, and dissimilarity . The information of each of these texture measures depends on the type of image to be analyzed in relation to the spectral domain, the spatial resolution and characteristics of detected objects (size, shape, and spatial distribution). In addition, when faced with a complex forest structure textural images have stronger relationships with biomass than the original spectral bands .
4. Algorithms to predict forest AGB
Forest AGB estimated from RS data is usually via an indirect relationship between the spectral response (original or transformed sensor bands) and AGB calculations based on field measurements, allowing an extrapolation of field data collected for larger scales [111, 112]. Different prediction methods can be applied to estimate forest AGB [52, 113]. The most used methods for forest AGB estimation are the linear and multiple regression models . However, in recent years’ machine learning methods, such as, random forest (RF), support vector machines (SVM) and artificial neural network (ANN) have become more prevalent .
Linear and multiple regression models are parametric algorithms which assumes that there is a linear relationship between both the dependent (i.e., AGB) and independent (derived from DR) variables . Simple linear regression establishes a relationship between a dependent and one independent variable. If there is a relationship between two or more independent variables, the regression is called multiple linear regressions. Multiple regressions can be linear and nonlinear. This type of regression also allows to determine the relative contribution of each of the independent variables to the total explained variance and the explained variation of the model. Despite being an easy method to calculate the relationship between RS-derived variables and forest AGB, parametric algorithm is not simple global linear because it is affected by many factors (e.g., forest age, tree species, and tree height). Thus, a stepwise regression model might be applied to identify the appropriate RS-derived variables that present strong relations with forest AGB .
Among the existing non-parametric algorithms, only the most commonly applied to predict forest AGB estimation using RS data will be described, that is, RF, SVM and ANN [10, 52]. Non-parametric algorithms are more flexible than the parametric algorithms and create more complex models of non-linear AGB. These machine learning methods are a more reliable technique to estimate AGB  because do not have predefined model structures and the data determined the structure of the model.
RF is a machine learning classification and regression technique that creates a vast number of uncorrelated decision trees at training time, where the most accurate decision tree can be voted . In addition, regression tree-based methods have a higher potential to identify non-linear relationships between dependent and independent variables . This advantage is significant for RS-based studies, where data have shown low linear relation with AGB and the variables might be collinear . RF has also the advantage of using multisource data in large study areas .
SVM is a machine learning algorithm that analyzes data used for classification and regression analysis . This algorithm, from a set of category-identified training examples, build a model in which the new examples are attribute to one category or another. SVM constructs a linear separation rule between examples in a higher-dimensional space induced by a mapping function in training samples. This algorithm has the ability to use small data training samples to produce relatively high estimates of forest biophysical parameters using remote sensing data .
ANN is an important non-parametric model for forest parameter estimation  that simulates the associative memory as animal brain . This algorithm learns by processing examples that have one or more inputs (independent variables from different data sources, such as RS and ancillary data) and known results, establishing associations by probability that will contribute to the “learning.” These associations are stored in their net data structure. After receiving several examples, the net is able to predict the results from inputs using the previously established associations. Thus, the greater the number of examples, the greater the accuracy of ANN’s predictions will be. However, the relationship between the dependent and the independent variable is not easily interpreted .
Accurate predictive models of forest AGB are of great importance for forest management and climate mitigation . In general, there are three widely used methods of validation of forest AGB estimation. According to Lu et al. , the first method consists in selecting a set of sample plots through random, systematic, and stratified random sampling. The sample plots will be randomly divided into two subsets. One of the subsets will be used to train the model (e.g., 75% of subset data), while the other will be used to calibrate the model (e.g., 25% of subset data). In this case, both subsets are produced from the same sample plot which may lead to an overestimation of accuracy. The second method is cross-validation where a set of sample plots is selected using one of the first sampling previous methods. Here, a plot sample is removed while the remaining plots are used for the development of forest AGB estimation model. This method has a similar advantage to the first; however, it presents a more reliable precision assessment. The third method involves the use of an independent set of sample plots collected through a sampling design. However, despite being theoretically reliable, this method is more expensive.
These accuracy statistics are often expressed as the coefficient of determination (R2), a measure of how well model predictions explain the target variance of the validation set, and the root mean square error (RMSE), a frequently used measure that indicates the absolute fit of the model to the data (i.e., how close the observed data points are to the model’s predicted values). RMSE is a good measure of how accurately the model predicts the response, and it is the most important criterion for fit if the main purpose of the model is a prediction . In general, a high R2 and a low RMSE value shows a good adjustment between the model developed and the sample plot data. Thus, obtaining an accurate predictive model of forest AGB estimation is important to provide valuable information to better support sustainable forest management strategies to mitigate climate change and preserve the multiple ecosystem services provided by forests.
5. Discussion of forest AGB estimation using RS data
Over the past decades, there has been an improvement in satellite data from sparse coarse to medium and fine spatial resolutions, allowing better accuracy in estimating forest AGB at local, national, and global scale . Recently with the launch of the Sentinel satellite family, more accurate predictive models of forest AGB estimation may be produced due to the existence of better spatial (bands with 10, 20, and 60 m) and spectral resolutions data, with a 5 days’ revisit time of these satellite family in comparison with other free commercial satellite data, as Landsat or MODIS . For instance, Landsat images with spatial resolution of 30 m contain many mixed pixels, and a pixel can contain different trees species and vegetation ages. In addition, large amount and good quality of field measurements, obtained from forest inventory plots data [123, 124] and/or from LiDAR data  should be used, as training and calibration data, to obtain accurate model of forest AGB estimation.
More recently, the studies of forest AGB estimation have been using the combination of optical and radar data. The integration of different remotely detected data sources showed to increase the accuracy of the predictive models of forest AGB estimation. In this way, the incorporation of forest structural parameters of SAR data overcome the problems of mixed pixels and data saturation caused by optical data [30, 126]. For instance, Townsend  observed that the model’s performance for estimating biophysical properties of forests has improved due to the capabilities of the Landsat TM and SAR data. On the other hand, Forkuor et al.  when mapping forest AGB found better predictive accuracy of AGB when combining optical and SAR sensors (Sentinel-1 and 2) than individually. However, several authors corroborate that optical sensors produce better forest AGB estimation results than SAR when used individually [8, 44, 128] despite the lack of sensitivity of the optical data to AGB beyond the canopy closure and grass interference in savannas and forests [129, 130].
In the last years, predictive models to estimate forest AGB have been applying machine learning algorithms based on decision trees instead of the traditional parametric regression models . Machine learning algorithm (e.g., ANN) showed advantage over regression algorithms for being versatile and flexible . This advantage was observed by Ou et al.  when comparing with two parametric models (linear regression model and linear regression with combined variables), the two non-parametric models (RF and ANN) resulted in significantly greater estimation accuracies of forest AGB, that is, higher coefficient of determination (R2) and lower root mean square error (RMSE). Other authors corroborate with this statement by showing that non-parametric models have greater capacity to better capture the heterogeneity of forest AGB compared to parametric models [47, 64, 128, 133].
Among the variety of machine learning techniques, RF algorithm revealed to be one of the best methods for classification and regression by providing high accuracy in estimating forest AGB, high speed of computation, robustness and capacity to predict the important variables either using optical or SAR data [30, 59, 128, 134, 135, 136, 137]. Also, RF showed to be suitable for analyzing a larger data set, while other non-parametric algorithms, such as support vector regression (SVR), are more suitable to be used with small data set [30, 47] and in grasslands and shrubs AGB estimations . However, regardless of the algorithm applied to the model (e.g., linear regression, RF, and ANN), independent variables seem to be more important to obtain accurate forest AGB estimations .
The predictive models are able to explore and rank the variables importance measure in the forest AGB estimation. Textural features from optical data (spectral data) and SAR (backscattering data), spectral vegetation indices, and, more recently, biophysical variables derived from Sentinel-2 (e.g., LAI - Leaf area index, FVC - Fractional vegetation cover, and FAPAR - fraction of photosynthetically active radiation) have been considered as the most important variables for forest AGB estimation [30, 47, 141, 142, 143, 144, 145]. Spectral bands produce predictive models with lower accuracy than using vegetation indices, transformed images and textural features . Therefore, Forkuor et al.  showed that SWIR bands are important in predictive models of AGB estimation in semi-arid regions. In addition, the integration of variables (e.g., multispectral bands, transformed images, vegetation indices, and textural features) from optical and SAR sensors provide more accurate predictive models of forest AGB estimation [10, 52] than simple backscatter (SAR) and spectral (optical) bands [30, 47, 141, 142].
Vegetation indices are still important variables to estimate forest AGB as reported by several authors [59, 107, 134, 139, 140]. In last years, due to the Multi-Spectral Instrument aboard of the Sentinel-2 satellite, two relatively new vegetation indices, NDI45 and IRECI emerged (Table 2). Both new vegetation indices take advantage of the Sentinel-2 Red-edge bands (band 5 = 705 nm; band 6 = 740 nm; band 7 = 783 nm) to reduce the effects of saturation problem in high AGB density . NDI45 is similar to NDVI but the original NIR band of 800 nm , is replaced by the new Red-edge band (band 5) and the Red band (band 4 of 665 nm) is kept . On the other hand, IRECI uses the three available Red-edge bands of Sentinel-2 and put little emphasis in the red band to avoid saturation problem .
Transformed images, such as PCA, are also an important variable to face the saturation problem of optical sensor at low to intermediate biomass levels (between 60 and 150 Mg/ha) . These images can also be used as input for textural images of optical and SAR sensors to prevent the saturation problem of high AGB density. For instance, textural variables showed to be more suitable to predict forest AGB estimation due to its ability to simplify complex cover structures, such as uneven-aged forests and different canopy structure then spectral bands [147, 148]. Also, textural bands from optical sensor images (e.g., sentinel-2, SPOT-6, and AVNIR) contributed to obtain accurate predictive models of forest AGB estimation than the original spectral bands [47, 132, 147, 149].
In addition, the greater interaction capacity of SAR-derived variables with the forest elements, such as branch, stem, and soil under the canopy [39, 65, 150], highlight their advantage over biophysical parameters (e.g., LAI, FVC, and FAPAR) to estimate forest AGB . Hence, the importance of SAR long wavelengths (P-band), capable of providing accurate forest AGB estimations, will be harnessed in the BIOMASS mission to provide unprecedented information on the distribution of world’s forest AGB and its changes [17, 18]. This mission will help to build a sustainable global system of monitoring and quantification of biomass over time to help countries in managing forest resources and mitigating the impacts of climate change and land use changes.
From the analysis of several forestry AGB estimation studies, the integration of optical and radar data improves the information extraction process, taking advantage of the strengths of different image data. In this way, mixed pixel problems and data saturation is reduced. Further, Sentinel satellite family showed to be promising free satellites data to reach accurate forest AGB estimation models, including in regions with few or scarce AGB information.
Non-parametric models, such as RF, SVM, and ANN, have been replacing regression models due to their greatest ability to capture the heterogeneity of forest AGB than parametric models. Among the variety of machine learning techniques, RF algorithm showed to be one of the most used with ability to obtain better accuracy in forest AGB estimation, either using optical or SAR data.
The integration of different data sources RS-derived, that is, spectral bands, transformed images, vegetation indices, textural features, showed good correlation with forest AGB. VNIR bands are the most important to calculate most of vegetation indices. When using Sentinel-2 data, the available red-edge bands showed to reduce the effects of saturation problem in high AGB density.
PCA is a key variable to face the saturation problem of optical sensor of high AGB density and to be used as input data for textural features of optical and SAR sensors also to prevent the saturation problem of both sensors. Textural features, from both optical and SAR sensors, are among of the most suitable variables for forest AGB estimation due to their stronger relationships with AGB. SAR long wavelengths bands (L and P) showed to be very promising bands in studies of relatively high biomass density.
This work is funded by the National Funds through FCT - Foundation for Science and Technology under the Project UIDB/05183/2020 and by Programa Operativo de Cooperação Transfronteiriço Espanha-Portugal (POCTEP), and Programa INTERREG V A Espanha – Project IDERCEXA – Investigación, Desarrollo y Energías Renovables para nuevos modelos empresariales en Centro, Extremadura y Alentejo, 0330_IDERCEXA_4_E.