Open access peer-reviewed chapter

The Potential of Sentinel-2 Satellite Images for Land-Cover/Land-Use and Forest Biomass Estimation: A Review

By Crismeire Isbaex and Ana Margarida Coelho

Submitted: November 28th 2019Reviewed: July 11th 2020Published: February 10th 2021

DOI: 10.5772/intechopen.93363

Downloaded: 41

Abstract

Mapping land-cover/land-use (LCLU) and estimating forest biomass using satellite images is a challenge given the diversity of sensors available and the heterogeneity of forests. Copernicus program served by the Sentinel satellites family and the Google Earth Engine (GEE) platform, both with free and open services accessible to its users, present a good approach for mapping vegetation and estimate forest biomass on a global, regional, or local scale, periodically and in a repeated way. The Sentinel-2 (S2) systematically acquires optical imagery and provides global monitoring data with high spatial resolution (10–60 m) images. Given the novelty of information on the use of S2 data, this chapter presents a review on LCLU maps and forest above-ground biomass (AGB) estimates, in addition to exploring the efficiency of using the GEE platform. The Sentinel data have great potential for studies on LCLU classification and forest biomass estimates. The GEE platform is a promising tool for executing complex workflows of satellite data processing.

Keywords

  • GEE
  • forest classifiers
  • accuracy
  • mapping

1. Introduction

In the last decades, remote sensing techniques have been applied in several studies of monitoring and classification of agricultural, forest, environmental, and socioeconomic resources [1, 2, 3, 4, 5]. The information extracted by a set of sensors can offer information on growth, vigor, dynamics, and diversity of vegetation cover [6, 7, 8]. In the LCLU classification and forest biomass estimation studies, the proper selection of the sensor is crucial, given the variation of spatial, radiometric, spectral, and temporal resolutions available [9]. For these studies, the use of Sentinel-2 images and a free processing platform lack information about the advantages and disadvantages between different landscapes, classification methods, and biomass estimation models.

In this way, this chapter is organized as follows: Section 2 present the satellites image classification and the potential of the Sentinel-2 satellites; Section 3 describes forest LCLU maps and the assessment of accuracy; Section 4 presents the GEE platform, advantages, and disadvantages; Section 5 describes the estimative of biomass via remote sensing; and Section 6 concludes with a platform performance overview and an outlook for the future.

2. The potential of image classification and the Sentinel-2 satellite

The recognition of different LCLU by remote sensing is a key parameter in the application and assessment of socioeconomic and environmental changes at local, regional, and global scales. The accurate and reliable LCLU mapping, represented by a thematic map, can be obtained through the satellite images classification [10]. In studies with a focus on forest resources, forest classification provides useful information in various decision-making processes for forest planning and management [11]. On a map, this classification becomes essential for the implementation of monitoring studies of natural and/or artificial forests, offering support in the assessment of forest protection, ecology, and quantification of carbon and biomass at different scales. In this way, data from several sensors that operate in the optical range of the electromagnetic spectrum enabled important advances in the methods of mapping and monitoring different vegetation covers.

Since the launch of the first terrestrial resource satellite in the 1950s, the analysis of vegetation data via remote sensing has been improved with advances in technology. Currently, images obtained, for example, by Landsat, SPOT, MODIS, AVHRR, ASTER, CBERS QuickBird, IKONOS, WorldView, RapidEye, Radar, LiDAR, ALOS PALSAR, and Sentinel, can produce thematic maps. In the vegetation classification process, the selection of satellite images depends on factors such as the objective of the study, availability of images, cost, level of diversity in the types of cultures, and extension of the study area [12]. In general, radar data are used to model the vertical structure of a forest and data from the multispectral optical sensors are the most used in the literature to model the horizontal structure of vegetation [13]. Multispectral sensors are capable of capturing vegetation characteristics, such as species composition, canopy cover, growth stage, and health of some forest stands [13]. Thus, different spatial, spectral, and temporal parameters that allow the extraction of robust, consistent, and comparable long-term data series must be taken into account due to the better cost-benefit ratio [14, 15]. In summary, some of the sensors used for monitoring vegetation were listed in Table 1. More information about the costs for satellite imagery can be found inhttp://www.landinfo.com. The sensors with medium and high spatial resolution differ in terms of the number of bands, temporal resolution, scale, and costs. Depending on the classification objective, the scale is a factor to consider, because, when choosing a sensor with a high spatial resolution (<5 m), the cost and complexity of the classification can increase [16]. In addition, with a larger set of data with spectral variability for the same class, the training time can affect the computational cost [17]. Thus, the spatial resolution must be considered as an important factor, because it must be adequate to the size of the object to be identified [16]. With images from S2, it was observed that the fragmented elements of the landscape decreased the accuracy of the classification by using the spatial resolution of 10 m, due to increased bias resulting from the total composition of pixels at the fragment edges [11].

Sensor groupSatellite/sensorNo of bandsSpatial resolution (m)Temporal resolution (day)Scale of applicationaData distribution policy (cost)Data availability
Medium spatial resolution sensorSPOT4–52.5–2026L-RYes1986
Aster1415–9016L-GNo1999
IRS-P6-LISS III423.524L-GNo2003
CBERS-445–2052–26L-RNo2014
Landsat-81115–10016L-GNo2013
Sentinel-21310–605L-RNo2015
High spatial resolution sensorIKONOS51–41.5–3L-RYes1999
QuickBird50.61–2.242.7LYes2001
WorldView4–170.31–2.401–4LYes2007
GeoEye40.46–1.652.6LYes2008
RapidEye551–5.5L-RYes2012

Table 1.

Main characteristics of the remote sensors used for LCLU classification: spectral (number of bands), spatial and temporal resolution, data cost, the scale of application, and data availability.

L-G: local to global; L-R: local to regional; L: local.


In multispectral images, vegetation analysis can be obtained by reflectance information resulting from different wavelength bands [visible (VIS), near-infrared (NIR), and short-wave infrared (SWIR)]. The reflectance of the light spectra emits different signals according to the biophysical variables of the vegetation [18]. Among the wavelengths, the emissivity (equivalent to the absorption capacity in the thermal wave range) most used in the analysis of vegetation is in the near and middle regions of the infrared [19]. Various combinations of bands have been studied over decades to derive the biophysical variables of vegetation, resulting in a range of vegetation indices (VIs). In the LCLU classification, VIs are used to extract quantitative information from the contrast of intrinsic characteristics of spectral reflectance of vegetation [20]. VIs are useful to characterize the vigor of vegetation, pigments, sugar content and carbohydrates, high plant temperature levels, and abiotic/biotic stress levels, among others [21]. Thus, VIs have become commonly applied because of computationally simple analysis of vegetation on global, regional, or local scales [20].

The forest classification is a complex task, and searching for information using individual bands can result in low accuracy models. The model input data can be represented by spectral bands combined with auxiliary bands (vegetation indices), the use of image transformation algorithms, such as principal component analysis (PCA), and image textures [13]. The combination between them helps to distinguish complex vegetation arrangements, improving thematic area estimates [13]. The vegetation indices are mathematical functions that combine two or more spectral bands [22]. The use of vegetation indices is widely discussed in the literature because they improve the classification accuracy. The VIs can reduce atmospheric interference and better distinguish the vegetation characteristics such as plant vigor, water content, sensitivity to lignin, and forest above-ground biomass (alive or dead) [23, 24]. The normalized difference vegetation index (NDVI) has been used for many years in classification studies, due to its high correlation with photosynthetically active vegetation and sensitivity to discriminate between vegetation, nonvegetation, wet, and dry areas [5, 21, 23, 24, 25].

The texture data are also important in classification studies, as it contains information of groups of pixels with similar intensity properties describing the distribution and spatial arrangement of repetitions of tones. It has been used in the quantification of the variability of pixels in a neighborhood [26]. The texture information explores information such as crown-internal shadows, size and crown shape, in coarser scales [18]. Several studies point out that the integration of texture measures increase the classification accuracy by improving the separability between classes through reducing the spectral confusion effects between spectrally similar classes [10, 27]. The incorporation of texture metrics can improve classification accuracy by up to 10 to 15% [18]. Textural information can be derived from various methods that use multiple functions of image bands in different window sizes [21]. The methods can be categorized into four main groups being based on structure, statistics, model, and transformation. As an example, we can mention the local statistics (that describe the moments of a neighborhood of individual pixels in a region of the image) and the measures of the gray level co-occurrence matrix (GLCM) (statistics set that characterize the distance and the angular relationships between pixels) [28]. In the LCLU classification, second-order GLCM texture measures have been the most used textural characteristics for characterizing the relative frequencies between the brightness values of two pixels connected by a spatial relationship [29]. However, studies that incorporate the combination of vegetation indices with image texture measurements may contribute more to improvements in classification accuracy than when they are used separately [18].

There are different classification approaches; a class can be represented based on classifiers pixel by pixel [30], per object [30], and per sub-pixel [31]. In the pixel-by-pixel classification, only spectral information of each pixel is used separately, to represent a certain class [30]. In object classification, the image segmentation is performed based on a group of pixels with similar properties, in which the algorithm examines the texture and the spectral response of the object as a basic unit, instead of individualized pixels [19]. The sub-pixel mode consists of identifying within a pixel the proportion of each type of LCLU, modifying a classification in a resolution with clear limits, without assigning mixed pixels to a dominant class [31].

In the LCLU mapping, the right choice of a classification method is essential to obtain good accuracy of representing the thematic map in an image [9]. Traditionally, the classification methods used can be supervised or unsupervised classification [32]. In particular, in supervised classification, it can be applied parametric (e.g., linear and multiple) and nonparametric [e.g., the nearest neighbor K (K-NN), artificial neural network (RNA), random forest (RF), decision trees (DT), and support vector machines (SVM)] models [13, 33]. In the parametric models, the results are restricted, for example, to unimodal data, by assuming a relationship between the dependent and independent variables with an explicit model structure [13]. However, the nonparametric supervised classification has been the most used method in classification studies based on remote sensing data due to the more robust approach [33]. Thus, the supervised classification based on machine learning achieves better overall accuracy, due to the ability to deal with heterogeneous images, such as forest scenery [34, 35, 36]. Based on the training samples defined by the user, the algorithm searches for pixels belonging to each class [30]. The classifiers learn from the database and are efficient in identifying the nonlinearity of the data, noise in the samples, and can use less computation time [34].

In special, the Sentinel-2A (S2) launched in 2015, result from the Earth observation mission developed by the European Space Agency (ESA) as part of the Copernicus program, has been designed to support global monitoring for environment and security (GMES) [37]. The Sentinel-2 wide swath high-resolution Multi-Spectral Instrument system (MSI) aims to obtain information on the monitoring and management of terrestrial surface and provides continuity of the SPOT and Landsat missions [37]. With the wide coverage (swath width of 290 km) and minimum 5-day global revisit time (with twin satellites in orbit), the sensor becomes an extremely useful monitoring product for studies such as LCLU changes and environmental impacts [23]. The main level 2A output allows the use of orthoimage with corrected reflectance at the bottom-of-atmosphere (BOA) [38]. The MSI from optical sensor has 13 spectral bands, with three spatial resolutions, four bands with 10 m, including visible and near-infrared (NIR), six bands with 20 m, including four red-edge bands, two in the SWIR region, and three bands with 60 m (Figure 1) [39, 40].

Figure 1.

Multispectral bands of the Sentinel-2 in 10, 20, and 60 m spatial resolution and their wavelengths.

The multispectral S2 data have several advantages over monitoring satellites available. In particular, Landsat and Sentinel-2 data are now frequently used in monitoring and management studies, to meet the demand for attributes scale data with easy access [16]. The S2 has red-edge spectral bands of longer wavelength range, essential in vegetation analysis [41]. In addition, the spatial resolution of this satellite can be compared to commercial SPOT or RapidEye systems, surpassing the spatial resolution of the MODIS 250 m [41]. The Sentinel-2A satellite imagery can be acquired for free at the Sentinel Hub (https://scihub.copernicus.eu/). For the image processing, the ESA developed the Sentinel Application Platform (SNAP) software. It is a tool available to analyze satellite information for free, specialized for the Sentinel series, has a performance comparable to that attained with other software’s (e.g., QGIS or ENVI) [42], and has a discussion forum for consultations [43]. For these reasons, it seems to be justified to explore the potential of Sentinal-2 data for studies in forest monitoring [24] such as forest succession [2] and wildfire control [44], forest classification [11], forest management [45], and biomass estimation [46].

3. Maps of forest LCLU and accuracy assessment

In image classification, information on the spectral resolution of the sensors is also important to provide data on vegetation reflectance [25]. This is possible because each species emits an intrinsic electromagnetic wave according to chemical contents and morphology such as leaf structure, water content, pigments, carbohydrate and aromatic content, and proteins, among other factors [21]. In a light spectrum, remote vegetation detection is based on the ultraviolet, visible near, and medium infrared regions [47]. In the species, identification by remote sensing data requires attention, because there is spectral variability within and between healthy tree species, which can lead to an incorrect classification and hinder statistical assessments [48]. The spectrum can change, for example, with the differences in illumination, shadow effects, and during the seasons and growth periods of trees [48].

Generally, in the LCLU classification, the forest classes are considered more heterogeneous than the water surface, the exposed soil, urban and agricultural classes. The complexity of a forest canopy comes from a surface full of lighting and shading fluctuations [49]. The variability between forest classes can directly change the spectral response reflected in the images influenced by several parameters such as age, season, defoliation, the density (e.g., number of trees and basal area), canopy cover, and understory [49, 50]. In remote sensing, several studies show that the gain in differentiation between species increases with the wavelength, showing a greater correction with the SWIR region [11, 46, 51]. As an example, we can mention the spectral differentiation between conifer needles and broad (flat) leaves. In general, the biggest difference between them is in transmittance and reflectance in the infrared region [51]. In relation to broad leaves, conifer needles have less transmittance at all wavelengths (greater absorption) and low reflectance in the SWIR region [1116, 37, 42, 43]. This effect is due to the greater sensitivity to water absorption which in conifers is deeper [51], explained by the difference in anatomical structure, biochemical composition, and thickness of the leaves, for example [11, 23, 39]. According to some studies, obtaining accurate thematic maps of the Mediterranean ecosystems multispectral bands of the Sentinel-2 in 10, 20, and 60 m spatial resolution and their wavelengths is a challenge, due to its complexity derived from the variability of tree density per unit area and similar spectral behavior [52, 53]. In the montado system, where there is a higher occurrence of holm oak and cork oak, in pure and mixed stands, with high spatial variability in tree densities, the effect of mixed pixels can decrease the accuracy of a classification depending on the spatial scale and resolution [54, 55, 56]. When using Landsat-8 images, the montado class can be classified as an agricultural class and vice versa, because some montado areas with 10 to 30% tree density can be masked by the high reflectance of the bare soil [57]. On the other hand, the montado areas with a tree density above 50% can be confused with the olive grove classes due to spectral proximity [57]. When using Sentinel-2 images for canopy estimation of the montado system, it was observed that the use of red-edge bands, integrated in seven vegetation indices, was essential and sensitive in the vegetation properties evaluation during the summer, improving the tree crown coverage estimates [56]. The classification of seven forest species in Germany also achieved improvements in accuracy by including the SWIR bands (B11 and B12), one on the red border (B5) and two on the visible border (B2 and B4), using Sentinel-2 images and the RF algorithm [49]. Thus, one of the ways to capture the spectral confusion that can interfere with the results of a thematic map is through the evaluation of the classification model’s accuracy.

In forest classification studies, the assessment of accuracy is necessary to validate a classified image because the estimates or forecasts can present errors and uncertainties [58]. The most common way to express the accuracy map is by comparing the final product area percentage with the reference data of an image. This statement is derived from one of the most used approaches to define the accuracy of the classified data, expressed in the form of the confusion matrix and/or contingency matrix [36]. Through this analysis, it is possible to generate information on overall accuracy, user’s and producer’s accuracy, and Kappa coefficient (k) [24, 59, 60]. The confusion matrix is a cross-tabulation, which correlates the LCLU classes in rows and columns, with reference or test data (usually represented by columns) being compared with classified or training data (usually represented by lines) [58]. In the matrix, the diagonal values indicate the number of pixels in the agreement between two class sets [61].

The overall accuracy of the thematic maps is defined by the division of number of the pixels classified correctly by the total number of samples. However, with the advances in remote sensing technology, using a computational classification in the landscape complexity, the classification standardization must be analyzed with caution. The overall accuracy is not always representative of the individual class’s accuracy, that is, the high overall accuracy map does not guarantee high accuracy for the individual classes [62]. When assessing the accuracy of classification, the overall accuracy assumes a minimum value and, in each class, a value with comparable precision [63]. Therefore, to obtain more detailed accuracy information, the individual class accuracy of LCLU can also be derived from the confusion matrix. Through matrix analysis, the user’s and producer’s accuracy points out the errors of omission and commission, respectively. In the producer’s accuracy, the user has the opportunity to evaluate the number of the pixels correctly classified in the interest class. The omitted pixels that have not been classified outside the interest class are called omission errors [58]. On the other hand, the user’s accuracy indicates the percentage of pixels in a classified image that really represents the class on the ground, certifying the classification, in this case measuring commission errors [58, 64, 65]. The Kappa coefficient is used to describe the possibilities of casual agreement between predicted values and field data [66]. The K values varied from 0 to 1, where the closer to 1, the better the perfect agreement between the ground truth and the classified image [66].

In the analysis of thematic mapping, it is vital that accuracy is the best possible for each particular case, avoiding interpretation errors [67]. However, there is still no record of a preestablished minimum limit in terms of accuracy, because the reliability of a map may vary depending on the study application [65, 68, 69]. Anderson et al. [70] mentioned that an accuracy of 85% in LCLU classification maps is frequently accepted. When using MODIS images with a spatial resolution of 1 km in classification studies, Thomlinson et al. [28] defined a minimum value of 85% for overall accuracy and 70% for the classes, due to the difference in errors between spatial and thematic accuracy. The United Nations Framework Convention on Climate Change (UNFCCC) does not provide any limit to the accuracy of the data for the construction of forest reference levels [65]. Despite attempts, defining this value as a goal may be inappropriate and not represent reality in front of the challenges of the distinction of heterogeneous classes at different spatial resolutions and scales [69]. Furthermore, a classification study may be more interested in analyzing the accuracy of a particular class and disregarding others [71].

In fact, there are many studies with different classifiers that can obtain results with good accuracy [71]. Given the complexity of image classification, there are a variety of classification methods using classifying algorithms, with different approaches to achieve mapping accuracy [36]. With an emphasis on supervised classification, the machine learning algorithms exploited image data by finding hidden relationships between various input variables (parameters) with output variables (classification results) [72]. Because it is computer programming, training samples are essential to identify different spectral areas to represent a class in the image. It is based on previous samples or experiences that machine learning has the ability to generalize, that is, when trained they are able to produce solutions to an unknown data set [72]. The combination of different classification techniques has been investigated with support vector machines (SVM) [73], random forest (RF) [74], and classification and regression tree (CART) [75]. Despite the use of several methods to evaluate the performance of the algorithms, evaluating the classification accuracy is the most common one [9]. This assessment is necessary because the classifier’s performance is affected, not only by the classifier’s own limitation but also by the image quality, sample size, and computational resource [9, 76]. Table 2 shows the overall accuracy obtained by different algorithms in some recent studies with forest classification, using Sentinel-2 images. These results aim to provide information on the current conditions of vegetation cover. In addition, it shows that the relevance of accuracy depends on the objective of the study, without being based on standard accuracy [69]. However, it was observed that classifiers have an important role in supervised classification. When using improved classification methods, such as the use of convolutional neural networks, the overall accuracy is generally higher in relation to the other methods that presented medium or fluctuating precision [36].

AlgorithmsOverall accuracy (%)Example of useReference
Artificial neural network (ANN)65.7Mapping of the forest vertical structure in Gong-ju, Korea.[67]
85.0aDiscriminating urban forest types in Xuzhou, East China.[60]
Convolutional neural network (CNN)90.4Forest vegetation types in Jilin Province, China.[77]
97.7Classification forest in Semarang, Central Java, Indonesia[78]
Random forest (RF)90.9|93.2bForest classification in ecosystems in Germany and South Africa.[79]
88.9Mapping of 11 forest classes in the Belgian Ardenne ecoregion, Ardenne, Belgium.[11]
Support vector machine (SVM)80.0Mapping of the invasive species American bramble (Rubus cuneifolius) in KwaZulu-Natal province of South Africa.[80]
RF|SVM84.2|81.8Mapping of the crop, including high and low density forest classes in the foothill of Himalaya.[81]
RF|K-NN|SVM94.44|95.29|94.13Classification of six types of land use, including forest cover in the Red River Delta in Vietnam.[82]
80.0|74.3|80.3Classification land use, including forest cover in the Dak Nong province, Vietnam.[69]

Table 2.

The overall accuracy of LCLU and forest classification with Sentinel data.

ANN + vegetation abundance (VA).


Sentinel 1 and Sentinel 2.


The studies based on forest species classification involving Sentinel-2 images are still recent [23, 33, 83, 84, 85, 86]. The classification with S2 images of forest tree species in southwestern France obtained an overall accuracy above 90%, for plantations such as aspen (Populus tremula) and red oak (Quercus rubra). However, the species that presented more spectral confusion with an accuracy of 81 and 74% were the black pine (Pinus nigra) and Douglas fir (Pseudotsuga menziesii), respectively [87]. In tests with S2 images and the RF for forest type mapping in the Mediterranean, Italy, using four vegetation indices (NDVI, SRI, RENDVI, and ARI1) in three phenological periods (winter, spring, and summer), Puletti et al. [88] reported that the forest categories (pure coniferous forests, broadleaf forests, and mixed forests), had an overall accuracy of 86.2% and a Kappa coefficient of 0.86. The user’s and producer’s accuracy were above 83% for all the classes. In a study, by Duan et al. [89], they obtained overall, producer’s and user’s accuracy of 92.3, 92.3, and 92.2%, respectively, when mapping the distribution of urban forests in China, using eight bands and three vegetation indices (NDVI, NDWI, and NDBI) with S2 images, random forest algorithms, and the GEE platform. Thus, the spatial distribution of species and their number of trees per hectare can be understood by the classification, becoming essential in studies with different types of landscapes. When making decisions about forest resources, this information is very important, because each specific study offers support to the formulation of local and global public policies, as well as in forest planning and management up to biomass estimates.

4. Google Earth Engine (GEE) platform: advantages and disadvantages in the LCLU classification

In GEE, there is a computing platform that has a cloud infrastructure designed by Google, from the launch of the public data catalog of the Landsat 2008 series images [89]. The platform allows to manage, analyze, and store large volumes of geospatial historical data on a planetary scale, which can be applied for several scientific studies [89]. In the GEE, it is possible to have access to data catalog of the Landsat, MODIS, and Sentinel satellites [90]. In addition, the platform offers social, demographic, climatic, and digital elevation models and allowing the interaction remote sensing data with algorithms in synergy with the field data, using Java-Script or Python code [90]. Another cloud resource available in GEE is the Fusion Table, which offers support for tabular data, keyword-based mechanisms, and text data, among others [91].

Although the GEE platform is a free access tool, the LCLU mapping in large extensions and with high spatial resolution can be challenging for several reasons. The inclusion of large volumes of data from a complex and heterogeneous landscape requires a large computational load and processing time [92]. However, freely accessible data can be used in studies with limited funding. For example, the use of Sentinel-2 data is more recommended than the Landsat images, because it provides a slightly higher spatial and spectral resolution [5]. The classification methods can also be applied on the platform according to the image type, segmentation method, classification algorithm, training sample sets, input resources, target classes, and accuracy assessment [36]. All classification procedures can be developed on the cloud computing platform, without downloading remote sensing data, processing on desktops or other software, simplifying information extraction [86]. Therefore, the GEE platform allows the use of high-performance tools for processing a large data set. Thus, GEE is an important resource for big data management and scientific development, because it lowers barriers between the global scientific community and allows the same opportunities to share and replicate geospatial analysis [89].

With millions of servers worldwide, the platform allows the combination of different algorithms and data with free availability for noncommercial use [93]. In the image processing phase, the GEE platform can synchronize all S2 data, can easily clear cloudy pixels [86], and perform a continuous workflow of complex remote sensing [94]. According to the objective of the study, it is possible to choose the specific period and create image mosaics, with the best cloudless pixel for a specific region, solve terrain effects problems, and identify any changes in LCLU in the world through classified images [95]. Also, it can use several vegetation indices at the same time when image classification is performed [96].

Although it is of easy access, users are generally not familiar with the client-server programming model. The GEE libraries offer a more familiar programming environment, but the user must have some basic knowledge of the programming language [95]. This requires much effort from the end user to be implemented [94]. Fortunately, there are learning platforms and discussion forums on GEE on the Internet, which help to solve most doubts and programming errors. However, difficulties have been encountered in studies of LCLU classification. Difficulties in validating the classification model carried out in the GEE were found by Zurqani et al. [96], as there was little availability of high-resolution aerial images in the world, to serve as a reference. The difficulties in image preprocessing were also reported, due to difficulties in the acquisition of parameters with the atmospheric correction [89], and limited availability of other algorithms, not allowing improvements in the classification accuracy [97].

The GEE memory defines a threshold limit to the size of the matrices, which constrains the training of the classification algorithms, with a large number of training samples and evaluation of the input bands [98]. The limited processing capacity can cause errors in complex computational analyses as in large spatial areas [95]. On the other hand, GEE processing can be affected by the very different preprocessing criteria, which makes analysis and comparison with other sensors difficult and makes GEE unsuitable for some types of processing [98, 99].

The approach to programming through the cloud platform is becoming increasingly common for large-scale computing and in multidisciplinary studies. In this way, the offer of high-resolution satellite images in the GEE can solve the problems of detecting global changes in LCLU, environmental monitoring, and help in the quantitative and qualitative identification of forest cover [24]. In particular, the Sentinel 1 (radar data) and Sentinel 2 data, have great potential for future studies of classification and estimates of above-ground biomass, through GEE. Until the present moment, few studies have tested the cloud platform capabilities with Sentinel data. This gap opens the opportunity to generate research and development in monitoring forest cover [100]. In future missions, the Sentinel data will be essential for studies to validate forest AGB estimates [100]. Free access to both resources can generate valuable information on forest cover, located in low-, medium-, and high-income countries.

5. Biomass estimation

Above-ground biomass (AGB) of forests plays a key role in the global carbon cycle, maintaining the climate and as bioenergy reservoirs [101]. In a global context, forest AGB or phytomass is generally defined by the quantification of stems, branches, and leaves. The estimation of the amount of the carbon mass or energy potential per unit area is also frequent [102]. In global discussions on climate change, the estimation of forest AGB has been one of the agenda items in public and private decision-making, integrating into sustainable development projects on a local to global scale [91].

In forest biomass estimates, data can be acquired using direct (destructive) methods based on harvesting the all tree and indirect (nondestructive) based on field inventory and synergy between remote sensing data [103]. The direct methods are considered to have the best accuracy, due to the dry weight determination of the parts of the tree [101]. However, when it is intended to estimate biomass, this method is time-consuming, costly and laborious, and inadequate for using in large geographic areas [13, 42]. In addition, the application of destructive methods causes disturbances in the fauna and flora with alterations in the microclimate and habitat [104]. Thus, one of the alternatives to reduce these impacts is through the use of remote sensing technology [105]. In addition to being a nondestructive method, it is based on forest inventory data, allowing the fitting of models in synergy with small-, medium-, and large-scale satellite images [91, 106].

The thematic mapping based on a forest classification is fundamental as input parameters for the area and biomass estimation [107]. It is through the accuracy of a thematic map that the user can assess the consistency of the overall reliability of the map data and accuracy measures for LCLU classes [107]. In biomass estimates by remote sensing, the type of sensor, different spatial and temporal resolutions, scale, field data, errors, and uncertainty are factors that hinder the statistical evaluation of the final product on the map. In the biomass estimate, several studies indicate errors that can range from 5 to 30% [13]. The forest planning and management decision scale can also influence the accuracy of forest biomass analysis. According to Lu et al. [13], it is recommended that in forest research, accuracy reaches values greater than 90% for a regional scale and 80% for a national or global scale.

Regardless of the estimation method used, it is fundamentally a thorough evaluation of the reference data and map data [108]. The agreement presented by the error matrix may not be equivalent to the product of the map and the reality, which impacts the biomass estimates [109]. Therefore, the classification validation analysis becomes essential because, depending on the classification methodology in a forest ecosystem, it is possible to observe errors and uncertainties that affected the classifier’s performance [36]. In studies of the representation of the heterogeneity of a forest, the erroneous choice of spatial resolution can cause the addition of redundant data and can increase the noise of statistical models, without adding important information about the stands [110]. Even when an AGB forest map achieves high accuracy with the integration of multisource remote sensing data, some limitations and uncertainty can impact the results between LCLU and remote sensing data [42]. One of the limitations is found in the uncertainties about ground measurements with GPS, which can contain geolocation errors, difficult to eliminate in the image analysis processing [42]. In addition, some uncertainties in the prediction model can be included with the time gap between field data and remote sensing data [42, 111]. In this way, any gain in accuracy in biomass maps comes from advances in technology. The diversity of image processing software, greater processing capacity, and computer storage allowed the synergistic use of cloud platforms and machine learning use to deal with big data problems.

Revised studies for the Mediterranean region [106] found that when the estimates are based on passive sensor (optical data) are less accurate (R2 ≈ 0.70) than those carried by active sensors (R2 ≥ 0.80). One of the factors can be related to the dimension of images. While the optical sensors are based mainly on two-dimensional view—2D, creating estimates with the top canopy layer, the active sensors, such as synthetic aperture radar (SAR) and LiDAR, reach the third dimension of the forest with the evaluation of the arboreal and understory, presenting a better data correlation with the biomass [106]. Among some studies, it was found that the spatial resolution has a key role in the accuracy of the biomass. The spatial resolution of Landsat can achieve a low accuracy [112], while the SPOT satellites [113], GeoEye [114], and Quickbird [55] can achieve medium to high accuracy, WorldView [115] and spatial resolution of LiDAR can achieve high accuracy [116]. Overall, the more accurate the modeling, the greater the approximation to the observed values [117]. Accurate models with optical data can be achieved using the high spatial resolution (<10 m), where the pixel size approximates the size of the study object [118]. In particular, using Sentinel-2 data, it has been reported medium accuracy in local and regional scale studies [119, 120]. Thus, the production of AGB maps, with Sentinel-2 data, has a great potential to expand to forest management and monitoring decisions on a regional scale.

In quantifying the biomass stock with Sentinel data, the cross-use of sensor data combined with forest inventory data and algorithms were fundamental for gains in the accuracy of the estimates. A study by Castillo et al. [43] showed improvements in the accuracy of biomass models with Sentinel-1 (S1) and S2, which were comparable to the image accuracy of current commercial sensors. In biomass estimates, precision can also differ depending on the order of importance of spectral bands and textures, according to the input characteristics in the model [121]. When analyzing Landsat-8 satellite images with the inclusion of S2 images, the importance of SWIR region reflectance to improve the estimation of forest parameters was observed [122]. In study using the GEE platform, it was identified that the greatest contribution of the variables in the estimates of forest AGB were those composed by the SWIR and red-edge bands [119]. In addition, Hu et al. [123] reported that the red-edge band (B5) of S2 was the best in the prediction with good performance of the RF algorithm on the GEE platform. In particular, the RF algorithm showed good accuracy in AGB estimates in subtropical deciduous forests, as it was robust with the nonlinearity of the data [124]. Due to the large reported of the random forest algorithm in biomass estimation studies, Table 3 summarizes the results of some recent studies that used Sentinel (S1 and S2) data to estimate the forest AGB in different scenarios. It was observed that the machine learning algorithm was used in different climatic conditions and in time series with considerable accuracy.

Satellite dataLocationForest settingsModel performanceReference
R2RMSE (Mg. ha−1)
S2Evros prefecture, Rhodopes mountain range, GreeceMediterranean forest0.6363.11a[122]
S2Parsa National Park, NepalCentral-southern part of Nepal, subtropical climate0.8125.32[124]
S2Parque Nacional Yok Don, VietnamTropical monsoon climate0.8136.67[119]
S2Parsa National Park, NepalCentral-southern part of Nepal, subtropical climate0.994.51[125]
S2Hunan Province, southern ChinaSubtropical monsoon climate0.5865.03a[123]
S1 and S2Island province of Palawan, PhilippinesSouthern coast of Honda Bay within the administrative jurisdiction of Puerto Princesa City, tropical climate0.7533.81[43]
S1 and S2Ecoregion of Changbai Mountains mixed forests and eastern mountainous region of Jilin Province in northeast ChinaMonsoon-influenced humid continental climate0.9733.29[76]

Table 3.

Studies on forest biomass estimation using algorithms RF in different ecological settings.

m3 ha−1 (growing stock volume).


Although advances in image resolution and use of radar data have been increasingly available, the storing of a large amount of data and with a complex data structure is one of the major challenges of research with the geospatial data [126]. With the advances in technology, cloud computing has helped researchers to solve big data problems, which reduces the costs of accessing software and maintaining hardware [93]. However, to date, studies with cloud platforms such as GEE for forest biomass estimates have been little explored. For forest AGB, cloud computing holds promise in solving problems related to big data [126]. The platform’s advantages are related to the data storage and analysis process of an intensive nature and based on complex point structures, such as LIDAR data [91]. The GEE platform is being used as rasterized data management in parallel with other software such as packages in the R [125] and cloud computing applications such as Fusion Tables and Google Cloud Platform [91]. In the near future, it is hoped that it will be possible to integrate data such as high-precision LiDAR with a collection of optical images for mapping global biomass [91].

In remote sensing, the promising prospects for biomass estimates can take new directions thanks to the ESA’s Earth Explorer mission Biomass. The mission scheduled for this decade aims to provide global maps of biomass and carbon stored in the world’s forests [126]. As a novelty, the Biomass will have the first P-band synthetic aperture radar capable of carrying out the precise mapping of biomass estimates [126]. In addition, it will have an experimental tomograph to provide 3D views of the forests [126]. In this way, we hope that in the future, it will be possible to carry out studies with crossing Sentinel 2 data with Biomass to compose thematic maps of LCLU changes and biomass stock with greater accuracy.

6. Conclusions and outlook

The development of LCLU maps for biomass estimates with Sentinel images is still recent, but promising. In studies of LCLU classification, the overall accuracy and producer’s and user’s accuracy should be the highest possible for better support of biomass prediction models. For the refinement of the classification map, the accuracy can be improved with the combination of radar and optical data from Sentinel 1 and 2, respectively, as well as incorporating models and algorithms, vegetation indices, textures, biophysical variables, and forest inventory data, among others. However, it is noteworthy that the accuracy of the biomass prediction models obtained by remote sensing still depends on the precision of the field-based measurements.

The GEE platform allows the use of free data, combined with the remote data and machine learning for building maps, in addition, to assess the use of land occupation on a large scale. The inclusion of samples collected in the field in the Google Fusion Table and multitemporal data, such as high-resolution images can improve the classification results.

Based on the limitations of the GEE platform, we hope that the new updates of the platform can solve problems such as memory space and the inclusion of other ranking algorithms. The improvements make the platform more accessible and attract new users to assess changes in LCLU and analysis of vegetation cover monitoring. In future works, the LCLU classification approach will be based on plots using the pixel method connected to the GEE and field validation, which facilitates the detection and understanding of the dynamics of forest areas in terms of volumetric and gravimetric production of biomass over time.

To our knowledge, studies that address the accuracy of a forest classification combined with biomass estimates, using Sentinel images on cloud platforms, have not yet been reported in the literature. Available studies make separate approaches to forest classification and biomass estimates. With Sentinel imagery, these two themes generally use field data and other thematic maps to develop the models. This limitation shows the opportunity to develop pioneering research on forest classification and biomass estimates with Sentinel images on cloud platforms.

Acknowledgments

The work was supported by Programa Operativo de Cooperação Transfronteiriço Espanha-Portugal (POCTEP); project CILIFO – Centro Ibérico para la Investigación y Lucha contra Incendios Forestales and by FCT, Portugal, Fundação para a Ciência e Tecnologia, through IDMEC, under LAETA, project UIDB/05183/2020.

© 2020 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License, which permits use, distribution and reproduction for non-commercial purposes, provided the original is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Crismeire Isbaex and Ana Margarida Coelho (February 10th 2021). The Potential of Sentinel-2 Satellite Images for Land-Cover/Land-Use and Forest Biomass Estimation: A Review, Forest Biomass - From Trees to Energy, Ana Cristina Gonçalves, Adélia Sousa and Isabel Malico, IntechOpen, DOI: 10.5772/intechopen.93363. Available from:

chapter statistics

41total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Biomass Estimation Using Satellite-Based Data

By Patrícia Lourenço

Related Book

First chapter

Lodgepole Pine (Pinus contorta Douglas ex Loudon) from the Perspective of Its Possible Utilization in Conditions of Changing Central European Climate

By Petr Novotný, Martin Fulín, Jiří Čáp and Jaroslav Dostál

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us