Type of sugarcane planting in each crop season and pixels number percentage for each cluster by k-means with DTW.
The remote sensing images are more accessible nowadays and there are proper technologies to receive, distribute, manipulate and process long satellite image time series that can be used to improve traditional methods for harvest monitoring and forecasting. The potential of the satellite multi-temporal images to support research of agricultural monitoring has increased according to improvements in technological development, especially in analysis of large volume of data available for knowledge discovery. In Brazil, sugarcane is cultivated on extensive fields and is the main agriculture crop used to produce ethanol. The main objective of this chapter is to monitor the sugarcane crop by clustering analysis with multi-temporal satellite images having low spatial resolution. A large database of this kind of image and specific software were used to perform the image pre-processing phase, extract time series, apply clustering method and enable the data visualization on several steps during the whole analysis process. According to the analysis done, our methodology allows to identify land areas with similar development patterns, also considering different growing seasons for the crops, covering monthly and annual periods. Results confirm that satellite images of low spatial resolution can indeed be satisfactorily used in agricultural crop monitoring in regional scale.
- time series
With the current challenge to improve the agricultural monitoring, forecast and planning, which are strategic for a country with continental dimensions and great diversity of land uses, the importance of the time series of digital images acquired by low-spatial-resolution satellites (such as the AVHRR/NOAA and MODIS/Terra) to monitor the expansion and production of agricultural crops (such as the sugarcane) in tropical regions (such as the southeastern region of Brazil) that have a huge amount of clouds during the growing season making the operational use of remote sensing data difficult is an essential highlight.
The AVHRR/NOAA is a meteorological remote sensor that has been widely used also as source of spectral information for environmental and agricultural purposes. Since the sugarcane is cultivated on large and extensive fields, medium- and low-spatial-resolution satellites such as the AVHRR/NOAA can be used to properly monitor this agricultural crop. Sugarcane production has expanded in the last years in southeastern Brazil making this agricultural product strategic for its economy and environment since it is the main renewable source of energy used to replace fossil fuels and reduce the emissions of greenhouse gases that cause the global warming.
Remote sensing images have been efficient to evaluate important characteristics of the sugarcane cultivation, providing relevant results to the debate of sustainable ethanol production from sugarcane . The accuracy of the thematic mapping of sugarcane through satellite images was assessed , and a methodology for contributing in the automation of sugarcane mapping over large areas, with time series of remotely sensed imagery , was developed.
In addition, researchers have conducted studies to assess social and economic impacts in sugarcane cultivation , as well as to predict its yield . An alternative masking technique for satellite image time series, called yield-correlation masking, can be used for the development and implementation of regional crop yield forecasting models eliminating the need for a land cover map .
In fact, this agricultural commodity has an increasing economic importance especially due to the increasing demand for ethanol (one of its derivative) used as renewable energy source to replace fossil fuels. Although there is a consensus about the benefits from a temperature increase for the sugarcane production, its expansion to the warmest regions can be negatively impacted whether the water deficit becomes more severe in consequence of climate changing scenarios in those areas. Thus, researchers have been dedicated to more detailed studies regarding expansion and productivity of sugarcane fields to find innovative and optimized methods in order to understand the impact of global warming in this crop production .
Even being more accessible and available nowadays, many users still have difficulties to deal with satellite images due to different and more sophisticated demands as well as the fast-growing quantity and complexity of this kind of data . In this context, knowledge discovery technologies are an important alternative to explore and find relevant information on this huge volume of data. Some initiatives involving data and image mining have been accomplished through different techniques with reasonable results [9–13].
In this context, we focus on computational methods that allow analysis at regional scale with the purpose of improving agricultural crop monitoring and increasing the sustainable usage of the soil, taking into account that climate changes are in course. Even so, we show a clustering-based approach to analyze time series extracted from multi-temporal NDVI images and visualization. The main objective of this chapter is to monitor the sugarcane crop by clustering analysis through multi-temporal satellite images of low spatial resolution.
2. Material and methods
2.1. Study area
The study area is located in São Paulo, an important state of southeastern macro-region of Brazil (54°00′ to 43°30′W and 25°30′ to 19°30′S), which is responsible for 60% of the national production and 25% of the global production of sugarcane (Figure 1).
2.2. Proposed approach
The knowledge discovery process comprehends three main steps: (1) data preparation of satellite image time series, (2) extraction of the NDVI profiles, and (3) clustering analysis. Figure 2 presents a flowchart of the proposed process to assess multi-temporal satellite images.
2.2.1. Satellite image time series (SITS)
The database of multi-temporal NDVI/NOAA/AVHRR images used in this chapter is available at the Centre for Meteorological and Climatic Research Applied to Agriculture (Cepagri) at the University of Campinas (Unicamp), Brazil, having AVHRR/NOAA images recorded since April 1995 with approximately 6 terabytes of data. It was used in the analysis AVHRR/NOAA-16 and AVHRR/NOAA-17 images gathered from April 2001 to March 2010.
It is necessary to preprocess the images, since the AVHRR/NOAA images often have geometric distortions caused by the Earth curvature and rotation, attitude errors and imprecise orbits of the satellite . These distortions must be corrected specially for land applications that require a highly accurate geometric matching, with one pixel accuracy (1.1 km) in the Equidistant Cylindrical Projection. To perform accurate geometric, the maximum cross-correlation (MCC) method is applied. The MCC method compares a target image to a base image (one for each year season), geometrically accurate and cloudless . The first step to be executed corresponds to the image georeferencing process, which is executed in batch mode by the NAVPRO system [16, 17] to accomplish the necessary tasks, such as:
Conversion from a raw to an intermediary format
Identification of pixels classified as cloud
To attenuate the effect of the atmosphere on the images, maximum-value composite (MVC) of NDVI images was generated. Following the recommendations , it is important to mask out the inappropriate pixels, such as cloud-contaminated pixels. The georeferencing module allows users to generate NDVI images for a specific region. As the volume of images is huge, it was used the SatImagExplorer system . This system is interactive and allows the user to specify regions of interest (ROIs), using as input basis a satellite image time series. SatImagExplorer extrapolates the region indication for all images in the sequence, generating time series of the ROIs corresponding to that indicated for all available images. This tool allows the user to focus their analysis on strategic points of interest, as well as facilitates the analysis of a long series of data. Time series extracted from multi-temporal images using SatImagExplorer are one of the data to be mined by the clustering method.
2.2.2. Clustering analysis
The clustering task is defined as a process of grouping similar objects, following a given criterion . In this step, NDVI time series are analyzed by clustering method implemented in the SatImagExplorer system. We have used the partition-based method named k-means.
k-Means divide n objects from the input dataset into k partitions. Initially, the algorithm randomly determines k objects as initial centroids and associates each remaining object to the partition represented by the most similar (closest) centroid. In the end of each iteration, centroids that correspond to the average values of the cluster objects are recalculated to define the new order of n objects in the clusters during the next iteration. The k-means algorithm converges when there are no more changes in the clusters. Although simple and computationally efficient (O(nk)), as k-means considers average values, it is more sensitive to errors when noise and outliers appear in time series .
The k-means method uses a distance function to perform similarity search operations to find the series most similar to a given time series that is being analyzed. A distance function or metric can be defined as a similarity measure between two data elements that are, in this case, two time series. The most widely used distance functions are those from the Minkowski family (or Lp norm). The Euclidean distance corresponds to L2, which is commonly used to calculate the distance between multidimensional arrays and vectors. The dynamic time warping (DTW) is a very efficient distance function to compare time series . Its main objective is to keep close time series that have similar behavior but are delayed or distorted along the time axis. Thus, this technique presents a proper way of working to warping, because the comparisons between corresponding points are not rigid. DTW is a tool with two of the main issues raised by high-temporal-resolution satellite image time series, namely, the irregular sampling in the temporal dimension and the need for comparison of pairs of time series having different numbers of samples .
We will show next the three clustering analyses performed:
First: k-Means used with Euclidean distance, when we considered only monthly NDVI values. These values of sugarcane fields were extracted using geographical coordinates (latitude and longitude) provided by the Canasat/INPE Project (
Second: k-Means used with DTW distance function, when we have generated series of NDVI values corresponding to one or more sugarcane crop series. The clustering was determined by five clusters for each crop season (2001–2010) for annual crop monitoring according to the type of planting in each crop season, for example, sugarcane ratoon, sugarcane expansion, sugarcane renewed, sugarcane under renewing and not defined [13, 24].
Third: k-Means used with DTW distance function of three dimensional (multivariate) time series database, extracted from 324 monthly images of NDVI, albedo and surface temperature. Since DTW calculates the distance between pairs of data points using Euclidean distance, DTW method can be applied to multivariate time series. The whole dataset had 220,238 data series, being each observation a triplet of NDVI, albedo and surface temperature values of study area in a given month, with 108 values per time series .
3. Results and discussion
In this section, we present the results and discuss the three analyses performed in this chapter described above.
3.1. k-Means used with Euclidean distance
In this section we present how results of appliance of k-means clustering with Euclidean distance function over NDVI monthly values extracted from the study area can assist the monitoring of sugarcane fields.
Months from December to May correspond to the period of maximum vegetative growth of sugarcane. In Figure 3J, L and B, pixels that appear in yellow and red colors correspond to the maximum NDVI values, being included in the clusters 3 and 4, respectively. On the other hand, months of August, September and October correspond to harvest season. In these months (Figure 3F), pixels in magenta and blue, with minimum NDVI values, correspond to clusters 0 and 1, respectively. Cluster 2 (green) corresponds to sugarcane intermediate stage of growth.
These clusters can be validated in the MVC NDVI images. The black squares over the satellite images in the left correspond to the main sugarcane planting areas. Analyzing the MVC NDVI images in the northeastern region of São Paulo, the evolution of the sugarcane vegetative growth cycle can be seen (Figure 3). Planting begins in August represented in the images by pixels in shades of green and blue located in the northeastern region of the state. These colors represent low NDVI values (around 0.2) characterizing areas with exposed soil and sparse vegetation. Similar pattern also occurs in the months from September to November. From December, when sugarcane begins to grow up and acquire more biomass, these regions are shades of yellow, orange and red. Months from January to May show shades of dark red, when sugarcane reaches the highest stage of growth with maximum NDVI values (between 0.7 and 0.8). The dark areas in images represent pixels covered by clouds and water.
There is no predominance of one or two clusters in all producing regions if we consider all months of the crop season. As we can observe, both plant and ratoon sugarcane are grown throughout the state, and the five clusters appear in all months. There is a higher percentage of pixels in the clusters with higher NDVI during some months. However, in other months, the largest number of pixels is included in clusters with lower NDVI (Figure 3).
Figure 4 has the temporal profile of clusters showing dynamics of crop planting and harvesting throughout the growing season. Analyzing the temporal profile of Figure 4, we can observe that in months from December to May, the NDVI values are higher and represent a larger percentage of pixels for clusters 2, 3 and 4 (from 20 to 40% of the pixels). For the months from August to November, the NDVI values are lower, representing higher percentages for clusters 0 and 1 (around 30% of the pixels). Each month features a sugarcane planting area at a certain stage of growth, appearing in clusters 0 or 1 (harvested or bare soil) and in clusters 2, 3 and 4 (in growth or ready to be harvested) (Figure 3).
Although the k-means method is simpler and more widely used, their application in satellite image time series of low spatial resolution allows the regional study of crop, even with the difficulty in the analysis due to the possibility of spectral mixing in pixels.
3.2. k-Means was used with DTW distance
Results of the MVC NDVI image time series analysis about the period 2001–2010 for the state of São Paulo are presented hereafter. Maps and temporal profiles correspond to results of clusters (k-means with DTW distance function), pixels with NDVI values from year to year. In general, clusters that were identified as sugarcane may be (i) related to the type of planting carried out each year, for example, identifying areas of sugarcane ratoon (the sugarcane available for harvest after one or more cuts), sugarcane expansion (the sugarcane planted in new areas that will be harvested for the first time), sugarcane renewed (the year-and-half sugarcane plant that has undergone renovation during the previous crop year and will be available for harvest in the current crop year), sugarcane under renewing (the sugarcane area is not harvested due to renovation, not available for that specific crop year) and not defined area, and (ii) related to the quantity produced. Clusters, which were determined by clustering analysis, do not remain constant from year to year as the sugarcane planting is dynamic along the time series.
Thus, applying the k-means clustering analysis, we can verify sugarcane planting type from the years analyzed. Cluster 4 (red) indicates the maximum NDVI values in the month, corresponding to areas with higher biomass. Cluster 0 (magenta) shows the lower NDVI values, corresponding to bare soil. The k-means method showed more homogeneous temporal profiles (Figure 5). Low peaks in NDVI profiles during the months of December and January (Figure 5) match NDVI values related to clouds, because this period of year is the rainy season in the state.
Analyzing every year, we found that each cluster corresponds to different types of sugarcane planting (Table 1). For example, in crop season 2001–2002, 2003–2004, 2006–2007 and 2008–2009, cluster 2 (green; Figure 6A, C, F and H) corresponds to the type of sugarcane ratoon, and this cluster (29–47% of the pixels) is correlated (between R = 0.74 and R = 0.87) with the crop production (Figure 7). In crop seasons 2002–2003 and 2009–2010 (Figure 6B and I), sugarcane ratoon corresponds to cluster 1 (blue), with a correlation of R = 0.84 and R = 0.73 with the production and 36 and 33% of the sugarcane pixels (Figure 7). Crop season 2004–2005 (Figure 6D) corresponds to cluster 3 (yellow), with correlation index R = 0.81 and 32% of the sugarcane pixels (Figure 7). In most crop seasons, sugarcane ratoon is strongly correlated with the sugarcane production. Only in crop seasons 2005–2006 and 2007–2008 (Figure 6E and G), the sugarcane expansion is correlated with crop production.
|0||Expansion 9%||Under renewing 18%||Expansion 7%||Expansion 4%||Renovated 3%||Not defined 14%||Under renewing 12%||Renewed 11%||Renewed 21%|
|1||Renewed 17%||Ratoon 36%||Under renewing 27%||Not defined 17%||Ratoon 21%||Expansion 20%||Not defined 7%||Under renewing 11%||Ratoon 33%|
|2||Ratoon 29%||Expansion 13%||Ratoon 41%||Under renewing 20%||Under renewing 18%||Ratoon 29%||Renewed 21%||Ratoon 47%||Expansion 18%|
|3||Not defined 19%||Renewed 15%||Not defined 13%||Ratoon 32%||Expansion 35%||Under renewing 21%||Expansion 28%||Expansion 22%||Under renewing 19%|
|4||Under renewing 24%||Not defined 15%||Renewed 9%||Renewed 24%||Not defined 20%||Renewed 14%||Ratoon 29%||Not defined 7%||Not defined 6%|
3.3. k-Means was used with DTW distance function of three dimensional (multivariate) time series database
Dataset with more than 220,000 series in the state of São Paulo were clustered into five clusters (0–4) by k-means method with DTW distance function. Each cluster was formed according to the characteristics of NDVI, surface temperature and albedo extracted from AVHRR/NOAA images in the period 2001–2010. The identified areas were cluster 0 (magenta), which corresponds to water; cluster 1 (blue), which to the urban area and areas where the soil is exposed or have low vegetation and pasture; cluster 2 (green), which represents areas of agricultural crops; cluster 3 (yellow), which corresponds to sugarcane; and cluster 4 (red), which represents forest areas (Figure 8A and B).
NDVI was useful to separate vegetation areas from other targets, for example, forests present high values of NDVI during the whole season (have high concentration of vegetation and biomass), and these areas are normally shown by red-colored representative time series, in profile visualization (Figure 9A). On the other hand, albedo variable was useful to separate water areas from other targets, but was not enough to distinguish areas having different levels of vegetation cover (Figure 9B). The water represented by cluster 0 was well clustered, since the NDVI values and especially the albedo values were different from other clusters, as shown in the temporal profile of NDVI (Figure 9A) and albedo (Figure 9B). The albedo and NDVI values are lower (less than 0.1), since there is no presence of vegetation in the water or when there is minimal.
Clustering results for agricultural crops and grassland were less accurate, probably because different crops present similar NDVI values in some phenological phase during vegetative crop cycle, but are useful to separate agricultural from nonagricultural areas, such as water, urban areas and forest. Clustering of these areas was defined mainly by surface temperature, being higher for targets with lower canopy, such as urban areas and exposed soil, and lower for woodland (Figure 9A and C). For example, the forest areas represented by cluster 4, in Figure 8A and B, have high NDVI values (Figure 9A) and lower surface temperature values (Figure 9C), as they are very shady and dense vegetation coverage areas.
However, sugarcane fields were well clustered over the crop seasons because the sugarcane has a typical behavior (long seasonal cycle) than other crops. In Figure 8A and B, it is possible to observe the dynamic of this agricultural crop, represented by cluster 3 (yellow), throughout the decade in which in the crop years 2001–2002 the acreage was low, with higher production, and planted in the northeast area of the state, and in the end of the crop years 2009–2010, there was a significant increase in the planted area toward the western of the state. This technique of clustering in three dimensional (multivariate) time series database was efficient to perform temporal analysis of land use, indicating that this methodology can be used to identify and analyze the dynamics of land use and cover.
This chapter presented a new approach to boost the agricultural monitoring including the expansion of crops to different regions, through techniques of time series mining. We used clustering analysis associated with the Euclidean and the DTW distance functions. We demonstrated that it is possible to take advantage of off-the-shelf computational methods to support agricultural monitoring as well as to automatically determine sugarcane fields’ expansion that is a valuable contribution of this work.
Moreover, we also showed the potential use of time series of satellite images with low spatial resolution in agricultural monitoring although spectral mixtures can occur. The main advantage of this approach is the high temporal resolution, low cost and global coverage of the remote sensing system used (AVHRR/NOAA). The performance analysis of a simple clustering technique based on a time series of satellite images is in providing a further step in the researches on the use of renewable energy sources, such as the sugarcane ethanol. The impact of such approach becomes even stronger, and it increases the need for researching on new ways to reduce greenhouse gas emissions, mainly in the trail of the recent occurrences of extreme events in different locations of the planet.
The authors thank FAPESP/AlcScens and CNPq for funding and Cepagri/Unicamp for the database of remote sensing imagery.