Open access peer-reviewed chapter

Artificial Intelligence Data Science Methodology for Earth Observation

Written By

Corneliu Octavian Dumitru, Gottfried Schwarz, Fabien Castel, Jose Lorenzo and Mihai Datcu

Submitted: 25 March 2019 Reviewed: 16 May 2019 Published: 05 September 2019

DOI: 10.5772/intechopen.86886

Chapter metrics overview

1,455 Chapter Downloads

View Full Metrics


This chapter describes a Copernicus Access Platform Intermediate Layers Small-Scale Demonstrator, which is a general platform for the handling, analysis, and interpretation of Earth observation satellite images, mainly exploiting big data of the European Copernicus Programme by artificial intelligence (AI) methods. From 2020, the platform will be applied at a regional and national level to various use cases such as urban expansion, forest health, and natural disasters. Its workflows allow the selection of satellite images from data archives, the extraction of useful information from the metadata, the generation of descriptors for each individual image, the ingestion of image and descriptor data into a common database, the assignment of semantic content labels to image patches, and the possibility to search and to retrieve similar content-related image patches. The main two components, namely, data mining and data fusion, are detailed and validated. The most important contributions of this chapter are the integration of these two components with a Copernicus platform on top of the European DIAS system, for the purpose of large-scale Earth observation image annotation, and the measurement of the clustering and classification performances of various Copernicus Sentinel and third-party mission data. The average classification accuracy is ranging from 80 to 95% depending on the type of images.


  • Earth observation
  • machine learning
  • data mining
  • Copernicus Programme
  • TerraSAR-X

1. Introduction

Typical shortcomings of current image analysis tools are the lack of content understanding. This becomes apparent with current developments in Earth observation and data analysis [1]. In this chapter, we therefore concentrate on artificial intelligence (AI) applications and our solution strategies as our main objectives in the field of remote sensing, i.e., the acquisition and semantic interpretation of instrument data from remote platforms such as aircraft or satellites observing, for instance, atmospheric phenomena on Earth for weather prediction—or icebergs drifting in arctic waters endangering maritime transport. In particular, we will describe the exploitation of imaging data acquired by Earth-observing satellites and their sensors.

These satellites may either circle about the Earth (mostly on low polar Earth orbits) or be operated from stationary or slowly moving points high above our planet (on so-called geostationary or geosynchronous orbits). Typical examples are Earth-observing and meteorological satellites. All these instruments have been designed with dedicated goals that, as a rule, can only be fulfilled by systematic and interactive data processing and data interpretation on the ground. The processing and data analysis chains are then the main candidates where one can and shall apply modern data science approaches (e.g., machine learning and artificial intelligence) in order to fully exploit the full information content of the sensor data.

In general, we have quite a number of different sensors installed on satellites. These include passive instruments observing the backscattered solar illumination or thermal emissions from the Earth—or active imaging instruments (transmitting and receiving light pulses or radio signals toward and from the target area being observed). For the ease of understanding, we will limit ourselves to optical sensors operating in the visible and infrared spectral ranges and to radar sensors applying synthetic-aperture radar (SAR) concepts [2, 3]. These instruments provide large-scale images with a typical spatial resolution of 1–40 m per pixel. The images can be acquired from spacecraft orbits that cover the Earth completely with well-defined repeat cycles.

After being transmitted to the ground, the image data will have to undergo systematic processing steps. Typically, the processing schemes follow a stepwise approach where for all steps the image data are accompanied by the necessary descriptor data (metadata). The processing chains start with what we call level-0 data consisting of reordered and annotated detector data; level-1 data provide calibrated sensor data, while level-2 data contain data in commonly known physical units preferably on regular spatial or map grids. Then level-3 data are higher-level products such as thematic maps or time series results (obtained by merging or concatenation of several individual images) or similar operations. Finally, users can apply additional interactive processing steps on their own or exploit available software/platform concepts [4].

This principle of ordered value-adding requires well-established techniques for data management, batch processing and databases, local and distributed (cloud) processing, understanding of the information flow, experience with learning principles, knowledge extraction from image and library data, and discovery of image semantics. At present, typical data sources with easy access are publicly available scientific image data provided by the European Copernicus mission with its Sentinel satellites [5, 6] as well as high-resolution remote sensing images [7, 8]. The European Sentinel satellites comprise among others a constellation of SAR imagers (i.e., Sentinel-1A/Sentinel-1B providing typically large radar images, with a ground sampling distance of 20 meters and selectable horizontal and vertical polarizations), and a constellation of optical imagers (i.e., Sentinel-2A/Sentinel-2B delivering typically large multispectral images with 13 different bands and a ground resolution—depending on the bands—of 10–60 m). This space segment of the Copernicus mission is complemented by systematic level-1 and level-2 image data processing on the ground and by support environments that serve as comfortable platforms for further data handling and interpretation covering all aspects of applied data science. These approaches then pave the way for deeper semantic data analysis and understanding as typically required in Earth observation for crop yield predictions, atmospheric research, etc.

The design of Earth observation (EO) missions as constellations of several satellites brings important advantages. However, this is not the case for some of the most popular EO missions. Figure 1 shows typical TerraSAR-X and Copernicus Sentinel overpasses from different orbits and their target areas.

Figure 1.

Satellite overpasses of Sentinel-1A/Sentinel-1B, Sentinel-2A/Sentinel-2B, and TerraSAR-X (on 23th of August 2018 starting at 14:02 UT) [12].

TerraSAR-X flies on a polar Sun-synchronous circular dawn-dusk orbit. This satellite shares its orbit plane with its twin satellite TanDEM-X (keeping a 97.44° orbital phasing difference) and a repeat cycle of 11 days with 167 orbits per cycle. Due to its flexibility, TerraSAR-X can cover any point on Earth within a maximum of 4.5 days and 90% of the Earth’s surface within 2 days [9].

The Sentinel-1 satellites fly on a near-polar, Sun-synchronous orbit, too. The satellite constellations (comprising Sentinel-1A and Sentinel-1B) share the same orbit plane with a 180° orbital phasing difference and a repeat cycle of 6 days with 175 orbits per cycle. Sentinel-1 can cover the equator on 3 days, the Artic on less than 1 day, and Europe, Canada, and shipping routes in 1–3 days [10].

Like the Sentinel-1 constellation, the Sentinel-2 constellations (comprising Sentinel-2A and Sentinel-2B) share the same orbit with a separation of 180°. The repeat cycle is 5 days with 143 orbits per cycle. Sentinel-2 can cover the equator on 5 days under cloud-free conditions and in 2–3 days at mid-latitudes [11].

When selecting data for fusion, we have to constrain ourselves to data acquired as close as possible in time.

These data handling approaches are typical for recent advances in big data scenarios in distributed systems on the web (e.g., with high data volumes and throughput rates, conventional and innovative data processing steps, additional necessary tools and environments, and greater user expectations). In our case, this affects the tasks of image processing (e.g., data fusion), image understanding, and comparisons with physical models. This can also be seen when we look at the evolution of satellite data analysis. While early concepts started with data being transferred to algorithms, current systems often transfer data to archives, and future systems may support more and more distributed systems.

A typical example is the full functionality offered by machine learning tools, while the basic ideas of future data science aspects for Earth observation as seen by the European Space Agency can be found in [13]. In our case, we are interested in applying more theoretical data science, machine learning, and artificial intelligence (for instance, deep learning, powerful classification maps, and prediction results) together with interactive visualization on various information levels. These ideas will be dealt with below for three remote sensing scenarios as detailed in [14]:

  • Urban monitoring (urban growth and sprawl, urban classification, and semantic indicators)

  • Quantitative interpretation of forested areas

  • Disaster monitoring (earthquakes, inundations, mud slides, etc.)

Here traceable products yielding quantitative data about physical phenomena, change maps, and change predictions are among our primary goals. Of course, we have to consider the implementation effort as well as the attainable accuracy of our products. For each scenario dealt with below, the reader should try to understand what the additional value of machine learning, artificial intelligence, and comprehensive use of data science concepts brings about.

The basic terms of machine learning, artificial intelligence, and data science shall be understood in the following sense:

  • We use the term “machine learning” mainly when we talk about learning target category parameters derived from selected images and applying these parameters to other examples. Currently, we see much progress by “deep” techniques (e.g., deep learning [15, 16]). An important point is the selection of reliable reference data for traceable validation and verification of the methods.

  • “Artificial intelligence” describes how machine learning results are exploited for further use. Typically this includes recognizing and being aware of typical situations, making decisions based on the recognized high-level parameters, and predicting future developments. To this end, one can profit from external databases complementing machine learning results.

  • “Data science” covers the entire field of comprehensive data management and tools, machine learning, and artificial intelligence. This includes topics like distributed processing, monitoring of workflows, visualization techniques, and performance monitoring. Even seemingly trivial tasks (e.g., accessing and handling of data) may belong to data science. However, remote sensing still is in urgent need of efficient tools to familiarize the user community with remote sensing opportunities.

When we look at remote sensing in more detail, we currently see many efforts to transform sensor data to physical quantities that can be exploited for quantitative analysis or modeling. If we accomplish this, we can combine measured data with physical models and find quantitative parameters for predictions.

In the following, we describe how we applied these concepts in a research project funded by the European Union [17]; the project’s main objective is to allow the creation of added value from Copernicus data through the provisioning of modeling and analytics tools for data collection, processing, storage, and access that are provided by the Copernicus Data and Information Access Services (DIAS) [18] and creating a data science workflow where sub-images (image chips) are annotated, administered, and validated based on their assigned semantic labels [19].

The chapter is organized in seven main sections. Section 2 explains the CANDELA platform used for prototyping EO applications, while Section 3 describes the characteristics of the data set. Section 4 presents typical examples which a user can obtain when using the platform from Section 2 and the data set from Section 3. Section 5 illustrates the perspectives in EO data science workflows and Section 6 summarizes our conclusions, while Section 7 contains the future work. The chapter ends with acknowledgments and a list of references.


2. The CANDELA platform

CANDELA’s main objective is the creation of additional value from Copernicus data through the provisioning of modeling and analytics tools provided that the tasks of data collection, processing, storage, and access will be carried out by the Copernicus Data Information and Access Service [18]. The corresponding flowchart is presented in Figure 2 and in [17]. In the end, after the integration of all components, CANDELA will be deployed on top of DIAS.

Figure 2.

CANDELA platform [17].

The CANDELA platform [17] allows prototyping of EO applications by applying efficient data retrieval, data mining augmented with machine learning techniques, as well as interoperability in order to fully benefit from the available assets and to add more value to the satellite data. It also helps to interactively detect objects or structures and to classify land cover categories.

The implementation of the platform is putting in place a set of powerful tools in artificial intelligence environments (e.g., with machine learning and deep learning). These tools have as their objectives:

  • To process large volumes of EO data and to perform data analytics

  • To extract the information content from the EO data based on data mining

  • To fuse various EO sensors in order to increase and to complement the information extracted from different sensors

  • To apply deep learning to detect changes in EO data

  • To semantically search and index our EO image catalog

From this list of objectives, we focus on two of them, namely, data mining and data fusion (see Figure 3). Our goal is to simplify data access and to analyze large volumes of EO data without specific knowledge about the processing of EO data and to fuse the outputs for content exploration.

Figure 3.

Block diagram of the CANDELA platform modules [17].

For the development of the data mining component, we started from [20], and we improved the cascaded active learning system of [21] for typical Copernicus Earth observation images. Its implementation, test, and validation aim at automated knowledge extraction and image content interpretation. The results are presented in Section 4.1.

Regarding data fusion, a new sub-component had to be developed within data mining. This new sub-component fuses multispectral and SAR images. There are two types of fusions; one is performed at the feature level and the other one at the semantic level. The results are shown in Section 4.2 for feature-level fusion.


3. Data set description for CANDELA

Our main data sets extracted from different instruments are Earth’s surface images of the European Copernicus Programme (e.g., Sentinel-1 and Sentinel-2). Sentinel-1 is a twin satellite synthetic-aperture radar configuration, while Sentinel-2 is also a twin satellite configuration, each carrying a multispectral imager [22, 23].

There are three reasons why we are selecting and using Sentinel-1 and Sentinel-2 images. Firstly, we can recognize different target area details in overlapping radar and optical images complementing each other with rapid succession. Secondly, individually selectable Sentinel-1 and Sentinel-2 images can be rectified and co-aligned by publicly available toolbox routines offered by ESA allowing a straightforward image comparison or image fusion. Thirdly, all Sentinel instruments are totally openly available to the EO community. Many publications (dedicated conferences [1, 24, 25, 26]) already describe newly discovered Earth’s surface characteristics derived from the individual instruments.

Furthermore, the long-term operations of the Sentinel satellites allow the interpretation of image time series or even the combination of time series data with external supplementary data via additional data mining and data fusion tools [1, 25, 26].

Besides these data sets, we include other third-party EO mission data sets as specified by CANDELA users (e.g., TerraSAR-X and WorldView).

3.1 Sentinel-1 data

The Sentinel-1 mission comprises a constellation of two satellites (launched on April 1, 2014, and on April 25, 2016), operating in C-band for synthetic-aperture radar imaging. SAR has the advantage of operating at wavelengths not impeded by thin cloud cover, or a lack of solar illumination, and can acquire data over a selected area during day- or nighttime under nearly no weather condition restrictions. The repeat period of each satellite is 12 days; that means every 6 days there is an acquisition by one of the two satellites.

The Sentinel-1 characteristics are presented in detail in [22]. From the multitude of parameters/configurations that exist for Sentinel-1, we have selected as examples the following configurations based on data availability, the CANDELA use cases, and our previous experiments: level-1 Ground Range Detected (GRD) products with high resolution (HR) taken routinely in Interferometric Wide (IW) swath mode. These products/data are produced (prior to geo-coding) with a pixel spacing of 10 × 10 m and correspond to about five looks and a resolution (range × azimuth) of 20 × 22 m. They have a nearly uniform signal-to-noise ratio (SNR) and also a stable distributed target ambiguity ratio (DTAR). For these products, the data are provided in dual polarization, VV and VH for land and HH and HV for polar target areas.

3.2 Sentinel-2 data

The Sentinel-2 mission (like Sentinel-1) comprises a constellation of two satellites (launched on June 23, 2015, and on March 7, 2017) able to collect multispectral data and is affected by the weather conditions (e.g., cloud cover). The repeat period of each satellite is 10 days; that means every 5 days there is an acquisition of one of the two satellites, thus providing a high revisit frequency.

Each Sentinel-2 satellite carries a multispectral instrument with 13 spectral channels (in the visible/near-infrared and shortwave infrared spectral range) and with 290 km swath width. The Sentinel-2 characteristics are presented in detail in [23]. This also applies to level-1 data; level-1C of these products are radiometrically and geometrically corrected images with orthorectification and spatial registration on a global reference system with sub-pixel accuracy. Since the product size is very large, each image is divided into several quadrants in UTM WGS84 projection. The average size of a quadrant is 10,980 × 10,980 pixels (rows × columns). For visualization, the RGB bands (B04, B03, and B02) were used to generate a quick-look quadrant image. For feature extraction, the user can choose different band combinations.

3.3 Third-party mission data

From the available third-party mission data sets, we selected for demonstration four pairs of multi-sensor images of TerraSAR-X and WorldView-2 [27].

TerraSAR-X is a German radar satellite launched in June 2007, followed by its TanDEM-X twin in 2010. Both operate in X-band and are side-looking SAR instruments that offer a wide selection of operating modes and product generation options [7]. TerraSAR-X has a revisit cycle of 11 days on the Earth’s equator. We selected high-resolution spotlight mode images because they provide the highest-resolution data of the target areas. As for the product generation options, we took enhanced ellipsoid corrected (EEC) and radiometrically enhanced (RE) data. Finally, we took horizontally polarized (HH) or vertically polarized (VV) images, as this option is most frequently used. The images have a pixel spacing of 1.25 m and a resolution of 2.9 m with WGS-84 map projection. The average size of the images is 8000 rows × 9600 columns.

In contrast, WorldView-2 provides a single panchromatic band and eight multispectral bands. It was launched in October 2009 to become a DigitalGlobe satellite. The revisit period of the satellite is about 3 days on the Earth’s equator [28]. The resolution for the panchromatic band is 0.46 m and for multispectral bands is 1.87 m. The map projection of WorldView-2 is, again, WGS-84, and the size of these images (on average) for panchromatic images is 47,000 × 37,000 pixels (rows × columns) and for multispectral images is 11,000 × 9000 pixels (rows × columns).


4. Typical CANDELA examples

4.1 Data mining by machine learning

In EO data mining, a number of researchers have already developed technologies for semantic image understanding [29, 30]. The available web engines are focused on the everyday needs of a broad category of users [31]. A very popular satellite image data mining system is Tomnod from DigitalGlobe or Google Earth, which is targeting general user topics. Especially for EO, there are systems such as LandEX [32] which is a land cover management system, while GeoIRIS [33] is a system that allows the user to refine a given query by iteratively specifying a set of relevant and a set of nonrelevant images. A similar system is IKONA [34] which is using relevance feedback in order to analyze the content of very high-resolution EO images. Further, the knowledge-driven information mining (KIM) system [41] is an example of an active learning system providing semantic interpretation of image content. The KIM concept evolved into the TELEIOS prototype [36], complementing the scope of searching EO images with additional geo-information and in situ data. Finally, a cascaded active learning prototype [21] has been integrated into an operational EO system [20] to interpret the archives of TerraSAR-X images [37].

CANDELA is improving this cascaded active learning system by searching for dedicated algorithms for typical Earth observation images. Its implementation, test, and validation aim at automated knowledge extraction and image content interpretation. The targeted performance characteristics are verified for several typical use cases and tell us more about the potential of dedicated algorithms with respect to general machine learning.

Figures 49 depict typical classification maps for TerraSAR-X and Sentinel-1 images together with their respective accuracy (e.g., precision/recall) for the cities of Venice, Italy, and Munich, Germany. Another example is the Dutch part of the Wadden Sea in the Netherlands. The results of the classification map and their accuracy are given in Figures 10 and 11.

Figure 4.

TerraSAR-X image of Venice, Italy: (left) a quick-look view of the image and (right) the corresponding classification map generated by CANDELA.

Figure 5.

Sentinel-1 image of Venice, Italy (after selecting the area that is covered by TerraSAR-X from the full Sentinel-1 image): (bottom-left) a quick-look view of the image and (bottom-right) the classification map generated by CANDELA.

Figure 6.

Classification accuracy (precision/recall) by comparison between TerraSAR-X (top-left) and Sentinel-1 (bottom-right) for the Venice image.

Figure 7.

TerraSAR-X image of Munich, Germany: (left) a quick-look view of the image and (right) the classification map generated by CANDELA.

Figure 8.

Sentinel-1 image of Munich, Germany (after selecting the area that is also covered by TerraSAR-X): (bottom-left) a quick-look view of the image and (bottom-right) the classification map generated by CANDELA.

Figure 9.

Classification accuracy (precision/recall) by comparison between TerraSAR-X (top-right) and Sentinel-1 (bottom-left) for the Munich image.

Figure 10.

Sentinel-2 quadrant image of an area of the Dutch Wadden Sea: (left) a quick-look view of the image and (right) the classification map generated by CANDELA.

Figure 11.

Classification accuracy (precision/recall) for the Sentinel-2 quadrant image covering an area of the Wadden Sea.

4.2 Data fusion by machine learning

Currently, what exists in the field of data fusion is a collection of routines/algorithms that can be linked and embedded for various applications. A very well-known open-source toolbox is Orfeo [38] which provides a large number of state-of-the-art algorithms to process SAR and multispectral images for different applications. Another one is Google Earth [31] that includes a large image database and an expandable number of algorithms that can be used for image processing.

In our case, we need to recognize different target area details in overlapping SAR and multispectral images. For doing this, we selected a number of cities from all over the world. The cities are Bucharest in Romania, Munich in Germany, Venice in Italy, and Washington in the USA. The selection criteria of these cities were the simultaneous availability of these cities covered by the two satellites and the variety of categories that can be found. A difficulty arises when trying to co-align these images, for example, images provided by TerraSAR-X and WorldView-2, because the original data have different pixel spacing. To solve this problem, we resampled the panchromatic WorldView-2 image in order to co-align it with the TerraSAR-X image [27].

In the case of Sentinel-1 and Sentinel-2, the images can be rectified and co-aligned by publicly available toolbox routines [39]; this allowed us a straightforward image comparison.

While we are accustomed to image fusion as a radiometric combination of multispectral images, a comparably mature level of semantic fusion of SAR images has not been reached yet. In order to remedy the situation, we propose a semantic fusion concept for SAR images, where we combine the semantic image content of two data sets with different characteristics. By exploiting the specific imaging details and the retrievable semantic categories of the two image types, we obtained semantically fused image classification maps that allow us to differentiate between different categories.

Figures 1214 present the classification maps for each sensor and the fused ones together with their accuracy (e.g., precision/recall) for the city of Venice, while Figures 1517 apply to the city of Munich.

Figure 12.

A multi-sensor data set: multispectral image (top-left side), panchromatic image (top-right side), and TerraSAR-X image (bottom-center) for the city of Venice, Italy.

Figure 13.

Classification maps generated using the CANDELA platform for the city of Venice: multispectral image (top-left side), panchromatic image (top-right side), TerraSAR-X image (bottom-left side), and fusion of all three images (bottom-right side).

Figure 14.

Classification accuracy (precision/recall) for a selected image taken over the area of Venice using multispectral, panchromatic, and SAR images and also the fused image.

Figure 15.

A multi-sensor data set: multispectral image (top-left side), panchromatic image (top-right side), and TerraSAR-X image (bottom-center) for the city of Munich, Germany.

Figure 16.

Classification maps generated using the CANDELA platform for the city of Munich: multispectral image (top-left side), panchromatic image (top-right side), TerraSAR-X image (bottom-left side), and fusion of all three images (bottom-right side).

Figure 17.

Classification accuracy (precision/recall) for a selected image over the area of Munich using multispectral, panchromatic, and SAR images and also the fused image.

For a quantitative assessment, we compared the semantic annotation results with the given reference data set and computed precision/recall for each category and sensor. Analyzing the figures separately, we observed that the average of precision/recall obtained for fused sensor images is higher than the precision/recall of individual sensor images. Unfortunately, there are also cases in which for corresponding image patches tiled from different sensor images, the WorldView-2 annotations have a different semantic classification when compared to the TerraSAR-X results or when a category is missing for one sensor. In our case, in the Venice image, the category “buoys” is only detected in the TerraSAR-X image, and not in the WorldView-2 image. This has a noticeable impact on the performance of the category “boats.” Another example is the category “clouds” that appears in the case of the Munich image that is detected in the WorldView-2 image, but not in the TerraSAR-X image.


5. Data science workflows

Recently, a new paradigm for Earth observation, namely, Data Knowledge Discovery, was introduced [17]. This paradigm defines the entire chain “data-information-knowledge-value” and deals with a meaningful EO content extraction, i.e., the semantic and knowledge aspects.

We developed user-invariant and EO domain-specific compensatory methods for the individual user- and domain-subjective biases. The derived models generate a sharable knowledge body as a means to enable the communication between fragmented knowledge learned from metadata, image data, and other data in synergy with the domain expertise of EO users. Today’s EO paradigms and technologies are largely domain-oriented and have to support the communication outlined above.

Artificial intelligence big data in Earth observation [13] forced the development of new technologies starting from management platforms [4] and is reaching now the information platforms.

An example for the first category are ESA’s Thematic Exploitation Platforms (TEPs) [4] that are designed and focused for coastal applications, forest, geohazards, hydrology, polar, urban, and food and security application domains, integrating standard processing chains that have low user interaction. The Copernicus system (currently still under development) and its data information and access services component [18] are a major achievement but still represent a “classic” management paradigm.

Currently, “classic” existing systems/platforms are usually batch-oriented (e.g., TEPs, DIAS), but with EOLib [20, 40] and the new CANDELA platform [17], this paradigm was “moved” to interactive systems (e.g., supporting active learning).

There are three perspectives to describe this type of interactive systems:

  • The first one is based on signal-information logic (Figures 18 and 19).

    The objective is the knowledge extraction from the sensor signal of the physically meaningful parameters or Earth’s surface cover categories.

    The process is divided in two steps:

    • The first step is an automated batch process to manage the satellite image product files, i.e., to extract the image data and to select the relevant metadata, to perform a spatial breakdown of the image into patches, to estimate for each image patch the particular signatures or primitive descriptors, and to further structure the extracted information in a database.

    • In a second step following interactive machine learning paradigms, the extracted information is transformed into semantic entities attached to each image location. The process is a combination of querying, browsing, and active learning. Using positive examples, i.e., training samples for the categories of interest and complemented by negative examples to enhance the accuracies of each class, a user can define the image semantics adapted to a particular application.

  • The second perspective is based on the value-adding logic (Figure 20).

    Based on these procedures, value-adding is an iterative process.

    The satellite data are generally multi-mission data, e.g., multispectral and SAR data that are restructured in a common database, which becomes the data source. The data preparation component is generating the Analysis-Ready Data (ARD) ensuring the least and mandatory processing and organizational steps that enable a direct analysis, thus minimizing the user interaction at the data level.

Figure 18.

The signal-information logic scheme: chain  data-information-knowledge.

Figure 19.

The signal-information logic scheme: chain  data-information-knowledge-semantic value.

Figure 20.

The value-adding logic scheme.

Among them are the generation of radiometrically and geometrically calibrated data cubes. Browsing the data sets is a first step of visual inspection where the user is getting acquainted with the observed structures and their signatures. Further, data mining is an automated process to discover the main data particularities and categories but also detect artifacts or outliers in the data sets, which are beyond the capabilities of human observation, due to the large data volumes and the nonvisual nature of the satellite images. The discovered and selected data sets are further analyzed in detail by extracting the particular characteristics of the observed scenes or objects. The results of the analysis are contributing to update existing models or build new models for the observations. Visualization of the model parameters or extracted information is a verification step to cope with large complex data volumes. Specific evaluation paradigms are needed to build trust in the obtained results, to be used to make predictions. The process is iterative, and when new data are acquired, they will be analyzed further.

  • The third perspective is the implementation architecture logic (Figure 21).

    The implementation of these paradigms requires a concept of integrating artificial intelligence with software (SW) system architectures enabling interactive multiuser operations in real time relative to the user reaction times. End users will be able to work on shared user scenarios, results of their analyses, or information extraction procedures.

    The central component is a data index (DI) which is a very specific database model for very fast, real-time management, processing, and distribution of large structured and unstructured distributed multi-temporal data sets. The data can be efficiently uploaded on demand, coping with large volumes of data from various heterogeneous sources.

    The data preparation needs to be able to support various tasks for the ARD generation. A workflow orchestration engine will be relaying data and offers various processor steps:

  • A deep neural network (DNN module) for physically meaningful feature learning

  • Spatiotemporal analysis, e.g., spatiotemporal pattern analysis and extraction for understanding the evolution classes, fusing information from various sources, not just identifying objects, but in particular spatiotemporal patterns and context

  • Data mining to explore heterogeneous multi-temporal data sets.

Figure 21.

The logic implementation architecture scheme.

The extracted information and data content are again indexed in the DI and provided (via web services) to one of the four human-machine interface (HMI) modules (i.e., visual browsing, visual analytics, active learning, and event analysis) supporting advanced big data visualization and active learning paradigms. Once a researcher is satisfied with the results, they can be shared with a restricted group or publically via the collaborative layer. These architectures are generically based on federated approaches, making it possible to deploy various components where they fit best, using cloud technologies and web services for communication.


6. Conclusions

The advantages and benefits of the proposed approach are:

  • We do clustering considering the physical parameters behind the sensors contrary with the classical classification proposed in AI.

  • With very few examples, we are able to classify the images with high accuracy.

  • We are able to process multi-sensor data.

  • We are able to create a semantic scheme adapted to different EO sensors (SAR or multispectral), high resolution (e.g., TerraSAR-X or WorldView)/medium resolution (e.g., Sentinel-1 or Sentinel-2).


7. Future work

During the next years, we expect a wide variety of new satellite image data that can be easily downloaded, handled, and analyzed by individual users. We also think that a number of new geophysical databases and browse tools will become available so that each user has easy access to numerous additional satellite data sources together with auxiliary geophysical data from common libraries and data management tools supporting in-depth image data analyses and their interpretation. Innovative application fields (such as autonomous driving based on machine learning and artificial intelligence) will bring us still more data handling tools and new data archives becoming available via the Internet. In addition, we also suppose that these new tools will be supplemented by management and support environments, for instance, for system testing and performance monitoring. Within the next 5 years, this should result in new established environments for image data understanding.



Part of this work was supported by CANDELA—the Copernicus Access Platform Intermediate Layers Small-Scale Demonstrator—a H2020 research and innovation project under grant agreement no. 776193.

Another part of the work was supported by EOLib—the Earth Observation Image Librarian—an ESA technological project.

The TerraSAR-X image data being used in this study were provided by the TerraSAR-X Science Service System (Proposal MTH 1118), while the WorldView-2 image data were provided by the European Space Imaging (EUSI).


  1. 1. Living Planet Symposium. 2019. Available from: [Accessed: April 2019]
  2. 2. Lavender S, Lavender A. Practical Handbook of Remote Sensing. Boca Raton: CRC Press; 2015
  3. 3. Reeves RG, Anson A, Landen D, editors. Manual of Remote Sensing. Falls Church, Virgina: American Society of Photogrammetry; 1975
  4. 4. ESA TEPs. 2019. Available from: [Accessed: March 2019]
  5. 5. Berger M, Moreno J, Johannessen JA, Hanssen F, Levelt PF, Hannssen RF. ESA’s sentinel missions in support of earth system science. Remote Sensing of Environment. 2012;120:84-90
  6. 6. ESA Sentinels. 2019. Available from: [Accessed: April 2019]
  7. 7. TerraSAR-X. Basic Products Specification Document, Issue: 1.6, TX-GSDD-3302. 2009. Available from: [Accessed: April 2019]
  8. 8. WorldView. 2019. Available from: [Accessed: April 2019]
  9. 9. ESA Portal. 2019. Available from: [Accessed: May 2019]
  10. 10. ESA Sentinel-1 Portal, Geographical Coverage. 2019. Available from: [Accessed: May 2019]
  11. 11. ESA Sentinel-2 Portal, Geographical Coverage. 2019. Available from: [Accessed: May 2019]
  12. 12. N2YO. Search Satellite Database. 2018. Available from: [Accessed: August 2018]
  13. 13. AI4EO Agenda. 2019. Available from: [Accessed: April 2019]
  14. 14. Datcu M, Dumitru CO, Schwarz G, Castel F, Lorenzo J. Data Science Workflows for the CANDELA Project. 2019. Available from: [Accessed: April 2019]
  15. 15. Deep Learning. 2019. Available from: [Accessed: April 2019]
  16. 16. TensorFlow. 2019. Available from: [Accessed: March 2019]
  17. 17. CANDELA Project. 2019. Available from: [Accessed: April 2019]
  18. 18. DIAS Platform. 2019. Available from: [Accessed: March 2019]
  19. 19. Dumitru C, Schwarz G, Datcu M. Land cover semantic annotation derived from high-resolution SAR images. The IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 2018;11(5):1571-1592
  20. 20. Earth Observation image Librarian (EOLib). 2019. Available from: [Accessed: April 2019]
  21. 21. Blanchart P, Ferecatu M, Cui S, Datcu M. Pattern retrieval in large image databases using multiscale coarse-to-fine cascaded active learning. The IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 2014;7(4):1127-1141
  22. 22. ESA Sentinel-1. 2019. Available from: [Accessed: April 2019]
  23. 23. ESA Sentinel-2. 2019. Available from: [Accessed: April 2019]
  24. 24. Living Planet Symposium. 2017. Available from: [Accessed: December 2017]
  25. 25. Big Data from Space Conference. 2019. Available from: [Accessed: April 2019]
  26. 26. IGARSS. 2019. Available from: [Accessed: April 2019]
  27. 27. Dumitru CO, Cui S, Datcu MA. Study of multi-sensor satellite image indexing. In: Proceedings of the JURSE 2015. 2019. Available from: [Accessed: April 2019]
  28. 28. WorldView-2. 2019. Available from: [Accessed: April 2019]
  29. 29. Smeulders A, Worring M, Santini S, Gupta A, Jain R. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2000;22:1349-1380
  30. 30. Torralba A, Russell B, Murphy K, Freeman W. LabelMe: A database and web-based tool for image annotation. International Journal of Computer Vision. 2008;77(1-3):157-173
  31. 31. Google. 2019. Available from: [Accessed: April 2019]
  32. 32. Stepinski T, Netzel P, Jasiewicz J. LandEx-A GeoWeb tool for query and retrieval of spatial patterns in land cover datasets. The IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 2014;7(1):257-266
  33. 33. Shyu CR, Klaric M, Scott G, Barb A, Davis C, Palaniappan K. GeoIRIS: Geospatial information retrieval and indexing system-content mining, semantics modelling, and complex queries. IEEE Transactions on Geoscience and Remote Sensing. 2007;45(4):839-852
  34. 34. Boujemaa N. IKONA: Interactive Specific and Generic Image Retrieval. MMCBIR; Glasgow, UK. 2001
  35. 35. Datcu M, Daschiel H, Pelizzari A, Quartulli M, Galoppo A, Colapicchioni A, et al. Information mining in remote sensing image archives: System concepts. IEEE Transactions on Geoscience and Remote Sensing. 2003;41(12):2923-2936
  36. 36. TELEIOS Project. 2019. Available from: [Accessed: April 2019]
  37. 37. Dumitru C, Schwarz G, Datcu M. SAR image land cover datasets for classification benchmarking of temporal changes. The IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 2018;11(5):1571-1592
  38. 38. Orfeo Toolbox an Open Source Collection of Remote Sensing Tools. 2019. Available from: [Accessed: February 2019]
  39. 39. ESA Sentinel Toolboxes. 2019. Available from: [Accessed: February 2019]
  40. 40. Espinoza-Molina D, Manilici V, Cui S, Reck CH, Hofmann M, Dumitru CO, et al. Data mining and knowledge discovery for the TerraSAR-X payload ground segment. In: Proceedings of the PV 2015, Darmstadt, Germany. 2015
  41. 41. Knowledge-based Information Mining (KIM). 2019. Available from: [Accessed: April 2019]

Written By

Corneliu Octavian Dumitru, Gottfried Schwarz, Fabien Castel, Jose Lorenzo and Mihai Datcu

Submitted: 25 March 2019 Reviewed: 16 May 2019 Published: 05 September 2019