Optimization of an Earth Observation Data Processing and Distribution System Optimization of an Earth Observation Data Processing and Distribution System

Conventional Earth Observation Payload Data Ground Segments (PDGS) continuously receive variable requests for data processing and distribution. However, their architecture was conceived to be on the premises of satellite operators and, for instance, has intrinsic limitations to offer variable services. In the current chapter, we introduce cloud comput ing technology to be considered as an alternative to offer variable services. For that pur - pose, a cloud infrastructure based on OpenNebula and the PDGS used in the Deimos-2 mission was adapted with the objective of optimizing it using the ENTICE open source middleware. Preliminary results with a realistic satellite recording scenario are presented.


Introduction
Traditionally, Earth Observation systems have been operated by governments and public organizations; the primary investors being US, China, Russia, Japan and Europe mainly because of worldwide common objectives such as climate change, sustainable development and objectives at national level.
However, from 2015 to 2016, the Earth Observation from space paradigm is changing with the globalization of the market, the evolution of the information and communication technologies and the high investment of private entities in the field.
This boost of commercial interest in Earth Observation can be explained because of the parallel evolution of three main pillars, as stated by Denis et al. in [1]: 1. Increased performance of commercial satellites with defence needs in the range of very high resolution products, i.e. resolutions between 0.25 and 1 m.

2.
The development of hybrid procurement schemes between private and public customers.

3.
Appearance of the New Space scheme started in Silicon Valley, which attracted the interest of investors and contributed to the creation and entrance of new actors in the space sector.
To these, we would add the dedicated budget of new countries, such as Kazakhstan, Venezuela and Vietnam, in EO; increased budget in new EO programmes for India, China and South Korea [2] and fast evolution of information and communication technologies, which facilitated the creation of new applications requiring availability of lots of information in the shortest time possible. This contributed to the evolution of the space sector in two manners: (a) the evolution of the sensors to provide highest performance at a lower cost and (b) the launch of more satellites to cover the demand of information. This last explains the increase in the launch of satellites during the last years and interest of satellite operators to operate satellite constellations in order to reduce the revisit time and offer more coverage of the land surface.  [3]. The amount of generated data is used, for instance, to accumulate spatial and temporal records of the world itself, of the events and changes that occur in it in a diverse number of applications: security, maritime, agriculture, energy and emergency, among others [4].
However, the infrastructures used to manage EO data are still based on traditional EO systems, which (because of their previous ambit of application) make use of on-site traditional infrastructures or data centers. Their architecture was designed to be monolithic in a localized single infrastructure. Now, the process of recording data from Earth observations generates massive amounts of spatiotemporal geospatial information that has to be intensively processed for a variable and increasing demand. This is a handicap for traditional data centers since they are not designated to manage variable amounts of data. They were designed and sized to operate a certain data volume. They are then limited in terms of flexibility and scalability [5]. The storage of increasing amounts of data over time is also a challenge, since the recordings are also maintained by their owners over time as well [6].
Traditional Earth Observation Payload Data Ground Segments (PDGS) present the following limitations to cover the demands of the current EO market: i. Traditional infrastructures are not flexible or easily scalable to operate.
ii. There is a risk of oversizing/undersizing the infrastructure to offer services when highly variable demand exists.
iii. They make the cost of acquiring recent images of the Earth very high.
iv. The customers cannot access directly neither fast to the information they need because this has to be processed and ad-hoc distributed.
However, the use of cloud computing technology can eliminate the previous drawbacks to improve EO services because it is elastic, scalable, it works on demand through virtualization of resources, offers virtually unlimited storage and computation capability, it is worldwide connected and it is based on a pay per use model [7,8].
Nevertheless, the current cloud computing technology still presents some limitations: i. The virtual machine images (VMIs) are not optimized, being highly oversized, impacting in the costs of using the infrastructure and in the dynamic resources provisioning.
ii. The deployment of virtual machines (VM) in cloud is not in real time. The deployment normally takes between 10 and 20 minutes, which directly affects to the flexibility and dynamic scalability of the system.
iii. Although the pay per use model should intrinsically have reduced costs, since the customer only pays for what he uses, the costs of using cloud computing are still high.
iv. There are some major worldwide champions in the offer of cloud services such as Amazon, Google, Microsoft and IBM, which make difficult the migration of a system from a cloud infrastructure to another different cloud infrastructure, existing vendor lock-in. This limits the democratization of these services and makes an entrance barrier for new cloud providers.
Within the ENTICE H2020 project (project no. 644179), we intend to demonstrate that processing the data recorded from Earth observations in a cloud environment with the middleware ENTICE optimizes the efficiency and overcomes the critical barriers of cloud computing and data processing needs. Among other advantages, ENTICE provides independence from a specific infrastructure provider and facilitates the distribution of VMs in distributed infrastructures.
In this work, we present the implementation of the Earth Observation Data (EOD) pilot, which mainly consists of the implementation in cloud of the already commercial Ground Segment for Earth Observation (gs4EO) suit, commercialized by Deimos [9], which is currently operational in the Deimos-2 satellite mission [10].
For this purpose, we simulate a real scenario with the Deimos-2 satellite running in a federated cloud infrastructure, in which we obtain real performance metrics and present real system requirements for normal operations with the satellite. Through this experimentation, we demonstrate the EOD concept as a solution for the new EO market paradigm.

ENTICE environment
In order to facilitate the implementation in cloud, the EOD pilot makes use of the ENTICE middleware [11], which facilitates autoscaling and flexibility to the ingestion of satellite imagery, its processing and distribution to end users with variable demands. Kecskemeti et al. [12] introduced the ENTICE approach to solve these problems. The ENTICE environment consists of a ubiquitous repository-based technology, which provides optimised virtual machine (VM) image creation, assembly, migration and storage for federated clouds. The webpage of ENTICE can be found in [13].
ENTICE facilitates the implementation of cloud applications by simplifying the creation of lightweight virtual machine images (VMIs) by means of functional descriptors. These functional descriptors define at high and functional levels the VMIs and contribute to define the system Service Level Agreement (SLA) to facilitate the optimization of the VMIs in terms of performance, costs, size and quality of service (QoS) needed. Then, the VMIs are automatically decomposed and distributed to meet the application runtime requirements. In addition, ENTICE facilitates elastic autoscaling. The benefits of using ENTICE are the following: • Reduction of up to 80% storage.
• Reduction on the costs of deployment.
• Elimination of cloud infrastructure vendor lock-in.
In the EOD pilot, ENTICE is used as middleware between the federated infrastructure described in Section 3.1 and the gs4EO application software.

EOD pilot description
The Earth Observation Data Processing and Distribution Pilot (EOD) consists of the implementation of the Elecnor Deimos' geo-data processing, storage and distribution platform of Deimos-2 satellite using cloud technologies. The main functionalities of the system are the following: • Acquisition of raw data: When the imagery data are ingested from the satellite into the ground station, the system is notified and the ingestion component automatically ingests the raw data into the cloud for its processing.
• Processing of data: Once the data are ingested, it is processed in the product processors.
There are several processing levels to provide different products.
• Archiving and cataloguing geo-images: The different products obtained from the processing of raw data are archived and catalogued in order to provide these images or high added value services to end users.
• Offering user services: This is the front-end of the system. It allows end users to select the product that they want to visualize or to download.

EOD architecture
The main objectives of the EOD pilot is to process real data of Deimos-2 satellite in a realistic scenario of normal operation and the validation of the processing chain module as part of the cloud infrastructure. Ramos and Becedas [14] proposed an original architecture of the gs4EO suit to be implemented in cloud. Based on that work, the architecture for the EOD pilot has been redesigned and implemented, see Figure 1.
The architecture is composed of the following components: • monitor4EO: It is a ground station monitor, which ingests the available raw data from the ground stations to the cloud system. It contains an Orchestrator, which manages the tasks of the different modules.
• process4EO server: It is the Orchestrator, which is the component that manages the tasks to be done by all the modules of the architecture computed in the cloud infrastructure. The Orchestrator has the following functions: ○ To identify which outputs shall be generated by the processors.
○ To generate the Job Orders. They contain all the necessary information that the processors need. Furthermore, these eXtensive Markup Language (XML) files include the interfaces and addresses of the folders in which the input information to the processors is located and the folders in which the outputs of the processors have to be sent. They also include the format in which the processors generate their output.
○ To find data in the ground stations (pooling) to be ingested in a shared storage unit in the cloud for its distribution to the processing chain.
○ To control the processing chain by communicating with the product processors.
○ To manage the archive and catalogue.
• process4EO node: Constituted of different software modules, which are in charge of the processing of the raw data and the products of previous levels to produce image products. Figure 2 depicts the pipeline of the image processing process. The four most important operations are the following: ○ Calibration: (L0 and L0R processing levels) to convert the pixel elements from instrument digital counts into radiance units.
○ Geometric correction: (L1A processing level) to eliminate distortions due to misalignments of the sensors in the focal plane geometry.
○ Geolocation: (L1BR processing level) to compute the geodetic coordinates of the input pixels.
○ Orthorectification: (L1C processing level) to produce orthophotos with vertical projection, free of distortions. • archive4EO: In this module, the processed images are stored and catalogued for their distribution. It offers a Catalogue Service for the Web (CSW) interface.
• user4EO: It is a web service in which the end users can access to the products.
• Shared storage: It is a storage module shared by all the modules of the architecture in which all the inputs and outputs of the different modules of the architecture are stored.

Testing infrastructure
The testing infrastructure used in the experiment is formed by hardware deployed in three different locations and managed in a federated manner: DMU infrastructure (in Deimos UK in United Kingdom), DMS infrastructure (in Deimos Space in Spain) and DME infrastructure (in Deimos Engenharia in Portugal). The hardware resources deployed in every location are described in Table 1. The ENTICE middleware was installed in the DMU infrastructure, which is acting as master. It also contains an object store with interface to Amazon Simple Storage Service (Amazon S3) for cloud bursting. DMS and DME infrastructures are slaves of DMU infrastructure and contain object stores also with interfaces to Amazon S3. A block diagram describing the interrelations of the testing infrastructure is depicted in Figure 3.
The virtualization of the infrastructure was done with OpenNebula. Kernel-based Virtual Machine (KVM) was used as hypervisor. The creation of the virtual machines was done with Packer, whereas the automatic deployment of the virtual machines was done with Ansible. Figure 4 shows a diagram describing the logic process of automatic generation of the virtual machines that constitute the EOD software. The image building process takes advantage of  • Packer template: It is a JSON file that provides all the information to create the virtual machine in Packer. It contains the format, the instructions and the parameters on how to build a VMI using KVM. The provisioners define the scripts or recipes in Ansible for configuring the machine and installing the applications.
• Ansible playbook: These files are "recipes" to install the EOD software in the virtual machines. This is a YAML file with the commands expressed in a simplified language, describing a configuration or a process. It contains the information to configure the system, install the EOD software and the functionalities to work in the cloud environment (contextualization). The Python script receives the configuration file and launches the Packer command after configuring some parameters in the Kickstart file. The Packer command takes the template and runs all the builds within it in order to generate a set of artefacts and build the image in KVM. Once the image is built, Packer launches all the provisioners (Ansible) contained in the template. Ansible carries out several steps: it configures all the repositories, installs all the dependencies and software packages of the EOD modules, configures the EOD software and installs a context package to deploy the VMI in OpenNebula.
The recording of the experiment data was done with Jmeter™ [15] and Nagios® [16]. Jmeter™ is installed in the Node and Nagios ® in a virtual machine inside the federated cloud. It is used for the monitoring of the cloud resources and status and to extract the experimental data.

Experiment description
The aim of this experiment is to demonstrate the feasibility of implementing the EOD system in cloud and how its behavior improves after the optimization done by ENTICE over the process4EO node.
The experiment is that of a realistic recording with Deimos-2 satellite in which a real acquisition is ingested into the EOD pilot. Then, the processing of the raw data is carried out with the EOD pilot before and after the optimization process. The results are compared to evaluate the functionality of the optimized system with regard to the nonoptimized system and validate the implementation of the gs4EO modules in cloud.
VMI size, VMI creation time, VMI delivery time and VMI deployment time are the evaluated metrics selected to compare the performance of the system before and after the optimization process.
The following are the evaluated metrics to demonstrate that the functionality of the system remains the same after the optimization: processing time, imagery products size, CPU use per process and memory use per process.
The raw data used in the experiment have 3 MB size, four multispectral bands (R, G, B and NIR) and one panchromatic. The recorded area of the land surface is a rectangle of 8.86 × 16.59 km 2 .
The raw data are managed and processed to automatically obtain the following products: • L0: raw data decoded.
The virtual resources used in the experiment were the following: a virtual machine with 300 GB, a RAM of 10 GB, four CPUs of 32 bits, a shared storage with 99 GB and an additional storage volume with 50 GB. This hardware was used for both experiments (EOD before and after optimization) in order to facilitate comparison.

Experiment results
First, the virtual machine images of the EOD pilot were created, delivered and deployed in the cloud. Then, the virtual machine of the proces4EO was optimized and its VMI was again created, delivered and deployed. The time spent in every step is depicted in Table 2.
In these results, one can see the increase in the performance of the system before the runtime, i.e. up to the deployment of the system: this is a reduction of 30% in VMI size, a reduction of 37.3% in the VMI creation time, a reduction of 34.53% in the VMI delivery time and a reduction of 54.05% in the deployment time.
Next, the raw data recorded with the satellite were ingested in both the original EOD pilot and the optimized EOD pilot. The response of both optimized and nonoptimized systems were measured in the runtime. The processing time of the satellite imagery in the original EOD pilot and the EOD pilot with the optimization of the processing chain is shown in Figures 5 and 6 respectively. It can be noticed that the processing time of the different levels is similar in both experiments, so as to the time to process the raw data up to the orthorectification level (    Table 3. Imagery product sizes obtained with both the nonoptimized and the optimized EOD system.   Table 3. Notice that the size of the different products remains the same in both experiments. These demonstrate that the functionality of the system is intact after the optimization process, while the optimization provides benefits in storage, creation, delivery and deployment of the system.
Furthermore, the CPU and memory used in both experiments are similar for all the processing stages: in Figure 7, the CPU used in the processing of the satellite imagery with the nonoptimized system is shown; in Figure 8, the CPU used in the optimized system is depicted.   Besides, the memory used by the optimized system was lower: the memory use per process in the nonoptimized system can be seen in Figure 9, while the memory used in the optimized system can be seen in Figure 10.
These results obtained with the EOD pilot can be related with the new paradigms of the Earth Observation market stated in [1]. Table 4 describes how an approach of a PDGS system similar to the EOD pilot could cover the main requirements of the new EO market.

Conclusions and future work
In this work, the successful implementation of the EOD pilot in an experimental cloud infrastructure with the middleware ENTICE was demonstrated. The pilot was tested and Figure 10. Memory use per process in the optimized EOD system.

Costs optimization
Cost reduction by means of reduced storage of optimized VMIs, reduced creation time, reduced delivery time and reduced deployment time Multi sensors ground processing systems Ground stations, ground control centers and data processing centers would take advantage of a rapid, agile, resilient and secure interconnected computer system in cloud Vertical integration Global distributed infrastructure connecting all the stakeholders in an operational environment Scalability Elastically autoscale applications on cloud resources based on their fluctuating load with optimized VM interoperability across cloud infrastructures and without provider lock-in Table 4. New paradigm requirements vs. EOD pilot approach.
promising results were obtained. These results indicated that real scenarios of satellite imagery managing and processing can be carried out in cloud with many advantages with respect to traditional infrastructures. Furthermore, an optimization of the EOD pilot was carried out, demonstrating a reduction of 30% in VMI size, 37.3% in the VMI creation time, 34.53% in the VMI delivery time and 54.05% in the deployment time, while maintaining the functionality of the system intact. This indicates that a PDGS system implemented in cloud in a similar manner to that of the EOD pilot can fulfill the requirements of the new Earth observation market paradigm. Specifically, these EOD pilot results demonstrate that the deployment of an optimized PDGS system in cloud can reduce the costs of storage and reduce the time to user by reducing the creation time, the delivery time and the deployment time of the system. Besides, ground stations can take the advantage of rapid, agile, resilient and secure interconnected system when are cloud-based. In addition, the global operational environment provided by a cloud infrastructure facilitates both global acquisition and distribution of data, improving the market efficiency. Finally, the system improves its scalability without vendor lock-in, covering the needs of recent on demand markets.
In future research, different realistic scenarios with variable demand of services will be tested.
With these scenarios, we will evaluate the elastic behaviour in the ingestion of raw data in the system, the processing and the distribution of imagery products to users. Furthermore, a complete optimization of the system will be tested to evaluate the complete repository storage size reduction, which was not evaluated in this work. In addition, new metrics will be measured to validate the implementation of the system for its commercial implementation in the next future.