Spatial Cloud Computing Using Google Earth Engine and R Packages

Anwarelsadat Eltayeb Elmahal; Mohammed Mahmoud Ibrahim Musa

doi:10.5772/intechopen.1002686

Abstract

Google Earth Engine (GEE) is an open-source spatial cloud computing platform that can be easily accessed through its native application programming interfaces (APIs), which are JavaScript and Python. In this chapter, we will introduce the R package, a non-native GEE API, as a different way to interface with and utilize the extensive statistical and visualization capabilities of the R package through the RGEE library. In addition to the fundamental concepts of spatial cloud computing, the reader will be exposed to the practical side of satellite image processing techniques, which will include, digital image processing and visualizations.

Keywords

spatial cloud computing
Google Earth Engine
R package
RGEE library
satellite image processing
image classification
machine learning algorithms
accuracy assessment
NDVI

Author Information

Show +

Anwarelsadat Eltayeb Elmahal
- Faculty of Geography and Environmental Sciences, University of Khartoum, Sudan
Mohammed Mahmoud Ibrahim Musa*
- Faculty of Computer Science and Information Technology, Alzaiem Alazhari University, Sudan

*Address all correspondence to: mohammed.aau@gmail.com

1. Introduction

Through the internet or other specialized networks, cloud computing has gained a lot of attention and is being increasingly utilized. It provides clients with applications, data, computing processing power, and information technology management as a service. Individuals could use it for emailing, data storage or photo sharing. It is used by enterprise businesses for a range of business operations as well as process management.

Geospatial data, particularly that obtained through remote sensing, is expanding at an exponential rate and using cloud computing services. For users to handle it professionally in the cloud, it requires both spatial and programming skills. Google Earth Engine (GEE) is a well-known example of a cloud-based geospatial platform that uses Google’s multi-Petabyte data repository to let users analyze and visualize planetary-scale environmental data.

GEE could be easily accessed through its native application programming interfaces (APIs), which are JavaScript and Python. In this chapter, we will introduce the R package, a non-native GEE API, as a different way to interface with and utilize the extensive statistical and visualization capabilities of the R packages through the RGEE library. In addition to the fundamental concepts of spatial cloud computing, the reader will be exposed to the theoretical and practical sides of remote sensing principles, and satellite image processing techniques, which will include, digital image processing and visualizations.

2. Principles and concepts

2.1 Remote sensing principles

We currently live in a world where the population is constantly growing, and natural resources are being continuously depleted by land use and landcover improper management. This situation leads to catastrophic climate change challenges that may threaten humanity’s very existence. Understanding the Earth’s system and applying that knowledge to inform our decisions can help us overcome these challenges [1, 2]. Remote sensing is an approach for collecting information about the ocean, land, and atmosphere utilizing electromagnetic radiation without directly contacting the object, surface, or phenomenon under investigation [3, 4]. The components of a satellite remote sensing system are (i) data collection via satellite platforms and sensors, (ii) data conversion to ground stations, (iii) data processing, and (iv) distribution to users after applying quality assurance standards (Figure 1).

Figure 1.
Remote sensing components simplified.

After the official introduction of GEE in 2010 [5], simple methods for handling earth observation (EO) data became available. Contrary to the conventional methods of EO data manipulation, the new methods do not require the user to download or retain massive datasets, deal with a variety of file formats, or own highly developed computer systems in order to analyze and visualize EO data.

2.2 Analysis and visualization of satellite data

Images are an efficient way to communicate ideas, and numerous experts say that a picture is worth a 1000 words. Remotely sensed data are received in image format and may contain some systematic or accidental errors as a result of atmospheric interaction. Before evaluating and interpreting remotely sensed data, the user must be aware of how to correctly pre-process it; otherwise, they face the risk of depending on images that have already been processed, unless they were handled by a professional. The satellite image processing techniques could be categorized into three main categories [6, 7, 8], which are:

Image restoration and rectification: it refers to the operations that correct images from both geometric and radiometric errors. Image restoration may include georeferencing and haze reduction in satellite images. Image registration or restoration must be implemented to correct distorted or degraded image data. The unprocessed image data must first be processed in order to adjust for geometrical distortions, calibrate the data radiometrically, and eliminate noise [9].
Image enhancement: It means enhancing the image’s quality so that it is easier for others to perceive it. This might be done by increasing the contrast in the image, using filtering, or using any other approaches.
Information extraction and image classification: The purpose of image classification is to convert the image’s spectral classes into information classes [7, 8]. With the goal to categorize the pixels with the same spectral values to, for instance, known land use and land cover classes. There are typically two main categories, which are: (i) unsupervised and (ii) supervised. In the unsupervised category, the computer/machine will classify the spectral classes into information classes without any major user inputs. To get satisfactory results, the technique is followed by laborious post-classification operations. It is preferable when working with expansive areas or when the user has limited background information regarding the image’s classes. However, with supervised classification, the user selects the training areas, and the computer/machine creates the classes based on those selections. In both categories, strict accuracy standards should be adhered by utilizing machine learning (ML) algorithms. In this chapter, a real example of image enhancement followed by classification is provided using the RGEE package.

2.3 Image classification and machine learning approaches

We employed a supervised image classification technique in the example at the end of this chapter, in which the training areas were designated for various land use land cover (LULC) classes. The location chosen for the test is Wadi Sob, a well-known watershed in Khartoum State, Sudan. The heart of ML is the exploration and creation of mathematical models and algorithms to learn from data and get insights out of that. In ML, the data is split into two subsets: a training test for building the model and a test subset to evaluate it. In this chapter, we chose the SmileCart algorithm because it generates clear findings that are simple enough to be comprehended, can be used to categorical and continuous datasets, is non-parametric, and makes no prior assumptions. The datasets were divided into two groups: 80% were used for training, and 20% were used to test and validate the algorithm’s effectiveness.

2.4 Normalized Difference Vegetation Index (NDVI)

The normalized difference vegetation index (NDVI) is a biophysical measure. It measures the difference between the red light, which plants absorb, and near-infrared light, which plants reflect and is usually calculated by an empirical formula [7, 8].

NDVI=NIR−R/NIR+R.

In this chapter, an example is provided to calculate the NDVI and visualize it using the rgee platform. The area selected for the test is Kassla State, Sudan.

3. Specificity of spatial data

3.1 Understanding the nature of spatial data

Geographical information science (GISc) is used to identify and characterize an area of science concerned with the analysis, visualization, and representation of geo-referenced data [2]. The terms spatial and geospatial data are interchangeably used to mean any data that contains information about a specific location implicitly at a global level or explicitly at the local levels up to the GPS coordinates.

Based on storage and representation approaches, spatial data can be categorized into two types: vector and raster data. Vector data represents spatial data as points, lines, and polygons, whereas raster data expresses it as pixels (Figure 2). The building block of the vector model is the point, a line is a combination of two points and a polygon is the closed loop of more than two lines. The building block of the raster model is the pixel (picture element). The pixel is the most tiny component of the digital display. The spatial resolution refers to the size of the pixel representing the geographic phenomenon. An image of 10 m resolution-like that of the Sentinel-2- meaning that every pixel is 10 m × 10 m width. Both vector and raster data contain attribute values. Raster data is mostly found in remotely sensed data in the form of satellite images, aerial photos, and digital elevation models.

Figure 2.
Vector (crisp/right) and raster (pixelated/left) models. (Source: https://pygis.io/docs/a_intro.html).

3.2 Structure of spatial data

Spatial objects, specifically vectors are categorized as points, lines/poly-lines and Polygons, and as the building block is the point, once you read the new object you can view it by seeing the first couple of lines by applying these codes:

# this code will allow you to see the first five lines of the data and having good idea about its attributes.

head(new object@ ‘data sorucese, n = 5).

head(new object@polygons[[1]]@Polygons[[1]]@coords, 3).

4. Importance of R in data science and statistical analysis

4.1 R ecosystem

R is an open-source software and interpreted programming general-purpose language. R has expanded from a specific statistical language to a larger approach for data science and beyond the information technology and communication (ITC) industry [10]. The R programming language has emerged as the de facto data science programming language. Its adaptability, power, maturity, and expressiveness have made it an indispensable tool for data scientists worldwide [11]. Some of the several popular applications of R include, but are not limited to, ML; predictive modeling; data visualization; statistical modeling; scraping; dashboard construction; and interactive websites. Programmers could develop codes efficiently via an integrated development environment (IDE), R or Rstudio. The user will have a variety of R packages provided by the Comprehensive R Archive Network (CRAN) https://cran.r-project.org/mirrors.html.

Figure 3 shows some R essential packages used in data science.

Figure 3.
Essential R packages for data science.

Learning a strong programming language is crucial given the demand for data science, data engineering, and manipulation of big data. Python, R, SQL, and JavaScript are some of the most widely used programming languages for data research. R has not been as popular in recent years as Python, but it still attracts a lot of data scientists in academia since it excels at data wrangling, visualization, and statistical computing.

4.2 R spatial ecosystem

Spatial data conceptualization has been a part of R from its inception. The development of spatial packages that can handle and manipulate both raster and vector data is a continuous and progressive process [12]. For spatial data entry and analysis, numerous R packages have been released (Figure 4). sp. and sf, among others, are used to manage basic features. For manipulating spatial data, rgeos and rgdal are the most common. The R-based Geospatial Data Abstraction Library (rgdal) is an extension of the Geospatial Data Abstraction Library (GDAL). For processing spatiotemporal data, the Spacetime library was established, and the Raster package was introduced for handling raster data.

Figure 4.
Essential R packages for spatial data analysis.

4.3 Familiarity with spatial data in R

The user must have an R package, which could be downloaded from:

http://cran.r-project.org/, make sure to use the most stable version, not necessarily the latest version. It is preferable to use an R editor like RStudio, or any alternative editor that you think will work for you. Posit Cloud provides a cloud-based environment for R (https://login.rstudio.cloud).

Despite the fact that there are thousands of packages, for the purpose of spatial data manipulation the user may start with these packages: ggmap; rgdal; rgeos; tidyr; dplyr; maptool; and tmap. To understand the functionalities of each package, use the help command in R or Rstudio to get the details. You can simply install all the packages by using the concatenate function and you can upload all the packages as well by using the lapply function in R.

# Assign a variable and concatenate the packages to install them.

x < − c(‘rgdal’,'rgeos’,'maptools’,'ggmap’,'tidyr’,'dplyr’,'tmap’,'ggmap’).

# load all the needed packages once.

lapply(x,library,character.only = TRUE).

4.4 Manipulating spatial data with R

One of the most important features of R is the ability to handle spatial data, for instance reading shapefiles. The library (rgdal), as mentioned previously, as the R extension for the GDAL library will allow users to manipulate spatial data, you could read it by using readOGR function as per the example in the code below:

# loading the rgdal libaray for manipulating spatial data.

library(rgdal).

new object < − readOGR(dsn = ‘data source name’, layer = ‘the layer’).

The first line of code is for calling the rgdal library, while the second line of code is used to load the shapefile while assigned to a new object named “new object” in which the readOGR function is the function to load the file with both the data source name and the layer names identified. You can try it using your own data. Make sure to set up the working directory for R properly to read the data without complications. The head function is usually used to display the first lines, the default output is the first six lines, but in the above code are only the first three lines as identified by the number 3.

5. Cloud computing

Cloud computing attracted users and enterprises as it simplifies ITC services because storage and operating systems do not need to be installed or maintained locally, a third party is taking care of that at a low cost. Cloud computing refers to the on-demand, pay-as-you-go availability of computer system resources, particularly data storage and processing power, without direct active control by the user [13, 14]. In Cloud computing, there are three service models: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS) [14, 15]. IaaS is regarded as the foundation of the could service, PaaS is the platform for constructing applications, and SaaS is where the providers manage every part of the products and clients are only concerned with accessing the software at their convenience.

5.1 Spatial cloud computing

Spatial cloud computing is a cloud computing paradigm based on geospatial sciences and spatiotemporal principles [16]. There are four service models for spatial cloud computing [17], which are: (1) Data as a Service (Daas), (2) Model as a Service (MaaS), (3) Geoprocessing as a Service (Gaas), and (4) Workflow as a Service (WaaS). The focus of this chapter is considered as PaaS, as it serves as a platform for developers to create and run scalable applications on the Google Cloud platform.

6. Google Earth Engine platform

GEE is a cloud-based platform to analyze planetary-scale environmental data, that utilizes Google’s multi-Petabyte data catalog [18], GEE is categorized as PaaS and used by scientists, researchers, and developers to identify changes, plot trends, and measure variability on planetary surfaces. The GEE platform is an all-inclusive development and deployment environment for spatial cloud computing that falls under the PaaS category. The main components of GEE are: (i) Datasets provided by different agencies, mainly NASA, USGS and the European Space Agency; (ii) Computer power provided by Google Inc.; (iii) REST APIs/client libraries for making request to the EE; and (iv) The code editor as an IDE for developers using JS API.

6.1 GEE data catalog

Earth engine as illustrated in Figure 5, is composed of an open data catalog, computational infrastructure, geospatial APIs and an interactive application server [5]. Utilizing the petabytes of spatial data archives provided by US government agencies, the European Space Agency, and many others is now possible as a result of the introduction of GEE as a spatial cloud computing platform. Below are some descriptions of the available datasets provided by GEE.

Figure 5.
Earth engine components. (Source: earthengine.google).

6.2 Landsat data

The Landsat Program is the most extensive and ongoing space-based record of Earth’s land in existence [19]. The Landsat satellites records offer broad spectral data extending from visible light to short-wave infrared, as well as covering a long time period (1972-present). The majority of the data’s spatial resolution is in the 30 m range, although it can vary depending on the Landsat series and bands that are chosen. Landsat data is ideal for analyzing changes in land use and land cover as it offers both rich data and a large time span for tracking changes on the surface of the globe.

6.3 MODIS data

MODIS (or Moderate Resolution Imaging Spectroradiometer), Terra and Aqua satellites view the entire earth every 1−2 days [20]. The first MODIS was launched in 1999 and acquires the data in 36 bands, with spatial resolution ranges from 250 to 1000 m and spectral bands ranging from 0.4 to 14.4 μm. MODIS excels at observing significant biosphere changes (“MODIS Web,” n.d.). It is perfect for meteorological and climate change activities.

6.4 Sentinel data

The Sentinel was launched in 2014. The Sentinel missions carry different sensors such as radar and multi-spectral imaging instruments for land, ocean and atmospheric monitoring [21]. The sensors are used for depicting land surface changes, inventory of land resources and depicting pollution on land, water and atmosphere.

7. Interacting with Google Earth Engine

The user needs to interact with GEE via specific APIs to download, process, analyze, and visualize Google Earth Catalog data. There are two native APIs that users usually use to interact with GEE which are JavaScript (JS) and Python APIs. In the following sections, we will explore the rgee package, an Earth Engine client library for R [22], which enables R users with specific functions to call GEE through the Python API.

API stands for API, it is a contract that would allow two or more computer programs to communicate with each other [23] or simply integrate different software systems [24]. GEE could be accessed easily via two native APIs, which are the JavaScript and the Python APIs. In this chapter, we will dig deep in exploring the potentiality of rgee package that integrates R and GEE via an intermediate package called Reticulate. You can think of a reticulate package as a bridge that connects R and GEE via the Python API.

JavaScript is one of the native GEE APIs and is an API that enables GIS developers to interact with the GEE platform via the Code Editor http://code.earthengine.google.com. The code editor is the web-based IDE for EE JS. The requirement to access the code editor is to register for Earth Engine https://earthengine.google.com/ and to have at least an intermediate familiarity and experience with JavaScript coding. There are many learning platforms that beginners can start with in learning JS programming, but https://www.w3schools.com/js/default.asp and https://developers.google.com/earth-engine/guides/getstarted are good websites to start with.

Similarly, Python API is the other native GEE API that enables GIS developers to interact with the GEE platform, but via Earth Engine(ee) package and Google Collab platform (https://colab.research.google.com/notebooks/) or Jupyter Notebook (https://jupyter.org/).

To start the communication with GEE you need to import the ee library and trigger the authentication workflow ee.Authenticate(), and initialize the ee library ee.Initialize(). There are many learning websites, but a good start could be the one provided by Google: https://developers.google.com/earth-engine/tutorials/community/intro-to-python-api.

7.1 RGEE package bridging Earth Engine and Python API

The rgee package is a non-native R client library that employs the reticulate package of tools for Python-R interoperability [24] as cited in [18]. The rgee package enables users to combine the advantages of the R spatial ecosystem and GEE in a single workflow [18]. Figure 6 explains how the R ecosystem could be integrated with GEE via the reticulate package and making use of google cloud utilities and assets.

Figure 6.
The integration between the R ecosystem and GEE via the reticulate package.

8. Getting access to Google Earth Engine

To get access to GEE, sign up the form in: https://signup.earthengine.google.com, you will receive an explanatory email on how to get started. Or you have another option to try the code editor, once opening the code editor will receive a welcoming message (Figure 7), sign up and complete the registration process if needed.

Figure 7.
A screenshot of the GEE welcoming message.

8.1 Trying the code editor

The GEE code editor is a web-based IDE. Figure 8 shows the components of the code editor and for more information and practice you can visit: https://developers.google.com/earth-engine/guides/playground

Figure 8.
The components of the JavaScript code editor. (Source: https://developers.google.com/earth-engine).

The code editor has many features (Figure 7) that can help the user to take advantage of the GEE API. The user can examine, display, create, manage, and exchange data and scripts.

8.2 rgee installation and initialization

The RGEE package’s main purpose is to leverage the workflow between the R ecosystem and GEE, but it provides other functionalities [18] such as input/output design, interactive map display, time series extraction and asset management interface. To run and install RGEE properly Python environment version 3.5 or higher with NumPy and Earth Engine-API packages are required, you can download the Python version that suits your system from: https://www.python.org/downloads/, for more details about) Earth Engine Python API visit: https://developers.google.com/earth-engine/guides/python_install. In addition to python environment the user should have an active Google account, if you do not have one, you can create it from: www.google.com and the user should have access to Earth Engine platform and datasets, either via free non-commercial or commercial access, for more details visit: https://developers.google.com/earth-engine/guides/access. To use Earth Engine-API in the local environment, users should download and install Google Cloud SDK, which can be downloaded from: https://cloud.google.com/sdk/docs/install. To bridge the Python API with the R software, the user must install and upload the reticulate package, followed by installing and uploading remotes packages and finally, installing and uploading the RGEE package which could be downloaded from the GitHub repository (https://github.com/r-spatial/rgee). There are many ways by which you can install and upload the above-mentioned packages, but installation codes are a straightforward way of doing so. Different codes are provided to install the rgee package from the specified GitHub repository. The first step before rgee installation is to make sure that the reticulate package and Python 3.5 or greater are installed. Install the most stable version, not necessarily the latest version, as the stable versions are more mature. It is recommended that you install Python and create your own working environment to avoid any interoperability issues with other running applications. Below are the codes for installing the reticulate package and making sure that Python is installed and testing the environment. Once the reticulate package and pythons are installed you can make some simple operations using some Python libraries, like numpy, scipy, etc.

# install the reticulate package if it is not installed yet.

if (!(“reticulate” %in% installed. Packages()[,1])) {.

print (“Installing package `reticulate`…”)

install.packages(“reticulate”)

}

else {.

print(“The reticulate package is already installed”)

}

library(reticulate).

# Testing python envirnment.

Sys.which(“python”).

# check if Python 3 installed.

Sys.which(“python3”).

# Making sure the python popular libraries are functional with reticulate.

np < − reticulate::import(“numpy”, convert = FALSE).

# do some array manipulations with NumPy.

a < − np$array(c(1:4)).

print(a).

In the above code, the first three lines instruct the program to install the ‘reticulate’ package if not installed yet. The package enables R to extend Earth Engine Python API classes and functions, you could use the library() function to load the reticulate package in your R eco-system. The next couple of lines load the ‘reticulate’ package to the system and test the installed Python package. You can import as many Python packages to your system according to your needs with the same previous code that you used in importing the NumPy package and you can just substitute the name of the packages using the reticulate function. Below are the codes to import the sklearn package.

reticulate::import(“sklearn”).

# Same function could be used to import other libraries from Python.

The remotes package functionality is to allow installation from remote repositories (below code). A confirmation message will be received, you may need to update some needed packages. Once installed and updated, cross-check the installed packages and make sure, the reticulate, R6 and processx packages are all installed, if not, do install them. Once all is done perfectly and after executing the initialization function, R will ask you to enter the verification code (Figure 9). Then a window will open in your web browser to add your GEE credentials details in the Notebook authenticator. Once credentials are provided, click on generate a token. In the R environment add generated token as verification code.

# Install Remotes Packages.

install.packages(“remotes”).

# Load Remotes Package into your R eco-system.

library(“remotes”).

# Install and Load RGEE packages.

remotes::install_github(“r-spatial/rgee”).

library(rgee).

# Install RGEE dependencies packages.

ee_install().

# Make Sure all dependencies installed properly.

ee_check().

# Autherize and Initialize RGEE API.

ee_Initialize().

The first two lines of the code instruct the application to load and install remote packages which will enable you to install RGEE package from GitHub. Once installed it should be uploaded to the system. The ee_install() function from the RGEE package would be used to install the RGEE dependencies and the ee_check() function is used to check that all the RGEE dependencies are installed. Once all is installed and loaded properly, a message will be received to inform you about the Python installations with the available packages installed. The final line of the code above, instructs the system to initialize the RGEE package which will bridge the gap between the R-ecosystem and the Python API to facilitate the process of interacting with GEE using R statistical and visualization power.

9. Case studies

In this section, we are going to walk through three case studies. The first case study is how to visualize Sentinel-2 datasets provided by GEE. It is a straightforward code that will allow the user to read, upload and visualize the Sentinel-2 data.

The second case study is how to apply the Normalized Vegetation Difference Index (NDVI) for the legendary Landsat Data. The NDVI is a commonly used measurement to determine the density and health of plants from remotely sensed data. It is determined using spectrometric data of the red and infrared regions of the electromagnetic radiations. We selected Kassala state in Sudan as it has a history of dry spells and is one of the drought-prone regions in Eastern Sudan.

The third case study is about the implementation of supervised classification of LULC for a large watershed in Sudan, Wadi Soba. The user may need to be aware that the supervised learning needs an appropriate selection of the training areas needed for the classification process, this may need expert knowledge or a thorough desk review to interpret the classes. We also applied the SimpleCart ML algorithm as it provides easily interpretable and visualizable results. The study area of the Soba watershed was chosen as the area witnessed urban expansions in the last decades that may negatively impact the area in the future.

9.1 Case study 1: Calculating NDVI for Sentinel-2 images

This case study will showcase how to use RGEE in uploading and visualizing the Sentinel-2 data (Figure 10). The ee$ImageCollection method is used to add image collection from the GEE repository. The ee$FeatureCollection is used to load vector data, which could be applied to clip the region of interest (ROI), which is part of Kassala State, Sudan in our case, it is an administrative boundary. The following code illustrates the process of uploading the sentinel image for Kassala State in Sudan using the RGEE library and GEE. But the user can substitute the data we used with the data that suits his/her purpose.

Figure 10.
Adding a Sentinel-2 image using the rgee package for Kassala State, Eastern Sudan.

# load the RGEE API.

library(rgee).

# 2 Initialize RGEE API.

ee_Initialize().

# 3 Load study area Kassala admin unit will be used as study area.

study_area < − ee$FeatureCollection(“FAO/GAUL/2015/level2”)$filter(ee$Filter$eq(“ADM2_NAME”, “Kassala”)).

# 4 Load Sentinel 2 Image.

sentinel2_msi < − ee$ImageCollection(“COPERNICUS/S2”)$filterBounds(study_area)$mean().

# 5 Define visualization parameters in an object literal.

vizParams <− list(.

bands = c(“B5”, “B4”, “B3”),

min = 0, max = 3000).

# 6 set study area as center of Map.

Map$centerObject(sentinel2_msi$clip(study_area)).

# 7 Add Sentinel Image to the Map.

Map$addLayer(sentinel2_msi$clip(study_area), vizParams, “S2”).

The RGEE package in the above code is loaded into the R ecosystem in the first line using the library() function, and it is initialized in the second line using the ee_initialize() function. Line 3 uses the filter() function to select the Kassala area in eastern Sudan as the ROI while the ee$FeatureCollection() function loads level 2 administrative boundaries from the Earth Engine repository. In line 4, the ee$ImageCollection() class is used to load the sentinel 2 image, the filterBounds() function is used to identify image boundaries, while the mean() function is used to add mean value from available images in the specified study area. In line 5, object literal is used to define visualization parameters, bands property is used to select certain bands from the image, and min and max functions are used to specify the minimum and maximum values of pixels. In line 6, the centerObject() function is used to adjust the image to be at the center of the scene. Finally, in line 7, the addLayer() function is used to add an image to the scene.

In the second illustration, the NDVI calculation procedure is shown for a region in Kassala State, Sudan (Figure 11). The NDVI uses band math by correlating the red and infrared wavelengths of electromagnetic radiation to assess the healthiness of the vegetation cover in an area. The code below shows case how the NDVI could be calculated using GEE and RGEE interface.

Figure 11.
Adding the NDVI layer to the map.as of September 2022.

# 1 load the R GEE API.

library(rgee).

# 2 Initialize RGEE API.

ee_Initialize().

# 3 Load Landsat 8 Image.

lc_image <− ee$Image(“LANDSAT/LC08/C02/T1/LC08_171049_20220910”).

# 4 setting visualization parameters.

vizParams <− list(.

min = 0,

max = 0.5,

bands = c(“B5”, “B4”, “B3”).

)

# 5 Setting map center and Zoom Level.

Map$setCenter(36.1899, 15.5010, 10).

# 6 Add Image to the Map.

Map$addLayer(lc_image, vizParams).

# 7 Simple Single-Band Vizualization.

ndvi <− lc_image$normalizedDifference(c(“B4”, “B5”)).

# 8 Setting Visualization parameters.

ndviViz <− list(.

min = 0.5,max = 1,

palette = c(“00FFFF”, “0000FF”)).

# 9 Masking.

ndviMasked <− ndvi$updateMask(ndvi$gte(0.4)).

# 10 Create Region of Interset Geometry.

roi < − ee$Geometry$Point(c(36.1899, 15.5010))$buffer(20000).

# 11 Set Map center to region of intersets.

Map$centerObject(lc_image$clip(roi)).

# 12 Using region of interest to Clip and Add Landsat image to the Map.

Map$addLayer(.

eeObject = lc_image$clip(roi),

visParams = vizParams,

name = “Landsat 8”.

)

# 15 Using region of interest to Clip and Add NDVI to the Map.

Map$addLayer(.

eeObject = ndviMasked$clip(roi),

visParams = ndviViz,

name = “NDVI”

)

The above code explains, how RGEE package is uploaded to the R-ecosystem in the first line, while the third line explains the way the Landsat 8 image is uploaded. Lines 4 and 5 explain the process of adding the visualization parameters and how to center the image to the ROI in this case the circle selected from Kassala State. In line 7, the normalizedDifference() function is used to calculate NDVI, Band 4, represents the red wavelength from the visible light and Band 5, represents the infra-red portion of the electromagnetic radiation. Lines 8 and 9 indicate the codes for visualizing, masking and displaying NDVIs values more than 0.4, while buffered to the ROI in line 10. The rest of the code explains the adjustments to center the map and zooming to the ROI. The final portion of the code, below is to show how to add the NDVI layer to GEE interface using the RGEE package.

# 16 Using region of interest to Clip and Add NDVI Mask to the Map.

Map$addLayer(.

eeObject = ndviMasked$clip(roi),

visParams = ndviViz,

name = “NDVI”

)

10. Supervised classification

The principle behind supervised classification is that a user can choose sample pixels from an image (training areas) that are representative of particular classes. These sample pixels can then be used to find comparable pixels with the same spectral signatures and classify them into similar classes. To ensure that the categorization process produces the desired results, the training regions must be carefully chosen. Since the process of choosing the training region is widely described in the literature on remote sensing, we did not concentrate too much on it; rather, the focus of this chapter is on how we could run the algorithm using the RGEE package.

In addition, the reader will learn how to utilize the SmileCart Algorithm as a form of ML through the provided code below. The code has been implemented in the Wadi Soba Watershed of Khartoum State, Sudan. However, it can also be applied by the reader in other areas. The first steps for adding images, visualizing them and clipping them to the desired regions of interest are nearly identical to the previous codes. When it comes to ML, dividing data into training and testing sets is key. The training set is used to predict classes, while the testing set is used to validate results and determine overall classification accuracy. The output of the code is shown in Figure 12.

Figure 12.
Showing the LULC classes for Wadi Soba.

# Load GREE.

library(rgee).

# Initialize RGEE.

ee_Initialize().

# Add Region of Interest to the project and # You may need to store your datasets in your own EE assets.

roi < − ee$FeatureCollection(‘projects/ee-anwareltayeb2/assets/WADISOBA_WS’).

# Add Classes and # You may need to store your datasets in your own EE assets.

Water <− ee$FeatureCollection(‘projects/ee-mohammedaau/assets/soba_water’).

Agric <− ee$FeatureCollection(‘projects/ee-mohammedaau/assets/soba_agric’).

Urban <− ee$FeatureCollection(‘projects/ee-mohammedaau/assets/soba_urban’).

Barren <− ee$FeatureCollection(‘projects/ee-mohammedaau/assets/soba_burren’).

Rangeland <− ee$FeatureCollection(‘projects/ee-mohammedaau/assets/soba_rangeland’).

# Add Image.

image <− ee$ImageCollection(“LANDSAT/LT05/C02/T1_TOA”)$filterDate(‘1984-2001-01’, ‘1984-2012-31’)$filter(ee$Filter$bounds(roi))$sort(‘CLOUD_COVER’)$first().

Soba84 < − image$select(c(‘B4’, ‘B3’, ‘B2’)).

Soba84vis < − list(min = 0.0, max = 0.4, gamma = 1.2).

Map$addLayer(image$clip(roi), Soba84vis,'SobaTM84False’).

Map$centerObject(roi,10).

label <− ‘Class’.

bands <− c(‘B1’,'B2’,'B3’,'B4’,'B5’,'B6’,'B7’).

input <− image$select(bands).

training <− Water$merge(Agric)$merge(Urban)$merge(Barren)$merge(Rangeland).

trainImage <− input$sampleRegions(training).

trainingData <− trainImage$randomColumn().

trainSet <− trainingData$filter(ee$Filter$lessThan(‘random’,0.8)).

testSet <− trainingData$filter(ee$Filter$greaterThanOrEquals(‘random’,0.8)).

classifier <− ee$Classifier$smileCart()$train(trainSet,label,bands).

classified <− input$classify(classifier).

landcoverPalette <− c(.

“#2c7bbc”, # Water.

“#4dac26”, # Agric.

“#d7191c”, # Urban.

“#ffffbc”, # Barren.

“#b8e186” # Rangeland).

Map$addLayer(classified$clip(roi),list(palette = landcoverPalette,min = 0, max = 5),“classification84”).

11. Conclusion

GEE, opened a new era in satellite image processing, analysis and visualization, as it is an open-source spatial cloud computing platform, which is freely available for users and researchers to access its resources via its native APIs, which are JavaScript and Python APIs. In this chapter, the rgee package was introduced as a different way to interface and utilize the extensive statistical components, and visualization capabilities of R software. The reader will be introduced to the Normalized Vegetation Difference Index (NDVI), one of the most biophysical indices in monitoring vegetation cover and plants’ health over time. In addition to the fundamentals of spatial cloud computing, the reader will also be exposed to the practical side of installing, initializing and running GEE under R. A very good illustration of supervised classification utilizing the rgee package and ML methods is also provided.

References

1. Liang S, Li X, Wang J. Advanced Remote Sensing: Terrestrial Information Extraction and Applications. Cambridge, MA, USA: Academic Press; 2012
2. Longley P. Geographic Information Systems and Science. 2nd ed. Hoboken, NJ, USA: John Wiley & Sons; 2005
3. Fotheringham AS, Rogerson PA. The SAGE Handbook of Spatial Analysis. London, UK: SAGE Publications Ltd.; 2008
4. Martin S. An Introduction to Ocean Remote Sensing. Cambridge, U.K: Cambridge University Press; 2004
5. Gorelick N, Hancher M, Dixon M, Ilyushchenko S, Thau D, Moore R. Google earth engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment Journal. 2017;202:18-27. ISSN: 0034-42572017
6. Weng Q. An Introduction to Contemporary Remote Sensing. New York, USA: McGraw Hill Professional; 2011
7. Campbell JB, Wynne RH. Introduction to Remote Sensing. 5th ed. New York, USA: Guilford Press; 2011
8. Aronoff S. Remote Sensing for GIS Managers. Redlands, California, USA: ESRI Press; 2005
9. Chandra AM. Higher Surveying. Delhi, India: New Age International; 2005
10. Peng RD. R Programming for Data Science. Lulu.com. 2016
11. Bivand RS. Progress in the R ecosystem for representing and handling spatial data. Journal of Geographical Systems. 2021;23:515-546. DOI: 10.1007/s10109-020-00336-0
12. Ren K. Learning R Programming. Birmingham – Mumbai: Packt Publishing Ltd.; 2016
13. Huang Q. Spatial cloud computing. In: Wilson JP, editor. The Geographic Information Science & Technology Body of Knowledge. 2nd Quarter 2020 ed. New York, USA; 2020. DOI: 10.22224/gistbok/2020.2.7
14. Marinescu DC. Cloud Computing: Theory and Practice. Waltham, MA, USA: Elsevier; 2013
15. Yang C, Huang Q. Spatial Cloud Computing: A Practical Approach. Boca Raton, Florida, USA: CRC Press; 2013
16. Yang C, Goodchild M, Huang Q, Nebert D, Raskin R, Xu Y, et al. Spatial cloud computing: how can the geospatial sciences use and help shape cloud computing? International Journal of Digital Earth. 2011;4:305-329
17. Jaatun MG, Zhao G, Rong C. Cloud computing. In: First International Conference, CloudCom 2009, Beijing, China, December 1-4, 2009, Proceedings, vol. 5931. Springer, 2009
18. Aybar et al. rgee: An R package for interacting with Google Earth Engine. Journal of Open Source Software 2020;5(51):2272. DOI: 10.21105/joss.02272
19. Landsat Science [WWW Document], n.d. Available from: https://landsat.gsfc.nasa.gov/ [Accessed: 05 April 2023]
20. MODIS Web [WWW Document], n.d. Available from: https://modis.gsfc.nasa.gov/about/ [Accessed: 05 April 2023]
21. The Sentinel missions [WWW Document], n.d. Available from: https://www.esa.int/Applications/Observing_the_Earth/Copernicus/The_Sentinel_missions [Accessed: 05 April 2023]
22. Jacobson D, Brail G, Woods D. APIs: A Strategy Guide. Sebastopol, CA, USA: O’Reilly Media, Inc.; 2012
23. Biehl M. API Architecture. Zürich, Switzerland: Create Space Independent Publishing Platform; API-University Press; 2015
24. Ushey K, Allaire J, Tang Y. reticulate: Interface to ‘python’. 2020. Retrieved from: https://CRAN.R-project.org/package=reticulate

[1] 1. Liang S, Li X, Wang J. Advanced Remote Sensing: Terrestrial Information Extraction and Applications. Cambridge, MA, USA: Academic Press; 2012

[2] 2. Longley P. Geographic Information Systems and Science. 2nd ed. Hoboken, NJ, USA: John Wiley & Sons; 2005

[3] 3. Fotheringham AS, Rogerson PA. The SAGE Handbook of Spatial Analysis. London, UK: SAGE Publications Ltd.; 2008

[4] 4. Martin S. An Introduction to Ocean Remote Sensing. Cambridge, U.K: Cambridge University Press; 2004

[5] 5. Gorelick N, Hancher M, Dixon M, Ilyushchenko S, Thau D, Moore R. Google earth engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment Journal. 2017;202:18-27. ISSN: 0034-42572017

[6] 6. Weng Q. An Introduction to Contemporary Remote Sensing. New York, USA: McGraw Hill Professional; 2011

[7] 7. Campbell JB, Wynne RH. Introduction to Remote Sensing. 5th ed. New York, USA: Guilford Press; 2011

[8] 8. Aronoff S. Remote Sensing for GIS Managers. Redlands, California, USA: ESRI Press; 2005

[9] 9. Chandra AM. Higher Surveying. Delhi, India: New Age International; 2005

[10] 10. Peng RD. R Programming for Data Science. Lulu.com. 2016

[11] 11. Bivand RS. Progress in the R ecosystem for representing and handling spatial data. Journal of Geographical Systems. 2021;23:515-546. DOI: 10.1007/s10109-020-00336-0

[12] 12. Ren K. Learning R Programming. Birmingham – Mumbai: Packt Publishing Ltd.; 2016

[13] 13. Huang Q. Spatial cloud computing. In: Wilson JP, editor. The Geographic Information Science & Technology Body of Knowledge. 2nd Quarter 2020 ed. New York, USA; 2020. DOI: 10.22224/gistbok/2020.2.7

[14] 14. Marinescu DC. Cloud Computing: Theory and Practice. Waltham, MA, USA: Elsevier; 2013

[15] 15. Yang C, Huang Q. Spatial Cloud Computing: A Practical Approach. Boca Raton, Florida, USA: CRC Press; 2013

[16] 16. Yang C, Goodchild M, Huang Q, Nebert D, Raskin R, Xu Y, et al. Spatial cloud computing: how can the geospatial sciences use and help shape cloud computing? International Journal of Digital Earth. 2011;4:305-329

[17] 17. Jaatun MG, Zhao G, Rong C. Cloud computing. In: First International Conference, CloudCom 2009, Beijing, China, December 1-4, 2009, Proceedings, vol. 5931. Springer, 2009

[18] 18. Aybar et al. rgee: An R package for interacting with Google Earth Engine. Journal of Open Source Software 2020;5(51):2272. DOI: 10.21105/joss.02272

[19] 19. Landsat Science [WWW Document], n.d. Available from: https://landsat.gsfc.nasa.gov/ [Accessed: 05 April 2023]

[20] 20. MODIS Web [WWW Document], n.d. Available from: https://modis.gsfc.nasa.gov/about/ [Accessed: 05 April 2023]

[21] 21. The Sentinel missions [WWW Document], n.d. Available from: https://www.esa.int/Applications/Observing_the_Earth/Copernicus/The_Sentinel_missions [Accessed: 05 April 2023]

[22] 22. Jacobson D, Brail G, Woods D. APIs: A Strategy Guide. Sebastopol, CA, USA: O’Reilly Media, Inc.; 2012

[23] 23. Biehl M. API Architecture. Zürich, Switzerland: Create Space Independent Publishing Platform; API-University Press; 2015

[24] 24. Ushey K, Allaire J, Tang Y. reticulate: Interface to ‘python’. 2020. Retrieved from: https://CRAN.R-project.org/package=reticulate

Spatial Cloud Computing Using Google Earth Engine and R Packages

Geographic Information Systems - Data Science Approach

Abstract

Keywords

Author Information

Anwarelsadat Eltayeb Elmahal

Mohammed Mahmoud Ibrahim Musa*

1. Introduction

2. Principles and concepts

2.1 Remote sensing principles

Figure 1.

2.2 Analysis and visualization of satellite data

2.3 Image classification and machine learning approaches

2.4 Normalized Difference Vegetation Index (NDVI)

3. Specificity of spatial data

3.1 Understanding the nature of spatial data

Figure 2.

3.2 Structure of spatial data

4. Importance of R in data science and statistical analysis

4.1 R ecosystem

Figure 3.

4.2 R spatial ecosystem

Figure 4.

4.3 Familiarity with spatial data in R

4.4 Manipulating spatial data with R

5. Cloud computing

5.1 Spatial cloud computing

6. Google Earth Engine platform

6.1 GEE data catalog

Figure 5.

6.2 Landsat data

6.3 MODIS data

6.4 Sentinel data

7. Interacting with Google Earth Engine

7.1 RGEE package bridging Earth Engine and Python API

Figure 6.

8. Getting access to Google Earth Engine

Figure 7.

8.1 Trying the code editor

Figure 8.

8.2 rgee installation and initialization

Figure 9.

9. Case studies

9.1 Case study 1: Calculating NDVI for Sentinel-2 images

Figure 10.

Figure 11.

10. Supervised classification

Figure 12.

11. Conclusion

References

Continue reading from the same book

Geographic Information Systems