CGS-SIMMA landslide frequency distribution.
This landslide detection research applied remote sensing techniques. Morphometry to derive both DEM terrain parameters and land use variables. SAR interferometry (InSAR) for showing that InSAR coherence and InSAR displacement obtained with SRTM DEM 30 m resolution were strongly related to landslides. InSAR coherence values from 0.43 to 0.66 had a high association with landslides. PS-InSAR allowed to estimate terrain velocities in the satellite line-of-sight (LOS) in the range − 10 to 10 mm/year concerning extremely slow landslide displacement rates. SAR polarimetry (PolSAR) was used over L-band UAVSAR quad-pol data, obtaining the scattering mechanism of volume and surface retrodispersion more associated with landslides. The optical remote sensing with a multitemporal approach for change detection by multi-year Landsat (5, 7 and 8)-NDVI, showed that NDVI related to landslides had values between 0.42 and 0.72. All the information was combined into a multidimensional grid product and crossed with training data containing a Colombian Geologic Service (CGS) landslide inventory. A detection model was implemented using the Random Forest supervised method relating the training sample of landslides with multidimensional explanatory variables. A test sample with a proportion of 70:30 allowed to find the accuracy of detection of about 70.8% for slides type.
- landslide detection
- Random Forest
This research aimed to design a model for the detection of landslides at the regional scale using Earth Observation data and remote sensing techniques. The following techniques were used to achieve this goal.
Landslide inventory (LI), for example the CGS-SIMMA2 as the base for the evaluation of the multivariate data generated by new technologies. It provides information about the vector representation such as point objects, date, municipality, mass distribution, type, sub-type, material and origin contributing to landslide detection .
At regional scale, landslide detection and landslide distribution analysis allows to analyse the distribution and classification of landslides . These analysis use univariate and multivariate statistical methods, obtaining weights by the correlation between landslides occurrences and conditioning factors. Weight of Evidence (WofE) method is a bivariate approach where the landslides inventory is used to calculate weights of conditioning factors in order to delineate potential areas of landslides. Logistic regression (LR) is one of the statistical methods more used to evaluate the relationship between landslides and related factors .
Remote sensing techniques have been used for landslides detection . One of them is morphometry based on Digital Elevation Models (DEMs), for a quantitative analysis of the land surface topography. In this way, the geomorphometry analysis derives land surface parameters such as slope, aspect, curvature or basic local descriptors, regional parameters as catchment area, parameters connected with hydrology like topographic wetness index and so on . Terrain parameters can be related to landslides to build detection models . Change detection technique  based on the sudden disappearance of vegetation by NDVI difference computation can serve to landslide detection. Also, SAR-based techniques in rural areas can help to landslide detection .
Interferometric spaceborne radar to measure linear deformation rates has been implemented on several studies to landslides detection . PS-InSAR (persistent scatterers) allows measurements with millimetre accuracy of individual features. PS-InSAR is differential measurements with respect to a reference point that is assumed to be stable . PolSAR imagery allows to characterise objects on the ground based on that different structures and geometries show different backscatter values at different SAR polarisations. Quad polarimetric SAR data content information to landslide detection in forested areas under the assumption that the dominant mechanism is surface scattering with high homogeneity .
This way Earth Observation (EO) data encompasses sensors like SAR, optical images and GPS on board of platforms either satellite-based, aircraft-based or ground-based provide high spatial, temporal, and spectral resolution to geohazard studies . The above combined with machine learning (ML) techniques like Random Forest, allow the mapping, monitoring and modelling of landslides occurrences.
2. Landslide inventory and earth observation data
In this section we describe the study area, the landslide inventory, taken the from the Colombian Geologic Service (CGS), and the Earth Observation data.
2.1 Study area
The study area is located at the southwest of Colombia and covers the inter and central Andes Mountains in a rectangle within the following WGS84 system coordinates: 0206′53.50″ North, 7648′49.71″ West and 0234′27.81″ North, 7624′32.25″ West. In here, we found elevations between 848 and 4932 m.a.s.l. The area covers the inter-Andean valleys of Cauca river and the central mountain range of the Andes in the southwestern of Colombia. Former is to comprise Tertiary and Quaternary formations and above there are volcanic ash depositions. The Central Mountain Range of the Andes in Cauca state, have quaternary deposits at the summit and its western slope it is found diabase rock. Figure 1 shows the study area on Alos Pasar elevation data as background and the sheets of the topo-map provided by the National Geographical Institute ‘Agustin Codazzi’ (IGAC).
2.2 Landslide inventory
The CGS-SIMMA geo-service allowed to build the landslide inventory database for training the detection model. Landslide database contains an inventory of geomorphologic and catalogue type. The former contained more precise information in its thematic attributes. The overall distribution of SIMMA landslide inventory was 77.4% of slide type, 16.5% of fall type, and 6.1% of flow and creep type for a total of 230 registered events (Table 1). The spatial information contained only allowed mapping in a shapefile of point type.
2.3 Earth observation data
Table 2 summarise the EO data used in this research. EO data corresponds to DEMs, optical remote sensing and radar remote sensing. DEMs provided from SRTM DEM 30 m resolution and Palsar RTC elevation data at 12.5 m. Radar data had as source the spaceborne-based ESA-Copernicus Sentinel-1 satellites and the aerial-based UAVSAR platform. Optical data was provided by time series analysis of the multi-year Landsat 5, Landsat 7, and Landsat 8 NDVI surface reflectance images.
The main remote sensing techniques over Earth Observation data to extract features and conditioning factors related to landslides are listed in Table 3.
|Approach||Variable or method||Software||Software type|
|Morphometry||Land surface parameters||Demanal/SAGA/R/ArcSDM1||Open-source|
|Spaceborne-based InSAR||InSAR measurements (coherence and displacement)||SNAP toolbox/SARProZ2||Open-source and Commercial|
|Multi-InSAR||Deformation velocities of persistent scatterers||SARProZ3||Commercial|
|Aerial-based PolSAR||Surface, volume, and double-bounce scattering mechanism||PolSARpro_v5.04||Open-source|
|Optical remote sensing||NDVI vegetation indices||Google Earth Engine5||Open-source|
|Binary model||WofE analysis and Logistic regression||ArcSDM6||Open-source|
|Multidimensional data fusion||Random Forest supervised method||QGIS/ArcCatalog/R software7||Open-source and commercial|
Figure 2 shows the functional representation of the EO data indicating the entire flow processing needed to develop the detection model from multi-dimensional data, data training and test of landslides. Also, the scheme shown in the Figure 3 indicates the fusion of all geospatial information by the Random Forest method.
The CGS-SIMMA landslide inventory was split into training and test subset in a proportion of 70:30 in concordance with the study made by Huang and Zhao  in order to determine the accuracy of each remote sensing method applied and the detection model generated in this research.
The results of remote sensing techniques implemented in the study area are presented in this section.
4.1 Land surface parameters (LSP) related to landslides
The DEMs Palsar RTC elevation data, SRTM DEM 30 m and 90 resolution, and ASTER-GDEM were evaluated in relation to GPS control points and a reference Topo DEM obtained by interpolation of contours at a scale of 1:25 K. The best in terms of vertical root-mean-square-error (RMSE) was SRTM 30 m resolution, and its accuracy at a 95% confidence level (7.8 m) corresponded to a better scale than 1:25 K. ALOSP RTC elevation data was the second best DEM. For this reason we derived from the latter the land surface parameters (LSP) at 12.5 m resolution using algorithms implemented on SAGA software. Table 4 shows the results of a vertical accuracy assessment of the global DEMs used in this research.
|Metric||ALOSP||ASTER||SRTM1 (30 m)||SRTM3 (90 m)|
|SZ (without bias) (m)||4.427||5.331||3.937||4.595|
|LE95 (without bias) (m)||8.853||10.236||7.795||9.144|
|SZ=||4.62 + 80.5*tan(slp)||5.60 + 89.8*tan(slp)||3.54 + 109.9*tan(slp)||4.689 + 100*tan(slp)|
The land surface parameters: slope, aspect, curvature , topographic wetness index (TWI) , valley depth (Vdepth) , convergence index (CONVI) , flow path length (FPL), and insolation , were converted into independent components by using Principal Component Analysis (PCA) . These were used as independent variables into a landform detection model and a landslide regression model by WofE methods.
Table 5 shows the results of WofE analysis to relate morphometric and land use conditioning factors with landslide inventory. Only the variable with its class with the most studentised contrast (bigger than 2) C/s(C) is shown. Figure 4 shows the unit soils at a scale of 1:100 k, which covers plain, undulated and mountainous terrains.
|LSP||The range of study area||Class||C/s(C)|
|Slope||0- 79.4||6.6to 19.9||2.1|
|FPL||0–2598 m||0–371 m||2|
|Soil unit||9 classes||Humid hill lands (LQ) and very wet cold mountain (MK)||3.5|
4.2 InSAR measurements
InSAR measures are phase, coherence, and displacement and they are obtained by the cross-correlation between two or more SAR images to process the line-of-sight displacements. This research used C-band data provided by Sentinel/1 ESA’s Copernicus programme. In this investigation, the effect of a DEM on the InSAR processing was determined , concluding that DEM did not have an effect on InSAR coherence but if the InSAR phase is unwrapped with the DEM variable there are significance differences. The above is due to the inaccuracies of external DEM.
Two landslide regression models obtained either with InSAR coherence or InSAR displacement from a DEM variable showed that SRTM DEM 30 m resolution had the highest association with landslides inventory (Table 6). These models had an accuracy of 62% and 68% respectively.
|DEM||InSAR coherence||InSAR displacement|
|SRTM DEM 30 m||2.627||0.853||0.0014**||0.870||0.0003***||1.56e-9 ***|
|SRTM DEM 90 m||−2.674||0.850||0.816||−0.467||0.04*||0.22|
From the point of view of the InSAR coherence measurement, all DEMs are not statistically significant. However, Topo-map showed a weak relation to landslides. Topo-map and SRTM 30 m contributed to a better explanation of the linear regression model. InSAR displacement with a DEM variable indicated that SRTM 30 m had the lowest p-value suggesting a strong association of the elevation with the probability of having a landslide with a positive coefficient. ANOVA verificated that by adding the SRTM 30 m to the regression model significantly reduces the residual deviance.
WofE analysis showed that the maximum studentised contrast for InSAR coherence was in the range of 0.43 to 0.66.
Multi-InSAR processing by PS-InSAR method allowed to estimate the deformation rate in relation to a reference point target which is assumed to be stable. This approach overloads the substantial limitation of InSAR measurements: the spatial and temporal decorrelation and atmospheric distortions due to ionospheric electron density and tropospheric water vapour.
The C-band sensor on-board the Sentinel-1 satellite served as input data to implement PSInSAR processing to estimate the annual linear velocities and the time series of deformations. Table 7 indicates the geometrical characteristics of S1_A of the ESA’ Copernicus used for PS-InSAR in the study area. The perpendicular baseline in all cases was lower than 150 m, which is an adequate value for studies of terrain deformation. The results were a deformation map which consisted in a set of selected points (12 pts./km) with both the information of the estimated LOS velocity (in the range − 10 mm/year to 10 mm/year) and the accumulated displacement. Ordinary Kriging (OK) method allowed to predict the LOS velocity on landslide inventory (Figure 5). A terrain displacement between −4.5 mm/year and 4.8 mm/year in ascending mode, and between −2.7 mm/year and 7.7 mm/year in descending mode was predicted with the satellite LOS velocity, indicating the movement towards (positive values) and away (negative values) the sensor respect to the master radar scene. The prediction variance found was lower than 1.6 (mm/year)in ascending orbit, and lower than 5.63 (mm/year)in descending pass.
|SE||10–2014/09–2015 and 01–2016/09–2016||24||Asc/VV||7 to 144||24–384||34.2|
|NW||10–2014/09–2015 and 01–2016/09–2016||21||Asc/VV||7 to 144||24–384||34.2|
|NE||10–2014/05–2016||21||Des/VV||2 to 117||24–312||34.1|
4.3 PolSAR unsupervised classification
Dual Pol-Sentinel-1 analysis allowed to analyse the sigma nought scattering coefficient with VV and VH polarisation. Copolarization backscattering (−8.5 dB) was higher than cross-polarisation (−14.5 dB) (Figure 6).
Quad Pol-UAVSAR decomposition allowed to define the mechanism of scattering on landslides inventory using entropy/alpha within the Cloude Pottier method. The scattering mechanism dominant in the study area were volume scattering (vegetation) and surface scattering (Table 8). The results indicate that 50% of the landslide have a scattering mechanism of volume and 25.4% of surface type. The WofE method validated that the H-classification of volume and surface scattering were highly related to landslides.
|H-category||Landslide frequency||Relative frequency|
|4-Medium entropy and multiple scattering||12||9.5|
|5-Medium entropy and volume scattering||63||50|
|6-Medium entropy and surface scattering||32||25.4|
|7-Low entropy and multiple scattering||15||11.9|
|8-Low entropy and volume scattering||4||3.2|
4.4 NDVI time series analysis
Time series analysis of the multi-year Landsat NDVI was used as input data for the change detection analysis. In the period 2012 to 2017, the multi-year Landsat NDVI cloud-free yearly composites through Google Earth Engine did not show the statistically significant trends in vegetation. But WofE analysis found that the NDVI range with the highest association to the landslide inventory was between 0.40 and 0.72. Figure 7 shows that only the years 2012, 2014 and 2016 covers, without clouds, the landslide inventory distribution with median values of NDVI.
4.5 Model to the detection of landslides
All of the variables generated (25) in this research by remote sensing techniques were overlapped and cut into a common sub-zone and then combined into a multidimensional image. Here are found the classification variables. Then the effect of classification variables (derived from remote sensing techniques) over a target variable (landslide inventory) was measured by the algorithm of supervised pixel-based classification called Random Forest. Test data in a proportion of 30% of the entire data set allowed to obtain an independent validation. Table 9 show the Random Forest classification with an overall accuracy of 70.8%. The user’s accuracy refers to the correct classification of the type of movement in relation to the referenced one, and the producer’s accuracy refers to the commission or inclusion error. Due to the high frequency of rotational and translational slide, the method was successful, which did not happen with other less frequent types.
|Random Forest classification||Topic||Sw|
|Training model||Rotational slide||66.7%|
|User’s accuracy||Rotational slide||66.7%|
|Translational slide||71.4 $|
|Producer’s accuracy||Rotational slide||80%|
Figure 8 shows the results of the Random Forest classification for the landslides types: debris fall, flow, planar translational, rotational and translational. Rotational and translational slides had a producer’s accuracy of 80% and 91% respectively resulting in omissions errors of 20% and 9% for each one. User’s accuracy for the same type of landslides was of 67% and 71% indicating commission errors of 33% and 29%. The overall accuracy was 70.8%.
Figure 9 shows the Mean Decrease Accuracy implemented in Random Forest. The variables which contributed more to the study were PolSAR, the displacement InSAR, the NDVI and the morphometric variable slope.
This study confirmed that the slope angle is a key classification factor in landslide detection in a similar way reported by Donnarumffia et al. . So as land use is the most influencing factor to the occurrence of landslides .
An analysis of the effect of DEM on InSAR processing to estimate terrain deformations showed that DEM only had a significant impact on InSAR displacement but not on InSAR coherence, such as also is highlighted in Bayer et al. .
Dual polarimetric SAR analysis found that VV-polarisation radar backscatter produces stronger scattering than cross-polarisation (VH) on landslide inventory in the same way as is reported by Ningthoujam et al. .
C/band Sentinel-1 data allowed to measure very slow ground surface displacements with mm precision by PSInSAR method. However, as is indicated in Colesanti et al. , it is necessary to combine data from different sources, i.e. GNSS data, to avoid misinterpretations.
Time series analysis of Landsat NDVI composites with Google Earth Engine , allowed to compare measurements of inter-annual NDVI. However, this research only analysed the period 2012–2016. Thus, the lacking of long-term time series of optical satellites data did not detect trends in vegetation cover changes related to landslides. For this reason, inter-annual NDVI in the period 2012–2016 only was taken as conditioning factor to develop of detection model.
Random Forest (RF) algorithm was applied to classify landslides. Conditioning factors provided by remote sensing techniques were stored as grid cells at 30 m of spatial resolution. RF model for landslide classification needed data to train the model and validate its results. The total training dataset was split with a proportion of 70% of samples used to train models and 30% for validation.
Using the test dataset, we found that the overall classification accuracy of the model was 70.8%. This meant that over 70.8% of the test dataset was correctly identified as either a landslide event or non-landslide event in the same sense as is reported in Taalab et al. . The rank of variables importance, based on the relative contribution to the classification accuracy of the model, in order of importance, were: PolSAR, InSAR displacement, NDVI, backslope landform and InSAR coherence.
By using Remote Sensing techniques at the visible and microwave frequencies of EM waves this research did relate EO measurements with ground physical parameters such as scattering mechanisms, topography, land cover type and surface deformation patterns. All of the above in relationship with landslides inventory of the study area.
This research did implement unsupervised and supervised classification methods. The first to understand the pattern of LSI clustering and the second to classify the LSI with multidimensional variables derived from EO data and RS techniques.
All of the EO data collected and generated by RS techniques during this research was stored in appropriate containers of data.
This research used errors’ theory, ANOVA, TUKEY and cross-validation techniques to determine the internal and external precision of the method generated for landslides detection.
- Sistema de Información de Movimientos en Masa del Servicio Geológico de Colombia; in English: Mass Movement Information System from the Colombian Geologic Service