Data and data sources.
The aim of this study is to produce landslide susceptibility maps of Şavşat district of Artvin Province using machine learning (ML) models and to compare the predictive performances of the models used. Tree-based ensemble learning models, including random forest (RF), gradient boosting machines (GBM), and extreme gradient boosting (XGBoost), were used in the study. A landslide inventory map consisting of 85 landslide polygons was used in the study. The inventory map comprises 32,777 landslide pixels at 30 m resolution. Randomly selected 70% of the landslide pixels were used for training the models and the remaining 30% were used for the validation of the models. In susceptibility analysis, altitude, aspect, curvature, distance to drainage network, distance to faults, distance to roads, land cover, lithology, slope, slope length, and topographic wetness index parameters were used. The validation of the models was conducted using success and prediction rate curves. The validation results showed that the success rates for the GBM, RF, and XGBoost models were 91.6%, 98.4%, and 98.6%, respectively, whereas the prediction rate were 91.4%, 97.9%, and 98.1%, respectively. Therefore, it was concluded that landslide susceptibility map produced with XGBoost model can help decision makers in reducing landslide-associated damages in the study area.
- landslide susceptibility mapping
- machine learning
Natural disasters cause displacement of people, injuries, loss of life, and damage to infrastructure facilities and cultural heritage, which can directly give rise to extreme economic losses. According to the data from Emergencies Database (EM-DAT), managed by the Center for Research on the Epidemiology of Disasters (CRED), 11,755 people died worldwide due to 396 natural disasters that occurred in 2019; 94.9 million people were affected by these disasters and an economic loss of 103 billion dollars was suffered . On the contrary, according to the report prepared by the AON company, which provides insurance and reinsurance brokerage and risk management consultancy services, the damage caused by natural disasters in 2020 is estimated to be 268 billion dollars . In the AON report prepared in 2020, the value of total economic losses caused by natural disasters in the 2010–2019 period was calculated as 2.98 trillion dollars. In the same report, the economic losses in question were reported to be 1.1 trillion dollars higher than that in the 2000–2009 period .
Landslide is generally defined as the downward movement and displacement of the material forming a slope with the effect of gravity . Rabby and Li  stated in their study that landslides are a very common phenomenon and account for 9% of disasters in the world. Landslides, especially those caused by rainfall, are the most damaging natural disasters in mountainous and rugged regions, resulting in loss of life, damage to property, and economic loss . Landslide susceptibility maps are one of the important data needed to identify landslide-hazardous areas and to reduce losses due to landslides [7, 8]. Many different approaches and models have been implemented in the production of landslide susceptibility maps. Merghadi et al.  and Tang et al.  classified the modeling approaches into four categories: the heuristic, physically based, statistical, and machine learning (ML) models. Heuristic and physically based models (also known as deterministic models) have their own characteristics and disadvantages. Heuristic models are highly subjective and rely on experts’ opinions and experience on assigning weightage to landslide-conditioning factors [11, 12, 13, 14]. In this approach, differences in expert opinions or insufficient information about the study area may cause inconsistent results . Physically based or deterministic models use laws of mechanics to analyze slope stability. The advantages of these models are that they do not require long-term landslide inventory data and are more useful in areas where landslide inventories are missing . However, deterministic models are suitable for small areas where landslide types are simple and ground conditions are fairly uniform , but they require detailed geotechnical and hydrogeological data on these areas . To overcome the disadvantages of the above two approaches and to produce reliable landslide susceptibility maps, statistics-based models have been developed . Statistics-based models evaluate the correlation between past landslides and the conditioning factors that had an impact on their occurrence  and they need landslide inventory data for this .
In recent years, machine learning (ML) techniques such as support vector machine [18, 19], decision tree [20, 21], generalized linear model [22, 23], logistic model tree [13, 16], artificial neural networks [6, 24, 25], and Naïve Bayes [26, 27, 28] have been widely applied for landslide susceptibility mapping (LSM). Sahin  and Merghadi et al.  stated that tree-based ensemble algorithms provide better prediction performance for LSM compared to any single model. In addition, Sahin  stated that ensemble learning techniques, such as random forest (RF), gradient boosting machine (GBM), and extreme gradient boosting (XGBoost), are efficient and robust for creating landslide susceptibility maps and that these algorithms would be preferred more frequently in the future for their robustness.
The most common natural disasters in Turkey are landslides and floods. Artvin is one of the provinces in Turkey that experiences the most frequent natural disasters. Landslides occur almost every year in the province of Artvin, especially due to meteorological conditions (extreme rainfall) and anthropogenic activities, such as agricultural activities, excessive irrigation, and road excavations. Şavşat is one of the districts of Artvin where landslides are most common. Şavşat, a Cittaslow city, stands out with its historical and natural beauties and has a high tourism potential. For this reason, it is very important to evaluate the landslide susceptibility to reduce the landslide-associated damages in the district. The aim of this study is to produce landslide susceptibility maps of Şavşat district of Artvin Province using RF, GBM, and XGBoost ML models and to evaluate the performances of the models. Eleven factors commonly used in LSM studies were used in the study. The produced landslide susceptibility maps were validated using the validation dataset.
2. Study area and data used
Şavşat, like other districts of Artvin, is a district with a rugged terrain. Şavşat, spreading on a 1272.27 km2 land, is located between 41°05′11″ and 41°30′56″ north latitudes and 42°04′30″ and 42°35′47″ east longitudes (Figure 1). In the study area, the altitude varies between 590 and 3005 m with the average altitude being 1789.14 m. The average slope of the study area is 21.17°, whereas the maximum slope is 72.53°. The slope is over 20° in ~55% of the study area.
According to the data from the Turkish Statistical Institute (TURKSTAT), the total population of Şavşat district in 2020 is 17,024. Of this population, 6,123 live in the town and 10,901 live in villages . There is a transitional climate between the Black Sea climate and the continental climate in the district. While semi-humid climatic conditions are observed in the low valley floors, cold humid climatic conditions are observed in the higher elevations. In addition, winters are very long in places with high altitudes. According to the data (November 2012–March 2021) from the General Directorate of Meteorology, sum of monthly average rainfall in the study area is 715.60 mm. The monthly average rainfall is minimum in February with 27.8 mm and maximum in May with 111.03 mm. In the study area, the monthly average temperature was maximum at 32.8°C in August and minimum at −7.4°C in December .
Şavşat is located in the eastern part of the Eastern Pontides and the southern part of Transcaucasia. In the study area, intrusive, volcanic, and volcano-sedimentary facies have developed due to the magmatic activities that took place in the Dogger, Late Cretaceous, and Eocene ages. In the north and northwest part of the region, units representing the same stratigraphic unity surfaces in a range extending from the Liassic to the Early-Middle Eocene. In the southern part, units representing two separate stratigraphic units are surfaced. The sequence in the west of the southern section is characterized by units of Early-Middle Jurassic and Late Cretaceous age, and the sequence in the east of the southern section is characterized by units of Late Cretaceous and Middle Eocene age. Tertiary units surfacing in the eastern and southeastern parts of the region are considered as common units . According to the earthquake zone map of Turkey, Şavşat district is located in the third degree earthquake zone. However, the most common natural disaster in the district is landslide . The landslides occurring in the study area are mostly of complex type. Landslides are observed in larger areas with respect to Quaternary alluvium and slope debris .
2.1 Landslide inventory map
To reliably predict future landslides, reliable landslide inventory maps containing information about past landslides are needed . As stated by Parise , landslide inventory maps represent the spatial distribution of landslides and provide information about the location, typology, and activity status of landslides. In this study, the landslide inventory map produced by Artvin Provincial Directorate of Disaster and Emergency was used. The landslide inventory map contains 85 landslide polygons. The area of the smallest landslide polygon in the study area is 0.01 ha (99.34 m2), and the area of the largest landslide polygon is 325.97 ha. The average area of the landslide polygons is 34.75 ha. Landslides cover ~3% of the study area. The lengths of the landslides in the region vary between 13 and 3100 m and their widths vary between 10 and 2780 m. According to their activities, 28 of these landslides are active, 32 are stalled, and 25 are inactive landslides. According to Varnes  classification of mass movements, 6 of the landslides were classified as slide, 2 as lateral spread, 20 as flow, and the remaining 57 as complex.
2.2 Landslide-conditioning factors
Evaluation of landslide susceptibility in a region depends on determining the factors that are effective in the formation of landslides in that region and on collecting spatial data related to these factors . Yi et al.  stated that there is no widely accepted procedure for the selection of factors used in LSM. Yanar et al. , on the contrary, stated that the main limitation in determining the factors to be used to create landslide susceptibility maps is the availability of data. In this study, 11 factors including altitude, aspect, curvature, distance to drainage network, distance to faults, distance to roads, land cover (CORINE 2018), lithology, slope, slope length, and topographic wetness index (TWI) were used based on the availability of data, geo-environmental conditions of the study area, and literature survey. Spatial data on these factors are collected from different sources (Table 1). Landslide-conditioning factor maps were generated using ESRI ArcGIS 10.5 and SAGA GIS 7.9.0 software and were converted into raster format with 30 m spatial resolution.
|Original data||Factors||Data type||Scale||Data provider|
|Landslide inventory||Landslide locations||Polygon||1/25,000||Artvin Provincial Directorate of Disaster and Emergency|
|Geological map||Lithology||Polygon||1/100,000||General Directorate of Mining Research and Exploration (GDMRE)|
|Distance to fault lines||Polyline||1/100,000|
|Topographical map||Altitude||GRID||1/25,000||General Directorate of Mapping|
|Distance to drainage network||GRID||1/25,000|
|Road network||Distance to roads||Polyline||1/25,000||Basarsoft Information Technologies Inc.|
|CORINE 2018||Land cover||Polygon||1/100,000||European Union Copernicus Land Monitoring Service|
Altitude is associated with various geomorphological and meteorological factors such as weathering, weather conditions, wind effect, and precipitation, which are effective in the formation of landslides . For this reason, it has been used in almost all LSM studies. The digital elevation model (DEM) of the study area was created using 10-m-interval contours on the topographic maps and it was converted to raster format with 30-m spatial resolution. The altitude map of the study area was generated from this DEM. The altitude in the study area varies between 590 and 3005 m. DEM was reclassified into 10 classes at 240 m intervals (Figure 2a).
Aspect has an important role in landslide formation as it affects factors such as exposure to sunlight and the intensity of solar radiation, wind, rainfall and, soil moisture [38, 39]. For this reason, aspect is widely used in LSM studies [6, 26, 36, 40]. The aspect map used in this study was produced from DEM and divided into nine classes (flat, north, northeast, east, southeast, south, southwest, west, and northwest) (Figure 2b).
Curvature, which is widely used in geomorphometric analysis, is one of the basic terrain parameters and reflects the shape of the land surface [23, 41]. In curvature map, positive curvature values indicate that the surface is convex, negative curvature indicates that the surface is concave, and zero indicates that the surface is flat . In this study, curvature map was derived from DEM using ArcGIS 10.5 software and divided into three subclasses, i.e., concave, flat, and convex (Figure 2c).
2.2.4 Distance to drainage network
The distance to the drainage networks is one of the important conditioning factors used in landslide susceptibility studies, since the pore water pressure that causes the formation of landslides increases in areas close to the drainage networks . Drainage networks in the study area were generated from DEM using functions in ArcHydro toolbox in ArcGIS 10.5 software. The distance to the drainage networks was calculated using the Euclidean distance tool in ArcGIS 10.5. The maximum distance to the drainage networks in the study area has been calculated as 1830.98 m. The distance to the drainage networks is reclassified into 10 subclasses with equal intervals of 180 m (Figure 2d).
2.2.5 Distance to faults
Areas close to faults are highly susceptible to landslides as the strength decreases due to tectonic fractures . Ba et al.  stated that landslides tend to occur around faults due to fractures in the rock mass. For this reason, the distance to the faults is taken into account in the landslide susceptibility analysis [14, 40, 44]. In this study, the distance to the faults was obtained using the Euclidean distance tool of ArcGIS 10.5 software. The maximum distance to the faults in the study area has been calculated as 13,016.61 m. The distance to the faults was classified into 10 subclasses with 1200 m intervals and used in the landslide susceptibility analysis (Figure 3a).
2.2.6 Distance to roads
Road construction, which is considered to be one of the most important anthropogenic factors, destabilizes the slopes, so the probability of landslides along a road increases . Roads built on slopes in areas with rough topography cause loss of toe support, change in topography, increase in tension behind the slope, and development of tension cracks [45, 46]. For this reason, distance to the road has been considered as one of the important conditioning factors in many studies [14, 17, 47]. The road network in the study area was supplied in digital format from Başarsoft Information Technologies Inc., which collects road data for the production of navigation maps in Turkey. Distance to roads was calculated using the Euclidean distance tool in ArcGIS 10.5 and reclassified into 10 subclasses at 450 m intervals (Figure 3b).
2.2.7 Land cover
Land cover maps, in general, represent what physical classes or materials (e.g., forest, pasture, field, lake, and wetland) the Earth’s surface is spatially covered with. Land use or land cover maps are usually used in LSM studies for taking into consideration the effects of anthropogenic activities on rugged slopes on landslide formation . In this study, CORINE 2018 land cover (CLC 2018) data provided by Copernicus Land Monitoring Service, one of the European Union’s Earth Observation Programme services, were used. According to this dataset, the study area includes 14 different land cover classes (Figure 3c).
The slope angle, one of the most important factors governing the stability of slopes, is closely related to the shear forces acting on the slopes. As the angle of inclination increases, the shear stress in the materials forming the slope generally increases . For this reason, slope angle has been used in all LSM studies, as is the case for the lithology parameter [18, 40, 49, 50, 51]. The slope in the study area varies between 0° and 72.53°. In this study, the slope was divided into 10 classes with 5° spacing, and a slope map of the study area was produced (Figure 3d).
2.2.9 Slope length
Slope length is one of the important topographic factors that affect the formation of landslides . Kavzoglu et al.  defines the slope length as “the distance along a slope subject to uninterrupted over land flow.” Slope length affects hydrological processes and soil loss, especially in mountainous areas . This factor is closely related to the formation of landslides, because the potential for the materials forming the slopes to be carried downhill also increases with the increase of the slope length . In this study, slope length was produced from DEM using SAGA GIS software and it was reclassified into 10 classes using the natural break classification method (Figure 4a).
2.2.10 Topographic wetness index (TWI)
TWI is an index generally used to characterize the spatial distribution of soil moisture  and is considered as an important factor contributing to the occurrence of landslides. Yanar et al.  stated that TWI indicates the locations and size of the water-saturated regions. For this reason, TWI has been used in many landslide susceptibility studies [26, 54, 55]. The following equation is used to calculate TWI:
In the Eq. (1), As is the specific basin area and β is the slope in degrees. TWI index in the study area, varying between 1.002 and 24.160, was produced using SAGA GIS software. TWI index values were divided into 10 subclasses using the natural break classification method and used in sensitivity analysis (Figure 4b).
Kavzoglu et al.  stated that lithology is one of the main factors that have a direct impact on the formation of landslides, as lithological and structural variations lead to changes in the strength and permeability of rocks and soils. For this reason, lithology has been one of the most important conditioning factors used in all landslide susceptibility evaluation studies. In this study, 1/100,000 scaled digital geological map obtained from General Directorate of Mineral Research and Exploration (GDMRE) was used to produce the lithological map of the study area. The geological map of the study area includes 16 lithological units (Figure 5).
3.1 Random forest
First proposed by Breiman , RF is an ensemble learning method that creates multiple decision trees from the training dataset and combines the results of the decision trees to improve the predictive ability of the model . According to Arabameri et al.  and Merghadi et al. , one of the most important advantages of RF is that it avoids the risk of overfitting, which is a common problem in other decision tree models. In the study conducted by Sahin , it is stated that requiring less hyperparameter tuning, compared to gradient boosting algorithms, was RF’s main advantage. To create a classification model in RF, two parameters must be defined:
3.2 Gradient boosting machine (GBM)
GBM  is a ML technique that combines multiple different models through boosting and regression trees to increase prediction precision . The main feature of GBM is that it combines multiple weak learners to improve their performances. GBM, an ensemble learning method, combines multiple decision trees to create a more powerful model that can be used for classification or regression. In GBM, unlike RF, each tree tries to correct the error of the previous tree . For this purpose, the residual errors calculated as a result of the prediction of the previous tree are minimized and the next tree is obtained, and these processes continue until the prediction results are stable or until the maximum number of trees is reached. In practice, the number of trees is chosen to be 100 or greater. There are four parameters that must be set by the user during the execution of the GBM, namely number of trees (
3.3 Extreme gradient boosting (XGBoost)
XGBoost, developed by Chen and Guestrin , is based on the gradient boosting approach. XGBoost is based on the efficient and effective implementation of the gradient boosting algorithm. For this purpose, it interprets the approximate greedy algorithm with the Newton–Rapson method. XGBoost uses several classification and regression trees and integrates them using gradient boosting . It produces fast and accurate solutions with univocal regression trees, weighed quantile approach, and sparsity aware split finding. It is trained very quickly, and since it is suitable for parallel learning technique, XGBoost increases the overall accuracy (performance) of the model by avoiding the overfitting problem during the training process . XGBoost uses two additional techniques called shrinkage and column (feature) subsampling to avoid overfitting . Wang et al.  noted that the computational speed and accuracy of XGBoost has been significantly improved compared to GBM. In this study, the XGBoost model is implemented in R 3.6.3 using the “
3.4 Preparation of training and validation dataset
“Landslide (or positive)” and “non-landslide (or negative)” samples are needed in the study area during the training and validation of the models used to create landslide susceptibility maps. The ratio of 70:30 has been commonly used in the literature to produce training and validation datasets [6, 8, 65, 66]. In particular, 70% of the landslide inventory data is used for training the models and the remaining 30% is used for the validation of the models. Huang and Zhao , on the contrary, stressed that the number of positive and negative samples in the training and validation datasets should be equal, i.e., having a ratio of 1:1. For this reason, as many negative samples as the number of positive samples are selected in the study area. In this study, 85 landslide polygons on the inventory map were converted to 30 m × 30 m resolution raster format and 32,777 landslide pixels were obtained. A value of “1” was assigned to positive or landslide pixels in the study area. Then, 32,777 non-landslide pixels were randomly selected in the study area in the R program and the value of “0” was assigned to these pixels. Randomly selected 70% of the landslide and non-landslide pixels (45,888 pixels in total) were used for training the models and the remaining 30% (19,666 pixels) were used for the validation of the models.
3.5 Multicollinearity analysis for landslide-conditioning factors
One of the important steps of LSM is to control the multicollinearity between landslide-conditioning factors . Multicollinearity is an important analysis used to determine the conditional independence between the factors during the selection of the conditioning factors to be used in susceptibility models, and thus, to prevent the models from producing erroneous predictions [9, 68]. Commonly used indicators for multicollinearity analysis are tolerance (TOL) and variance inflation factor (VIF). A TOL value less than 0.1 or a VIF value greater than 10 indicates multicollinearity [8, 16, 44]. TOL and VIF values calculated using the training dataset for this study are shown in Table 2. The results show that there is no multicollinearity among the landslide-conditioning factors used in the study. Therefore, all selected factors were used to produce landslide susceptibility map of the study area.
|Landslide conditioning factors||Statistics|
|Distance to drainage network||0.7916||1.2633|
|Distance to faults||0.7786||1.2844|
|Distance to roads||0.5552||1.8011|
|Topographic Wetness Index||0.4595||2.1761|
4. Results and discussion
4.1 Landslide susceptibility mapping
In this study, RF, GBM, and XGBoost models were successfully applied and landslide susceptibility index (LSI) maps were produced via R 3.6.3 using the training data set for each model. Then, landslide susceptibility maps were obtained by reclassifying the LSI maps into five classes: very low, low, medium, high, and very high, using the natural breaks (Jenks) classification method in ArcGIS 10.5 software (Figure 6).
The spatial distributions (in percentages) of the susceptibility classes for each model are given in Figure 7. It has been determined that the study area is highly or very highly susceptible to landslides by 27.27%, 11.13%, and 16.89% according to the GBM, RF, and XGBoost models, respectively (Figure 7).
The significance degrees of the landslide-conditioning factors used in the study are presented in Figure 8. It has been observed in all models that the lithology is the most important parameter. After lithology, the most important or most effective parameters in the study area were determined to be altitude, distance to faults, slope, and land cover parameters. Slope length and curvature were the least significant parameters in all models (Figure 8). The findings related to the parameters found to be effective in terms of landslide are explained in the following sections.
When Table 3 is examined, ~76% of the landslides in the study area can be seen to have occurred at altitudes between 1070 and 2030 m. In respect of altitude, 1070–1310, 1310–1550, 1550–1790, and 1790–2030 m altitude classes were found to be susceptible to landslides (Table 3). The main reason why these altitude classes are susceptible to landslides is that more than 90% of the village settlements in the study area are located between these altitudes. Uncontrolled excavations and uncontrolled agricultural activities in villages are the most important factors that trigger landslides. In the study by Erener et al. , conducted in Şavşat district and covering a more limited (small) region compared to this study, the altitude class between 1500 and 2000 m was found to be susceptible to landslides.
|Factor||Subclasses||Pixels in domain||Pixels with landslide||Percentage of landslides (%)||Percentage of domain (%)||FR|
|Distance to faults (m)||0–1200||353573||10042||30.64||28.41||1.0786|
When the study area is examined in terms of slope, it is seen that 0°–5°, 5°–10°, 10°–15°, and 15°–20° slope classes are more susceptible to landslides (Table 3). In these slope classes, 82.31% of the landslides occurred in the study area. The fact that complex mass movements (creeping and spreading) in the study area are generally seen in areas with low slope degrees (approximately in the range of 7°–12°) have provided these results in terms of slope.
When Table 3 is examined, it is seen that ~55% of the landslides in the study area occur on slopes with north, northeast, and northwest aspects. When the frequency ratios in Table 3 are examined, it is clearly seen that the slopes with these aspects have the highest frequency ratio value, and therefore, they are more susceptible to landslides. In the study conducted by Akıncı and Zeybek , in the Ardanuç district, which is adjacent to the Şavşat district and has similar topographical and geomorphological characteristics with the study area, the slopes with north, northwest, and northeast aspects were determined to be more susceptible to landslides.
Within the first 3600 m margin of the faults, 74% of the landslides occurred in the study area (Table 3). In the study area, the landslide susceptibility tends to decrease with distance from the faults. Although the region most susceptible to landslides in terms of distance to faults is 4800–6000 m, it is seen that distance classes of 0–1200, 1200–2400, and 2400–3600 m are also susceptible to landslides (Table 3). Althuwaynee et al.  stated that the probability of landslide decreases as the distance to the faults increases. Also in the LSM study conducted by Akinci et al.  in the area covering Arhavi, Hopa, and Kemalpaşa districts of Artvin Province, the areas within the first 2000 m distance to the faults were determined to be more susceptible to landslides.
Considering the CORINE 2018 land cover data, it was determined that ~56% of the landslides in the study area occurred in agricultural areas (Table 3). Non-irrigated arable lands (CORINE land cover code 211), agricultural areas within natural vegetation (243), mixed agricultural areas (242), discontinuous urban structure (112), and bare rocks (332) were determined as landslide sensitive areas. The scattered settlements in the villages cause uncontrolled excavations, which in turn triggers landslides. In the landslide susceptibility study conducted by Erener et al.  in Şavşat district, it was reported that landslide activity increased in areas where the original vegetation was removed or changed. In the same study, it was determined that farming areas, irrigated or dry, were more susceptible to landslides. Researchers attributed this to the deforestation in agricultural areas.
4.2 Validation and comparison of landslide susceptibility models
Thi Ngo et al.  stated that it is important to identify landslide-prone areas with high accuracy and to use an appropriate metric for the performance evaluation to produce a reliable landslide susceptibility map. The performances of the models used in the production of landslide susceptibility maps are mostly evaluated using the receiver-operating characteristics (ROC) curve [28, 38, 45, 60, 71, 72, 73]. Therefore, in this study, the receiver-operating characteristic-area under the curve (ROC-AUC) approach was applied to evaluate and measure the performances of ML models. The ROC curve is a graph showing the true positive rate (TPR or sensitivity) on the vertical axis and the false positive rate (FPR or 1-specificity) on the horizontal axis. In the ROC curve, the most important indicator used to evaluate the accuracy or performance of the susceptibility model is the AUC. AUC takes values between 0.5 and 1 . An AUC value close to 1.0 indicates high performance of the model and close to 0.5 indicates low performance of the model. On the contrary, Chen et al.  and Wang et al.  stated that the AUC value can be classified in five classes: poor (0.5–0.6), moderate (0.6–0.7), good (0.7–0.8), very good (0.8–0.9), and excellent (0.9–1.0).
In the study, success rate and prediction rate curves were created using training and validation data sets, respectively. The success rate curve is used to understand how well the models used to produce landslide susceptibility maps to classify existing landslide areas . In this study, the AUC values of the success rate curves for the GBM, RF, and XGBoost models were calculated as 91.6%, 98.4%, and 98.6%, respectively (Figure 9a). Since the success rate curve is produced using the training data set, it is not an appropriate indicator to evaluate the predictive capabilities of the models [21, 42]. The prediction rate curve should be used to evaluate the prediction capabilities of the models . The prediction rate curve shows how well the models predict unknown or probable future landslides . The AUC values of the prediction rate curves produced for the GBM, RF, and XGBoost models were calculated as 91.4%, 97.9% and 98.1%, respectively (Figure 9b). AUC value being close to 1.0 in three models show, according to the classification made by Chen et al.  and Wang et al. , that their performances, i.e., their prediction capacities, are excellent.
In this study, RF, GBM, and XGBoost algorithms were used for landslide susceptibility mapping of Şavşat district of Artvin Province. The performances of these models were evaluated using success rate and prediction rate curves. According to the AUC values, the models used in the study showed excellent performance. However, the XGBoost model outperformed the other two models in landslide susceptibility mapping of the study area. Therefore, it was concluded that the susceptibility map produced by the XGBoost model can help decision makers and planners in reducing the risks caused by landslides in the region and in land use planning. In this study, 11 factors—altitude, aspect, curvature, distance to drainage network, distance to faults, distance to roads, land cover, lithology, slope, slope length, and TWI—were used based on the availability of the data, geo-environmental conditions of the study area, and literature survey. As a result of the study, it was concluded that the main factor governing the landslides in the study area in all three models is lithology. The artificial factors that trigger landslides across the province of Artvin, as in Şavşat district, are uncontrolled excavation works (usually road widening), uncontrolled explosive excavations, and uncontrolled agricultural land irrigation. In this respect, providing basic disaster awareness trainings to citizens residing in areas susceptible to landslides in the study area and trainings on the causes, effects, and consequences of landslides will be beneficial in terms of risk reduction. Similarly, taking into account landslide susceptibility maps in selecting dwelling zones in rural areas and in determining the routes through which infrastructure facilities such as drinking water, natural gas, electricity, and sewerage will pass, will be effective in reducing the risks associated with landslides in the study area.
Conflict of interest
The authors declare no conflict of interest.