Soil-Landscape Modelling – Reference Soil Group Probability Prediction in Southern Ecuador

Our dependence on soil, and our curiosity about it, is leading to the investigation of changes within soil processes. Furthermore, the diversity and dynamics of soil are enabling new discoveries and insights, which help us to understand the variations in soil processes. Consequently, this permits us to take the necessary measures for soil protection, thus promoting soil health. This book aims to provide an up-to-date account of the current state of knowledge in recent practices and assessments in soil science. Moreover, it presents a comprehensive evaluation of the effect of residue/waste application on soil properties and, further, on the mechanism of plant adaptation and plant growth. Interesting examples of simulation using various models dealing with carbon sequestration, ecosystem respiration, and soil landscape, etc. are demonstrated. The book also includes chapters on the analysis of areal data and geostatistics using different assessment methods. More recent developments in analytical techniques used to obtain answers to the various physical mechanisms, chemical, and biological processes in soil are also present.


Research area
The research area is situated between the provincial capitals Loja and Zamora (Figure 1) in the southern Ecuadorian Andes from 1670 to 3160 m a.s.l. It extends in UTM-Zone 17M from west to east between 710500 and 716000, and from north to south between 9561500 and 9557000 ( Figure 1). The San Francisco River divides the area into two parts: The north-west facing slopes south of the river are covered by montane rain forest and subpáramo vegetation above the tree line. Within this area, Homeier et al. (2002)  forest types according to their altitude and position on the ridge or in the valley. The southeastern facing slopes north of the river are mainly covered by pastures and succession vegetation after fire clearance when sites were left unused. For soil model development, only sites under natural vegetation were considered. As part of the Chiguinda unit, the research area is lithologically covered by metasiltstones, siltstones and quartzites which are intermixed with layers of phyllite and clay schists (Litherland et al., 1994). Furthermore, it is influenced by the regular occurrence of landslides. Average total annual rainfall increases from 2050 mm at an altitude of 1960 m a.s.l. to approximately 4400 mm at 3100 m a.s.l. (Rollenbeck, 2006). Average air temperature decreases with increasing altitude from 19.4 to 9.4 °C (Fries et al., 2009).

Classification trees
Classification trees (CTs), a method first described by Breimann et al. (1984), were used to relate the RSGs to terrain parameters. It was conducted with the rpart library of the R-Project for Statistical Computing (Therneau & Atkinson, 2003). In CTs subdivision is based on a categorical response variable, i. e. RSG. The final subsets, also called end nodes, should be as pure as possible. This is done by trying to assign them to only one category in the response variable, e.g. to Histosol. The Gini criterion (Equation 1) is applied as a measure of purity (Breiman et al., 1984). It serves as a decision criterion, to determine which terrain parameter best separates the dataset continuously into always two subsets to create the purest end nodes.
The Gini-Index (Equation 1) reaches its maximum in a particular node t if all categories k within this node are equally represented. On the other hand, when the probability P i is equal to zero for all but one category within any node, the Gini-Index reaches its minimal value. The categorical value accounting for the majority within each end node is then assigned to the corresponding parameter values, indicating the typical position within the landscape (e.g. Liess et al., 2009). However, another option is to assign the percentage of each categorical value within an end node as occurrence probability to the corresponding landscape position. This is of course only justified if the applied sampling scheme guarantees sampling of all landscape positions to an equal extent. The CT is pruned to avoid overfitting and obviate random variation. To assess model performance, the cross validation error (CV) is calculated. The dataset is subdivided into 10 subsets, and the process is repeated 10 times with 9 parts for model training and the 10 th part as the evaluation dataset. Eventually, among all trees considered for the final model, the tree with the lowest cross validated error rate is chosen. CV and model pseudo R² are calculated. Pseudo stability indices are constructed to satisfy the different interpretations, e.g. explained variance or square of correlation. They are similar to R² in that they also range between 0 and 1 and a higher value represents a better adaptation to the data.

Dataset and GIS methodology to gain terrain data
Topographic data for the research area is available on a continuous landscape level. The DEM used to obtain terrain parameters for the establishment of a prediction model of RSG occurrence has 2 m cell size (Liess et al., 2009). For model application, this accuracy was reduced to 10 m to decrease calculation time. The used terrain parameters include altitude a.s.l., aspect, slope angle, terrain curvature, upslope contributing catchment area and overland flow distance to the channel network (OFD). Slope angle, aspect and curvature were computed with a 2 nd degree polynomial fit from Zevenbergen and Thorne (Zevenbergen & Thorne, 1987;Cimmery, 2007). The contributing area was calculated with two methods; (1) based on the Kinematic Routing Algorithm (KRA CA) (Lea, 1992) and (2) based on the Braunschweiger Digital Relief Model (BS CA) (Bauer et al., 1985). In addition to the OFD, the horizontal (HOFD) and vertical (VOFD) overland flow distances were also calculated. The channel network itself was assessed applying the Strahler stream order ≥ 5 as initiation threshold (Strahler, 1957). Terrain curvature was computed using directly adjacent cells. Finally, the terrain parameters were calculated and the RSGs were predicted for each individual raster grid cell. The free and open source GIS software, SAGA, was used (Böhner et al., 2006). The research area was sampled at 367 sites, including 311 auger points and 56 soil profiles. Soil sampling covered 24 sampling classes produced by an overlay of four altitudinal, three slope angle and two aspect classes to guarantee representative area coverage. Transects for auger sampling ( Figure 1) were laid according to the catena concept (Milne, 1935) from hilltop to valley bottom. For more detailed information on the applied sampling design, see Liess et al. (2009). Two methods were used to assign terrain parameters to the soil dataset. On the one hand, the nearest neighbour (n. n.) value was allocated to each soil profile or auger point. On the other hand, a buffer representing the radius of GPS accuracy was placed around the sampled location, and the calculated mean value of the corresponding area was assigned. This assignment was completed for each of the described parameters apart from the slope angle and aspect. These were directly measured in the field. The slope angle and aspect which were computed from the DEM were solely used for model application.

Probability calculation
The probability of each RSG was predicted via a CT which grouped the soil sampling points regarding the existence or absence of that RSG. Thus, the percentage of sampling points assigned to the corresponding RSG in each end node of the tree was used to predict the probability of that RSG. Thereby, the diagnostic properties necessary for assigning the particular RSG were used, whereas the necessary absence of other properties was neglected. This was done in particular to establish a good prediction scheme for Stagnosols. It was decided that the occurrence and thickness of a sufficient stagnic colour pattern and/ or albic horizon is more important than the limitation in organic layer thickness. As a consequence, soils with a 40 cm organic layer displaying also a thick stagnic horizon were classified as Histosols and Stagnosols. Any other proceeding would have made the development of a Stagnosol prediction scheme incomplete and complex.
To sum the individual probabilities and standardize them by relating each RSG to the total probability sum, is one option. This option neglects WRB (FAO, IUSS Working Group WRB, 2007) hierarchy, because all RSGs are competing on an equal level and no soil process is given dominance over another. As a consequence, the probabilities refer to the probability of the diagnostic property necessary for RSG assignation. Later we will refer to these as WRB independent probabilities. Figure 2 shows the probability calculation scheme based on WRB (FAO, IUSS Working Group WRB, 2007) hierarchy.

www.intechopen.com
Soil-Landscape Modelling -Reference Soil Group Probability Prediction in Southern Ecuador 245 Fig. 2. Hierarchical calculation scheme for the maximum possible probability of each RSG according to WRB hierarchy. P X is the actual probability of the respective RSG: H Histosol, L Leptosol, S Stagnosol, U Umbrisol, C Cambisol, R Regosol. P X(max) is the maximum possible probability of the RSG It is used to calculate the maximal possible probability for each RSG from the probability predicted by the CTs. Maximal Leptosol probability is left after subtracting Histosol probability from 1. Maximal Stagnosol probability is left after also subtracting the actual Leptosol probability and so on. Equation 2 shows the calculation of the actual probability, P X , according to the CT probability, P X(tree) , and the maximal possible probability, P X(max) . Figure 3 presents the CT models to predict Histosol, Leptosol and Stagnosol occurrence probability from nearest neighbour (n. n.) and mean terrain values. The RSG Histosol is assigned to soils with an organic layer ≥ 40 cm (FAO, IUSS Working Group WRB, 2007). Its probability within the research area was found to depend on two hydrological parameters ( (Figure 4b). This also accounts for the major difference between the two models ( Figure 4c). But since the corresponding end node in the tree model ( Figure 3d) is only supported by 13 sampled sites, this finding is not representative for the research area. Leptosols refer to soils limited to 25 cm depth by continuous rock (FAO, IUSS Working Group WRB, 2007). During soil sampling continuous rock was rarely attained, and refusal typically occured at the C horizon. This made the establishment of a model predicting soil depth to continuous rock impossible. Therefore, to calculate Leptosol occurrence probability expert knowledge was applied in addition to the CT methodology. From field work and data review it was known that Leptosols are found on steep slopes ≥ 50° and close to the creeks at approximately < 20 m HOFD. Other soils, which occurred at the same landscape positons with even higher probability, were excluded for model development. Afterwards, they were included again to calculate the probabilities of the tree end nodes. This explains the rather untypical appearance of the Leptosol CTs (Figures 3b and 3e). Usually, for any final subdivision into two end nodes, one of them would always display a probability > 0.5 and the other < 0.5. However, for the reason of adding more datasets after tree development this is not the case. This procedure was necessary in order to develop a reasonable model and account for true probabilities. Leptosol CTs established with n. n. and mean terrain values are very similar. In the already mentioned positions, Leptosol probability was assumed 0.30 -0.36 (Figure 3b and 3e). the map layout. The similarity between the two models for the mentioned sites is indicated by yellow colours in Figure 4c. The sites mapped in red colours refer to a 0.1 -0.3 higher probability as predicted by mean relief values. Comparison of the two tree models (Figures  3a and 3d) shows that differences are not higher than 0.13. The models differ only by a probability of 0.03 -0.13, neglecting the mentioned 13 sites. ) is increasing (hardly recognisable in the map). Model difference regarding probability predicted directly by the CTs and probability being calculated based on WRB hierarchy (Figures 5g and 5h) shows a similar picture. The difference between the WRB independent and dependent prediction by n. n. values (Figure 5g) is ≤ ± 0.1, but higher regarding the prediction difference by mean terrain values (Figure 5h). Stagnosols are "soils exhibiting hydromorphic features for some time during the year in some part within 50 cm of the mineral soil surface and show a stagnic colour pattern and/ or an albic horizon in half or more of the soil volume" (FAO, IUSS Working Group WRB, 2007). Planosols are classified by similar diagnostic properties, but in addition display an abrupt textural change, which could not be confirmed for the investigated soils. Stagnosol probability is predicted throughout the research area with at least 0.25 (Figures 3c and 3f). The probability in both models depends on slope angle and altitude. It is higher on slopes < 40°. Above 2146 m a.s.l. for the prediction by n. n. and above 2135 m a.s.l. by mean terrain values, the probability increases even further. While curvature is of no importance for Stagnosol probability prediction by n. n. relief values, mean terrain values assign an even higher probability for concave plan curvature with 0.64. Landscape positions < 2146 m a.s.l. for prediction by n. n. and < 2135 m a.s.l. by mean terrain values, and high slope angles account for the lowest probability of Stagnosols.

terrain values and h) mean terrain values
Model application to the research area is shown in Figure 6. Stagnosols reach higher probabilities by the mean terrain values model (Figure 6b) compared to the prediction from n. n. terrain values (Figure 6a). Figures 3c and 3f show that the difference between the probability prediction by n. n. and mean relief values (Figure 6c), + 0.1 -0.3, is not due to this higher Stagnosol probability on high altitudes as predicted by mean terrain values on concave sites. This difference accounts for only 0.05. However, it is due to the reduced probability assigned to convex sites ≥ 2135 m a.s.l. (0.24 difference). As a conclusion to this, the two models are quite similar, mainly differing by the dependence on curvature, which is not included in the model from n. n. relief values. Including WRB (FAO, IUSS Working Group WRB, 2007) hierarchy in the probability prediction, a site classified as Histosol or www.intechopen.com

Soil-Landscape
Leptosol cannot be classified as Stagnosol. Accordingly, Histosol probability reduces Stagnosol probability to a perceptible extent (Figures 6d and 6e). Figures 6g and 6h show that these differences account for 0.1 to 0.3 for most of the research area with the prediction by n. n. terrain values (Figure 6g) still yielding less differences in the lower altitudes compared to the prediction by mean terrain values (Figure 6h). Differences between the two models are extended while including WRB hierarchy (Figure 6f), compared to that being independent of WRB hierarchy (Figure 6c).

terrain values and h) mean terrain values
CTs for Umbrisols, Cambisols and Regosols cannot be provided. Umbrisol prediction was impossible, since the used dataset contains only 7 Umbrisols among 367 sampled sites and is not enough to gain a clear prediction scheme. Furthermore, not all but some of the determined Umbrisols are situated within the accumulation zone of former landslides so that an additional variable to predict their occurrence would be necessary. Cambisols and Regosols, on the other hand, are rather unspecific RSGs which makes their prediction difficult. Cambisols need a cambic horizon, but apart from that they are rather determined by the absence of diagnostic criteria that would classify the soil for another RSG. Regosols are even worse, since they do not have any characteristic on their own, but refer to all soils that do not classify as another RSG.

Model performance and uncertainty
Overall CT model performance is limited (Table 1). Terrain attributes can likely only explain RSG distribution to a limited extent within this mountainous tropical landscape. Unfortunately, no information is available about parent material distribution, but rapid bedrock changes were discovered during field work. The profound influence of landslides causes shifts in soil material and mixes it with rock material, leading to quite different soil properties. Although there has been a landslide inventory based on visible landslide scars on a time series of aerial photographs from 1962 to 1998 (Stoyan, 2000), most former landslides remain hidden under the regrown dense forest cover as was experienced during field work. (1) They are very dependent on the dataset used, i.e. some sample points more or less may lead to rather different models and (2) they predict abrupt values due to the grouping into end nodes. A continuous probability distribution of the RSGs in reality therefore is replaced by some probability classes according to Figure 3. What makes WRB RSG prediction in general problematic is the character of the WRB itself. Assignment of some RSGs requires exceeding an absolute (Histosols) and for others a relative (Stagnosols) thickness value of a diagnostic horizon. If a soil has an organic layer ≥ 40 cm, it is classified as Histosol independent of its mineral properties. If the organic layer is 1 cm less, these mineral properties abruptly become important. Relating the extent of the stagnic horizon to soil depth obviously is not characteristic enough to allow for a good model relating the Stagnosol occurrence pattern to terrain parameters. This is probably the reason why model accuracy is limited. As a consequence the low R² are not considered as a problem, but as a natural phenomenon in predicting complex entities such as RSGs. Furthermore, the calculated CT R² refers to a one value prediction. As was described earlier, a CT model usually assigns the category which forms the majority within each end node to the respective landscape position. It does not consider other categories assigned to that end node as classification possibility, but neglects them. Any soil map has a certain degree of uncertainty. Usually boundaries between soil units are drawn according to expert knowledge or GIS interpolations. However, the degree of uncertainty which is a logical phenomenon in any below ground investigation usually is not included within the soil map.

RSG
The new generation of digital soil maps provides a new development in this area. Accordingly, our digital soil maps include this model uncertainty through assigning RSG occurrence probabilities instead of unique values. Other authors mainly used fuzzy-logic to include this uncertainty, e.g. McBratney & De Gruiter (1992), Hannemann (2010).
Another aspect to be considered, is that generally soil maps are gained on a much larger scale. Lagacherie & Holmes (1997) use a spatial resolution of 50 m, Moran & Bui (2002) use 250 m. Therefore, the small scale, 10 m resolution, in our soil maps might be another reason for the low R². The soils within the research area change within a few meters radius as typical for tropical soils. Accordingly, the highest possible resolution was used. This way low scale soil variability is included within the models, which would be neglected while working on a larger scale. To conclude, the size of the applied dataset is not enough to represent the investigated soil-landscape at this high precision.

Comparison with earlier soil map
A RSG probability prediction is also possible from a single CT which predicts all RSGs at once. Liess et al. (2009) established such a CT for the research area (Figure 7), but did not predict probabilities from it. The percentage of the RSGs within each end node of this tree was interpreted as occurrence probability for the RSGs according to the related landscape position and compared it to the findings from the various CTs of this study. The difference between RSG probability by the tree model from Liess et al. (2009) and our predictions is displayed in Figure 8. The first column maps the RSG probabilities according to Liess et al. (2009), the second column presents the differences between the latter and our prediction from n. n. relief values (WRB dependent), and the third column shows the differences regarding the prediction from mean relief values. The model from Liess et al. (2009) (Figure 7) assigned a very high Histosol probability with 0.6 -0.8 to about half of the research area. For some sites the predicted probability was even higher. In our new model, Histosol probability was less, 0.2 -0.4 for most of the area ( Figure  4b), but continuous on all sites with at least 0.2 (Figure 3a and 3d). It was shown that Histosol probability is high within some landscape positions and for a VOFD from 54 -175 m this is supported by a high number of sampled sites. In contrast to this, the end nodes in the tree model from Liess et al. (2009) mostly contain only a very limited number of sampling sites, e.g. the end nodes that predict particularly high Histosol probabilities (≥ 0.8) only contain 12 -15 sampled sites. The end node with the most sites predicting Histosol probability with 0.78, refers to landscape positions in small catchments < 214 m HOFD, similar to our findings. The importance of the catchment size as first subdividing variable for model development was confirmed. For smaller catchment sizes, i.e. sites through which a smaller area discharges, Histosol occurrence is more likely. Leptosols were predicted with low probability on steep slopes and close to the creeks (< 20, < 19 m HOFD). The latter is confirmed by Liess et al. (2009) who predicted Leptosols < 21 m HOFD, but with a high probability of 0.71 (Figure 8d). 0.71 of 7 sampled sites that are contained within the respective end node no. 7 (Figure 7) are 5 sampled sites. To use only five Leptosol sites to predict such a high probability seems unreasonable. On steep slopes, especially in an area influenced by landslides, soils have less chance to develop. Hence, it is no surprise to find Leptosols in these landscape positions. Close to the creeks soil material is probably removed downslope within the channel system during times of high rainfall; through these sites a high amount of water discharges due to a high contributing catchment area. On many sites the organic layer directly overlies continuous rock. Stagnosols were predicted with a higher probability by n. n. relief values compared to the model from Liess et al. (2009) (Figure 8h). This is due to the fact that Stagnosols were predicted as all soils that display sufficient stagnic properties, but it was neglected that some of them carry a sufficiently thick organic layer to qualify as Histosols. Stagnic properties and  (Figure 8i) than prediction by n. n. terrain values. This is because Liess et al. (2009), who used a subset of our dataset, predicted the RSGs by mean relief values, too. Stagnosol probability increases above an altitude of 2146 m a.s.l. on slope angles < 40°. An increase in Stagnosol abundance with increasing altitude and decreasing slope angle was also described by Liess et al. (2009). Schrumpf et al. (2001) stated an increase in hydromorphic properties with increasing altitude and designated soils as Humaquepts (Soil Survey Staff, 2006). The increase with altitude can be attributed to the increasing rainfall (Rollenbeck, 2006). Lesser steep slope angles account for a slower discharge. We assume the RSG probability predicted by various CTs, to better represent soil reality within the research area, since the dataset does not consist of all RSGs to an equal extent so that some are preferred over others during the tree subdivision process. Furthermore, the multiple CTs rather predict probabilities of soil diagnostic properties, which can occur simultaneously at one site within the soil profile. Accordingly, the model from Liess et al. (2009) overestimated Histosol probability for most sites as can be seen by the mainly green colours in Figure 8b and c. However, at the same time it underestimated Stagnosols in most of the area as can be deduced from the prevailing red colours in Figures 8h and i. In a similar way, Leptosols are overestimated by the model from Liess et al. (2009).

Conclusions
Models adapted for n. n. compared to those adapted for mean terrain values showed only minor differences. We conclude that predicting all RSGs at once is not as good as predicting each RSG on its own by a CT. The dataset does not consist of all RSGs to an equal extent, so some RSGs are preferred over others during the tree subdivision process. Model performance might be improved by choosing a lower resolution to exclude small scale diversity, reducing model dependence on the dataset, applying a different statistical model or predicting soil properties instead of the complex RSG entities. However, further research is needed to prove these assumptions. Model uncertainty in the digital soil maps is represented by the occurrence probabilities of the RSGs. Probabilities of various RSGs at the same landscape position can be understood as competing RSGs. But the probabilities of the various RSGs can also be interpreted as a soil composed of the various RSGs, i.e. various diagnostic horizons or various soil processes running simultaneously or successively as has been part of soil genesis theory for a long time (Simonson, 1959;Schelling, 1970). Thereby, this provides a good means to acknowledge inter-relations between the RSGs. An even better chance to acknowledge this would be the prediction of the diagnostic properties necessary for WRB classification by themselves. In accordance with McBratney & De Gruiter (1992), who thought to improve the existing soil classification systems via fuzzy sets, we would like to contribute the above-mentioned ideas to the development of a continuous soil systematisation system.