Terrain Analysis for Locating Erosion Channels: Assessing LiDAR Data and Flow Direction Algorithm

Terrain analysis can be used to locate concentrated flow erosion (e.g., ephemeral gully ero‐ sion) across landscapes. For example, studies have found that ephemeral gullies were likely to occur when field specific thresholds were exceeded for the following terrain attributes: the product of upslope area, slope, and plan curvatures [1]; topographic wetness index, up‐ slope area, and slope [2]; and the topographic wetness index and the product of the upslope area and slope [3]. Another study used a cartographic classification and threshold procedure for erosion channel identification [4].


Introduction
Terrain analysis can be used to locate concentrated flow erosion (e.g., ephemeral gully erosion) across landscapes. For example, studies have found that ephemeral gullies were likely to occur when field specific thresholds were exceeded for the following terrain attributes: the product of upslope area, slope, and plan curvatures [1]; topographic wetness index, upslope area, and slope [2]; and the topographic wetness index and the product of the upslope area and slope [3]. Another study used a cartographic classification and threshold procedure for erosion channel identification [4].
An alternative approach utilized logistic regression and artificial neural network procedures to predict where erosion channels would appear in agricultural fields based on digital terrain attributes [5]. With leave-one-field-out validation, it was determined that the more simple logistic regression was more appropriate because it performed as well as the non-linear neural network procedure. In a follow up study, erosion channels predicted from terrain attributes derived from 10-m US Geological Survey (USGS) digital elevation models (DEMs) were compared to those derived from DEMs created with survey-grade real-time kinematic (RTK) Global Positioning System (GPS) data [6]. The USGS models identified most eroded features but the RTK analyses delineated them more clearly. The authors concluded that the USGS predictions were adequate for many agricultural applications because creating DEMs with RTK was relatively costly while USGS data was freely available on the Internet for most of the United States. A graphical representation illustrates how the logistic regression analysis [5,6] can be fit (step 1) and then applied (step 2) in Figure 1.
In another study, the model from reference [5] predicted where eroded waterways would occur 223 km away in a different physiograpic region of Kentucky where there were marked textural and parent material differences between the soils [7]. The locations of the Outer Bluegrass [6,7] and Western Coal Fields studies are shown in Figure 2.

RTK GPS, LiDAR, and USGS DEM Accuracy
The performance of terrain analysis applications depends on the source and quality of the elevation information. Very accurate elevation measurements can be obtained with surveygrade RTK GPS (e.g., horizontal rmse < 2.2 cm). Figure 1. Description of the basic two-step logistic regression analysis procedure used in references [5,6]. The first step involves fitting a model with field observations of soil erosion as a function of terrain attributes. The second step involves applying the model. In this example, the model was applied to the same field as it was fit. However, references [5,6] used a leave one-field out validation and fit the model to four fields and applied it in a fifth. See equation 3 in methods section for a mathematical definition of the Length-Slope terrain attribute. This usually involves setting up one RTK receiver as a base station, to broadcast via radio corrections to a mobile receiver, which calculates elevation on-the-fly. Elevation data can also be obtained with light detecting and ranging (LiDAR) systems mounted at the bottom of airplanes with a pulsing laser scanning rapidly from side to side. Detectors determine the time required for laser pulses to bounce back in order to calculate the distance between the aircraft and the ground surface. With simultaneous in-flight RTK GPS measurements, estimates of ground elevation are obtained at a high spatial intensity (e.g., several points per square meter). In one study [8], RMSEs for LIDAR were determined to be 33.3 cm in short grass. The USGS National Elevation Dataset (NED) includes Level-2 DEMs for most of the United States, which in many cases were reinterpolated from USGS topographic contours. These contour maps were originally created by the USGS using stereo orthophotogrammetry. Level-2 DEMs have an accuracy of one-half the contour width (e.g., the contour width was for the studies in [5,6] was 304 cm).
Of the three elevation datasources discussed, RTK GPS is one of the most accurate methods for creating ground surveys. Unfortunately, these systems are relatively expensive and obtaining ground measurements and data processing may be labor intensive. USGS DEMs are an attractive option because they are free; however, they do not identify erosion pathways as clearly as the RTK data [6]. Elevation data obtained with LiDAR is relatively inexpensive on a per area basis and are currently being obtained by various government agencies not necessarily associated with agriculture, but urban planning, transportation, and geophysics applications. LiDAR has a great potential to aid in the identification of concentrated flow pathways [9,10].

Flow Direction Algorithm
Terrain attributes (e.g., length-slope, topographic wetness index) require estimates of the upslope contributing area for each cell in the DEM. This necessitates the calculation of single or multiple flow direction for each cell in the DEM. The most common single direction flow model is the deterministic eight-neighbor (D8) procedure. Multiple direction flow models include fractional deterministic eight-neighbor (FD8), digital elevation model networks (DE-MON), and deterministic infinity (D∞). There are a number of software programs that can calculate terrain attributes, including ArcGIS, Grid-Based Terrain Analysis Programs for the Environmental Sciences (TAPES-G), and Terrain Analysis Using Digital Elevation Models (TauDEM). ArcGIS, TauDEM, and TAPES-G predict D8 single direction flow. The TAPES-G program can also estimate multidirectional flow with FD8 and DEMON while TauDEM program can estimate flow with D∞.
O'Callaghan and Mark [11] developed the D8 model that routs flow from each cell to one of its single eight neighbors in the cardinal or diagonal direction with the steepest grade. However, the D8 method does not accurately model divergent flow in ridge areas and it produces unrealistic parallel flow lines [12]. The FD8 multidirection method [13] allows flow to be routed to more than one of its eight neighbors in an amount proportional to the slope gradient between the center cell and the adjacent eight neighbors. Once the upslope contributing area for each cell exceeds a threshold (i.e., maximum cross grading area), the FD8 procedure switches to D8 flow, allowing the modeling of both divergent and convergent flow [12]. Lea [14] developed an aspect driven kinematic routing algorithm that was no longer restricted to the eight nearest neighbors (i.e., cardinal and diagonal directions). This algorithm inspired both the D∞ and DEMON methods. The DEMON stream tube method developed by Costa-Cabral and Burges [15] is computationally intensive and involves routing flow downstream along tubes that expand and contract in a way where the tubes do not necessarily coincide with the cell boundaries [12]. The D∞ algorithm first calculates flow direction from the infinite set of possible flow directions around each cell, not just in the 8 cardinal directions.
The terrain attributes in references [5][6][7] were calculated with TAPES-G utilizing the FD8 method. TAPES-G still exists but it is no longer being supported and updated by the developers. The Windows installation only works on legacy versions of ArcGIS. The preferred approach for terrain analysis today has become TauDEM with the D∞ flow direction algorithm.

Concentrated Flow Erosion, Grassed Waterways and The Environment
Agriculture contributes 80% of the excessive phosphorus (P) flowing to the Gulf of Mexico from the Mississippi River Basin. Among the states contributing to this environmental impact, Kentucky is the sixth largest contributor of nitrogen and phosphorus to the Mississippi River Basin [16]. Sediment is the leading cause of impairment of Kentucky's rivers and streams, impacting over 4,329 linear km with agriculture being the primary contributing source [17]. The problem of sedimentation, often underestimated, is enormous. It not only adversely impacts aquatic life [18] and impaires ecological and economic functioning of drainage systems, wetlands, streams, and rivers but also increases the risk of flooding and the need for water treatment. Properly constructed grassed waterways (NRCS Code 412), with appropriately sized vegetative filters on either side reduce considerable runoff, nutrient, and sediment delivery to surface waters. Specifically, they reduce ephemeral gully erosion, which is a substantial but often overlooked problem that accounts for about 40% of all erosion in agricultural fields [19]. In one study, the installation of grassed waterways led to reductions of 39 and 82% in runoff and sediment, respectively [20]. In another study, grassed waterways reduced dissolved reactive phosphorus losses by factors ranging between 4 to 7 and particulate phosphorus by factors ranging between 4 and 10 [21]. Because grassed waterways are so effective, the USDA Conservation Reserve Program (CRP) and Environmental Quality Incentives Program (EQIP) provide funding for this and other conservation practices.

Research Objective
The objective of this study was to compare how predictions of concentrated flow erosion performed with LiDAR, survey grade RTK GPS, and 10-m USGS DEMs, and using the D8, FD8, DEMON, and D∞ flow direction models.

Field observations
One of the co-owners of the Worth and Dee Ellis Farms was trained by the NRCS to identify eroded zones from concentrated water flow. The farmer delineated and mapped the eroded ephemeral gullies in the five study fields using a DGPS system. Subsequently, the farmer installed grassed waterways in these areas but did not reshape them as is typically the case when installing these conservation structures. This is an important distinction because reshaping these locations would have changed the terrain attributes, making the analyses presented in this chapter difficult or impossible to interpret.
Because these field observations were conducted by the farmer, we asked the NRCS district conservationist to validate the erosion delineations in the field. The conservationist visited the fields in 2008 and examined 42% of the eroded channels delineated by the farmer in the five study fields. He determined that all the channels would have been eligible for cost support for grassed waterways through the continuous USDA Conservation Reserve Program CRP program if they did not already have waterways installed.

Acquisition of Elevation Data, DEM Creation, and Terrain Analysis
Survey grade RTK GPS equipment was used to collect elevation data from Field D in 2000 [23], Fields E and B in 2004 [24], and in Fields A and C in 2007 [5]. Dual frequency Trimble AgGPS 214 (base station) and Trimble 5800 (rover) RTK receivers were used to create surveys in Fields A, B, C, and E. The survey for Field D was created with two single-frequency Trimble 4600 receivers and elevation was determined during post processing. The surveys were created at varying intensities with elevation measurements spaced approximately every 3 (Field D) and 4 m (Fields A, B, C, and E) along the direction of travel during the survey. There were 7.5 (Field D) and 12 m (Fields A, B, C, and E) between survey passes. According to the Trimble data specifications, vertical errors for all receivers were expected to be <2.2 cm because the baselines were <1200m [25,26]. The LiDAR survey was created (courtesy of Photo Science) in November of 2009 after harvest with soybean residue and stubble on the ground. The airplane height above ground was 1.219 km and speed was 185 km hr -1 . The estimated horizontal and vertical accuracies were 16 and 11 cm, respectively. The average elevation point density was 2.82 points m -2 . PhotoScience converted the raw Li-DAR data into a triangulated irregular network (TIN). Then they converted the TIN into a 1m raster file using natural neighbor interpolation. After the 1-m LiDAR DEM was delivered to the University of Kentucky from PhotoScience, the grid was sampled to a 4 by 4 meters. Elevation data on 9.1-m grids for these fields were obtained from Kentucky Division of Geographic Information (KDGI) (http://technology.ky.gov/gis/). The USGS DEMs had been previously re-interpolated by KDGI from 10-m to 9.1 meters so that they could distribute in the Kentucky State Plane coordinate system. The 9.1-m USGS raster was converted to a point file format. The raw RTK GPS and USGS points data were converted to 4 by 4-m grids with the ArcGIS (ESRI, Redlands, CA) TOPOTORASTER command (drainage enforcement option was not used for the reasons described in reference [5]). To smooth the data, 1-m contour maps were created from the 4 by 4-m RTK, LiDAR, and USGS DEMs with the ArcGIS spatial analyst extension. Then the TOPOTORASTER command was used to convert the contours back into new 4-m DEMs.
Next, TAPESG was used to remove sinks (depressions) and calculate terrain attributes for the RTK and USGS datasets using the FD8 and DEMON flow direction algorithms. TauDEM was used to remove pits and calculate terrain attributes for the RTK, LiDAR, and USGS datasets using the D8 and D∞ flow direction algorithms. Primary terrain attributes included slope (β), plan curvature, upslope contributing area, and specific catchment area. Slope was calculated using the finite distance algorithm in TAPES and the D∞ algorithm (i.e., slope could be in any direction, not just in the cardinal and diagonal directions) with TauDEM. For the FD8 method, the maximum cross-grading area [12], was hardcoded to a value of 50,000 m 2 by the authors of the TAPESG for Windows software. Each cell in the DEM had a flow width dependent on the direction of flow entering that cell from the eight neighbors. For the TauDEM D8 and D∞ procedures, flow width was assumed to be 1 grid increment in all directions. The flow width for FD8 and DEMON algorithms is described in detail in reference [12]. The specific catchment area adjusts the upslope contributing area to account for flow width and was calculated as follows: The length-slope terrain attribute (Eq. [3]) was developed to be an estimate of the lengthslope factor from the Revised Universal Soil Loss Equation (RUSLE), and index of potential erosion [27]. Unfortunately, this index does not estimate the true length-slope well on longer and steeper slopes [22] but still has adequate utility.

Statistical analyses
A new variable, "ErWater," was created that was assigned a value of 0 if the grid point observations were not from areas with evidence of concentrated flow and a value of 1 if there was evidence. The terrain attributes for each 4 x 4 grid from all five fields (n= 99,505) were exported from ArcGIS in a database format so they could be read by SAS. The sample module in SAS Enterprise Miner 4.3 were used to random subsample the dataset with an equal number of observations from each field and areas within the fields with and without concentrated flow erosion. So after subsampling, the final dataset included 1450 observations (145 from each of the eroded areas and non eroded areas in each of the five fields studied). SAS PROC LOGISTIC was used for logistic regression with the 1450 point subset used as the input dataset. The SCORE option was used to obtain full dataset predictions with all 99,505 records. The topographic wetness index, the estimated length-slope for the Universal Soil Loss Equation, and plan curvature were included as predictor variables. The dependent variable was the newly created ErWater variable. Proc FREQUENCY was used to create confusion tables for the score datasets and the results were reported in percentages. The confusion tables presented in this chapter can be interpreted according to the guide presented in Table 1. The scored logit data (i.e., 0's and 1's) were then exported from SAS into ArcGIS where the point data were converted to raster format for display in GIS.

Performance of LiDAR data with D∞ and D8 Flow Direction Models
Logistic regression models and statistical tests are given in Table 2. The topographic wetness index, length-slope, and plan curvature values were all highly significant in the models. The discretized output of the LiDAR D8 ( Figure 3) and D∞ (Figure 4) models indicated a high correspondence between the eroded waterway boundaries and the black shaded areas (i.e., areas with probability of concentrated flow erosion > 0.5 and ≤ 1.0). The average of the combined type 1 and type 2 error rates across the five fields was 8% for D8 and 9% for D∞ (Table  2). This data can be interpreted using Table 1 provided in the methods section.
The LiDAR predictions with D8 and D∞ were excellent (Figure 3 and 4). Interestingly, the D8 approach differentiated between the eroded and non eroded areas in the lower central portion of Field A (Figure 3). In this area, an eroded feature that appears to be an island can be observed by examining the polygon boundaries. The water flowing across this island area disappeared underground and reemerged in the waterway below. The terrain modeling predicted high upslope contributing area and therefore high topographic wetness and length-slope index indices below this island area; however, plan curvature values did not have large negative values (indicating concavity) because there was no erosion area as determined during field observations with the NRCS conservationist.

Impact of data cleaning
We considered what would have been the result if we had not contoured and rasterized the data. The average combined type-1 and type-2 error rate for D8 and D∞ was 8% for the smoothed data (Table 2) and 12% for the unsmoothed data. The logistic regression analyses did not differentiate as clearly eroded and non-eroded areas when the data were not smoothed. This is very apparent when comparing the unsmoothed analyses of Fields D and E ( Figure 5) with smoothed analyses (Figure 3).      One problem with LiDAR data is that it contains systematic artifacts resulting from differential plant stubble heights in the direction perpendicular to crop rows associated with the direction of movement of farm machinery. With terrain modeling, they can create artificial flow lines along pathways of farm equipment. These lines are apparent in the unsmoothed but not smoothed LiDAR slope data ( Figure 6). Unfortunately, the method of smoothing presented in this chapter introduced new artifacts along the contours because the procedure involved 1. calculating contours from DEMs and

rasterizing the contours.
This smoothing is very computationally resource intensive, and potentially problematic for professional conservation planners. It may be necessary for GIS experts to smooth the Li-DAR data during preprocessing and make the analyses available to planners. More computationally efficient smoothing techniques should be considered that minimize new artifacts.

Comparison of LiDAR with RTK and USGS
The type-1 and -2 average misclassification errors for TauDEM D8 and D∞ output for Li-DAR (8%) were similar in size as errors for the RTK (8%) dataset and lower than those for the USGS data (12%). The predictions were not very different for any of the datasets as shown in Figure 7. In many areas throughout the United States, LiDAR is being purchased for various applications (e.g., agriculture, soil mapping, transportation, land use planning).
In those situations, it would be advantageous to use the LiDAR data for identifying eroded waterways. In areas where LiDAR is not available, USGS 10-m grids may be adequate for conservation planning [6].

Impact of Flow Direction Algorithm
The FD8 and DEMON methods were not used with the LiDAR data because TAPES no longer works with the latest version of ArcGIS (i.e., version 10.0). The regression parameters and Wald Chi Square test for the RTK (Table 4) and USGS (Table 5) analyses could be compared with those for the LiDAR models ( Table 2). All of the parameters were statistically significant except for plan curvature in the USGS FD8 analyses (Table 5) which was consistent with [6]. The concentrated flow prediction models can be tested in other areas across the country with the same data source (RTK, LiDAR, and USGS) and flow direction (D8, D∞, FD8, and DEMON). The users should note that use of these may only be valid when similar smoothing techniques and DEM resolutions are used. In our case, all of our analyses were made with 4 by 4-m rasters. Grid scale is a very important factor to consider because at a small scale, there may be no relationship between land forms and curvature values; however, at a large increment, landscapes could have a profoundly large impact on an analysis similar to the one used in this study. The average Type 1 and Type 2 error rates for the RTK dataset were ranked in the follow order (from lowest to highest): FD8 (6%), DEMON (7%), D8 (9%), and D∞(11%). The differences were smaller for the USGS dataset but procedures were ranked in a similar order: FD8 (10%), DEMON (10%), D∞(11%), D8 (12%). The map analysis for the RTK data ( Figure 8) demonstrates that differences between methods were small with the FD8 model showing least noise in the southern part of the field.

Conclusion and Recommendations
The findings of this study indicate that LiDAR data can be used to clearly identify eroded features in agricultural landscapes with a level of accuracy that is similar to RTK GPS and better than USGS DEMs. It is critical that LiDAR data are smoothed prior to modeling ero-sion channels. Smoothing removes artifacts resulting from differences in plant residue heights perpendicular to the direction of travel by farm machinery. These differences can actually cause the terrain analysis flow models to incorrectly rout water along the direction of travel. While smoothing produced better results in this chapter, the method of smoothing with ArcGIS TopoToRaster was not efficient and cannot be used over large areas. More work is needed to determine computationally efficient smoothing algorithms for terrain analysis that minimize artifacts.
Conservation planners and GIS analysts should be able to accurately identify erosion features with the D8, D∞, FD8, and DEMON flow direction algorithm. This is important because previous work was based on TAPES G which is no longer being supported. Further TauDEM can utilize very large blocks of memory to cover extensive land areas, and can also operate on high performance computers. It is important to note that the choice flow algorithm will change the model parameters so it is important that they use the correct model. We recommend the TauDEM software program which uses the D8 and D∞ procedures because it works on 64 bit machines, allows the use of multiple core processer, and works with DEMs up to 4 GB in size.
All analyses performed in this study were based on with 4-m DEMs. Efforts are necessary to better understand the impact of the scale of terrain models on the quality of erosion model predictions. It may also be possible to expand the inference space of these models by including erosion parameters in the analyses obtained from soil surveys.