Dataset labels and the corresponding bands.
In some applications such as construction planning and land surveying, an accurate digital terrain model (DTM) is essential. However, in urban and sub-urban areas, the terrain may be covered by trees and man-made structures. Although digital surface model (DSM) obtained by radar or LiDAR can provide a general idea of the terrain, the presence of trees, buildings, etc. conceals the actual terrain elevation. Normally, the process of extracting DTM involves a land cover classification followed by a trimming step that removes the elevation due to trees and buildings. In this chapter, we assume the land cover types have been classified and we focus on the use of image inpainting algorithms for DTM generation. That is, for buildings and trees, we remove those pixels from the DSM and then apply inpainting techniques to reconstruct the terrain pixels in those areas. A dataset with DSM and hyperspectral data near the U. Houston area was used in our study. The DTM from United States Geological Survey (USGS) was used as the ground truth. Objective evaluation results indicate that some inpainting methods perform better than others.
- digital terrain model (DTM)
- digital surface model (DSM)
- image inpainting
- vegetation extraction
- land classification
There are several ways to obtain DTM. The oldest method is to do this manually by measuring the terrain elevations of some selected points of a given area. The process is time-consuming, tedious, and prone to human errors. In recent years, people have started to use LiDAR to generate DTM. The obtained DTM is in general satisfactory even though the point density may not be very dense as compared to optical stereo imaging approach . Radar has been used as well. It is well known that LiDAR and radar equipment are expensive. Due to availability of low-cost drones, stereo imaging has been gaining popularity. Near infrared (NIR) together with color imagers have been used in recent years to generate DSM. However, due to the presence of vegetation and buildings, some additional processing steps are needed in order to obtain DTM from DSM.
In recent years, hyperspectral images [2, 3, 4] are gaining popularity in various applications, including anomaly detection [5, 6, 7], target classification [8, 9], search and rescue operations , and many others. Due to the availability of hundreds of contiguous spectral bands, accuracies of anomaly detection and target classification have been improved quite significantly. Hyperspectral images can also be used for accurate land cover classification [11, 12, 13, 14, 15, 16]. Many methods have been developed in the past [17, 18, 19] for target detection in hyperspectral images. It will be ideal that hyperspectral images are available for land cover classification so that more accurate DTM can be obtained. However, equipment cost, requirement on data storage, and computational burden are limiting the widespread usage of hyperspectral imagers.
In contrast, low-cost color and NIR images are relatively inexpensive, have low computational cost, and low data storage. If one is given only color (RGB) and near infrared (NIR) images, however, it will be difficult to obtain accurate land cover classification for the following reasons. First, the accuracy of using only RGB and NIR bands for land cover classification is low as compared to that of using hyperspectral images. This point will be clear later in Section 3. Improving land cover classification using only color and NIR images will be a good contribution to the community. In recent years, there are some new developments along this direction. In particular, people have developed methods to synthesize spectral bands from color and NIR images. One technique is known as Extended Morphological Attribute Profile (EMAP) . Several notable applications have appeared in the literature [16, 17]. Second, even after the pixels related to trees and man-made structures are identified and removed from the DSM, we still need to face an important practical issue. How can one recover the missing terrain pixels in the DSM to build a DTM? Conventional approaches use simple interpolation such as bilinear or bicubic interpolations . However, the accuracy of DTM may be compromised. In recent years, there have been new developments in interpolation methods, termed as image inpainting methods. Those recent methods can be categorized into several groups. The first group is similar to bicubic interpolation methods. Some representative methods include bicubic, Laplacian , and inpaint-nans . The second group uses nonlocal sparse representation for inpainting. Well-known methods include Local Matrix Completion Sparse (LMCS) , field of expert (FOE) , and Transformic . The last group is the deep learning-based methods. One representative method is known as generative inpainting (GenIn) .
In this chapter, we propose a low-cost and accurate approach to DTM generation. Suppose we are given a DSM and only the color and NIR images. Our approach consists of four steps. First, we perform land cover classification using only color and NIR images. Various methods can be applied in this step. The key innovation is to apply synthetic spectral bands to enhance the land cover performance. It was demonstrated that the land cover performance using synthetic bands can yield performance very close to that of the hyperspectral image. Second, since there may be more than 10 types of land covers, we observed that it is more accurate to consolidate some of the land cover types into only five groups. Third, the trees and man-made structures are then removed from the DSM. Fourth, various conventional and deep learning inpainting methods are applied to generate the DTM. Comparisons show that GenIn has consistent performance in DTM construction.
This chapter is organized as follows. In Section 2, we will briefly review the methods and data. Section 3 will discuss the land cover classification results and how we consolidate 15 land cover types into only five groups. Section 4 focuses on the various DTM reconstruction results. Finally, some concluding remarks will be given in Section 5.
2. Methods and data
2.1 Land cover classification methods
In this research, we have used the following nine methods for land cover classification. We will not go into the details of each method. Instead, we briefly list the names and provide some references for their sources.
We categorize the methods into three groups. In the first group are simple and efficient methods, including Matched Subspace Detection (MSD) , Adaptive Subspace Detection (ASD) , and Reed-Xiaoli Detection (RXD) . These methods have been used in hyperspectral image processing in the past. In the second group are kernel versions of the first group and they are: Kernel MSD (KMSD) , Kernel ASD (KASD) , and Kernel RXD (KRXD) . The kernel-based algorithms are computationally expensive and may not be suitable for real-time applications. The third group contains Sparse Representation (SR)  algorithm, Joint Sparse Representation (JSR)  algorithm, and Support Vector Machine (SVM) [28, 29] algorithm. In the past, we have used the above three methods in group 3 for soil detection using multispectral images .
2.2 Inpainting methods
We have applied seven methods in this project. They are briefly summarized below:
Bicubic: in a recent paper by researchers at Cyprus, a bicubic interpolation method was used in .
Inpaint_nans: we denote this as “inpaint” in our later experiments. This method was developed by D’Errico . This is a very simple method that only uses the neighboring pixels to estimate the missing pixels, which will be referred as NaNs (not a number).
FOE: the Field of Experts method (FOE) was developed by Roth . This method uses pre-trained models that are used to filter out noise and obstructions in images.
Laplacian: this method  fills in each missing pixel using the Laplacian interpolation formula by finding the mean of the surrounding known values.
Local Matrix Completion Sparse (LMCS) : in LMCS, which was developed by us, a search is performed for each missing pixel to find a pixel with the most similar neighbors. After the search, the missing pixel is replaced with the found pixel. This method performs very well with images containing repeating patterns.
Transformic: the Transformic method was developed by Mansfield . It is similar to the LMCS in that it searches the whole image for a patch that is similar to the neighbors of the missing pixel.
Generative Inpainting (GenIn) : a new inpainting method, Generative Inpainting (GenIn), which is a deep learning-based method , was considered in our research. It was developed at the University of Illinois and aims to outperform typical deep learning methods that use convolutional neural network (CNN) models. GenIn builds on CNN and Generative Adversarial Networks (GANs) in an effort to encourage cohesion between created and existing pixels.
In this section, we briefly introduce EMAP, which has been shown to yield good classification performance when one only has a few spectral bands available. Given an input grayscale image and a sequence of threshold levels , the attribute profile (AP) of is obtained by applying a sequence of thinning and thickening attribute transformations to every pixel in as follows:
where and are the thickening and thinning operators at threshold respectively. The EMAP of is then acquired by stacking two or more APs using any feature reduction technique on multispectral/hyperspectral images, such as purely geometric attributes (e.g., area, length of the perimeter, image moments, shape factors), or textural attributes (e.g., range, standard deviation, entropy).
More technical details about EMAP can be found in [20, 30, 31, 32]. In this work, the “area (a)” and “length of the diagonal of the bounding box (d)” attributes of EMAP  were used. The lambda parameters for the area attribute of EMAP, which is a sequence of thresholds used by the morphological attribute filters, were set to 10 and 15, respectively. The lambda parameters for the length attribute of EMAP were set to 50, 100, and 500. With this parameter setting, EMAP creates 11 synthetic bands for a given single band image. One of the bands comes from the original image.
2.4 IEEE dataset
From the IEEE GRSS Data Fusion package , we obtained the ground truth classification maps, the hyperspectral image of the University of Houston area, and the LiDAR data of the same area. The instrument used to collect the dataset is simply a hyperspectral and LiDAR sensor. The hyperspectral image contains 144 bands ranging in wavelength from 380 to 1050 nm with spatial resolution of 0.25 m. The LiDAR sensor has the same spatial resolution of 0.25 m.
As shown in Table 1, there are a number of datasets used for analysis. The first group is the RGB (band # 60, 30, 22 in the hyperspectral data) and the NIR band (band #103). It should be noted that the above selection of bands is not the same as band selection in the literature . In band selection, the objective is to select the most informative bands out of the available hyperspectral bands. In our case, we are restricted to only having a few bands. We call this group Dataset-4 (DS-4). The second group is the four band group put through EMAP augmentation to produce 44 bands as each band produces 10 other bands in addition to the original band [denoted as Dataset-44 (DS-44)]. The third group is the full hyperspectral image of 144 bands [denoted as Dataset-144 (DS-144)].
|Dataset label||Bands present in the corresponding dataset|
|Dataset-4 (DS-4)||RGB and the NIR bands (respectively bands # 60, # 30, # 22, and # 103 in the hyperspectral data)|
|Dataset-44 (DS-44)||RGB and the NIR bands. Forty bands obtained by EMAP augmentation applied to RGB and the NIR bands|
|Dataset-144 (DS-144)||Hyperspectral dataset|
3. Consolidation of the number of land cover classes
Before studying the performance of inpainting techniques on the IEEE GRSS Data Fusion dataset, in order to create a consensus about the best classification method, the number of classes was reduced from 15 to 5. As shown in Table 1, the first three grass classes (Healthy, Stressed, and Synthetic) were consolidated into simply grass; tree, soil, and water maintained their individual classifications; and then all other classes were grouped into one class as man-made structures. This was done simply because some of the man-made classes—road, highway, railway, and both parking lots—were consistently misclassified and often as the other classes in this group. The same is true for the grass classes. By consolidating the classes, the classification method selection process was made easier. The averages listed in Table 2 are a summation of all non-kernel methods averages. This is shown to illustrate how low performing some types are even among the high-performing methods.
|New class #||Class type||Class #||Avg. accuracy (5)|
|Parking lot 1||12||36.94|
|Parking lot 2||13||36.69|
Table 3, extracted from a recent work , corresponds to the accuracy for the full 15 class models while Table 4 is for consolidated 5 classes. Comparing the two tables, it can be seen that the new class combination results in much improved results in all cases. Each method has an overall improvement of at least 13% and most methods saw an improvement of over 20%. It is clear from Table 4 that JSR clearly stands out as the best performing method. JSR goes from being the best performing method in one band case to every band case as well as overall average when using the new class arrangement. Yet, every band case of JSR returns over 90% accuracy, when previously the smaller number band cases returned results near 50%.
|OA||DS-4 (%)||DS-44 (%)||DS-144 (%)||Avg. (%)|
|OA||DS-4 (%)||DS-44 (%)||DS-144 (%)||Avg. (%)|
It should be noted that the results related to the 44-band (DS-44) case are observed to perform better than the 4-band (DS-4) and 144-band (DS-144) cases. It may be easier to understand why DS-44 is better than DS-4. A simple explanation is that the DS-44 data contain some synthetic spectral information, which enriches the spectral content. The explanation for why DS-144 case is worse than DS-44 case is because there are a lot of redundancies in the various bands in the DS-144 data. The data redundancies appear to cause some conflicts in the classifiers. Other researchers have observed similar behaviors  before and sometimes they call this the curse of dimension.
4. DTM extraction by removing man-made structures and trees
4.1 Ground truth DTM
The ground truth being used is the 1/9 arc second-resolution Digital Elevation map produced by USGS. Additional maps used for comparison in this investigation are the Cloth Simulation Filter (CSF) method  and the 1 arc second-resolution USGS DE map. However, CSF and USGS can only be used for general comparison as their inputs are not dependent on the different numbers of bands. CSF simply uses the LiDAR image while USGS is an already completed product. The three DTMs are shown in Figure 1. It can be seen that the USGS 1/9 arc second map is more accurate.
4.2 Individual inpainting results
The different methods used to compare digital terrain models (DTMs) through inpainting were “inpaint_nans,” “LMCS,” “Laplacian,” “Transformic,” and “CSF.” However, CSF must be considered separate from others as it is not dependent on the same image bands that the other inpainting techniques are dependent on. In this study, the best resolution (1/9 arc second) USGS satellite radar imagery was used as the ground truth.
With a consistently well-performing method available for the composition of DTMs, which is JSR, we now look at the performance of inpainting methods judging against a general ground truth of the USGS Digital Elevation maps. Our goal is to remove Class 2 (trees) and Class 5 (manmade structures) from the DSM. The missing pixels will be interpolated by using inpainting techniques. The names of the methods tested against this ground truth were: inpaint_nans, LMCS, Laplace, Transformic, and FOE. There is also the added variation of downsizing the image four times versus maintaining the full-size image to demonstrate affected accuracy because the downsized results save considerable time.
After JSR classifier is applied to the EMAP images (DS-44) and the man-made objects areas are identified by JSR, these identified man-made and trees areas are removed from the LiDAR image (DSM). Inpainting techniques are then applied to those missing pixel areas in the LiDAR image. The filled-in LiDAR image with inpainting methods corresponds to the estimated DTM. Figure 2 contains the DTMs, generated from the four times downsized DS-44 EMAP images for each method (excluding CSF). The purpose of downsizing by four times was because of computational issues. It took many hours to finish the inpainting for some of the methods. The images from Figure 2 can be compared to the ground truth and fully produced products of CSF and the lower resolution USGS. The LMCS results have issues near the boundary of the image because LMCS cannot handle missing pixels near the image boundary.
Figure 1 displays the full-size ground truth maps. Figure 3 contains the estimated DTMs using full-size DS-44 EMAP images. The inpainting maps in Figure 2 and Figure 3 can also be compared against Figure 1.
Clearly the lower resolution USGS image is not a great product to use for the digital terrain map. However, it is useful to show a low-resolution picture of what the Houston area could look like without classification and inpainting.
To find an objective statistical proof of accuracy of the different inpainting methods, there are five different metrics that can be used. By taking the difference between the DTM of a given inpainting method and the ground truth map (USGS)—then calculating mean, standard deviation, root mean squared, and the min and max of each instance—we can find a general standard of accuracy for each method. The visual observation from LMCS shows that performance is poor on the edges of each map, as is expected given that it does not calculate any inpainting on the edges. The same can also be said to a lesser extent of inpaint_nans. To help alleviate that inaccuracy, a cropped comparison of downsized and full-size versions is conducted for all methods, which gets rid of these problematic areas on the edges.
The performance metrics for the DS-44 case can be seen in Table 5. In the DS-44 case, we observe that two techniques, Laplacian and Transformic, performed better than the rest. While Transformic’s mean value is the smallest, the other four metrics have better values in Laplacian. For comparison purposes, the performance metrics for the CSF method and USGS lower resolution elevation map are also included in Table 6. Overall, CSF performs pretty well for the mean; however, because of the non-removed bridge, all other metrics are relatively poor performing. The 1 arc second resolution USGS image performs poorly in all accuracy categories. It can be also noticed from Table 6 that these values for CSF and USGS are worse than the best performing individual cases that are shown in Table 5.
|CSF||USGS 1 arc second|
4.3 Fusion of different inpainting results
In an effort to improve the inpainting performance metrics, three different fusion methods are utilized. The pixel level fusion methods were used in . For the first fusion method, alpha trimmed mean filter (ATMF), the worst and best performing methods for a given accuracy measurement are removed and then the three in-between results are averaged before re-taking the accuracy measurements to see how the results were improved. The second fusion method, weighted method, weighs each method based on a specific accuracy measurement and averages those results. The final fusion method, F3, simply averages the three best performing methods for each accuracy measurement.
In order to perform these operations, it was necessary to rank each of the methods based on the three main accuracy measurements: mean, standard deviation (STD, also denoted in other tables as sigma), and root mean squared (RMS). This was done for exclusively DS-44 results. It includes both the full-sized results and the four times down-sampled results. Table 7 shows the performance rankings for various combinations of band number with respect to three performance metrics: mean, sigma, and root mean squared (RMS). From the results in Table 7, it is clear that overall the downsized results return much more accurate values than the full-sized values in most cases.
|1||4 × T||0.08||1||4 × LP||0.58||1||4 × LP||0.67|
|2||4 × LMCS||0.24||2||4 × FOE||0.64||2||4 × T||0.67|
|3||4 × LP||0.34||3||Full T||0.65||3||4 × FOE||0.74|
|4||4 × FOE||0.38||4||4 × nans||0.74||4||4 × nans||0.84|
|5||4 × nans||0.39||5||4 × LMCS||0.82||5||4 × LMCS||0.85|
Table 8 shows the performance metrics for the DS-44 case when three different fusion methods were applied to the best five individual inpainting methods’ results where the ranking was conducted with respect to three performance metrics separately (mean, STD, and RMS). F3 produces the lowest mean value and relatively lower sigma and RMS values in comparison to others.
Table 9 shows the performance metrics for the special case (which we name as combo) when three different fusion methods are applied to the best five individual inpainting methods’ results from both 44-bands inpainting results where the ranking is conducted with respect to three performance metrics separately (mean, STD, and RMS). In this case, F3 method produces lower mean, sigma, and RMS values with respect to ranking according to the mean performance metric.
Table 10 shows a summary of the best performing individual cases (no fusion) and the best performing cases with fusion for DS-44 case with fusion. From Table 10, it can be noticed that the F3 (with respect to ranking according to lowest mean) improves the RMS value slightly when compared with the RMS values of the best performing individual inpainting method results (Laplacian and Transformic in DS-44). However, when all performance metrics are considered as a whole, we cannot clearly state F3 performs the best in all performance accuracy metrics but improves a few of the parameters.
|No fusion||With fusion|
|Metric||Best DS-44 (Laplacian)||Best DS-44 (Transformic)||Best DS-44 (F3-mean)|
4.4 Comparison with deep learning inpainting
Using the pre-trained model provided by the GenIn package  for an image the size of about 350 by 1900 pixels (that covers the University of Houston campus and surrounding area), the computation time observed is roughly 2 minutes.
The accuracy of the GenIn model as compared to the other inpainting techniques is competitive and, cases, if not overall, is a more accurate result. In Table 11, statistics from GenIn together with the statistics from other inpainting techniques for the IEEE dataset are provided. In regards to mean, root mean square (RMS), and the maximum difference (max), GenIn outperforms all other techniques. For the sigma metric, it is the second-best performing method. The minimum difference accuracy measure is the only underperforming value coming in as the fourth best performing statistic. However, it is still the second-best value available, closely trailing the other three techniques.
|4 × crop||Inpaint-nans||LMCS||Laplacian||Transformic||FOE||GenIn|
It is also helpful to visualize GenIn’s digital terrain map estimation as compared to the ground truth. In Figure 4, an image of the U. Houston area can be observed after GenIn is applied on that area’s LiDAR data. Figure 5 corresponds to the USGS 1/9 arc second Digital Elevation map that is used as the ground truth for the area.
The GenIn-generated results are found to be a very close reproduction of the ground truth. In some instances, it is observed that it provides more realistic results than the ground truth. As an example, in the horizontal right and vertical center of the plot in Figure 5, there is a deep dark spot that is observed, which is to be denoted as a low spot. However, this is caused because of a highway bridge that runs over a railway and could be considered a miscalculated section of the Digital Elevation map. The GenIn-generated map produces no such deep dark spot and instead smoothly removes the bridge and because it does this, it then slightly suffers in the resultant accuracy statistics.
In this research, we investigated the feasibility of using only color and NIR images for accurate DTM extraction. We assume the DSM is also available. Our approach involves several steps. The first step is to use color and NIR images for land cover classification. After some extensive experiments, it was observed that using only four bands cannot achieve accurate land cover classification. A morphological filtering approach was applied to generate synthetic spectral bands. Using nine land cover classification algorithms, it was observed that the use of synthetic bands significantly improved the land cover classification accuracy for the well-known IEEE dataset. The second step is to consolidate the many land cover types into only five groups. This was observed to further improve the accuracy. The third step is to apply nine inpainting algorithms to recover DTM from DSM. It was observed that the deep learning algorithm yielded more consistent performance.
Here, we also briefly mention a few future research directions. One direction is to focus on DSM generation using color images. The second direction is to obtain ortho-rectified images for the color and NIR images. A third direction is to build a software prototype that integrates DSM generation tool, ortho-rectification tool, land cover classification tool, and DTM reconstruction tool.
This research was supported by DOE under contract # DE-SC0019936.