Road Feature Extraction from High Resolution Aerial Images Upon Rural Regions Based on Multi-Resolution Image Analysis and Gabor Filters

Accurate, detailed and up-to-date road information is of special importance in geo-spatial databases as it is used in a variety of applications such as vehicle navigation, traffic management and advanced driver assistance systems (ADAS). The commercial road maps utilized for road navigation or the geographical information system (GIS) today are based on linear road centrelines represented in vector format with poly-lines (i.e., series of nodes and shape points, connected by segments), which present a serious lack of accuracy, contents, and completeness for their applicability at the sub-road level. For instance, the accuracy level of the present standard maps is around 5 to 20 meters. The roads/streets in the digital maps are represented as line segments rendered using different colours and widths. However, the widths of line segments do not necessarily represent the actual road widths accurately. Another problem with the existing road maps is that few precise sub-road details, such as lane markings and stop lines, are included, whereas such sub-road information is crucial for applications such as lane departure warning or lane-based vehicle navigation. Furthermore, the vast majority of roadmaps aremodelled in 2D space, whichmeans that some complex road scenes, such as overpasses and multi-level road systems, cannot be effectively represented. In addition, the lack of elevation information makes it infeasible to carry out applications such as driving simulation and 3D vehicle navigation.


Introduction
Accurate, detailed and up-to-date road information is of special importance in geo-spatial databases as it is used in a variety of applications such as vehicle navigation, traffic management and advanced driver assistance systems (ADAS). The commercial road maps utilized for road navigation or the geographical information system (GIS) today are based on linear road centrelines represented in vector format with poly-lines (i.e., series of nodes and shape points, connected by segments), which present a serious lack of accuracy, contents, and completeness for their applicability at the sub-road level. For instance, the accuracy level of the present standard maps is around 5 to 20 meters. The roads/streets in the digital maps are represented as line segments rendered using different colours and widths. However, the widths of line segments do not necessarily represent the actual road widths accurately. Another problem with the existing road maps is that few precise sub-road details, such as lane markings and stop lines, are included, whereas such sub-road information is crucial for applications such as lane departure warning or lane-based vehicle navigation. Furthermore, the vast majority of road maps are modelled in 2D space, which means that some complex road scenes, such as overpasses and multi-level road systems, cannot be effectively represented. In addition, the lack of elevation information makes it infeasible to carry out applications such as driving simulation and 3D vehicle navigation.
Traditional methods for acquiring road information include i) ground surveying and ii) delineating roads from remotely sensed imagery (Zhang & Couloigner, 2004). Ground surveying can be carried out by using devices such as total stations and GPS receivers. As both devices are point-based, rendering this method labour-intensive and time-consuming, and therefore more suitable for detailed road surveying for small areas rather than for large-scale road mapping. Road information can be delineated from remote sensing images in three ways: i) manual delineation, ii) semi-automated extraction, iii) and fully automated detection. Manual extraction of roads from remotely sensed imagery is a simple stretching operation. However, the operation is impractically time consuming when the scenes are very complex. In addition, not only are such complex maps required for large geographic areas, frequent updating is also needed. In the semi-automatic road extraction method, approximations or seed points are given manually followed by an automatic algorithm which uses these approximations as input to enable them to automatically extract the road. Approximations can be a starting point, an ending point, intermediate points, road directions, road widths, and prior knowledge from a GIS database (Zhang, 2003). Full automatic road feature extraction is pursed by automating the selection of the necessary initial information.
As well as the advancement of innovative sensors and platforms, road network spatial information can be acquired from aerial and satellite imagery, synthetic aperture radar (SAR) imagery, airborne light detection and ranging (LiDAR) data, and from image sequences taken from ground-based mobile mapping systems (MMS) with different spatial and spectral resolutions (Quackenbush, 2004). Aerial images and LiDAR point clouds are promising data sources for generating road maps and updating available maps to support various activities and missions of government agencies and consumers (Mokhtarzade & Zoej, 2007). However, it has often been the case that while large amounts of high resolution aerial images and dense LiDAR data are being collected, piled up and remain unprocessed or unused, new data sets are continuously being gathered. This phenomenon is caused by the fact that development of automatic techniques for processing aerial imagery and LiDAR data is far behind that of the hardware sensor technologies. Object extraction for full exploitation of these data sources is very challenging. There are more challenges for automatic road information extraction in urban areas due to its much more complex circumstances.
Research on road feature extraction from aerial and satellite images can be traced back to the 1970s (Bajcsy & Tavakoli, 1976). Over three decades, a large number of automatic and semi-automatic algorithms have been attempted. Although many different approaches have been developed for the semi-automatic or automatic extraction of road information, none of these can solve all the problems without human interactions. This is because of the wide variations of roads (urban, rural, precipitous) and the complexities of their environment (occlusions caused by cars, trees, buildings, shadows etc.) (Poullis & You, 2010). It is worth noting that the existing road feature generation algorithms are all task-based and data-based. For instance, road surfaces have a quite different appearance from pavement markings; thus, approaches that are suitable for road surface extraction usually cannot be applied in the detection of pavement markings without modification. Due to the inherent difference in the data style, methods utilized for road extraction in aerial images may not be appropriate for LiDAR data sets. Therefore, in this work, an effective road information extraction system, which deals with road features in rural and urban regions respectively, is proposed based on very high resolution (VHR) aerial images.
The research is structured to present the main contributions as follows. Section 2 provides a review of the relevant work published over the past 20 years. Road feature extraction for rural and urban areas from high spatial resolution remotely sensed imagery is discussed separately in this section. In Section 3, an effective road network extraction method is presented. The homogeneity histogram thresholding algorithm utilized to detect road surface from VHR aerial images, and detected road features are then thinned and vectorized to reconstruct the digital road map. A novel road surface and lane marking extraction approach is presented in Section 4, which detects road surface from VHR aerial images based on support vector machine (SVM) classification method, and the lane markings are further generated using 2D anisotropic Gaussian filter as well as Otsu's thresholding algorithm. Concluding remarks and future work recommendations are given in Section 5.

Review of the related work
The review conducted by Mena (2003) cites more than 250 road extraction studies, and classifies different road extraction approaches based on three principal factors: i) the preset objective, ii) the extraction technique applied, and iii) the type of sensors utilized. Although the developed approaches exhibit a variety of methodologies and techniques, different categorizations for road extraction work can still be sought in order to better match the available data and methods to its ultimate purpose. In this review, we consider the use of major state-of-the-art data sources, aerial imagery, airborne LiDAR data, and categorize the existing road extraction methods into two classes, i) road detection in rural or non-urban regions, and ii) urban area road extraction. As the aerial imagery and LiDAR data are usually collected in the same flight missions, the extraction of road information from LiDAR data only is uncommon. This review is by no means exhaustive; instead, it focuses mainly on commonly used road extraction techniques.
Subsection 2.1 examines the work on rural area road extraction, and the review of road detection in urban regions is presented in Subsection 2.2. In addition, a brief summary of the road pavement marking extraction algorithms is provided in Subsection 2.3. Last but not least, the qualitative and quantitative evaluation of results is reviewed in Subsection 2.4.

Rural road extraction techniques
Roads in rural or non-urban areas have characteristics such as constant widths, continuous curvature changes, and homogeneous local orientation distributions, which can moderate the complexity of their extraction. Basically, rural road extraction approaches, either semi-automatic or automatic, can be classified into i) artificial intelligent, ii) multi-resolution analysis, iii) snakes, iv) classification, and v) template matching.
An automatic road verification approach based on digital aerial images as well as GIS data is developed in (Wiedemann & Mayer, 1996) as a part of the update procedure for GIS data. The candidates for roadsides, which are obtained by searching the surroundings of GIS road-axes in the image based on profiles, are tested, and a measure of confidence is also calculated. However, user interaction is still required, as the results of the method are far from perfect. Roads that do not exist in the GIS data will not be detected.
In (Doucette et al., 2001), a fully automated road extraction strategy based on Kohonen's self-organizing map (SOM) is proposed to detect road information in high-resolution multi-spectral aerial imagery. The core algorithms implemented include i) anti-parallel edge centerline extractor, ii) fuzzy organization of elongated regions, and iii) self-organizing road finder. A covariance-based principal component analysis (PCA) is performed to determine the intrinsic dimensions of the image bands, and to classify the image using a maximum likelihood classifier with manually selected training samples. The extraction results over several different areas and sensors show that the highest extraction quality and correctness rates are from anti-parallel edge analysis of spectral band and class layers, respectively. Rellier et al. (2002) propose a model to locally register cartographic road networks on SPOT satellite image based on Markov random fields (MRF) so as to correct the errors and improve map accuracy. The method first translates the road network into a graph where the nodes are characteristic points of the roads. Then local registration is performed by defining a model in a Bayesian framework. One interesting point of the model is that the registration is done locally, which is very useful when the map exhibits local errors. The biggest problem with the model is still the computational time, which remains too long due to the frequency of computations of the path between nodes.
To extract roads from aerial images, Amo et al. (2006) employ the region competition algorithm, a mixed approach which combines region growing techniques with active contour models. Region growing makes the first step faster and region competition delivers more accurate results. However, this method is appropriate for roads in agricultural fields only, where roads are quite homogeneous and their homogeneity is sufficiently different from that of their surroundings. Mayer et al. (1998) utilize the ribbon snake for the extraction of salient roads from aerial images based on the extracted lines at a coarse scale and the variation of road width at a fine scale. Non-salient roads are extracted by connecting two adjacent ends of salient roads with a road hypothesis, which is then verified based on homogeneity and the constancy of width. Finally, a closed snake is initialized inside the central area of the junction and expanded until delineating the junction borders. Mayer's method can overcome some problems such as extraction of shadowed and occluded roads, but it cannot deal with the complex road scenario in urban areas. Laptive et al. (2000) use ribbon snakes to remove irrelevant structures extracted by a preliminary line detection algorithm at a coarse resolution. The method initializes a ribbon snake for each line detected and sets the width property to zero. The snake positions are optimized at a coarse scale to get a rough approximation of the road position. A second optimization process is used at a finer scale where the road position precision was increased and the width property expanded up to the structure boundary. Finally, road width thresholding is applied in order to discard any irrelevant structures.
A prior work for road detection based on image segmentation is conducted by Wang and Newkirk (1988), where a system is developed for automated highway network extraction from Landsat Thematic Mapper (TM) imagery supported by knowledge analysis and expert system. Three steps are involved in the system: i) binary image production, ii) tracing and feature extraction, and iii) highway identification. K-means clustering is employed to classify the image into two categories: road and non-road features. Analysis and processing are then performed on the linear patterns which are generated by labeling the binary image using a tracing algorithm. The proposed method is fairly simple and fully automatic, but the experiments are limited to the extraction of highways in rural areas. Amini et al. (2002) utilize a segmentation method called the split and merge algorithm to automatically extract roadsides from large-scale image maps. The proposed method consists of two stages: i) straight lines extraction, and ii) roads skeleton extraction. The authors firstly generate a simpler image by grey scale morphological algorithms. Then the split and merge algorithm is applied on the simplified image, which is converted to a binary image. After that, the binary image map objects are labeled using the connected component analysis (CCA), and the skeletons of roads are extracted in the classified image by morphological operations. The roadsides are finally extracted by combining the skeleton of roads and the generated straight line segments. Steger et al. (1995) propose a multi-resolution road extraction approach, where a different extraction method is utilized for each scale level. One method is applied on a fine scale with 25 cm GSD, while the other is applied at a lower resolution, which is reduced by a factor of eight. The larger scale method extracts roads based on a structural model matching technique, while the smaller scale method detects lines based on the image intensity level. Finally, the outputs are combined by selecting roads that are extracted at both levels.
An approach based on particle filtering is proposed in (Ye et al., 2006) to automatically extract roads from high resolution imagery. The road edges are extracted by the Canny detector, then the edge point distribution and the similarity of grey value are integrated into the particle filter to deal with complex scenes. To handle road appearance changes, the tracking algorithm is allowed to update the road model during temporally stable image observations. Baumgartner et al. (1999) extract roads from multi-resolution images based on the work of Heipke (1995). In this paper, they emphasize the concept of "road model" comprising explicit knowledge about geometry, radiometry, topology, and context. They firstly segment the aerial image into global contexts (forest, rural and urban) to guide the extraction process in the various regions. In the coarse image, the line features are extracted using Steger's algorithm (1998). In the fine image, parallel edges are extracted and grouped into rectangles, which are then connected into the road segments. Finally, roads are generated through grouping road segments and closing gaps between them. Dal-Poz et al. (2005) present an automatic method for road seed extraction from medium and high resolution images of rural scenes. The road-sides candidates are firstly detected by the Canny edge detector; the road objects are then built based on a set of rules constructed from a prior road knowledge. The rules used to identify and build road objects consist of anti-parallelism, parallelism and proximity, homogeneity, contrast, superposition, and fragmentation. Due to incompatibility with any road objects, road crossings cannot be extracted.

Road extraction in urban areas
Roads in urban areas have some unique characteristics absent in rural areas. There are often many shadows and occluded regions on road surfaces in urban areas due to the obstruction of tall buildings, vehicles, and trees. Furthermore, the contrast between roads and surrounding objects deteriorates significantly, since roads, side-walks, building roofs, and parking lots are usually constructed using similar materials, such as concrete and asphalt. Therefore, road extraction in urban areas cannot copy or enhance the methods and procedures which have been effective in the rural road extractions, such as the algorithms discussed above. Instead, it is necessary to develop an automatic system that can extract road information accurately as well as deal with the effects of background objects like cars, trees, or buildings. The key techniques used to reconstruct the urban road model include road tracking, segmentation and classification, mathematical morphology, and model based road extraction, which will be depicted in detail in the following paragraphs. Shukla et al. (2002) applies a path-following method to extract road from high-resolution satellite imagery by initializing two points to indicate the road direction. Scale space and edge-detection techniques are used as pre-processing for segmentation and estimation of road width. The cost minimization technique is used to determine the road direction and generate the next seeds. This method performs better than the work of (Kim et al., 2002) because it can generate seeds in different directions at intersections. The limitations are that the algorithm may not work on roads on which shadows are cast. Zhao et al. (2002) imposes a semi-automatic method by matching a rectangular road template with both road mask and road seeds to extract roads from IKONOS imagery. A road mask is the road pixels generated from maximum likelihood classification, and the road seeds can be generated by tracing the long edge of the road mask. The problem is all of the extracted road masks are not road area, and not all the extracted long edges are road edge; this results in misclassification. Kim et al. (2004) initializes one seed point on the centerline of the road to determine the position of the reference template. The orientation of the road centerline, which is calculated with Burn's algorithms, guides the optimal target window. A least square template matching approach, which puts emphasis on the central part of the road, is utilized to determine the new location of the next road template. The limitations of this algorithm are i) that it cannot work with shadows, which may terminate the tracking process, ii) that the operator must select the initial seeds on road central lines, and iii) that one seed can be used to extract only one direction, leading to too many seeds when the scene is large and complex. Hu et al. (2004) present a semi-automatic road extraction method based on a piecewise parabolic model with zero-order continuity, which is constructed by seed points placed by a human operator. Road extraction becomes a problem of estimating the unknown parameters for each piece of the parabola, which could be solved by least square template matching based on the deformable template and the constraint of the geometric model. In densely populated areas, where roads have sharp turns and orthogonal intersections, many seed points need to be located, which results in degrading the efficiency. Shi and Zhu (2002) propose an approach to extract road network in urban areas from high-resolution satellite images. The basic procedures include binary image production by a threshold selection interactively, and a line segment match for road network processing. Binary image production is not automatic and the threshold parameter may change with the variation of image input, so it lacks a degree of automatic process and robustness, and further improvement is required. Grey-scale mathematical morphology is tested as one of the potential solutions in the proposed approach. Haverkamp (2002) extracts road centerlines in urban areas from road segments and intersections based on size, eccentricity, length of the object and spatial relationships between neighboring intersections. A vegetation mask is derived from multi-spectral IKONOS imagery, and these objects are generated by grouping pixels with similar road directional information, based on texture analysis in a panchromatic IKONOS imagery. This method requires the predetermination of road width, which is tuned to detect roads with a specific level of contrast and a low along-road variance.
Two novel methods are developed in (Wang, 2004) to extract roads from high-resolution satellite images. One is a semi-automated road extraction method based on profile matching optimized by an auto-tuning Kalman filter, and the other is based on edge-aided multi-spectral classification. Experimental results from several aerial images show that both methods could accurately extract road networks from IKONOS and QuickBird satellite images, and could significantly eliminate the misclassification caused by small driveways, house roofs connected with the road networks, and extensive paved grounds.
Based on the fact that structural information obtained using mathematical morphological operators can provide complementary information to improve discrimination of different urban features that have a spectral overlap, Jin and Davis (2004) present applications of mathematical morphology for urban features extraction from high-resolution satellite imagery. To efficiently extract the road networks, directional morphological filtering is exploited to mask out those structures shorter than the distance of a typical city block. Directional top-hat operation is employed to mask out bright structures shorter than a city block. Similarly, dark structures shorter than a city block could be marked out by thresholding on the directional bottom-hat images. Zhu et al. (2005) extract road network from 1-meter spatial resolution IKONOS satellite images based on the mathematical morphology and a line segment match method. The authors firstly generate the binary road image by adopting morphological leveling. Secondly, the coarse road network is detected using the proposed "Line Segment Match Method", which determines straight parallel line segments corresponding to roads. The holes are finally filled by using mathematical morphological operation. The proposed algorithm is based on the assumption that roads are a darker tone compared with the surrounding features, which may induce some problems in different situations. Valero et al. (2010) propose a method for extracting roads in very high resolution (VHR) remotely sensed images, based on the assumption that roads are linear connected paths. Two advanced directional morphological operators, path opening and path closing, are utilized to extract structural pixel information; these remain flexible enough to fit rectilinear and slightly curved roads segments, due to their independence from the choice of a structural element shape. Morphological profiles are used to analyze object size and shape features so as to determine candidate roads in each level, since the morphological profiles of pixels on the roads are similar. Finally, a classical post-processing is employed to link the disconnected road segments using higher level representations (Tupin et al., 1998).
A Gibbs point process framework, which is able to simulate and detect thin networks from remotely sensed images, is constructed in (Stoica et al., 2004) to form a line-network for the road segments connection. The estimate for the network is found by minimizing an energy function. In order to avoid local minima, a simulated annealing algorithm based on a Monte Carlo Dynamics is utilized for finite point processes.
Based on Gaussian scale-space theory, a Gaussian comparison function is developed for extracting the linear road features from urban aerial remote sensing images (Peng & Jin, 2007). The curvilinear structures of the roads are verified, grouped and extracted, based on locally oriented energy in continuous scale-space combining the geometric and radiometric features. The system can significantly reduce computation complexity in the line tracking, and can effectively depress the zero drift caused by Gaussian smoothing, comparing with other edge-based line detection algorithms. The proposed curvilinear feature detection method is tested to be superior to the Canny operator and the Kovesi detector, in that it can detect not only urban highways but also the non-salient rural roads. Peng et al. (2008) update digital road maps in dense urban areas by extracting the main road network from VHR QuickBird panchromatic images. A multi-scale statistical data model, which integrates the segmentation results from both coarse and fine resolution, is employed to overcome the difficulties caused by the complexity of information contained in VHR images. Furthermore, an outdated GIS digital map is utilized to provide specific prior knowledge of the road network. The experiments indicate that the combination of generic and specific prior knowledge is essential when working at full resolution.

Lane marking extraction techniques
The popular method for road pavement marking reconstruction is through a vehicle-based mobile mapping system (MMS), where the road lane markings can be detected and reconstructed in the field using laser scanners or close range photogrammetric imagery. Due to the difference in devices used and types of features fused, approaches developed for lane feature extraction have been quite distinct from one another. For instance, lane markings are extracted based on structures (Lai & Yung, 2000), image classification (Jeong & Nedevschi, 2005), and frequency analysis (Kreucher & Lakshmanan, 1999). An exhaustive review of road marking reconstruction approaches using MMS can be seen in (Soheilian, 2008). Although accurate lane features can be obtained through MMS, it is costly and time-consuming to produce lane data over large areas.
Lane information reconstruction through feature extraction from remote sensed images has been a long-standing research topic within the photogrammetry and remote sensing community. However, due to the limitation of the ground resolution of images, the majority of existing approaches concentrate on the detection of road centerline rather than sub-road details. Research efforts have been focused in a number of institutions, resulting in various approaches to the problem, including multi-scale approaches (Baumgartner et al., 1999), knowledge-based extraction (Trinder & Wang, 1998) and context cues (Hinz & Baumgartner, 2000).
Only a few approaches involve the detection of lane marking in road extraction. Steger et al. (1997) extract the collinear road markings as bright objects with the algorithm given in (Steger, 1996) in large scale photographs when the roadsides exhibit no visible edges. Only the graph search strategy is adapted to extract road markings automatically, and a best-first search from a few salient road markings is also utilized. The strategy adds the road marking to the best connection evaluation only, which would add a global evaluation step following each marking, and try to add a new road marking if the directions of the road markings are not extracted perfectly.
In a more recent work, Kim et al. (2006) build a system to extract pavement information in complex urban areas relying on a set of simple image processing algorithms. The pavement information included land and symbol markings that guide direction, and the geometric properties of the pavement markings and their spatial relationships are analyzed. Moreover, road construction manuals and a series of cutting-edge algorithms, including template matching, are involved in the analysis. The evaluation of accuracy by comparing the data with manually plotted ground truth data validate that road information can be extracted efficiently to an extent in a complex urban region. Tournaire et al. (2009) propose a specific approach for dashed lines and zebra crossing reconstruction. This approach relies on external knowledge introduced in the detection and reconstruction process, and is based on primitives extracted in the images. The core of the approach relies on defining geometric, radiometric and relational models for dashed lines objects. The model also deals with the interactions between the different objects making up a line, which means that the algorithm introduces external knowledge taken from specifications. To sample the energy function, the authors also use Green's algorithm, complete with a simulated annealing, to find its minimum.

Result evaluations
Internal diagnosis and external evaluation for the extracted road models are two important aspects of assessment of the relevant automatic road extraction system (Wiedemann et al., 1998). However, relatively little work has been carried out in this area.
In (Heipke et al., 1997) and (Wiedemann et al., 1998), an external evaluation approach of automatic road extraction algorithms is developed by comparison of these to manually plotted linear road axes used as reference data. The quality measures proposed for the automatically extracted road data comprise completeness, correctness, quality, redundancy, planimetric RMS differences, and gap statistics, and are all aimed at exhaustive evaluation as well as assessing geometrical accuracy. The proposed evaluation method is tested by comparing evaluations of three different automatic road extraction approaches, and demonstrating its applicability.
An in-depth usability evaluation of a semi-automated road extraction system is presented in (Wilson et al., 2004), highlighting both strengths and areas for improvement. The evaluation is principally conducted on the timing and statistical analysis as well as on factors that affect the extraction speed. Peteri et al. (2004) present a method to guide the determination of a reference based on statistical measures from several image interpretations. A tolerance zone representative of the variations in interpretation is defined that allows both the determination of the uncertainty of the reference object and the possibility of defining criteria for a quantitative evaluation. A few criteria defined by Musso and Vuchic (1988), including the size, form, and topology indices of the road network, are employed to carry out evaluation of the planimetric accuracy and the spatial characterization of a road network.
To qualitatively evaluate the performance of the semi-automatic road extraction algorithms, four criteria (correctness, completeness, efficiency, and accuracy) are utilized in (Zhou et al., 2006) and further in (Zhou et al., 2007). Completeness and correctness are the priority criteria in cartography, while the efficiency measurement principally takes the savings of human input into consideration. Tracking accuracy is assessed as the root mean square error between the road tracker and the human input.
To sum up, the typical result evaluation approach for road extraction has been carried out by comparing the generated roads with manually plotted reference data. Correctness and completeness are the two most frequently used criteria, while other measurements are dependent on specific road extraction algorithms and objectives.

Road extraction in rural regions
In this section, we developed a new approach for automatic road network extraction, where both spatial and spectral information from aerial photographs or pan-sharpened QuickBrid images is systematically considered and fully used. The proposed approach is performed by the following three main steps: (i) the image is classified based on homogeneity histogram segmentation to roughly identify the road network profiles; (ii) the morphological opening and closing is employed to fill tiny holes and filter out small road branches; and (iii) the extracted road surface is further thinned by a thinning approach, pruned by a proposed method and finally simplified with Douglas-Peucker algorithm.  As a popular technique for image segmentation, histogram based thresholding only takes the occurrence of the gray level into account without any local information. But the segmentation based on the property of image homogeneity involves both the occurrence of the gray levels and the neighbouring homogeneity value among pixels; thus it will be employed in this study to obtain a more homogeneous segmentation result. Gaussian smoothing algorithm is then applied to this obtained homogeneity histogram, which can, in turn, ease the threshold finding procedure for segmentation. After achieving image segmentation, morphological opening and closing is utilized to remove small holes and noise from the road surface as well as narrow pathways connected to the main road. Then a thinning method is further applied to extract the skeleton of the road network. Finally, the generated road network is vectorized, and then pruned and simplified respectively by a proposed pruning method and Douglas-Peucker algorithm. Fig. 1 illustrates the flowchart for the developed approach. Basically, the performance includes two individual processes, namely, image segmentation and road network extraction, which will be elaborated in the following sections.

Image segmentation
Road network is detected using homogeneity histogram segmentation, which comprises the following two basic operations: contrast stretching, homogeneity histogram construction and smoothing.

Contrast stretching
Colour images can be represented by linear RGB colour space or their non-linear transformation of RGB, e.g. HSI (hue, saturation and intensity). It is, in general, easier to discriminate highlights and shadows in a colour image by using the HSI colour space than the RGB colour space, but the hue is rather unstable at low saturation and makes the segmentation unreliable. Although the three basic RGB components are highly correlated in RGB colour space, the latter is applied in this paper due to its efficiency in distinguishing small variations in colour.
All of the RGB channels, especially the blue channel, in an original aerial photo ( Fig. 2 (a)) have relative contrast deficiency which will impose challenges to the segmentation process. Therefore, contrast stretching is individually applied to each channel by assigning 5% and 95% in the histogram as the lower and upper bounds over which the image is to be normalized. It is clear that the contrast stretched images (shown in figure 2 (b), (c) and (d))have significantly higher contrast than the original RGB channels.

Homogeneity histogram construction
A general concept of the homogeneity histogram is referred to Cheng (2000). The homogeneity histogram takes into account not only the gray level but also spatial information of pixels with respect to each other. Therefore, homogeneity histogram thresholding tends to be more effective in finding homogeneous regions than histogram thresholding approaches.
The homogeneity vector of the pixel with its eight neighbours is calculated by Z-function, allowing the homogeneity histogram to be defined by normalization of the homogeneity vector. The normalized homogeneity histogram for Red, Green and Blue channels are shown in Fig. 3.
It is still difficult to detect the modes of homogeneity histogram in the above normalized homogeneity histogram when they are corrupted by noise. Therefore, once the homogeneity histogram for R, G and B channels are established, Gaussian filter is firstly applied to smooth them, instead of finding the thresholds directly by a complex peak finding algorithm proposed by Cheng (2000). In Gaussian filtering process, the spread parameter σ, which determines the  Lin et al. (1996). Each peak in the homogeneity histogram represents a unique region. Accordingly, the valleys in the homogeneity histogram can be used as the thresholds for segmentation, as they can be easily found in the smoothed homogeneity histogram (see Fig. 4).
Each colour channel is segmented using the above obtained thresholds separately, and then all three segmented channel images are fused to yield the final result of segmentation (see e.g., Fig. 5). It is observed from Fig. 5 (d) that almost all the road networks are correctly extracted, but there are still many small family driveways connected to road networks and many house roofs are misclassified into the road network. These make it impossible to obtain an accurate road network without further processing.

Road network extraction
Up until now we have obtained the segmented result for road objects (see e.g. Fig. 6(a)), but the probability of misclassification is still relatively high and many small holes enclose the main road network. These holes and pathways must be removed to correctly extract the road skeleton. In this section, a novel road network extraction approach is developed to accurately extract road networks from a segmented road image. This extraction process includes two main steps: morphological operation and thinning and vectorization.

Morphological operation
Mathematical morphology is a structure-based mathematical set theory that uses set operations such as union, intersection and complementation, so it is favoured for high-resolution image processing (Mohammadzadeh et al., 2006). Connected component analysis is firstly used to group pixels into different components based on pixel connectivity, then components whose surface area are smaller than a given threshold will be removed. The filtered image is shown in Fig. 6 (b), it can be clearly seen that all the misclassified objects unconnected to the main road network were removed. Morphological closing is then applied to remove small holes and noise from the road surface, while an opening operation is used to eliminate small pathways with a structuring element size that is smaller than the main roadâȂŹs width but larger than those of the pathways, resulting in the extracted road network as shown in Fig. 6 (c).

Thinning and vectorization
After the morphological operation, we further employ the thinning algorithm proposed by Wang and Zhang (1989) to extract the road skeleton, where the real road is replaced by its centreline with representation by a pixel. To remove short dangling branches of the centrelines caused by driveways, a novel pruning algorithm is performed as follows.  The pruning algorithm includes three steps: Step 1 Find all the intersection points 1. Scan the image (up to bottom, left to right), if current pixel P has more than three foreground neighbours, namely, Step 2 Line tracking 1. If there is no intersection point in the image, then go to 3. 2. Tracking lines from the intersection point.
(a) Start from the intersection point P found in Step 1, initialize n (number of P's feature points) arrays to store lines started from P. (b) Set the current tracking pixel to background after storing its position into the array, go on using the condition in Step 1 to find the next pixel on current tracking line until moving to the endpoint or other intersection point. 3. Tracking lines from endpoint.
(a) Scan the image (up to bottom, left to right). (b) Find the endpoint, start line tracking from it and set the pixels on the line to background (endpoint's number of feature point is 1 using the condition in Step 1). (c) Go on scanning until to the end of the image.
Step 3  Finally, Douglas-Peucker simplification algorithm, which not only decreases the number of data points but also retains the similarity of the simplified shape to the original one as close as possible, is employed to the pruned line network. The whole procedure of vectorization and simplification is shown in Fig. 8. The vectorization process consists of two steps: intersection point searching and line tracking, followed by small lines pruning and simplification. The final result is shown in Fig. 9. It can be seen that this approach works quite well that all the small road branches are removed.

Experimental results and evaluation
In order to demonstrate the efficient performance of the proposed procedures outlined in this paper, two additional experiments have been implemented from the QuickBird satellite images, and their extraction accuracies are also evaluated. The final road network extracted using the proposed method is shown in Figure 10. Almost all the main roads are correctly extracted. However, the developed method is still experiencing difficulties in road extraction from the images where indistinct contrast between the road surface and its surroundings, as well as shadows, exist. This is another important research topic to be resolved. Figure 9 98.5% 96.2% 94.7% Figure 10 (a) 98.8% 99.3% 98.1% Figure 10 ( Basing on the method developed by Wiedemann (1996) for evaluating automatic road extraction systems, we use three indexes to assess the quality of the generated road network. The completeness is defined as the percentage of the correctly extracted data over the reference data and the correctness represents the ratio of correctly extracted road data. The quality  is a more general measure of the final result combining the completeness and correctness.

Variables Completeness Correctness Quality
The optimum values for the above three defined indexes are all equal to one. Comparing automatically achieved results from the proposed process with the manual ones, the following quantified indicators have been calculated and presented in Table 1. The results demonstrate that the proposed method achieved a significantly high level of accuracy.

Summary
In this section, we have presented a new approach for road extraction from large scale remote sensing images. The tests have demonstrated that considerable success can be achieved by adopting the overall flowchart presented in this paper, particularly when the contrast between road surface and background is distinct, and there is a significant proportion of road surface in the image. Importantly, a novel algorithm is developed to vectorize and prune the extracted road network. The experimental results for road extraction from aerial photo and QuickBird satellite images demonstrate that the proposed approach could extract most of the main roads despite the fact that some roads are missing or are slightly distorted.

Road detection in urban areas
Accurate and detailed road models are of great importance in many applications, such as traffic monitoring and advanced driver assistance systems. However, the majority of road feature extraction approaches have only focused on the detection of road centerline rather than the lane details. Only a few approaches involved the detection of lane markings in the road extraction. For instance, Steger et al. (1997), Hinz and Baumgartner (2003), and Zhang (2004) extracted the road markings in their attempts to obtain clues as to the presence of road surface. Consequently, important requirements (Tournaire & Paparoditis, 2009) such as robustness, quality, completeness, are achieved less consistently compared to the lane level applications. In more recent works, Kim et al. (2006) and Tournaire et al. (2009) presented systems for pavement information extraction from remote sensing images with high spatial resolution.
In this section, the support vector machine (SVM) and Gabor filters are introduced into a framework for precise road model reconstruction from aerial imagery. The experimental practices using a data set of aerial images acquired in Brisbane, Queensland are utilized to evaluate the effectiveness of the proposed strategy.

Methodology
Supervised SVM image classification technique is employed to segment the road surface from other ground details, and the road pavement markings are detected on the generated road surface with Gabor filters.
An SVM is basically a linear learning machine based on the principal of optimal separation of classes (Vapnik, 1998). The goal is to find a linear separating hyperplane that separates the classes of interest provided the data is linearly separable. The hyperplane is a plane in a multidimensional space and is also called a decision surface or an optimal separating hyperplane or a maximal margin hyperplane.
Consider a set of l labelled training patterns (x 1 , y 1 ) , (x 2 , y 2 ) , ··· , (x i , y i ) , ··· , (x l , y l ),where x i denotes the i-th training sample and y i ∈ {1, −1} denotes the class label. If the data are not linearly separable in the input space, a non-linear transformation function Φ (·) is used to project x i from the input space to a higher dimensional feature space. An optimal separating hyperplane is constructed in the feature space by maximizing the margin between the closest points Φ (x i ) of two classes. The inner-product between two projections is defined by a kernel function K (x, y) = Φ (x) · Φ (y). The commonly used kernels include polynomial, Gaussian RBF, and Sigmoid kernels. Further details about kernels can be found in (Vapnik, 1998).
The decision function of the SVM is defined as subject to ∑ l i=1 α i y i = 0a n d0 ≤ α ≤ C,w h e r eC denotes a positive value determining the constraint violation during the training process.
Due to its properties of non-parametric, sparsity, and intrinsic feature reduction, SVM is superior to conventional classifiers, such as the maximum likelihood classifier, for image classification in very high resolution (VHR) remotely sensed data, since the estimated distribution function usually employs the normal distribution, which may not represent the actual distribution of the data (Huang & Zhang, 2008).

Gabor filters
2D Gabor filters, extended from 1D Gabor by Daugman (1985), have been successfully applied to a variety of image processing and pattern recognition problems, such as texture analysis, and image segmentation. 2D Gabor filters can be used to extract the road lane markings thanks to their following properties: (i) tuneable to specific orientations, (ii) adjustable orientation bandwidth, and (iii) robust to noise. Furthermore, it has optimal joint localization in both spatial and frequency domains. Therefore, Gabor filters can be considered as orientation and scale tunable edge and line (bar) detectors (Manjunath & Ma, 1998), which makes these a superior tool to detect the geometrically restricted linear features, such as road pavement markings.

Gabor functions
The general functionality of the 2D Gabor filter family can be represented as a Gaussian function modulated by a complex sinusoidal signal. Specifically, the 2D Gabor filter can be defined in both the spatial domain g (x, y) and the frequency domain G (u, v).T h e2 DG a b o r function in spatial domain can be formulated as (Cai & Liu, 2000): indicates the peak of the Gaussian envelope; σ x , σ y are the two axis scaling parameters of the Gaussian envelope; (u 0 , v 0 ) presents the spatial frequencies of the sinusoid carrier in Cartesian coordinates, which can also be expressed in polar coordinates as ( f , φ),w h e r e f = u 2 0 + v 2 0 , φ = arctan (v 0 /u 0 ), and the subscript r stands for a rotation operation as follows: x r = x cos θ + x sin θ y r = −x sin θ + y cos θ where θ is the rotation angle of the Gaussian envelope.

Determination of Gabor filter parameters
Road markings, which are presented as linear features with certain widths and orientations within local areas, can be considered as rectangular pulse lines. The correct determination of Gabor filter parameters is the central issue for lane pavement markings' extraction process. In order to effectively and accurately extract road lane markings with different sizes and thicknesses from aerial images using Gabor filters, we proposed an efficient method to determine the Gabor filter parameters.
Determination of θ θ stands for the orientation of the span-limited sinusoidal grating. The orientation θ (θ ∈ [0, π)) of Gaussian envelope is given as perpendicular to the direction ϕ (ϕ ∈ [0, π)) of the road surface by: where % is the modulo operator.
Determination of f f is the frequency of the sinusoid, which determines the 2D spectral centroid positions of the Gabor filter. This parameter is derived with respect to the width of road lane markings. In order to produce a single peak for the given lane line as well as discard other ground objects, such as white vehicles, the frequency f of the Gabor filter must satisfy the following conditions: where where W m is the width of the road marking in pixel, and W ′ is the width of other white features. The details of the proofing process can be referred to (Liu et al., 2003).
In our experiments, we set f = 1/W m , which will produce only a single peak in the output of the filter on road markings regardless of the values of σ x and σ y .
Determination of σ x and σ y .
The parameters σ x and σ y determine the spread of the Gabor filter in ÏŢ and Îÿ directions respectively. According to (Liu et al., 2003), σ x and σ y have the following parameter constraint: where k is a constant. As the road lane markings have strict orientation and enough distance between adjacent lanes, we set k=1 to simplify the calculation.
The relationship between the orientation bandwidth △θ and the frequency f within the frequency domain is illustrated in figure 1, which can be given by: where △θ is the orientation bandwidth. It give: Applying the 3dB frequency bandwidth in V direction when φ = 90 • to equation (2), we have It gives σ x = ln 2 2π d tan (△θ/2) According to orientation bandwidths of cat cortical simple cells (Liu et al., 2003), the mean a n g l ec o v e r sar a n g ef r o m2 6 • to 39 • . After examining the line extraction results over the above range, we find it appropriate to set △θ = 30 • .T h e nσ x and σ y can be further obtained by:

Experiments and discussion
The objective of the experiment is to determine the performance of the proposed road feature extraction approach quantitatively over the study area. A dataset of aerial images located in South Brisbane, Queensland have been selected as the study areas. The selected aerial images consist of three bands: Red, Blue and Green, with Ground Sampling Distance (GSD) of 7 cm. Fig. 11 shows one of the testing images. Several training samples were used to train the support vector machine and the resulting model was used to classify the whole image into two features: road and non-road. For the implementation of SVM, the software package LIBSVM by Chang and Lin (2003) was adapted. Gaussian RBF was used as the kernel function, and the constraint violation C was set to be 10. After the image classification, the connected component analysis was used to remove small noises misclassified into road class.
To this point, the road surface has been obtained using SVM classification. Gabor filter was then utilized to extract the lane marking features while restrain the affection from other ground objects. To reduce the calculation complexity, Principle Component Analysis (PCA) was applied on the color image and only the 1st component was chosen for Gabor filtering. The parameters of Gabor filters are determined as outlined in the previous section. For instance, the orientation of the lane markings shown in Fig. 11 is approximately 130 degrees. The average of width of the road markings is 6 pixels, thus the frequency f is set to be 0.17, while the axis scaling parameters σ x and σ y of the Gaussian function is set to be 3.4. The filtered image is as illustrated in figure 4, which was then masked by the road surface acquired in the previous step.
Finally, the Gabor filtered image was then segmented by Otsu's thresholding algorithm, and directional morphological opening and closing algorithms were utilized to remove misclassified features. Some white linear features such as house roof ridges may be misclassified into lane markings, so we further utilized the extracted road surface in the previous step as a mask to remove these kinds of objects. The lane segments may also be corrupted by many facts: occlusion, e.g. trees above the road surfaces; worn-out painting of lane lines; dirty markings on the road surfaces. We eliminated the affection from vehicles in the road markings extraction by utilizing the following two indicators: (i) elongation -the ratio of the major axis to the minor axis of the polygon, and (ii) lengths of the major and minor axis. The elongation measure of vehicle is smaller than the road lane markings, and the length of the major and minor axis of vehicle are within certain ranges. In this experiment, the major axis length of the vehicle is set to be within 2 to 10m, while minor axis is set to be between 1.5m and 3m. The extracted pavement markings are superimposed on the road surfaces, as giveninFig.13. The quantitative evaluation of the experimental results is achieved by comparing the automated (derived) results against a manually compiled, high quality reference model. Following the concept of error matrix, the evaluation matrices for the accuracy assessment of road surfaces detection can be defined at the pixel level as follows: In the above equation, TP (true positive) is the number of road surface pixels correctly identified, FN (false negative) is the number of road surface pixels identified as other objects, FP (false positive) is the number of non-road pixels identified as road surfaces.
The evaluation of the extracted pavement marking accuracy is carried out by comparing the extracted pavement markings with manually plotted road markings used as reference data as presented in (Wiedemann et al., 1998), and both data sets are given in vector representation. The buffer width is predefined to be the average width of the road markings, and we set it to be 15 cm in our experiment. Then the accuracy measures are given as: For the entire four test sites, nearly 90% of the road surfaces are correctly detected, and the relevant false alarm rate is about 10%. The completeness of road pavement marking extraction reaches above 87%, except for test site IV, which is seriously affected by shadows. The shadows on the road surfaces can reduce the intensity contrast between pavement markings and the road surface background, which makes it difficult to enhance the road markings using the Gabor filter. The average false alarm rate of the four test sites is about 10%.

Summary
In this section, an automatic road surface and pavement marking extraction approach from aerial images with high spatial resolution is proposed. The developed method, which is based on SVM image classification as well as Gabor filtering, can generate accurate lane level digital road maps automatically. The experimental results using the aerial image dataset with ground resolution of 7 cm have demonstrated that the proposed method works satisfactorily. Further work will concentrate on the process of seriously curved road surface and large images, which may be achieved by using knowledge based image analysis and image partition technique.

Conclusions
In conclusion, we have presented an integrated approach for road feature extraction from both rural and urban areas. Road surface and lane markings have been extracted from very high resolution (VHR) aerial images in rural areas based on homogeneity histogram thresholding and Gabor filters. The homogeneity histogram image segmentation method takes into account not only the color information but also the spatial relation among pixels to explore the features of an image. We further proposed a road network vectorization and pruning algorithm, which can effectively eliminate the short tracks segments. In the urban area, the road surface is firstly classified by SVM image segmentation method, and then Gabor filter is further employed to enhance the road lane markings whilst constraining the effects of other ground features. The experimental results from several VHR satellite images in rural areas have indicated that over 95% of road networks have been correctly extracted. The omission of road feature is a result of occlusions, poor contrast with the surrounding scenario, and partial shadows over the road. This has preliminarily demonstrated that the presented extraction strategy for road feature extraction in rural areas is promising. Experiments with three typical test sites in urban areas have resulted in over 90% of the road surfaces being corrected extracted, with the misclassification rate below 10%. The correction rate for lane marking extraction is approximate 95%, and only about 10% of the other ground objects are misclassified as lane marking.

Future work
Although the proposed approach has generated satisfactory results on the testing datasets, problems still exist: for example, lane markings obstructed by vehicles may not be effectively detected. Therefore, future work will focus on the improvement of detection accuracy and precise model reconstruction. For instance, an automatic vehicle detection approach may be introduced to efficiently detect and remove vehicles from the road surface. GPS real-time kinematic positioning solutions from a probe vehicle could beappropriate for the recovery of lane markings in areas where there are large obstructions: for example, a large number of skyscrapers or trees would greatly deteriorate the extraction result in urban or forest areas. We also consider using the linear feature linking technique to connect the broken road features.