Smart-Road: Road Damage Estimation Using a Mobile Device

Mexico is located on five tectonic plates, which when moving, generate telluric movements. These movements, depending on their intensity, affect the telecommunications infrastructure. Earthquakes tend to cause landslides, subsidence, damage to structures in houses, buildings, and roads. In the case of road damage, it is reflected in cracks in the pavement, which are classified according to their size, shape, and depth. The methods that are currently implemented to inspect roads mainly use human perception and are limited to a superficial inspection of the terrain, causing this process ineffective for the timely detection of damage. This work presents a method of road analysis using a drone to acquire images. For the processing and recognition of damages, a mobile device is used, allowing to determine the damage type on the road. Artificial intelligence techniques are implemented to classify them into linear cracks or zig-zag cracks.


Introduction
A country that is endowed with good road infrastructures can generate the basic elements of competitiveness and provide opportunities for better economic development and at the same time to promote its social and cultural development [1].There are several factors by which access roads can be affected, and some examples are as follows: time, use, excessive weight, the quality of materials, ubication, natural disasters, etc.We can highlight the damage to the roads caused by earthquakes, which is due to the movement of the tectonic plates, it can cause fissures that are on the surface.Within the damages caused on the roads, deterioration can occur from small cracks too wide ruptures or separations in the road, and these types of incidents tend to occur mainly in the seismic areas of the country.

Seismicity in Mexico
The Mexican Republic is in located one of the most seismically active regions in the world and is immersed within the area known as the Circumpacific Belt (or Pacific Ring of Fire), where the greatest seismic and volcanic activity on the planet is concentrated [2], Figure 1.
The Global Seismic Hazard Assessment Program (GSHAP) was a project sponsored by the United Nations (UN) that assembled the first worldwide map of earthquake zones [3].In Mexico, an earthquake hotspot follows the route through the Sierra Madre Occidental that reaches south of Puerto Vallarta to the Pacific coast on the border with Guatemala [3]. Figure 2 shows the seismic regionalization in the Mexican Republic, marking the zone A as the one with the lowest risk, followed by zones B, C, and D. These last three are the ones that generate the greatest concern to the scientific community and the inhabitants of these areas, due to the structural damage that occurs on the roads and in their localities.Areas with the highest propensity to earthquakes in Mexico [3].
Information Extraction and Object Tracking in Digital Video

Road network in Mexico
In Mexico, as in other countries, the road network is the most widely used transport infrastructure.The national network has 378,923 km, which is made up of avenues, streets, highways, and rural roads that allow connectivity between practically all the populations of the country.Figure 3 shows the main roads that interconnect the Mexican Republic [4].

Damage to land access roads
Seismic activity is recurrent in certain areas, causing damage to road infrastructure.These damages are identified as fissures, cracks in the asphalt, landslides, separation of road sections, subsidence, and other damages in different access roads that interconnect the country.The road must be inspected, and the damages detected must be reported and repaired.Figure 4 shows some examples of damage caused by seismicity in Mexican territory on different roads [5].In Mexico, most of the road inspections after an earthquake are carried out in person, which can generate more conflicts in some critical points.In these conflict points, semi-autonomous surveillance systems are required that implement mobile technology to detect damage to access roads.Therefore, it is proposed to develop a methodology that, through image processing techniques and neural networks, allows the identification of damage to roads, classifying two types of cracks: linear and zigzag.This chapter is divided into six sections.Section 2 explains the related work, Section 3 explain methods and materials, Section 4 provides tests and results, Section 5 conclusion and finally Section 6 discussion.

Related work
There is research in the field of artificial intelligence related to techniques and practices used to automate the detection of road defects.Below are some related works that have been developed: In [6], an automatic system for identifying cracks in roads through a camera is developed.It scans the roads by zone and inspects the condition of cracks and fissures.The authors propose the following stages: i) smooth, adjust, and binarize the image using the threshold value method, ii) perform morphological operations such as dilation and erosion, iii) eliminate false cracks in the image with smoothing filters, iv) clean and perform the connection of cracks in the image, and finally v) estimate the shape of the crack, using geometric characteristics and shape description.In [7], using texture classifiers, the authors address the descriptors.Through these techniques, it is possible to detect color and texture changes in an image, and thus perform the identification of edges by extracting a set of characteristics, generated from these histograms.For each frame of a pavement video analyzed, the method extracts the characteristic and creates its binary version to classify each region.
In [8], the authors apply morphological operations to the images and segment the images using the K near-neighbor (Knn) method.The proposed algorithm highlights the information of the image texture, and the results are classified using the standard deviation; to define regions delimited by the intensity of gray, these techniques allow to detect patches on the roads through images processed on a Smartphone. Figure 5 shows the results presented by the authors.
In [9], a system for identifying cracks in buildings from an unmanned aerial vehicle (UAV) equipped with a camera is presented.Fly through the building to acquire images, which are transmitted remotely via Wi-Fi to a computer for processing.Images are segmented with techniques to change the Red, Green, Blue (RGB) color space to grayscale.The threshold is calculated with statistical methods (mean and standard deviation), to categorize the black and white pixels and identify the cracks in the building.
Maeda et al. [9] developed a system for identifying cracks in the pavement, where images are captured from a Smartphone mounted on a cell phone holder on the dashboard of a car.It develops an application that analyzes images obtained from the Smartphone through a deep neural network that allows the identification of cracks in the road.In this work, they use deep neural networks such as Region-based Convolutional Neural Networks (R-CNN), You Only Look Ones (YOLO), and Single Shot MultiBox Detector (SSD), for the extraction of characteristics from the region of interest (cracks).In 2019, Zhang et al. [10] propose an intelligent monitoring system to evaluate the damage in the pavement, and this methodology proposes the use of a

Information Extraction and Object Tracking in Digital Video
set of points of an image obtained from a UAV, making use of Harris performing the processing in the cloud for the identification of cracks in the pavement.

Methods and materials
In Figure 6, the general architecture of the proposed methodology for the classification and identification of linear and zig-zag cracks is shown.Figure 6 shows the methodology is composed of different stages of development which are image acquisition, pre-processing, descriptors, classification, and result.Each of these stages is detailed below.

Image acquisition
In this step, the image is taken with the camera that the PARROT BEBOP 2 FPV Drone has, which has the following characteristics: 14-megapixel camera with wideangle lens, unique digital image stabilization system, live video from a Smartphone or tablet with a viewing angle of 180°, photo format: RAW, JPEG, DNG, and image resolution of 3800 Â 3188 pixels to automate the route of the drone, a function implemented to trace the flight path is used.In Figure 7, the programmed route map is shown.

Dimension reduction
In this section, the image scaling is performed by implementing the Discrete Wavelet Transform "Haar" (DWT-H).In Figure 8, three levels of decomposition are shown.

Edge enhancement
To obtain the edge enhancement of the image obtained from point 3.2.1, the use of a Laplacian filter is proposed.The Laplacian of an image highlights the regions of rapid intensity change and is an example of a second-order or a second derivative method of enhancement.It is particularly good at finding the fine details of an image.Any feature with a sharp discontinuity will be enhanced by a Laplacian operator [11].The Laplacian is a well-known linear differential operator approximating the second derivative given by Eq. (1).

Information Extraction and Object Tracking in Digital Video
where f denotes the image.The following process is performed, a 3 Â 3 matrix is convolved with the image, Figure 9.
In Figure 10, the Laplacian filtering process is shown.This process consists of the following steps: From the image obtained by the DWT-H (Figure 10a), the convolution is performed with the proposed 3 Â 3 kernel (Figure 10b).Finally, the sub-image of the crack is obtained with the edges highlighted as seen in Figure 10d.

Feature extraction
One of the main objectives of this work is to implement the methodology on a mobile device, which will perform the image processing offline, obtaining the result  on the site.It is therefore essential to extract only the key points that provide information about features outstanding image and thus make their classification in a convolutional neural network LeNet efficient.To do this, it is proposed to perform the extraction of the characteristics through the scale-invariant feature transform (SIFT) and the pixel rearrangement of the points thrown from the Laplacian filter through statistical moments.The following is the extraction of the characteristics:

Statistical central moments
Central moments also referred to as moments of the mean has been calculated as [12], Eq. ( 2), where 'm' is the order of the moment, 'L' is the number of possible intensity values, 'X n ' is the discrete variable that represents the intensity level in the image, and 'y' is the mean of the values, t(X n ) is the probability estimate of the occurrence of 'X n ', Eq. (3).
The mean is the first-order moment followed by variance, skewness, and kurtosis as the second, third, and fourth moments.The mean at the first-order central moment is used to measure the average intensity value of the pixel distribution.Variance (μ2) was used to measure how wide the pixels spread over from the mean value, Eq. (4).
To know the dispersion of the values located as key points by SIFT, the second central moment is implemented to group the pixels of the image processed by the Laplacian Filter.The smoothness texture "R" is defined by Eq. ( 5), where 'μ2' is the variance and 'x' is an intensity level.Then, the following condition is established, by Eq. ( 6),

SIFT
According to the SIFT methodology [13], the first step is scale detection.For the particular case of the crack contour, this step is very useful for the identification of the crack, since the taking of the images can vary depending on the shooting distance.The formal description of this step is detailed below.

Scale detection
The scalar space L (x, y) of an image, is obtained from the convolution of an input image I Fissure , through a Gaussian filter G (x, y, σ) at different scales of the value of σ = 0.5 [13], Eq. ( 7): where, x, y, σ , is the function of the Gaussian filter; it is applied in both dimensions (x,y) of the I Fissure image plane.
To obtain the different scale versions of the I Fissure image, it is necessary to multiply the value of σ with different values of the constant k to obtain the projections of the contiguous scales (where k takes values k > 1), each scale's projection is subtracted with the original scale, obtaining the differences from the original image I Fissure , Eq. ( 8): The search for extreme values on the spatial scale produces multiple candidates of which the points that are not considered are the low contrast ones since they are not stable to changes in lighting and noise.Eq. ( 9) shows how the points of interest are located within the image and these locations are given by [13]: Subsequently, the vectors are arranged according to the orientation of the points obtained from Eq. ( 9), and it is explained below.

Orientation mapping
This step assigns a constant orientation to the key points based on the properties of the image obtained in the previous steps.The key point descriptor can be represented with this orientation, achieving the invariance to rotation, which is important to highlight because the image can be taken at different shooting angles.The procedure to find the orientation of the points, is as follows [13]: Using the scalar value of the points of interest selected in Eq. ( 4).Finally, the description of the characteristic points obtained in the previous steps must be identified the interesting points, Figure 11.

Convolutional neural network LeNet
The neural network will allow, based on the characteristics obtained, to train and identify the cracks that appear in the image.The neural network used in this research is the convolutional neural network LeNet, which is made up of five layers of neurons in its architecture, and has an input of 1024 Â 1024 Â 3 values, and an output of two possible classes [14].LeNet is a network that is optimized for mobile devices, which allows greater efficiency in the detection and performance of the processes on the mobile device.The network architecture is presented in below Figure 12 [14]: For the training of the network, a collection of approximately 500 images were made in various areas of Tecámac, State of Mexico.Therefore, in this investigation,  LeNet neural network architecture [14].

Information Extraction and Object Tracking in Digital Video
cracks with different intensities will be detected, so these will be identified and classified in the following categories [15]: Erratic or zigzag cracks (ZZC): These types of cracks in the pavement with erratic longitudinal patterns.It is presented by extreme changes in temperature, defective base, and seismic movements.
Significant cracks (LC): These are cracks with a length greater than 30 centimeters.Very significant cracks (VSC): Those that shown in the pavement and have a length greater than 60 centimeters due to their size are a risk.These cracks are the most visible.
Non-significant cracks (NSC): These are cracks that appear in the pavement and that have a fine shape and a length of fewer than 30 centimeters.Figures 13  and 14 show images referring to the classifications that have been delimited for identification.These define the two classes to detector zig-zag crack (ZZC) and linear crack (LC), respectively.

Tests and results
To perform the tests, they were divided into phases to estimate the time of each one of these and thus detect which of them generates a greater consumption of time compared to the others.The processing and results were developed on a Motorola X4 mobile device with a processor: 2.2 GHz and 3 GB RAM.To select the optimum distance for taking the images, tests were carried out between 10 and 30 meters above ground level.At each distance, it was ensured that the images were clear and that the crack would be visualized.Figure 15 shows the range in height and image visibility for the 500 sample images.From Figure 15, we can see that at a height of 10 meters the drone has a visibility range of 26 meters in radius.In a similar way we can observe that for heights of 15, 20, 25 and 30 meters, they correspond to 40, 53, 67 and 80 meters of visibility radius.For our case we consider a height between 15 and 20 meters.

Phase 1. Distance estimation
To validate the distances shown in Figure 15 and their visibility range, four consecutive objects are placed on the crack in the road.From Figure 16, only three elements can be observed which are enclosed in circles as can be seen.The dimensions  of the objects placed on the crack are 10 Â 10 cm, which were used to estimate the field of view of the drone camera.Based on these tests, a height of 15 meters is proposed for clear detection of the object by the drone, coupled with its stability in the air currents present in the tests.

Phase 2. Estimation of the pre-processing stage
Table 1 shows the average times calculated for the number of samples acquired, as the DWT-H decomposition increases, the average processing time increases.From Table 1, we observe the processing times for the feature extraction and classification stage for each decomposition scale of the DWT-H.The dimension of the initial image is 2048 Â 2048, we observe that decomposition level 4 it gives a processing time of 14,445 ms for the two proposed stages.

Phase 3. Descriptors
Table 2 shows the average results obtained for the 500 images in the feature extraction stage.From Table 1, it is concluded that the optimal wavelet decomposition size for this estimation is at the fourth wavelet decomposition level.

Phase 4. Classification of images
The tests to validate the proposed methodology were carried out with 150 images acquired at a height of 15 meters.During the development of the test scenarios, four cases were considered: two correct classifications and two wrong classifications.The correct classifications are true positive (TP) and false positive (FP); and the misclassifications are false negative (FN) and true negative (TN).By using these metrics, we can obtain different performance measures like [13].where specificity (Sp) is the ability to detect non-crack pixels, sensitivity (Se) reflects the ability of the algorithm to detect the edge of the crack, Accuracy (Acc) measures the proportion of the total number of pixels obtained correctly (sum of true positives and true negatives) by the total number of pixels that constitute the image of the cracks [13]; this is the probability that a pixel belonging to the crack image will be correctly identified.Table 3 shows the results obtained from 150 test images that were acquired in flight, obtaining a total of 140 images (TP), 1 (FN), 5 (FP), and 4 (TN).
In Table 4, the results obtained from Acc, Sp, Se of the 150 acquired test images are shown.From Table 4, the results obtained show that 99.29% was obtained for Acc, which indicates that in this percentage the cracks were detected and classified positively.In addition, 96.55% of Sp represents that the result of no crack is true, as well as the value of Se with 80% to detect that it is not a crack.
In Figure 17, some images obtained through the proposed methodology and the result obtained from the classification are shown.
Finally, the mobile application that serves as the development and user interface is shown in Figure 18.

Discussion
Based on the tests carried out in the monitoring of the roads using the Parrot drone, we observed that the height between 15 and 20 meters gives satisfactory results.Within the development of the proposal, the size reduction stage made it possible to speed up the processing of the extraction of the characteristics, as well as the proposal to reduce the key points obtained by the statistical descriptors and SIFT, through Eq. ( 6).These development stages are fundamental because all the crack detection and identification processing are carried out internally on a mid-range mobile device.The section of the LeNet neural network was streamlined through the preprocessing stage, observing that the precision results obtained were not affected, which are 99%, even limiting the data that are entered into the neural network.

Conclusion
In conclusion, it can be emphasized the fact that the objectives that were sought to be achieved with the identification of cracks in roads, streets, highways or avenues, were achieved.It had the specific characteristics that allowed using the proposed processes on a mobile device, and it was possible to demonstrate that the processing of the proposed methodology was developed on an Android platform that to date is one of the most commercial platforms worldwide between mobile devices.The preprocessing results show a clear trend in terms of the time required to adapt an image and perform the crack identification process, time that does not exceed 14.79 ms, thanks to the use of DWT-H instead of other processes that require greater computational complexity for image size reduction.On the other hand, the results show that the proposed operations are 99% accurate in finding cracks.It was also found that the times of certain stages of the process can be improved by changing some processes such as the scaling of the images, which reduces the time by up to 200 milliseconds, among other possible improvements that can be implemented.

Figure 1 .
Figure 1.Circumpacific Belt Zone.Source: National Institute of Statistic and Geography.

Figure 2 .
Figure 2.Areas with the highest propensity to earthquakes in Mexico[3].

Figure 6 .
Figure 6.Proposed architecture for the identification and classification of cracks.

Figure 5 .
Figure 5. Detection of the mobile damage system.a) Hole in the pavement, b) longitudinal crack, c) transverse crack,and d) horizontal crack[8].

Figure 7 .
Figure 7. Simulation of a programmed route at 15 mts.

Figure 9 .
Figure 9. Convolution of the 3 Â 3 kernel at a point (x, y) in the image.

Figure 10 .
Figure 10.The result obtained by the Laplacian Filter.

Figure 11 .
Figure 11.Results obtained from the proposed methodology: a) original image, b) image obtained with the DWT-H, c) image with the Laplacian filter, d) image obtained with the descriptors, and e) final image.

Figure 16 .
Figure 16.Range of visibility of objects a) and b) no objects are present; c) 15 meters away, and d) 30 meters away.

Figure 17 .
Figure 17.Results obtained by the proposed methodology: a), c), and e) original image, b) and d) processed image (ZZC), finally f) processed image (LC).

Figure 18 .
Figure 18.Graphical user interface for interaction with the proposed methodology: a) LC detected and b) ZZC detected.

Table 2 .
Descriptor stage processing time result.

Table 1 .
Result of pre-processing stage time.

Table 4 .
The obtained results from the acquired images.