Self-driven vehicles slowly but surely are making the transition from a distant future technology to current luxury by slowly becoming a part of our everyday life. Due to their self-driving ability, they are making our travels efficient. However, they are still a work in progress as they require many software and hardware-based improvements. To address the software part of this issue, an image processing-based solution has been proposed in this study. The algorithm estimates the real-time positions and predicts the possible interaction of the objects, such as other moving vehicles, present in the vicinity of the driven autonomous vehicle in determined environmental conditions. Cameras and related peripheral present on autonomous vehicles are used to obtain data related to the real-time situation for predicting and preventing possible hazardous events, such as accidents, using these data.
- autonomous drive
- game theory
- vehicle control
- image processing
The continuous advancements in technology are constantly making our life more and more efficient. To this end, autonomous vehicles are one of the important pillars according to FIA  and the reports from IPSOS also show that the trends will be in the favor of the adaption of autonomous vehicles once the standardization issues are addressed .
Autonomous vehicles use sensors, such as radar, lidar, and camera, to implement computer vision and odometry techniques for sensing the surrounding environment and make the travel without any human intervention possible . When we look into the history of autonomous vehicles, we see that first autonomous projects appear in the 1980s . The first prototype came to life with navlab and ALV projects that ran by Carnegie Mellon University . These prototypes were followed by project Eureka Prometheus by Mercedes-Benz and Bundeswehr partnership in 1987 [4, 5]. After these, many companies have manufactured autonomous vehicles and some of those vehicles have also found a place in active traffic in some countries. It is believed that in the near future, autonomous vehicles will cause an unprecedented change in economic, social, and environmental areas of life .
However, still there remains work to be done before the roads are filled with safe self-driving cars. Researches have been done to make the autonomous vehicles safer and environment friendly so that they can be as close to perfect as possible. A real-time lane detection system for autonomous vehicles is proposed by Abdulhakam A. M. Assidiq et al. utilizing the images captured by the onboard camera installed on a moving vehicle to perform real-time lane detection the lane via image processing using edge detection and Hough Transform . Yen-Lin et al. presented a night-time vehicle detection method for two vehicles, incoming and outgoing, using their head and tail lights . They used image segmentation and spatial clustering processes to perform rule-based vehicle detection and distance estimation by using the head and tail lights of the vehicle. Image processing techniques have also been applied to detect the current state of traffic light by Guo et al. They used Histogram of Oriented Gradient (HOG) features to implement Support Vector Machine (SVM) for the real-time detection of the current state of traffic lights for self-driving vehicles . Phanindra et al. presented a method for detecting obstacles and lane using the data collected by a LIDAR sensor and a fisheye camera . A vector fuzzy connectedness-based algorithm has been proposed by Lingling Fang et al. to detect the boundary of the lane using the captured images by the camera .
Based on these results, in this study, an image processing software has been proposed that can autonomously extract data related to factors of the vehicle environment. The predictions regarding the self-driving vehicle’s environment are carried out on the basis of the proposed three models which are built using the video footage taken from an onboard camera placed inside the vehicle. The first model is an SVM that can detect vehicle positions using the inputted image. The second model fits a linear-parabolic function for the right and left lane boundaries of the vehicle. The third model approximates a bird-view version of the inputted image and carries out a more realistic approximation of steer angle on the bird-view image, rather than original.
2. Methods: research and application steps
2.1 Detection and tracking of lane boundaries
The model developed in this study is designed to work in a highway environment. Model performance is invariant to the color of the road and is directly proportionate to the amount of contrast between the road and lane boundary color, quality, and continuity of lane boundaries. However, the model assumes that the vehicle which is guided is in the center of the lane that the vehicle is present according to the horizontal plane and conducts its operations on this assumption. The rest of this subsection covers the following topics: the detection of the lane boundaries, the approximation of edge distribution function, weighted Hough transform which is used to determine the location of lane boundaries in images and lastly description of both linear and parabolic functions that we approximated for every single lane boundary. The images used in this study are taken by a vehicle-centered camera , and all studies are continued iteratively as a development and test loop on these images. The size of the image array (720, 1280) and different environmental aspects that it contains are shown below (Figure 1).
2.2 Initial detection of lane boundaries
In the first image of the application, we aimed to calculate the slopes of right and left lane boundaries in the image according to the vehicle-centered camera view. Firstly, image is converted to grayscale form and with the help of Sobel filter vertical and horizontal gradient images are obtained. These gradient images are then used to form a magnitude matrix and orientation matrix which contains slope calculation of every pixel in the image. With these matrices, an edge distribution function is formed, and local maxima points of edge distribution function gave us the center-lane boundary slopes on the image. Hough transform is applied with known edge slopes, and global maxima points for every edge are found with weighted voting. The locations of lane boundaries are found after slope and distance from origin calculation.
2.3 Edge distribution function (EDF)
To decrease the computational complexity, images are converted to grayscale. Function (30) is used for this conversion. The grayscale image is named as . With vertical and horizontal differentials of this function, we obtain function .
and represent vertical and horizontal differentials, respectively. and are used to calculate the magnitude matrix which contains a magnitude of every pixel in the image. Eq. (2) is used for that.
After this and matrices are used to calculate the orientation matrix which contains orientation of every pixel in the image. In the calculation of the orientation matrix, a lookup table is used to decrease arctangent values.
A voting routine in which every pixel votes weighted as their magnitude took place. After voting, a histogram of 90 bins with (−90, 90) minimum, maximum values are created.
For extracting local maxima points in the histogram, we first plotted the histogram as a piecewise function. Then smoothed the plot with a one-dimensional Gaussian filter (Figure 2). Filter width of 7 seemed enough to eliminate false positives after some number of tests we did. However, we also tried kernel density estimation to smooth the histogram. But it caused some meaningful local maxima points to shift from its original value or completely disappear, so we removed kernel density estimation from the workflow.
When lane and road boundaries are considered as only linear objects in the image, we can assume that there will be two local maxima that are visibly voted higher than others. The situation on roads with multiple lanes can slightly differ from this and more local maxima points can appear in the edge distribution function. Figure 2 above is generated from a multiple lane road. Below we discussed the local maxima points from this function (Figure 3).
When we consider that degrees shown above represent the degree value of horizontal axis lane boundaries, we can see that −30.8356546 is the right boundary of the center lane, −8.77437326 is the right boundary of the right lane, 14.28969359 is left road boundary, and 33.84401114 is left boundary of the center lane.
Generally, the center orientation of the center lane boundaries will be numerically closer to the vertical axis and have approximately the same absolute values. To extract center lane boundary orientations and to eliminate false positive local maxima points which can be caused by organic variations in the road, we applied thresholding. and are minimum and maximum degree values from local maxima points, respectively. We applied thresholding to these values with Eq. (1). 15, 20, 25 are tested as threshold values and we saw that 20 shows the best performance.
The and couples that do not meet the requirements of this equation are removed from the local maxima list and we continue to apply the equation to the rest of the values. When there are no local maxima points left in the list, the orientation couple that closest to the vertical axis are selected as center lane boundary orientations. But it should be noted that in most cases, local maxima points of edge distribution function contain true positives only and this process is just a formality.
At this point, despite we know the orientations of center lane boundaries, we still do not know the exact locations of these boundaries. To find exact locations, we first threshold the magnitude image according to Eq. (5).
To find the exact locations, we first get indexes of the pixels that have the same orientation as each center lane boundaries, then we zero out all the pixels except these indexes and pixels within a minor threshold value. By the end of this process, we have a magnitude matrix for each center lane boundary (Figure 4). As a result, we found two-lane boundary magnitude matrices and we will proceed to find exact locations of boundaries from these matrices.
2.4 Hough transform
Applying Hough transform on a two-dimensional matrix gives us a two-dimensional C (ρ, θ) accumulator matrix just as our input (Figure 5).
Slope and distance from image origin of found edges are obtained from the analysis of the accumulator matrix. But in this study, we already found the slopes of lane boundaries with techniques that we described in earlier subsections. Thus, we do not need to do our analysis on a two-dimensional matrix. As a result, our accumulator matrix will be independent of θ dimension and voting will take place only for potential ρ values which are only one dimension. Applying Gaussian filter on these accumulator arrays will give us global maxima points. Gaussian filter windows size as 30 is enough in this part to eliminate multiple maxima points and to extract the true ρ value we are searching for which is the global maxima. After extracting global maxima points from each accumulator array, we now have the exact locations of center lane boundaries in the image. After we have exact locations, we will use these locations to specify a region of interest to be used in the next frame and edge search that we carried out will only take place in this region of interest in the next frame which will improve efficiency (Figure 6). Extension of 20 pixels to left and right is chosen as the width of the region of interest. We assume that lane boundary locations found in the last frame will not change drastically in the next frame of image array, this assumption is the main reason we specify a region of interest. Apart from that we also assumed that just as its location, orientation of lane boundary will not change drastically from the last frame processed to the next frame of the image array. So, we applied a threshold to the orientation search as well. We only searched in values that differ slightly from the last orientation we found. After some tests, we decided 4° as the threshold value is optimal.
2.5 Lane tracking
After the initial detection of lane boundaries from the first frame, lane detection should continue in following frames. A linear model is chosen for the initial frame. But in the following frames, we already have prior information on lane boundaries which is the region of interest. So, we can work on a more complex model to predict, a linear-parabolic piecewise function is decided as a model. For this model, horizontal line that separates linear and parabolic parts of function is chosen as approximately 15 m in front of the vehicle and area below and above this boundary named as near and far sections, respectively. The model will act linear in the near section and will act parabolic in the far section. The mathematical representation of the model is shown below Eq. (6).
The model needs to meet two equations to have continuity and differentiable features Eq. (7).
If equations in (8) are solved within themselves for c and e parameters piecewise function below is what we have Eq. (9):
When we assume that both right and left center lane boundaries are found in the last frame, we can expect that both right and left boundaries will be in their corresponding region of interests in the next frame. Before we try to detect center lane boundaries in the next frame threshold application below should be applied within the region of interest to eliminate various noise sources that could appear in the region of interest Eq. (10).
We assume that usually, lane boundaries will have bigger values in magnitude matrices than any other source in the next frame and this assumption is the reason why we apply threshold within the region of interest. Hence, the majority of the pixels that above the threshold from Eq. (10) will belong to lane boundary we try to detect, and the majority of the pixels that below the threshold will be independent of lane boundary and can be considered as noise as a whole. As we remove these magnitudes from the magnitude matrix, we can carry out a more realistic approximation of the real lane boundary.
Let be the near section non-zero pixels from threshold applied lane boundary magnitude matrices and let be the far section non-zero pixels from threshold applied lane boundary magnitude matrices. Also, let be the corresponding magnitude of matrices elements.
This situation gives us three unknown equation below.
A solution is approximated with the normal equation method for the equation above. The below function is used to represent error and tried to minimize it.
Function E could be written as a matrix multiplication below.
Variables which are stated in Eq. (13) are specified below.
To solve the Eq. (13), we isolate variable b with matrix multiplication below.
2.6 Steer angle estimation
Steer angle estimation is prone to errors because as the distance from the camera increases the distortion of the road in the image also increases. For this reason, we first estimated the bird view version of the image with inverse perspective mapping (IPM)  (Figure 8).
We cropped the IPM image in a way that it only contains pixels from the center lane boundaries. Later we extract Canny edges from it (Figure 9).
After that, we apply Hough transform on the Canny image and get a two-dimensional accumulator matrix. Two highest local maxima points from the accumulator matrix give us the angle between the vertical plane and center lane boundaries. We averaged the two angle values and we found to estimate steer angle to keep the car on its lane.
2.7 Vehicle detection and tracking
We used a mix of GTI and KITTI datasets as the dataset. Dataset as a whole contains 8792 positive and 8968 negative and 17,760 examples in total. Every example is a (64, 64) size, RGB encoded, png image.
First, we extracted histogram of oriented gradients in order to be used in training. Histogram of oriented gradients basically puts a grid on image. A magnitude weighted voting takes place for each bin. Then, a histogram is created from voting results.
To shrink the histogram of oriented gradients features space, the principal component analysis is applied. Then the support vector machine is trained. After training, images size of (720, 1280) scanned for vehicle detection with a sliding window. Calculated vehicle positions are used to form a heat map and the thresholded heat map is used to find a singular location for each vehicle (Figure 10).
This paper presents an approach based on image processing using edge distribution function and Hough transform for lane detection and tracking, steer angle estimation and vehicle detection and tracking for the autonomous vehicles. It was found that the instant change in the image feed is one of the most challenging parts in lane detection, and tracking part of this study as the model was vulnerable to instant changes in image feed. In order to prevent that, a temporal filter was applied to the region of interest which allowed shifting and hence increased the model’s resistance to the aforementioned instant changes. According to the results of the tests, it was concluded that that the application of the temporal filter alone on the region of interest will not be ample; therefore, a filter was applied to the orientation of lane boundaries too, and the changes of greater than 2° in one frame. The model was found to be less dynamic, and an increase in its overall prediction accuracy was observed after those added aforementioned additions. One other issue that was discovered for this model is that it could be affected by the color changes. The model was found to be affected by the color changes on the road, shades of other vehicles, and trees on the side of the road. Errors caused by this situation usually hits the parabolic side of the estimation rather than linear.
Steer angle estimation model has a similar problem to the lane detection and tracking model. Considering the techniques that both the models share, it can be said that this was expected. This model is affected by color changes on road and shades also. If the same solution is applied, error can be lowered in color dynamic parts of the road. Another issue that was discovered was related to the parts of the road. The changes in the slope of the road cause an additional error. Because the model uses an inverse perspective mapping method that uses camera parameters as inputs, even if place occupied in Euclidean space by the camera does not change, the vanishing point which is one of the input parameters of IPM changes and this causes the model to make false predictions. To prevent this situation, image processing techniques or additional sensors can be used to estimate road slope. After predicting road slope, a model that can adapt to vanishing points shifts in the image can be developed and the error rate can be decreased drastically.