Stereo Matching Method and Height Estimation for Unmanned Helicopter

Vision plays a fundamental role for living beings by allowing them to interact with the environment in an effective and efficient way. The ultimate goal of Machine Vision is to endow artificial systems with adequate capabilities to cope with not a priori predetermined situations. To this end, we have to take into account the computing constraints of the hosting architectures and the specifications of the tasks to be accomplished, to continuously adapt and optimize the visual processing techniques. Nevertheless, by exploiting the low?cost computational power of off?the?shell computing devices, Machine Vision is not limited any more to industrial environments, where situations and tasks are simplified and very specific


Introduction
The research and development of autonomous unmanned helicopters has lasted for more than one decade.Unmanned aerial vehicles (UAVs) are very useful for aerial photography, gas pollution detection, rescue or military applications.UAVs could potentially replace human beings in performing a variety of tedious or arduous tasks.Because of their ubiquitous uses, the theory and applications of UAVs systems have become popular contemporary research topics.There are many types of UAVs with different functions.Generally UAVs can be divided into two major categories, fixed-wing type and rotary-wing type.The fixed-wing UAVs can carry out long-distance and high-altitude reconnaissance missions.However, flight control of fixed-wing UAVs is not easy in low-altitude conditions.Conversely, rotary-wing UAVs can hover in low altitude while conducting surveys, photography or other investigations.Consequently in some applications, the rotary-wing type UAVs is more useful than the fixed-wing UAV.One common type of rotary-wing type UAVs is the AUH (Autonomous Unmanned Helicopter).AUHs have characteristics including of 6-DOF flight dynamics, VTOL (vertical taking-off and landing) and the ability to hover.These attributes make AUHs ideal for aerial photography or investigation in areas that limit maneuverability.
During the past few years, the development of the unmanned helicopter has been an important subject of research.There have been a lot of researches interested in a more intelligent design of autonomous controllers for controlling the basic flight modes of unmanned helicopters (Fang et al., 2008).The controller design of AUHs requires multiple sensor feedback signals for sensing states of motion.The basic flight modes of unmanned helicopters are vertical taking-off, hovering, and landing.Because the unmanned helicopter is a highly nonlinear system, many researchers focus on the dynamic control problems (e.g.Kadmiry & Driankov, 2004;C. Wang et al., 2009).Appropriate sensors play very important roles in dynamic control problems.Moreover, the most important flight mode of autonomous unmanned helicopter is the landing mode.In consideration of the unmanned helicopter landing problem, the height position information is usually provided by global positioning system (GPS) and inertial measurement unit (IMU).The system of the autonomous unmanned helicopter is a 6-DOF system, with 3-axis rotation information provided by IMU and 3-axis moving displacement information provided from GPS. Oh et al. (2006) brought up the tether-guided method for autonomous helicopter landing.Many researches used vision systems for controlling helicopter and searching landmark (Lin, 2007;Mori, 2007;C.C. Wang et al., 2009).In the work of Saito et al. (2007), cameraimage based relative pose and motion estimation for unmanned helicopter were discussed.In the works of Katzourakis et al. (2009) and Xu et al. (2006), navigation and landing with the stereo vision system was discussed.Xu et al. used the stereo vision system for estimating the position of the body.From the work of Xu, it was shown that the stereo vision does work for the position estimation.
For unmanned helicopter autonomous landing, the information of the height is very important.However, the height error of GPS is in general about from 5 to 8 meters, which is not accurate enough for autonomous landing.For example, the accuracy of Garmin GPS 18-5Hz is less than 15 meters (GPS 18 Technical Specifications, 2005).After many times of measurement, the average error of this GPS was obtained to be around 10 meters.Since the height error range of GPS is from 5 to 8 meters, to conquer the height measurment error of GPS, the particular stereo vision system is designed for assisting GPS, and the measurement range of this system is set to be at least 6 m.
Image systems are the common guiding sensors.In the AUHs controll problems, image systems are usually collocated with IMU and GPS in the outdoor environment.The image system has been used on vehicles for navigation, obstacle avoidance or position estimation.Doehler & Korn (2003) proposed an algorithm to extract the edge of the runway for computing the position of airplane.Bagen et al. (2009) and Johnson et al. (2005) discussed the image-guided method with two or more images for guiding the RC unmanned helicopter approaching to the landmark.Undoubtedly multiple-camera system measurement environment is an effective and mature method.However, the carrying capacity of a small unmanned helicopters has to be considered.Therefore the image systems are the smaller the better.A particular stereo vision system is developed for reducing the payload in our application.
In this chapter, we focus on the problem of estimating the height of the helicopter for the landing problem via a simple stereo vision system.The key problem of stereo vision system is to find the corresponding points in the left image and the right image.For the corresponding problem of stereo vision, two methods will be proposed for searching the corresponding points between the left and right image.The first method is searchig corresponding points with epipolar geometry and fundamental matrix.The epipolar geometry is the intrinsic projective geometry between two cameras (Zhang, 1996;Han & Park, 2000).It only depends on the camera internal parameters and relative position.The second method is block matching algorithm (Gyaourova et al., 2003;Liang & Kuo, 2008;Tao et al., 2008).The block matching algorithm (BMA) is provided for searching the corresponding points with a low resolution image.BMA will be compared with epipolar geometry constraint method via experimental results.
In addition, a particular stereo vision system is designed to assist GPS.The stereo vision system composed of two webcams with resolutions in 0.3 mega pixels is shown in Figure 1.To simplify the system, we dismantled the cover of the webcams.The whole system is very light and thin.The resolution of cameras will affect the accuracy of height estimation result.The variable baseline method is introduced for increasing the measuring range.Details will be illustrated in the following sections.

Design of stereo vision system 2.1 Depth measuring by triangulation
In general, a 3D scenery projected to 2D image will lose the information of depth.The stereo vision method is very useful for measuring the depth.The most common used method is triangulation.
Consider a point P=(X, Y, Z) in the 3D space captured by a stereo vision system, and the point P projected on both left and right images.The relation is illustrated in Figure 2. In Figure 2, the projected coordinates of point P on the left and the right images are (x l , y l ) and (x r , y r ) respectively.The formation of the left image is: and the formation of the right image is: From ( 1) and (2), we have where f is focal length, b is the length of baseline and Δx = (x lx r ) is the disparity.From (3), the accuracy of f, b and Δx will influence the depth measuring.In the next section, the camera will be calibrated for obtaining accurate camera parameters.
There are three major procedures for stereo vision system design.Fristly, the clear feature points in image need be extracted quickly and accurately.The second procedure is searching for corresponding points between two images.Finally, computing the depth using (3).

Depth resolution of stereo vision system
The depth resolution is a very important factor for stereo vision system design (Cyganek & Siebert, 2009).The pixel resolution will reduce with the depth.The relations of depth resolution is illustrated in Figure 3. From Figure 3, with the similarity of triangle In addition, the ( 6) is true with the condition: For single image, the f, b and p are all constants.Thus there is no depth information from a single image.Furthermore, consider Figure 4, and we will have: where k is the horizontal view angle, P h is the horizontal resolution of the camera.
Combining ( 6) with ( 10), we will have: From ( 11), the relation between baseline and pixel resolution are shown in Figure 5. Obviously, the pixel resolution and baseline are in a nonlinear relation.Moreover, they are almost in inverse proportion.The accuracy of system depends on choosing an appropriate baseline.In general, if small pixel resolution is expected, one should choose a larger beseline.
Fig. 5.The pixel resolution H with different baseline for stereo vision system setup.

Searching for corresponding points
The stereo vision system includes matching and 3D reconstruction processes.The disparity estimation is the most important part of the stereo vision.The disparity is computed by matching method.Furthermore, the 3D scene could be reconstructed by disparity.The basic idea of disparity estimation is using the pixel intensity of a point and its neighborhood on an image as a matching template to search the most matching area on another image (Alagoz, 2008;Wang & Yang, 2011).The similarity measurment between two images is definded by correlation functions.Based on different matching unit, there are two major categories of matching method which will be discussed.They are area-based matching method and feature-based matching method.

Area-based matching method
A lot of area-based matching methods have been proposed.Using area-based matching methods, one can obtain the dense disparity field without detecting the image features.Generally, the matching method has good results with flat and complex texture images.Template matching method and block matching method are relatively prevalent methods of the various area-based matching methods.Hu (2008) proposed the adaptive template for increasing the matching accuracy.Another example is proposed by Siebert et al. (2000).This approach uses 1D area-based matching along the horizontal scanline.Figure 6 illustrates the 1D area-based matching.Bedekar and Haralick (1995) proposed the searching method with Bayesian triangulation.Moreover, Tico et al. (1999) found the corresponding points of fingerprints with geometric invariant representations.Another case is area matching and depth map reconstruction with the Tsukuba stereo-pair image (Cyganek, 2005(Cyganek, , 2006)).In this case, the matching area is 3×3 pixels and the image size is 344×288 pixels (download from http://vision.middlebury.edu/stereo/eval/).The disparity and depth map are reconstructed and the depth information in the 3D scene are obtainded.The results are illustrated in Figure 7.
However, there are still some restrictions for area-based matching method.Firstly, the matching template is established with pixel intensity, therefore the matching performence are depedent on brightness, contrast, and textures.If the brightness is changed a lot or textures are monotonous, the matching performence will not be good.Secondly, the matching results will not be good when the image with depth discontinuity or masking.Last, the computational complexity is very high.Therefore, the feature-based matching methods are developed for improving the defects of area-based matching method.www.intechopen.com Machine Vision -Applications and Systems 30

Feature-based matching method
The feature-based matching method is matching the corresponding points on image with the features of the scene.To highlight the information of space, the features are more easily than the pixel intensity of area.Moreover, feature-based matching method is more robust than area-based method for brightness changing.There are two steps for the feature-based matching method.They are feature extraction and feature matching.The features are usually the lines, corners or planes in the image.The specific operators are utilized for extracting the features.
Many feature-based matching methods have been proposed for searching the feature correlation between the right and left images.Both the intensity and orientation of the features could be the matching templates for searching the correspondence of the features.Therefore, for the depth discontinuity or masking problems, the feature-based matching method can obtain better matching result.In addition, feature-based matching method computes only for the features istead of all pixels, hence the computing load is smaller than area-based matching method.Olson (2002) proposed the matching method based on statistics.This method extracted a few eigenvectors as the matching templates, and it used the maximum-likelihood for template matching.
Moreover, the phase-based image matching is also a very accurate matching method (Muquit et al., 2006).The images are transformed into frequency domain by 2D Discrete Fourier Transforms (2D DFTs).The best matching vector is obtained by computing the phase correlation function.Figure 8 is a simulation of phase-based image matching.The example image is "Cristo Redentor" in Brazil and the image sizes are both 119×127 pixels.Similarly, feature-based matching method have two restrictions.Firstly, the dense disparity field could not be obtained.Therefore, it is not easy to reconstruct the complex 3D scene.Secondly, the matching performance is affected by feature extraction results directly.In other words, if the features are too sparse, the matching results will not be good.Since botth the area-based and feature-based matching methods have some restrictions, the hybrid matching algorithm has been proposed in recent years.For example, Fuh et al. (1989) combined the optic flow and block matching algorithm for increasing the matching performance.In this chapter, we will combine the feature points and epipolar geometry constraint for reducing the computation.

Feature points detection
There are usually millions pixels in an image, therefore how to extract the significant feature points is the interesting research topic.Including edge detecting method (Canny, 1986), Tabu search algorithm (Glover, 1989(Glover, , 1990)), neural network (NN) (Takahashi et al., 2006) or Hough transform (Duan et al., 2010) are useful methods for extracting the special features from an image.However, the point sets of lines or edges are still too large.In addition, the searching speeds of most feature extracting algorithms, such as Tabu search, are not fast enough for real-time stereo vision systems.Consequently, the Harris corner detector (Nixon & Aguado, 2008) is proposed for detecting the feature points.
The main principle of Harris corner detector is using the Gaussian filter to detect the cornersresponse of each pixel in the image.Gaussian filter can not only enhance the significant corners, but also remove the undesirable corners.Moreover, it can reduce the probability of misjudgment.Although the Harris corner detector is a very useful tool, it requires a lot of computing time.Therefore the corner detection operations are applied only for the basic rectangle to reduce computation time.The following demo example shows the results of landmark image corner detection.The procedure is described as follows.
Step 1. Detect the corners of the image in the basic rectangle.
Step 2. Detect the convex corners of the label 'H'.
Step 3. Label the convex corners and extract the four most outside corners of the label 'H'.
Step 4. Find the intersection of the diagonals, and designate it as the approximate center of the landmark image.
Some examples for demonstrating the Harris corner detector are illustrated in Figure 9. Several advantages can be summarized from Figure 9.The Harris corner detector is very robust for corner detection.Moreover the Harris corner detector could not only detect the edges but also the corners of the object.The detection procedure with the basic rectangle segmentation can greatly enhance the detecting efficiency.

Epipolar geometry constraints
The epipolar geometry is the intrinsic projective geometry between two cameras.It only depends on the camera internal parameters and relative position.The fundamental matrix F is the algebraic description of the epipolar geometry.Epipolar geometry between two views is illustrated in Figure 10.From Figure 10, a point K in the 3D space is projected on Π and Π' respectively.The point k in Π corresponds to an epipolar line l' line on Π', and it can be represented as: where F is fundamental matrix, k = (x, y) and l'= (a b c) T .The point k' = (x', y') lies on l', then: '' 1 0 1 www.intechopen.com Based on Hartley's 8-point algorithm (Hartley, 1995) With 8 points, ( 16) can be solved.
Under epipolar geometry constraint, searching of the corresponding points will be reduced from 2D image to one line.

Block matching algorithm (BMA)
The key point of the stereo vision is how to search the corresponding points quickly and effectively.Epipolar geometry constraint is the typical skill for finding the corresponding points.However, movement of stereo vision systems will cause the images blurred.Moreover, since the resolution of webcam is pretty low, the task of searching the corresponding points becomes difficult.Here we will apply the block matching algorithm (BMA) for searching the corresponding points.
BMA is a standard technique for encoding motion in video sequences.It aims at detecting the similar block between two images.The matching efficiency depends on the chosen block size and search region.It is not easy to choose an appropriate block size.Usually, bigger blocks are less sensitive to the noise but will spend more computation time.Fast matching methods for searching the corresponding points have been proposed in some representative studies (e.g.Tao, 2008).Tao's method is to match the reference block in the left image and the candidate block on the epipolar line in the right image.Here we need more accurate matching results for finding out the corresponding points.Therefore, the full-search (FS) is used for achieving better searching results.
In Figure 11, the sum of absolute difference (SAD) is used for the block similarity measuring.The first pixel of the chosen block on left image is (x, y) and the block size is N×N.The search region on the right image is defined to be a rectangular with the image width as the width and height of 2k.Since the left and right cameras of the stereo vision system are placed on the same line, the k can be small in order for reducing the computation time.

Matching cost function
The matching cost function is a norm to represent the degree of correctness of a match.The smaller the matching cost, the higher the correctness.The sum of squared differences (SSD) and normalized cross correlation (NCC) are frequently used matching cost functions other than SAD.Functions of SAD, SSD, and NCC are illustrated as (17-19).
  ) ) 00 ( , , , ) R_Image L_image where m and n are the length and width of the block, (x, y) is the position of the block on right image and (r, s) denotes the motion vector.
Comparing these matching cost functions to norms in algebra, the SAD is analogous to an 1norm in algebra, and the SSD is analogous to a 2-norm in algebra.And the NCC uses an inner-product-like operation.Obviously the computation complexity of SAD is lower than the other matching functions.The SAD is the most frequently used matching cost function in applications since it is one of the more computationally efficient methods.The advantage of SAD has been mentioned in literatures (Humenberger, 2010;Point Grey Research Inc., 2000;Bradski, 2010).In our application, the computation time is an important factor, therefore the SAD will be the matching cost function for searching the correspondence.

Camera calibration
A camera calibration is done to build the relationship between the world coordinates and their corresponding image coordinates.Consider a pinhole camera model (David & Ponce, 2002) as shown in Figure 12.The camera parameters can be distributed into intrinsic parameters and extrinsic parameters (Hartley & Zisserman, 2003).Based on the collinearity equation and the pinhole camera model (Luhmann et al., 2007), the transformation between image point and reference point can be represented as where   , qq  is a reference point on the image.Re-arrange equation ( 20), we will have: Equation ( 21) is equivalent to The coefficients L 1 ~ L 11 are called DLT parameters.In order to solve ( 22), at least 6 control points are required.
The special orientation of camera can be reconstructed by DLT parameters.The principal where , and the principal distance is The elements of the rotation matrix R are The position of the camera center will be given by From ( 23) -( 26), the camera matrix C is obtained.
where K, called four-parameter, is given by: and P, called camera matrix, is given by: where t is the translation matrix.
There are many ways to solve the camera matrix P. Least square method, SVD or pseudoinverse method can be used in the case of an over-determined system.For example, Hartley (1997) used the specific form of Kruppa's Equation and explicitly in terms of singular value decomposition (SVD) of fundamental matrix for calculating the focal length of camera; Zhang (1999) proposed the camera calibration procedure by using specifically model plane; and Heikkilä (2000) proposed the 4-step camera calibration procedure to solve the projective relation between the model plane and the image plane.Here the calibration of the webcam is base on Zhang's procedure for solving camera matrix P.

Experimental results
In our applications, the stereo system will provide the real-time height information for AUHs.It is necessary that the method should be simple and fast.The local search BMA and epipolar geometry constraints are utilized for searching the stereo corresponding points.
There are three parts in this section.In the first part, corresponding points with fundamental matrix are searched and the BMA sreaching are verified.Next, the simulations of height estimation for AUHs are illustrated.The third part of this section demonstrates comparison of our methods with some other methods.

Measurement results of epipolar geometry constraints and BMA
From Table 1, the corresponding points can be obtained with BMA and epipolar geometry constraint from the obtained parameters of the webcams.As mentioned above, the fundamental matrix can be solved by the left and right camera matrices.An example shows the two images under the epipolar geometry constraint, and the results are shown in Figure 13.Right images are the reference images, and the corresponding points are lying on the epipolar lines of the left images.In the simulations, the measurement range is from 50 cm to 600 cm. Figure 15 shows that the error of estimation is less than 10 cm when the distance is 225 cm and baseline is 10 cm.When the distance is 300 cm with baseline being 15 cm, the estimation error is less than 10 cm.When the distance is 425 cm with baseline being 25 cm, the error of estimation is less than 10 cm.When the distance is 475 cm with baseline being 25 cm, the estimation error is less than 10cm.So we can conclude that the wider the baseline is, the further the measurement distance is.For the BMA, Figure 16 shows that the size of the template block on left image is 5×5 pixels, the searching range is 640×10 pixels and the distance of target is 700 cm.The estimation results of different distances by BMA are shown in Figure 17.
From Figures 15 and 17, we can conclude that the measurement distance increase with baseline increasing.We can also conclude that the range of measurement distance of BMA is more than that of epipolar geometry constraint method, in the sense of the same error tolerance.However, as the measuring range increasing, BMA searching results almost in the same range because of the low resolution of the image, and this causes that the measurement error increases quickly.

The simulations of height estimation for AUHs
In this section, some stereo pairs of aerial photographs are captured for demostrating our methods.The estimating results of the simulations are illustrated in Figure 23.In Figure 23, the x-axis are the length of baselines, the y-axis are the estimation errors, and all the quanties in this figure are in meters.We can find from the figure that the estimation error is decreasing as the baseline increasing.And as the height growing with the baseline unchanged, the estimation error is increasing.When the height is 10.1m and the baseline is 10cm, both the errors of BMA and epipolar geometry constraint are over 2.5m.We can also conclude that the estimation errors by BMA will be less than those by epipolar geometry at the same condition.

Comparison of our methods with other methods
In Humenberger's work (Humenberger et al., 2010), a comparison of prosessing speed of some real-time stereo vision systems has been made.The proposed methods in this chapter are computed on the platform of CPU.The methods of Point Grey Research Inc. (Point Grey Research Inc., 2000), and Bradski (Bradski, 2010) are also computed on the platform of CPU.
Table 2 shows the processing speed of these two systems comparing to our methods with Fig. 22(a) as the test image and SAD as the matching cost function.The size of the test image is 640×480.The processing speed is given in frames per second (fps).

Conclusion
For general purpose of helicopter autonomous flight, GPS is very useful.However, position information provided by GPS is not accurate enough for autonomous landing of an helicopter.The stereo vision system is designed to assist GPS while helicopter is autonomous landing.For small unmanned helicopters, the effective measuring range of 8 m is enough for landing control.The stereo vision system is a very competent sensor for height estimation.On the helicopter autonomous landing problem, stereo vision system could estimate the height of the helicopter.In this chapter, we proposed a low cost stereo vision system which is much cheaper than a GPS.The proposed system can provide acceptably accurate height information for the unmanned helicopter landing control system in certain range.
From the simulation results, it is evident that different baselines will produce different measurement results.The wider the baseline is, the longer that the system can be used for height estimation with acceptable range of error.Comparing the height estimation error of GPS, we can conclude that the system indeed provides more accurate information of height, and it is more useful for the helicopter autonomous landing.To increase the measurement range, one should use cameras of higher resolution and/or increase the baseline.
There are three major works need to be concerned in the future study.Firstly, maybe it is neessary to increase the number of cameras for expanding the camera range.In recent years, multi-view 3D construction technology had made significant progress.The 3D topographical construction with multi-view images that appears to be feasible.The second, the BMA should be further improved.A lot of search methods have been proposed for speeding-up or ameliorating the matching performance.Therefore our approach will be more improved.Finally, the orther matching method (e.g.region matching) will be attempted for better matching performance.

Fig. 1 .
Fig. 1.The stereo vision system composed of two Logitech ® webcams
Figure 8(c) is the phase correlation of Figures 8(a) and 8(b).From Figure 8(c), we can see that the peak is located at (111, 126), and hence the motion vector is (8, 1).

Fig. 9 .
Fig. 9.The simulations of the image processing and feature extraction results.(a) The test image 1.(b) The binary image and the basic rectangle of test image 1. (c) The corner detection result of test image 1.(d) The test image 2. (e) The binary image and the basic rectangle of test image 2. (f) The corner detection result of test image 2. (g) The test image 3. (h) The binary image and the basic rectangle of test image 3. (i) The corner detection result of test image 3.

Fig. 11 .
Fig. 11.Block template and the search region.

Fig. 12 .
Fig. 12. Pinhole camera model and projective transformation between world and camera coordinate system.

Figure 14
Figure 14 demostrates the corresponding points searching results with grid board.Almost all matching points between two images in s p e c i f i c a r e a h a v e b e e n f o u n d o u t .T h e estimation results of different distances are shown in Figure 15.In the simulations, the measurement range is from 50 cm to 600 cm.Figure15shows that the error of estimation is less than 10 cm when the distance is 225 cm and baseline is 10 cm.When the distance is 300 cm with baseline being 15 cm, the estimation error is less than 10 cm.When the distance is 425 cm with baseline being 25 cm, the error of estimation is less than 10 cm.When the distance is 475 cm with baseline being 25 cm, the estimation error is less than 10cm.So we can conclude that the wider the baseline is, the further the measurement distance is.
The second pose of target.

Table 1 .
is used for camera calibration.The image size is 640×480 pixels and the intrinsic parameters of the webcams and camera matrix are shown in Table1.Intrinsic parameters of the webcams.

Table 2 .
Computation time of different matching methods.