Some characteristics of the Axis 209 camera
In recent years, two dimensional laser range finders mounted on vehicles is becoming a fruitful solution to achieve safety and environment recognition requirements (Keicher & Seufert, 2000), (Stentz et al., 2002), (DARPA, 2007). They provide real-time accurate range measurements in large angular fields at a fixed height above the ground plane, and enable robots and vehicles to perform more confidently a variety of tasks by fusing images from visual cameras with range data (Baltzakis et al., 2003). Lasers have normally been used in industrial surveillance applications to detect unexpected objects and persons in indoor environments. In the last decade, laser range finder are moving from indoor to outdoor rural and urban applications for 3D imaging (Yokota et al., 2004), vehicle guidance (Barawid et al., 2007), autonomous navigation (Garcia-Pérez et al., 2008), and objects recognition and classification (Lee & Ehsani, 2008), (Edan & Kondo, 2009), (Katz et al., 2010). Unlike industrial applications, which deal with simple, repetitive and well-defined objects, camera-laser systems on board off-road vehicles require advanced real-time techniques and algorithms to deal with dynamic unexpected objects. Natural environments are complex and loosely structured with great differences among consecutive scenes and scenarios. Vision systems still present severe drawbacks, caused by lighting variability that depends on unpredictable weather conditions. Camera-laser objects feature fusion and classification is still a challenge within the paradigm of artificial perception and mobile robotics in outdoor environments with the presence of dust, dirty, rain, and extreme temperature and humidity. Real time relevant objects perception, task driven, is a main issue for subsequent actions decision in safe unmanned navigation. In comparison with industrial automation systems, the precision required in objects location is usually low, as it is the speed of most rural vehicles that operate in bounded and low structured outdoor environments.
To this aim, current work is focused on the development of algorithms and strategies for fusing 2D laser data and visual images, to accomplish real-time detection and classification of unexpected objects close to the vehicle, to guarantee safe navigation. Next, class information can be integrated within the global navigation architecture, in control modules, such as, stop, obstacle avoidance, tracking or mapping.
Section 2 includes a description of the commercial vehicle, robot-tractor DEDALO and the vision systems on board. Section 3 addresses some drawbacks in outdoor perception. Section 4 analyses the proposed laser data and visual images fusion method, focused in the reduction of the visual image area to the region of interest wherein objects are detected by the laser. Two methods of segmentation are described in Section 5, to extract the shorter area of the visual image (ROI) resulting from the fusion process. Section 6 displays the colour based classification results of the largest segmented object in the region of interest. Some conclusions are outlined in Section 7, and acknowledgements and references are displayed in Section 8 and Section 9.
2. Experimental platform
Automation increases productivity of outdoor operations by enlarging the efficiency, reliability and precision, as well as reducing operator accidents and health risks. The automation has come to place human operators to the more comfortable position of supervisor and emergencies solver. On the other hand, wireless communication plays a fundamental role in the development of ever more autonomous systems. To this goal a commercial tractor has been adequately retrofitted to perform teleoperation and safe autonomous navigation.
2.1. Robot tractor DEDALO
The vehicle used as experimental platform is a hydraulic articulated commercial tractor (AGRIA S.A.), Figure 1. Safety, location and environment recognition sensors are installed on board, and mechanical adaptations have been undertaken to achieve automatic control of clutch, brake and steering. A 2D laser range finder and a visual camera are installed in the front of the tractor to ease the detection and recognition of close objects, Figure 2. The object features classification process entails control actions, such as a stop or a deviation under collision risk (Martin et al., 2009). Accounting for the objective to be accomplished, the on board sensors are classified as follows:
2D Laser range finder
2D Laser range finder
Three navigation modes have been developed and tested, in addition to the manual driving:
Semi-autonomous navigation guided by natural objects in the scene
Autonomous navigation guided by GPS
2.2. Visual camera
A compact and rugged digital camera was placed in the front of the vehicle, Figure 2. This location favours the correspondence process between pixel rows in the visual image and objects detected by the laser. The camera durable transparent cover provides excellent protection against dust, humidity and vibrations. Vision systems provide rich and meaningful information of the environment, where volume, shape and colour are the main cues, but lacks objects depth information. The main characteristics of the digital camera are summarized in Table1.
The image resolution has been set to 640x480 pixels, as higher resolutions imply greater computing time, incompatible with real time requirements.
|Image sensor||1/3“ Progressive Scan RGB CMOS 1.3 Mpixel|
|Lens||3.6 mm, F1.8, fixed iris, horizontal angle of view: 74º|
|Light sensitivity||3-10.000 lux|
|Shutter time||1/15.000 s to ¼ s|
|Camera angle adjustment||Pan +/- 10º, tilt 0-90º, rotation +/-10º|
|Pan/Tilt/Zoom||Digital PTZ, preset positions, guard tour|
|Resolution||160x90, 640x480, 1280x1024|
2.3. Laser range finder
To obtain depth information from the environment, a 2D laser range finder (Sick LMS291) has been integrated in the vehicle, Figure 2. Main advantages of laser systems are a broad bandwidth and small beam divergence and footprint. They also offer a high immunity to atmospheric effects in opposition to the visual cameras. Laser range finder gives a sparse, but accurate map of the environment in a 2D plane. They use an infrared light beam of 905 nm, which receives directly the reflected signal from the objects, in polar coordinates. The laser operation mode is based on a time-of-flight (TOF) measurement principle: a single laser pulse is sent out and reflected by an object surface. The elapsed time between emission and reception allows the calculation of the distance between laser and object. The laser pulses sweep a radial range in front of the laser unit, via an integrated rotating mirror. Main laser characteristics are depicted in Table 2.
|Angular resolution||0.25º/0.5º/1º (selectable)|
|Response time||53 ms/ 26 ms/ 13 ms|
|Measurement resolution||10 mm|
|Systems error (good visibility, T=23º)||Typ. +/- 35 mm, range 1.20 m|
The laser range finder, located at 0.67 m height above ground level, has been set to a maximum distance of 8 m. with an angular resolution of 1º.
3. Major drawbacks in outdoor visual images
The outdoor visual image analysis has to face the sudden changes in the scene lighting, which depends on unpredictable weather conditions, as well as on seasonal and daily variations. High and unexpected illumination changes give rise to image in-homogeneities and require a dynamic processing to get optimum objects recognition. Sunny-days shadows make even more difficult the process of extracting relevant objects from an outdoor scene. Main disadvantages to image processing of natural scenes are:
High lighting variation requiring continuous settings
Large data volumes involved in the visual image
Real-time detection of dynamic objects
Current work deals with visual images displaying high lighting variations and consequently requiring efficient techniques capable of self-adaptation. Three images of a semi-rural outdoor environment, acquired while the robot was navigating, are displayed in Figure 3. The region between the red lines in the laser data representation corresponds, to the 74º view angle of the camera. Thus, the targets detected by the camera come into view in the laser data representation (laser image) within the 53º to 127 interval. The small green rectangle located in the centre of the x axis, represents the robot-tractor. For each scene visualised in Figure 3, the left column displays the laser distance map in the 0-8 m interval. The right one,
displays the visual images acquired with the camera and field of view of 74º. The extreme lighting conditions of the first and third scenes points to the need of smart fusion methods to enhance the objects detection by combining features extracted from the information provided by heterogeneous sensors. The 2D laser representation of the first scene exhibit two obstacles, presumably trees, but from the visual image it is nearly impossible to obtain the same conclusion, due to the image in-homogeneities caused by direct incidence of the sunlight on the visual camera. In the third scene, the 2D laser representation of the environment outlines a thin mast, impossible to extract from the visual image which was acquired at the nightfall. Right mast appearing in the visual image is detected at a distance close to the 8 meter detection limit. The thin obstacles located in the 0º-35º interval on the 2D laser distance-angle representation, correspond to weeds rising on the left border of the asphalt road. Opposite, the second scene displays an operator in front of the vehicle, which is easily detected by the 2D laser range finder, but not so well by the camera because of the low illumination of the scene.
Thus, perception algorithms able to combine features extracted from heterogeneous sensor are essential for real-time interpretation of outdoor scenes.
4. Sensor fusion method
Laser range finders used in robotic systems provide accurate distance measurements of the objects in the environment. On the other hand, visual cameras are capable of providing dense 3D information of a scene, but lack depth information of the objects. The fusion method here proposed, combines the accuracy of laser distance measurements, with the complementary information obtained from a visual image that allows for the extraction of features such as, volume, shape, texture and colour. In the very recent years, works devoted to the accurate and automatic extrinsic calibration of a laser-camera rig (Zhang & Pless, 2004), (Li et al., 2007), (Kassir & Peynot, 2010) in static indoor environments, highlights the increasingly interest on this perception platform. Natural outdoor scenes are hard to interpret due to both illumination variability and quick variations from one scene to the other. The colour and texture of natural objects in a visual image change, not only by light incidence variability, but due to occlusion and shapes superposition effects. Moreover, the width and position of an object on a 2D laser distance map varies with its dynamic behaviour. The closer works, to current one, are described in both, (Katz et al., 2007) devoted to the self-supervised classification of dynamic obstacles but operating in urban environments, and (Naroditsky et al., 2011) illustrating the automatic alignment of a camera-LIDAR rig, with no comments on real time applications. The sensor fusion method here proposed, aims at the reduction of the drawbacks affecting each of the proposed sensor systems, enhancing real-time characterization of objects in outdoor environments. The laser data and visual images fusion process, pursuing close objects classification, initiates with the definition of the region of interest on the visual image, where objects detected by the laser in the 1 to 8 m interval, are mapped. This region, being a limited area of the global visual image, will greatly reduce the time-computing of the subsequent segmentation and classification processes requiring a real-time response. The analysis of the visual images and laser data has been accomplished with MATLAB (Martin et al., 2009), on account of the ease and fast development facilities provided by its multiple toolboxes and libraries, in the initial development stage of a vision application. The fusion method, here illustrated, enhances the real-time interpretation of a scene to generate the reactive motions required to reach the safety requirements.
4.1. Region of interest in the visual image
Despite the extensive research conducted in vision systems and robotics applications, camera-laser rig perception in outdoor environments still remains a challenge. The simulation of a rural environment has been accomplished with Google SketchUp platform, to visualize the effect of vertical parallax on the intersection of the laser plane on the visual image as a function of the objects distance. This is caused by the fact that the camera is located 6 cm below the laser, but in the same vertical axis. In the simulated world, composed of a vehicle, a warehouse, and several trees, two persons were located in front of the vehicle at 1 and 8 meter distance, Figure 4.
The cut of the 2D laser plane on the visual image is highlighted in Figure 5. The laser plane cuts the operators, located at different distance, at distinct heights depending on the relative distance object-vehicle.
To calculate the limits of the laser plane displacement on the visual image, due to the vertical parallax effect exhibited in Figure 5, a calibration test has been accomplished in the real world. To this aim, a red card fixed at a stool, 0.67 m above the ground, has been used as a target. The stool was then placed at three different positions, 8, 4 and 1 meter distance from the vehicle. Then, laser angle-distance representations and visual camera images are acquired at each target position, as displayed in Figure 6 in the left and right columns respectively. In all three cases the red card was correctly detected by the laser and the camera. The greater the distance the lower is the card area in the visual image. The upward shift in the laser cut on the visual image (640 columns x 480 rows pixels) is visualised and calculated. Thus, for object distances in 1-8 meters interval, the target is always within the limited visual image area (640x80 pixels), parallax band, inscribed in the blue rectangle, Figure 7.
Now, to experimentally determine the (640x80 pixels) parallax band location on the original visual image, the following test was carried on. This test determines the position of a calibration object, small yellow mast on top of the red sawhorse, as detected by the laser beam, on the corresponding visual image. The camera-laser rig is moved as a block vertically top-down to locate the obstacle on the laser image. Both visual image and laser data are synchronized to acquire data in unison. The test stops when the obstacle appearing at a well-known position on the visual image, emerges in the corresponding laser angle-distance representation.
The yellow low height mast, used as calibration obstacle is displayed about the centre of the Figure 8 (a).
The mast is detected by the laser plane at a distance of 5.1 m., Figure 8 (b), and appears on the 251 row of the original visual image.
4.2. Fusion of laser and camera images features
The calculation of the location (row 251) and height (+/- 40 pixels) of the visual region where objects detected by the laser are mapped, allows the processing of a shorter area visual image (ROI) that obviously will reduce time-computing. The horizontal correspondence between objects in laser and visual images is straightforward as the displacement between the sensors in this axis is very small. That is, the laser 53-127º range is matched within the -37º to +37º field of view of the camera. Therefore, a linear correspondence is performed between the 53-127º (74º width) range, in the laser angle-distance representation, and the 640 pixel width of the visual image. Consequently, a region centred in the image row 251 with a total height of 80 rows (pixel), 15% of the global visual image, is selected for the segmentation and classification processing. Thus, sensor fusion is accomplished from the determination of the visual image masks (parts of the ROI) of the different objects detected by the laser in an outdoor scene. An example of the sensor fusion method, is illustrated in Figure 9. The two black masks, Figure 9(c), correspond to objects detected by the laser at about 3 and 7 m. distance, Figure 9(b). The centre of the mask is located in the row 251 of a visual image frame, with a fixed vertical height of 80 rows. The horizontal length of each mask is proportional to the obstacle width (in degrees) as detected by the laser range finder.
The second step of the sensor fusion process is concerned with the determination of the visual image regions associated to the two masks. Only in the selected visual region, the segmentation and classification processes will be executed. Two scenes are displayed in Figure 10 (a) and (b), to illustrate the steps of the fusion process, where the extraction of objects in the visual image is difficult caused by light incidence angle on the camera and low lighting in the scenes. The first row displays the 2D laser angle-distance representation, the second row corresponds to the acquired visual images and the third one exhibits the performed fusion process. The masks are located on the visual image regions corresponding to the objects detected by the 2D laser range finder, Figures 10 (a1) and (b1).
The next step, after the sensor fusion process, is the segmentation of the visual image regions (ROI) that intersects the visual masks, followed by the classification process of the segmented images.
5. Image segmentation
Image segmentation is one of the most widely used steps in the process of reducing images to information. It is accomplished by dividing the image into regions that correspond to the different structures present in the scene. This process is often described, in analogy to the human visual processes, as a foreground/background separation, focused in the recognition of one o several structures of interest in the scene, disregarding the rest. That means focusing the attention in the objects defined as relevant for the task to be performed. One of the problems of the segmentation algorithms is their difficulty to meet the real-time requirement (Sun et al., 2008).
5.1. Region growing algorithm
Region growing algorithm, based on active contours, is a simple region-based image segmentation method (Chan et al., 2001). It aims to group pixels together into regions of similarity, starting from an initial set of pixels. It is classified as a pixel-based image segmentation method because it involves the selection of initial seed points.
The method initiates defining an arbitrary pixel that acts like a seed and is compared to neighbours to determine its aggregation to the region. The method works iteratively for growing the initial regions of the visual image mask, Figure 11(a). The seeds are selected from the regions of former initialization image mask that allows applying the active contours method only where obstacles were detected by the laser range finder. The algorithm stops when 120 iterations are performed.
The false colour palette displayed in Figure 12, allows a quick interpretation of the outdoor scene objects. The precise value of the distance vehicle-object12 is provided by the 2D laser, Figure 11(b).
As it becomes evident,, the segmentation process based on Region growing algorithm is imprecise, Figure 11(b). The shape of the person in the segmented image lacks in exactness caused by an inadequate contour calculated by the algorithm. To improve this process, the Mean-shift segmentation algorithm is selected.
5.2. Mean shift algorithm: global image
The Mean shift algorithm is a nonparametric clustering technique which neither requires prior knowledge on the number of clusters, nor any constraint in the shape of the cluster. It is specially addressed for outdoor environments (Comaniciu and Meer, 2002). This unsupervised clustering algorithm and the segmentation based on colour, aids to handle real and dynamic outdoor environment characterization. The algorithm has been chosen opposite to other methods that require either a priori information about the number and shape of groups, or that lack the use of colour in the feature analysis. The Mean-shift algorithm was successfully used in the tracking of objects in image sequences with complex backgrounds (She et al., 2004). The algorithm requires high computing time, even with standard (640 * 480 pixel) images. Another drawback of the segmentation of global images is related to both the appearance of shadows and the region combination. The Mean-shift algorithm applied to the segmentation of an outdoor global image with a shadow on the ground is illustrated in Figure 13 (a) and (b). In the first scene the shadow is segmented as an obstacle. In the second scene, the blue trouser of a person in the global image, Figure 13 (c), is joined with a blue region of the sky Figure 13 (d).
For these reasons, the segmentation process is only performed on the visual image region detected with the fusion process (ROI) associated with the closest object.
5.3. Mean-shift algorithm: region of interest
The segmentation of the selected region on the visual image (ROI), obtained by the sensor fusion method with the Mean-shift algorithm, is now illustrated. The algorithm configuration parameters are defined to obtain large segmented regions on the image. The Matlab interface used (Bagon, 2010), is based on the EDISON program that implements the Mean-shift algorithm. First parameter is the “Minimum Region Area”, that is calculated accounting for the extension, in pixels, of the detected obstacle times a constant, and varies according to the obstacle extension detected by the laser range finder. This parameter is initially set to 10. On the other hand, the height of the region of interest (ROI) on the visual image has been empirically determined to 80 pixels, as a relevant value for the characterization of the objects of interest in this domain. The values of both “Spatial Band Width" and "Range Band Width " parameters have been fixed to 2, so as to highlight the more relevant segmented regions, in this case large areas. Finally, the largest segmented region is classified through its colour feature by using expert knowledge. The segmentation process as performed only on the region of interest (ROI) reduces the computing time of the Mean-shift algorithm to less than 1 second per scene. The classification results are better than those obtained from the segmentation of the global image. Thus, the Mean-shift method is appropriated for the segmentation when applied to a part of the global image, selected by means of the sensor fusion algorithm here proposed. The results obtained from the application of both the sensor fusion method and the Mean-shift segmentation algorithm for the classification of the closest object, are presented in the next section.
6. Real-time classification results
Three types of objects have been detected and classified in a specific semi-rural scenario consisting of trees, human operators and vehicles. The fusion and segmentation method here proposed is invariant to both brightness variations on the image and position or orientation of the static or dynamic objects. The ultimate aim of the characterization of the objects is their classification through a set of descriptors: extension, location, distance and colour. These descriptors conveniently integrated with domain knowledge result in smart strategies for safe reactive piloting either in rural or rural-urban domains. The first result concerns the characterization of the object “tree”. Six scenes, with large illumination variations, displaying trees in front of the robot that navigates along a trajectory, are displayed in Figure 14. The first column exhibits the 2D laser angle-distance representation, reflecting the distance and angular extension of the detected objects, second column exhibits the visual images, third one the results of the fusion method, and fourth the segmentation of the region of interest (ROI), where clusters are represented in smooth colour (close to real colour). The largest pixel area corresponds to the closest object detected by the laser.
The first scene displays three clusters and the largest one is the most representative. Colour expert knowledge, domain dependent, is integrated to automatically classify the type of obstacle among the three possible classes: tree (green or brown), human operator (blue) and vehicle (white). Thus, the pixels of the most representative cluster are matched with the corresponding pixels on the original global image, and the mean colour is calculated to accurately determine the real colour of the obstacle. In current domain, expert knowledge states that objects displaying green or brown colours are “tree”.
The first to third rows, in Figure 14, show three cases with low illumination, and fourth to sixth rows three scenes with high illumination. The results of sensor fusion and segmentation of the region of interest (ROI) demonstrate that the proposed methods are robust to illumination changes. Both, the first and second rows display an obstacle at about four and eighth meters distance, respectively. The sensor fusion method locates the visual region to be segmented, and the segmentation algorithm gives rise to three and two clusters, respectively. The search for the real colour, of the largest cluster, in its original visual image confirms green colour in both cases and the objects are assigned to the class “tree”. The fourth row scene is difficult to segment due to shadows on the ground. The fusion and segmentation algorithms highlight three clusters and the classification of the largest one is, “tree”. The fifth and sixth rows show scenes which are difficult to analyse because of the visual image in-homogeneities. Objects are detected at 4 and 5 meters, in the front of the vehicle, respectively, and are automatically classified as “tree”.
The Figure 15 displays six rural scenes where a human operator is moving in front of the robot from right to left. As in former scenes, the largest cluster (smooth colour) in each of the six images, is matched with the same region in the original image to determine the real colour, which in this case corresponds to blue. Thus, the detected object is classified as “operator”. The first row shows a scene with low illumination where the laser image accurately detects the human operator at 4 m. distance. However, the automatic recognition, based only in the visual image analysis, is difficult due to the colour similarity between human operator and ground. The second row shows the human operator and a white vehicle at the back at a distance greater than 8 m., so it is not detected by the laser. Only one cluster is obtained, that is classified as “operator”. The third and fourth scenes present shadows in the ground, however the sensor fusion and the segmentation methods, classify the largest cluster as “operator”. In the fifth and sixth scenes, the operator is close to the robot and the colour of his trouser as well as its vertical pattern are correctly classified.
Finally, six rural-urban scenes showing two vehicles parked on the right side of the robot-tractor trajectory are illustrated in Figure 16. The first and second columns introduce the 2D laser angle-distance representation and the visual images where the robot-tractor is close to the bus. The first to third rows corresponds to images where the robot is approaching a first bus parked almost in parallel to the asphalt road. The segmented image (fourth column) displays six clusters with vertical and horizontal patterns. The vertical patterns correspond to the doors and the horizontal pattern to the front. The second and third row images, Figure 16, correspond to the detection of the lateral sides of the first bus. Third and fourth
columns present the images resulting from the sensor fusion and the segmented areas of the region of interest (ROI), respectively. In both cases the largest cluster corresponds to white colour in the original image, so as it is assigned to the class “vehicle”. Fourth to sixth rows show images acquired while the robot-tractor is navigating close to a second bus. The fourth row images show the detection of the front corner of this bus. The vertical pattern emerging in the segmented image corresponds to the front door of the bus. The fifth and sixth rows of Figure 16, show in the segmented images the lateral and the back of the second bus. The region of interest is larger for the first bus than for the second one, as it is detected closer to the robot.The proposed methodology has been verified by extensive experimental tests, with a 98% success in the classification of the types of objects defined by their colour feature, in this scenario.
The fusion method proposed in current work is based on the combination of visual image and 2D laser range finder data. The method improves the recognition of outdoor objects in extreme illumination conditions by integrating objects features from both representations.
The fusion method succeeds in the most extreme and variable weather conditions in dynamic outdoor environments, combining the rich and colourful representation provided by the visual images with the precise depth planar picture supplied by the 2D laser range finder.
The Mean-shift algorithm used for the segmentation of a region of the image (ROI) resulting from the fusion of both laser and visual image features, and the integration of expert knowledge on the specific domain, permits the classification of some common objects: vehicles, trees and human operators. Experiments show 98% classification accuracy of dynamic objects.
The real-time detection and classification of unexpected objects, guarantees safety and permits the instantaneous generation of reflex actions to either deviate or stop the vehicle. Computing time of the whole perception and classification modules is about 1 second, but it could be greatly reduced by integrating real time processor systems.
Present work has been supported through projects: CICYT- DPI-2006-14497 by the Science and Innovation Ministry, ROBOCITY2030 I y II: Service Robots-PRICIT-CAM-P-DPI-000176-0505, and SEGVAUTO: Vehicle Safety-PRICIT-CAM-S2009-DPI-1509 by Madrid State Government.