An Adaptive Resolution Method Using Discrete Wavelet Transform for Humanoid Robot Vision System

The RoboCup (Kitano et al., 1995) is an international joint project to stimulate research efforts in the field of artificial intelligence, robotics, and related fields. According to the rules for the 2009 RoboCup, in the league for kid-sized robots (Avalable, 2009), the competitions were to take place on a rectangular field with an area of 600 × 400 cm2 containing two goals and two landmark poles, as shown in Fig. 1. A goal was placed in the middle of each goal line, with one of the goals colored yellow and the other colored blue. As shown in Fig. 2, each goal for the kid-sized robot field had a crossbar height of 90 cm, a goal wall height of 40 cm, a goal wall width of 150 cm, and a 50 cm depth for the goal wall. The two landmark poles were placed on each side of the two intersection points between the touch line and the middle field line. The landmark pole was a cylinder with a diameter of 20 cm. It consisted of three segments, each 20 cm in height, stacked on top of each other. The lowest and the highest segments have the same color as the goal on its left side, as shown in Fig. 3. The ball is the standard size orange tennis ball. All of the above objects are the most critical characteristics in the field, and they are also the key features which we have to pay attention to.


Introduction
The RoboCup (Kitano et al., 1995) is an international joint project to stimulate research efforts in the field of artificial intelligence, robotics, and related fields.According to the rules for the 2009 RoboCup, in the league for kid-sized robots (Avalable, 2009), the competitions were to take place on a rectangular field with an area of 600 × 400 cm 2 containing two goals and two landmark poles, as shown in Fig. 1.A goal was placed in the middle of each goal line, with one of the goals colored yellow and the other colored blue.As shown in Fig. 2, each goal for the kid-sized robot field had a crossbar height of 90 cm, a goal wall height of 40 cm, a goal wall width of 150 cm, and a 50 cm depth for the goal wall.The two landmark poles were placed on each side of the two intersection points between the touch line and the middle field line.The landmark pole was a cylinder with a diameter of 20 cm.It consisted of three segments, each 20 cm in height, stacked on top of each other.The lowest and the highest segments have the same color as the goal on its left side, as shown in Fig. 3.The ball is the standard size orange tennis ball.All of the above objects are the most critical characteristics in the field, and they are also the key features which we have to pay attention to.
The functions of humanoid robot vision system include image capturing, image analyses, and digital image processing by using visual sensors.For digital image processing, it is to transform the image into the analyzable digital pattern by digital signal processing.We can further use image analysis techniques to describe and recognize the image content for the robot vision.The robot vision system can use the environment information captured in front of the robot to recognize the image by means of the technique of human vision system.An object recognition algorithm is thus proposed to the humanoid robot soccer competition.
Generally speaking, object recognition uses object features to extract the object out of the picture frame, and thus shape (Chaumette, 1994) and (Jean & Wu, 2004), contour (Sun et al, 2003), (Kass et al., 1988), and (Canny, 1986), color (Herodotou et al., 1998) and (Ikeda, 2003), texture, and sizes of object features are commonly used.It is important to extract the information in real-time because the moving ball is one of the most critical object in the contest field.The complex feature such as contour is not suited to recognize in our application.The objects don't have the obvious texture which is not suited to use in the contest field.However the object color is distinctive in the contest field, we mainly choose the color information to determine the critical objects.
Although this approach is simple, the real-time efficiency is still low.Because there is a lot of information to be processed in every frame for real-time consideration, Sugandi et al. (Sugandi et al, 2009) proposed a low resolution method to reduce the information.It can speed up the processing time, but the low resolution results in a shorter recognizable distance and it may increase the false recognition rate.In order to improve the mentioned drawbacks, we propose a new approach, adaptive resolution method (ARM), to reduce the computation complexity and increase the accuracy rate.
The rest of this study is organized as follows.Section 2 presents the related background such as the general color based object recognition method, low resolution method, and encountered problems.Section 3 describes the proposed approach, ARM.The experimental results are shown in Section 4. Finally, the conclusions are outlined in Section 5.

Color based object recognition method
An efficient vision system plays an important role for the humanoid robot soccer players.Many robot vision modules have provided some basic color information, and it can extract the object by selecting the color threshold.The flow chart of a traditional color recognition method is shown in Fig. 4. The RGB color model comes from the three additive primary colors, red, green, and blue.The main purpose of the RGB color model (Gonzalez & Woods, 2001) is for the sensing, representation, and display of images in electronic systems, such as televisions and computers, and it is the basic image information format.The X, Y, and Z axes represent the red, green, and blue color components respectively, and it can describe all colors by different proportion combinations.Because the RGB color model is not explicit, it can be easily influenced by the light illumination and make people select error threshold values.An HSV (HSV stands for hue, saturation, and value) color model relates the representations of pixels in the RGB color space, which attempts to describe perceptual color relationships more accurately than RGB.Because the HSV color model describes the color and brightness component respectively, the HSV color model is not easily influenced by the light illumination.The HSV color model is therefore extensively used in the fields of color recognition.The HSV transform function is shown in eqs.( 1 In ( 1), (2), and (3), the range of H, hue, is 0°~360°; the range of S, is 0~1, and the range of V, value, is 0~255.The RGB values are confined by (4): where "max" indicates the maximum value in the RGB color components and "min" indicates the minimum value in the RGB color components.Hence, we can directly make use of H and S to describe a color range of high environmental tolerance.It can help us to obtain the foreground objects mask, M(x,y), by the threshold value selection as shown in (5).
( ) ( ) ( ) where T H1 , T H2 , and T S are the thresholds of hue and threshold of saturation by manual setting.The foreground object mask usually accompanies with the noise, and we can remove the noise by the simple morphological methods, such as dilation, erosion, opening, and closing.It needs to separate the objects by labeling when many objects with the same colors are existed in the frame.The following procedure is the flow for labeling (Gonzalez & Woods, 2001): Step Step 4: Until no connected component can be found; Step 5: Update, i = i+1.Then go to Step 1 and repeat Steps 2~4; Step 6: Completely scan the image.By using the above-mentioned procedure, the objects can be extracted.Although this method is simple, it is only suitable for low frame rate sequences.For a high resolution or noisy sequence, this approach may need very high computation complexity.

Low resolution method
To overcome the above-mentioned problems, several approaches of low resolution method were proposed (Sugandi et al., 2009), (Cheng & Chen et al., 2006).The flow chart of a general low resolution method is shown in Fig. 5. Several low resolution methods, such as the approach of applying 2-D discrete wavelet transform (DWT) and the using of 2×2 average filter (AF), were discussed.(Cheng & Chen, 2006) applied the 2-D DWT for detecting and tracking moving objects and only the LL 3 -band image is used for detecting motion of the moving object (It is suggested that the LL 3 -band is a good candidate for noise elimination (the user can choose a suited decomposition level according to the requirement, and actually there is no need to do the reconstruction for these applications).Because noises are preserved in high-frequency, it can reduce computing cost for post-processing by using the LL 3 -band image.This method can be used for coping with noise or fake motion effectively, however the conventional DWT scheme has the disadvantages of complicated calculation when an original image is decomposed into the LL-band image.Moreover if it uses an LL 3 -band image to deal with the fake motion, it may cause incomplete moving object detecting regions.In (Sugandi et al., 2009) proposed a simple method by using the low resolution concept to deal with the fake motion such as moving leaves of trees.The low resolution image is generated by replacing each pixel value of an original image with the average value of its four neighbor pixels and itself as shown in Fig. 6.It also provides a flexible multi-resolution image like the DWT.Nevertheless, the low resolution images generated by using the 2×2 AF method are more blurred than that by using the DWT method.It may reduce the preciseness of post-processing (such as object detection, tracking, and object identification), because the post-processing depends on the correct location of the moving object detecting and accuracy moving object.In order to detect and track the moving object more accurately, we propose a new approach, adaptive resolution method (ARM), which is based on the 2-D integer symmetric mask-based discrete wavelet transform (SMDWT) (Hsia et al, 2009).It does not only retain the features of the flexibilities for multi-resolution, but also does not cause high computing cost when using it for finding different subband images.In addition, it preserves more image quality of the low resolution image than that of the average filter approach (Sugandi et al., 2009).

Symmetric Mask-Based Discrete Wavelet Transform (SMDWT)
In 2-D DWT, the computation needs large transpose memory and has a long critical path.On the other hand SMDWT has many advanced features such as short critical path, high speed operation, regular signal coding, and independent subband processing (Hsia et al, 2009).The derivation coefficient of the 2-D SMDWT is based on the 2-D 5/3 integer lifting-based DWT.
For computation speed and simplicity considerations, four-masks, 3×3, 5×3, 3×5, and 5×5, are used to perform spatial filtering tasks.Moreover, the four-subband processing can be further optimized to speed up and reduce the temporal memory of the DWT coefficients.The fourmatrix processors consist of four-mask filters, and each filter is derived from one 2-D DWT of 5/3 integer lifting-based coefficients.
In the ARM approach, we can select only the LL-band mask of SMDWT (The moving object is low-frequency energy).Unlike the conventional DWT method to process row and column dimensions respectively by low-pass filter and down-sampling, the LL-mask band of SMDWT can be used to directly calculate the LL-band image.The matrix function of the LL-mask is shown in (6) and the coefficients of the LL-mask are shown in Fig. 7 (Hsia et al, 2009).SMDWT (using the LL-band mask only) can reduce the image transfer computing cost and remove the noise.Besides, this approach can have accurate object tracking for various types of occlusions. ) Discrete Wavelet Transforms -A Compendium of New Approaches and Recent Applications

Adaptive Resolution Methos (ARM)
ARM takes advantage of the information obtained from the image to know the area of the ball and chooses the most suitable resolution.The operation flow chart is shown in Fig. 8.After HSV color transformation, ARM chooses the most proper resolution by the situation at this moment in time.The high resolution approach brings a longer recognizable distance but with a slower running speed.On the other hand, the low resolution approach brings a lower recognizable distance but with a faster running speed.When we got the area information of the ball from the image last time, we could convert it as the "sel" signal through the adaptive selector to choose the appropriate resolution.The "sel" condition is shown in (7): ) In (7), A thd1 and A thd2 are the threshold values for the area of ball.The relationship between the resolution and the distance of the ball is described in Table 1.According to Table 1, we can conclude that A thd1 and A thd2 are set to 54 and 413, respectively.The threshold selection is performed for each different resolution of working environment.The threshold value is used to produce the recognizable distance.If the ball disappears in the frame, the frame will change into the original size to have a higher probability to find out the ball.Since the sizes of other critical objects (such as goal and landmark) in the field are larger than the ball, they can be recognized easily.Fig. 9 shows the results of different resolutions after the HSV transformation.

Sample object recognition method
According to the above-mentioned color segmentation method, it can fast and easily extract the orange ball in the field, but it is not enough to recognize the goals and landmarks.The colors of the goals and landmarks are yellow and blue, and by color segmentation the extraction of goals and landmarks may not be correct as shown in Fig. 10.Therefore we have to use more features and information to extract them.Since the contest field is not complicated, a simple recognition method can be used to reduce the computation complexity.The landmark is a cylinder with three colors.Let us look at one of the landmark with the upper and bottom layers in yellow, and the center layer in blue; this one is defined as the YBY-landmark.The diagram is shown in Fig. 11.The color combinations of the other one are in contrast of the previous one, and the landmark is defined as the BYB-landmark.The labels of the YBYlandmark can be calculated by (8).The BYB landmark is in the same manner as the YBYlandmark.According to the above-mentioned labeling procedure, we labeled all of the yellow and blue components in the frame and assigned the numbers to those components.Where L Y i is defined the pixels of the i-th yellow component (Y), y min and y max the minimum value and the maximum value for the object i at y direction respectively, x c and y c the center point of the object at the horizontal and vertical direction respectively.The vertical bias value β Y is set as 15.The landmark is composed of two same color objects in the vertical line, and the center is in different color.If it can find an object with this feature, the system can treat this object as the landmark and outputs the frame coordinate data.
The result of landmark recognition is shown in Fig. 12. Eq. ( 9) is used to define the label of the ball: where is the pixel of the s-th orange component in a frame.Since the ball is very small in the picture frame, in order to avoid noise, the ball is treated as the maximum orange object and with a shape ratio of height to width approximately equal to 1.Here α 1 and α 2 are set to 0.8 and 1.2, respectively.The result of ball recognition is shown in Fig. 13.The goal recognition is defined in (10).
where is the pixel of the m-th blue component in a picture frame.Since the blue goal is composed of the blue object, it is not a part of the YBY-landmark and BYB-landmark.The size of the goal in the field is the largest object, and therefore we set the parameter γ as 50.The result of goal recognition is shown in Fig. 14.The yellow goal is in the same manner as the blue goal.

Coordinate transformation
Because our proposed approach, ARM is using the different resolutions in the object recognition, we transform the coordinate into the original resolution by level-based of DWT when the object information is outputted.The transform equation is defined in (11).
where O is the original image, LL n is the LL-band iamge after transformation, and n is the transformation level.

Experimental results
In this work, the environment information is extracted by the Logitech QuickCam Ultra Vision (Using the monocular vision technique).The image resolution is 320×240, and the frame rate is 30 FPS (frame per second).For the simulation computer, the CPU is Intel Core 2 Duo CPU 2.1GHz, and the development tool is Borland C ++ Builder 6.0.The graphical interface is shown in Fig. 15.
This work is dedicated to the RoboCup soccer humanoid league rules of the 2009 competition.
In order to prove the robustness of the proposed approach, many scenes of various situations are simulated to verify the high recognition accuracy rate and fast processing time.For the analyses of recognition accuracy rate, it is classified as a correct recognition if the critical object is labeled completely and named correctly such as the objects of Goal[B] and Ball shown in Fig. 16(a).On the other hand there are two categories for false recognition, "false positive" and "false negative"."False positive" means that the system recognizes the irrelevant object as the critical object, such as the Goal[Y] shown in Fig. 16(b)."False negative" means the system cannot label or name the critical object, such as those balls shown in Figs.16(c) and 16(d).

Low resolution analysis
Several low resolution methods, such as down-sampling (DS), AF, and SMDWT, were implemented and simulated in this experiment and the noise removing capabilities with these methods were analyzed.The flow chart of noise removing for the low resolution approaches is shown in Fig. 17 The experiment data are listed in Table 2.According to Table 2, the DS approach has the worst noise removing capability; the 2×2 AF approach also has a bad noise removing capability for big noise block even though this method can make the image smoother.On the other hand, the SMDWT approach (using LL-mask only) has a better noise removing capability than the other methods, and it can retain the information of low-frequency component and remove the noise of high-frequency component in the image.In order to improve the noise removing capability of the whole system, we added the opening operator (OP) of mathematical morphology after labeling in the flow chart of Fig. 17    The experiment data after adding the opening operator are shown in Table 3.Compared with the results of Table 2, the noise numbers are reduced significantly after adding the opening operator, and it can reduce the unnecessary computation.The SMDWT approach has the best performance and the frame rate can be as high as 30 FPS.Therefore this work adopts the SMDWT approach as the low resolution method.

Adaptive Resolution Method (ARM) analyses
In this experiment, we try to verify that ARM does not only retain high recognition accuracy rate, but also can raise the system processing efficiency.The hue threshold values of the orange, yellow, and blue colors are set as 35~45, 70~80, and 183~193, respectively.The saturation threshold values of the orange, yellow, and blue colors are all set as 70.To verify the ARM approach, the camera is set in the center of the contest field.The scene tries to simulate that the robot kicks ball into the goal and the vision system will track the ball.The results under resolutions of 320×240, 160×120, 80×60, and ARM are shown in Figs.24-27, respectively.The experiment data of the accuracy rate and average FPS under different resolutions and ARM are shown in Table 4 and Fig. 28.According to Table 4, although the 320×240 resolution has a high accuracy rate, the processing speed is slow.The 80×60 resolution has the highest processing speed, but it has the lowest accuracy rate.By this approach, it gets high accuracy rate only when the object is close to the camera.On the other hand, the proposed ARM approach does not only have a high accuracy rate, but also keeps high processing speed.According to Fig. 28, the result shows that ARM selects the most proper resolution when the Discrete Wavelet Transforms -A Compendium of New Approaches and Recent Applications ball is in different distances.ARM uses the 80×60 resolution when the level is equal to 2 and uses the 160×80 resolution when the scale level is equal to 1.As the scale level is equal to 0, ARM selects the original input frame size (320×240).

The critical objects recognition analysis
In this experiment, several scenes were simulated to improve the robustness of feature recognition approaches proposed in this work.

Landmark recognition analysis
According to (8), the landmark is composed of two same color objects in the vertical line, and the bias value β is the key point to make sure whether this block is a landmark or not.A small bias value β will cause the missing recognition, however a large β may recognize an irrelevant block as a landmark, and these two situations are shown in Fig. 29.The experiment data of landmark recognition is shown in Table 5.According to this table, we can have a higher recognition accuracy rate when β is greater than 15.Generally speaking, the vibration of robot walking is not more intense than the simulation, and therefore β is set as 15 in this work.It will increase the chance of false recognition as a larger β is used.

Goal recognition analysis
The goal is the largest critical object in the field, and hence the camera always captures the incomplete goal in the frame when the robot is walking in the field.It causes a false recognition easily by using the feature of the shape ratio to recognize the goal.We improve this drawback by using the proposed method in Section 3.2 and the experimental results are shown here.The camera is set in the center of the contest field.The scene tries to simulate that the robot raises its head to see the goal and turns right to see the YBY-landmark and then turns left to see the BYB-landmark.The hue threshold values of the orange, yellow, and blue colors are set as 35~45, 70~80, and 183~193, respectively.The saturation threshold values of the orange, yellow, and blue colors are all set as 60.The results are shown in Fig. 34 and the experiment data are listed in Table 6.According to the result, the system can make the correct recognition of goal even though the goal is occluded.

Ball recognition analysis
For the ball recognition, the system determines the orange block which has the maximum pixels as a ball for preventing the influence of noise.In this experiment, two balls are used in the scene.One ball is static in the field, and the other one moves into the frame and then moves away from the camera.The hue threshold values of the orange, yellow, and blue colors are set as 35~45, 70~80, and 183~193, respectively.The saturation threshold values of the orange, yellow, and blue colors are all set as 60.The result is shown in Fig. 35 and the experiment data are shown in Table 7.The static ball is labeled absolutely if only one ball is in the field, and the result is shown in Fig. 35(a).Because another ball has a bigger area when it is moving into the frame, the system will label the moving ball and determine the static ball as noise, and the result is shown in Fig. 35(b).When the moving ball is distant from the camera, the static ball is labeled again, and the result is shown in Fig. 35(c).Besides, it can also handle the situation when the ball is occluded partially by using the feature recognition proposed.We use the scene that the ball is occluded by the landmark during the ball moving in the frame from left to right.The hue threshold values of the orange, yellow, and blue colors are set as 35~45, 70~80, and 175~185, respectively.The saturation threshold values of the orange, yellow, and blue colors are all set as 50.The results are shown in Fig. 36 and the experiment data are shown in Table 7.

Environmental tolerance analysis
The color deviation by luminance variation has the most influence to the result of the colorbased recognition method proposed in this work.Before the robot soccer competition we usually have one day to prepare for the contest, and therefore we can regulate the threshold values easily by the graph interface according to the luminance of the field.The results under different luminance are shown in Fig. 37.The reference threshold values are shown in Table 8.The system cannot only recognize the critical objects under different luminance, but it can also accommodate the light changing suddenly.This experiment simulates that the robot recognizes the BYB-landmark and ball in the field under the light changing suddenly.The hue threshold values of the orange, yellow, and blue colors are set as 33~43, 67~77, and 175~185, respectively.The saturation threshold values of the orange, yellow, and blue colors are all set as 50.The results are shown in Fig. 38 and the experiment data are listed in Table 9.According to the result, the proposed method has a good performance about environmental tolerance.

Synthetic analyses
In this experiment, several scenes were simulated to compare the recognition accuracy rate and processing time between the 320×240 resolution and ARM.Scene 1: the ball is approach it to the camera slowly.Scene 2: the robot is approaching the ball after shooting the ball to the goal.Scene 3: the robot finds the ball and then tries to get approaching and kick it.Scene 4: the camera captures a blurred image when the head motor of the robot is rotating very fast.Scene 5: the robot localizes itself by seeing the landmarks.The hue threshold values of the orange, yellow, and blue colors are set as 35~45, 70~80, and 185~190, respectively.The saturation threshold values of the orange, yellow, and blue colors are all set as 50.The experiment data of these scenes are shown in Table 10 and the experimental results are shown in Figs.39-43, respectively.According to the simulation results, our proposed method accommodates many kinds of scenes.It has the accuracy rate of more than 93% on average and the average frame rate can reach 32 FPS.It does not only maintain the high recognition accuracy rate for the high resolution frames, but also increases the average frame rate for about 11 FPS compared to the conventional high resolution approach.Furthermore, all of the experimental result videos mentioned in this section are appended in.

Figure 1 .
Figure 1.The field for the competitions.

Figure 4 .
Figure 4.The flow chart of the traditional color recognition method.

1 :
Scan the threshold image M(x,y); Step 2: Give the value Label color i to the connected component Q{n} of pixel(x,y); Step 3: Give the same value to the connected component of Q{n}; Discrete Wavelet Transforms -A Compendium of New Approaches and Recent Applications

Figure 5 .Figure 6 .
Figure 5.The flow chart of a general low resolution method.

Figure 7 .
Figure 7.The subband masks coefficients of the LL-mask.

Figure 8 .
Figure 8.The flow chart of ARM.

Figure 9 .
Figure 9.The results after the HSV transformation under the resolutions of video.(a) Recognizable max distance of ball in 320×240; (b) Recognizable max distance of ball in 160×120; (c) Recognizable max distance of ball in 80×60

Figure 12 .
Figure 12.The result of landmark recognition.
. The input frame resolution is 320×240, and the resolution turns to be 160×120 after the low resolution processing.The noise numbers under different low resolution methods were counted.The contents of the simulated scene are obtained by turning the camera to left to see the YBY-landmark and keeping turning until the YBY-landmark disappeared from the camera scope.In this situation the background of the scene produces noise very easily.The hue threshold values of the orange, yellow, and blue colors are set as 35~45, 70~80, and 183~193, respectively, and the saturation threshold of the orange, yellow, and blue colors values are all set as 70.The experimental results under different low resolution methods, DS, AF, and SMDWT, are shown in Figs.18-20, respectively.

Figure 17 .
Figure 17.The flow chart of noise removing capability.

Figure 28 .
Figure 28.The relationship between frame number and frame rate under different resolutions and ARM.

Figure 29 .
Figure 29.The diagram of false recognition of landmark.(a) the case of small β.(b) the case of large β.In this experiment, different values of β were set to test the effects.The scene is used for simulating that the camera captures a slantwise landmark when the robot is walking.The hue threshold values of the orange, yellow, and blue colors are set as 35~45, 65~75, and 175~185, respectively.The saturation threshold values of the orange, yellow, and blue colors are all set as 60.The results under values of β equal to 5, 10, 15, and 20 are shown in Figs.30-33, respectively.

Table 1 .
The relationship between the resolution and the distance of the ball.An Adaptive Resolution Method Using Discrete Wavelet Transform for Humanoid Robot Vision System http://dx.doi.org/10.5772/55114 *The area means the pixel number in the original resolution.

Table 2 .
The noise counts under different low resolution methods.

Table 3 .
The noise counts under different low resolution methods with opening operator

Table 4 .
The experimental results of the accuracy rate and average FPS under different resolutions and ARM.

Table 5 .
The experimental results under different values of β.

Table 6 .
The experiment data of goal recognition.

Table 7 .
The experiment data of ball recognition.
Discrete Wavelet Transforms -A Compendium of New Approaches and Recent Applications

Table 8 .
The threshold values used under different luminance.An Adaptive Resolution Method Using Discrete Wavelet Transform for Humanoid Robot Vision System http://dx.doi.org/10.5772/55114

Table 9 .
The experiment data of light influence.

Table 10 .
The experimental results of several kinds of scene simulation.An Adaptive Resolution Method Using Discrete Wavelet Transform for Humanoid Robot Vision System http://dx.doi.org/10.5772/5511491