In this paper, we propose a noise removal method from image sequences by spatio-temporal image processing. A spatio-temporal image can be generated by merging the acquired image sequence (Fig. 1(a)), and then cross-section images can be extracted from the spatio-temporal image (Fig. 1(b)). In these cross-section images, we can detect moving objects and estimate the motion of objects by tracing trajectories of their edges or lines.
In recent years, cameras are widely used for surveillance systems in outdoor environments such as the traffic flow observation, the trespassers detection, and so on. It is also one of the fundamental sensors for outdoor robots. However, the qualities of images taken through cameras depend on environmental conditions. It is often the case that scenes taken by the cameras in outdoor environments are difficult to see because of adherent noises on the surface of the lens-protecting glass of the camera.
For example, waterdrops or mud blobs attached on the protecting glass may interrupt a field of view in rainy days (Fig. 2). It would be desirable to remove adherent noises from images of such scenes for surveillance systems and outdoor robots.
Professional photographers use lens hoods or put special water-repellent oil on lens to avoid this problem. Even in these cases, waterdrops are still attached on the lens. Cars are equipped with windscreen wipers to wipe rain from their windscreens. However, there is a problem that a part of the scenery is not in sight when a wiper crosses.
Therefore, this paper proposes a new noise removal method from images by using image processing techniques.
A lot of image interpolation or restoration techniques for damaged and occluded images have been also proposed in image processing and computer vision societies (Kokaram et al., 1995, Masnou & Morel, 1998, Joyeux et al., 1999, Bertalmio et al., 2000, Bertalmio et al., 2001, Kang et al., 2002, Bertalmio et al., 2003, Matsushita et al., 2005, Shen et al., 2006, Wexler et al., 2007). However, some of them can only treat with line-shape scratches (Kokaram et al., 1995, Masnou & Morel, 1998, Joyeux et al., 1999), because they are the techniques for restoring old damaged films. It is also required that human operators indicate the region of noises interactively (not automatically) (Bertalmio et al., 2000, Bertalmio et al., 2001, Kang et al., 2002, Bertalmio et al., 2003, Matsushita et al., 2005, Shen et al., 2006, Wexler et al., 2007). These methods are not suitable for surveillance systems and outdoor robots.
On the other hand, there are automatic methods that can remove noises without helps of human operators (Hase et al., 1999, Garg & Nayar, 2004). Hase et al. have proposed a real-time snowfall noise elimination method from moving pictures by using a special image processing hardware (Hase et al., 1999). Garg and Nayar have proposed an efficient algorithm for detecting and removing rain from videos based on a physics-based motion blur model that explains the photometry of rain (Garg & Nayar, 2004). These techniques work well under the assumptions that snow particles or raindrops are always falling. In other words, they can detect snow particles or raindrops because they move constantly.
However, adherent noises such as waterdrops on the surface of the lens-protecting glass may be stationary noises in the images. Therefore, it is difficult to apply these techniques to our problem because adherent noises that must be eliminated do not move in images.
Previous study (Yamashita et al., 2003) is based on the comparison of images that are taken with multiple cameras. However, it cannot be used for close scenes that have disparities between different viewpoints, because it is based on the difference between images.
Stereo camera systems are widely used for robot sensors, and they must of course observe both distant scenes and close scenes. Therefore, we have proposed a method that can remove waterdrops from stereo image pairs that contain objects both in a distant scene and in a close range scene (Tanaka et al., 2006). This method utilizes the information of corresponding points between stereo image pairs, and thereby sometimes cannot work well when appearance of waterdrops differs from each other between left and right images.
We have also proposed a noise removal method by using a single camera (Yamashita et al., 2004, Yamashita et al., 2005). These methods use a pan-tilt camera, and eliminate adherent noises based on the comparison of two images; a first image and a second image taken by a different camera angle (Fig. 3). However, adherent noises cannot be eliminated if a background object is blocked by a waterdrop in the first image and is also blocked by another waterdrop in the second image.
In this paper, we use not only two images at certain two frames but all of the image sequence to remove adherent noises in the image sequence. We generate a spatio-temporal image by merging the acquired image sequence, and then detect and remove adherent noises (Yamashita et al., 2008, Yamashita et al., 2009).
The composition of this paper is detailed below. In Section 2, we mention about outline of our method. In Section 3, the method of making a spatio-temporal image is explained. In Section 4 and Section 5, the noise detection and removal method are constructed, respectively. In Section 6, experimental results are shown and we discuss the effectiveness of our method. Finally, Section 7 describes conclusions and future works.
2. Overview of noise detection and removal method
As to adherent noises on the protecting glasses of the camera, the positions of noises in images do not change when the direction of the camera changes (Fig. 3). This is because adherent noises are attached to the surface of the protecting glass of the camera and move together with the camera. On the other hand, the position of static background scenery and that of moving objects change while the camera rotates.
We transform the image after the camera rotation to the image whose gaze direction (direction of the principal axis) is same with that before the camera rotation. Accordingly, we can obtain a new image in which only the positions of adherent noises and moving objects are different from the image before the camera rotates.
A spatio-temporal image is obtained by merging these transformed images. In the spatio-temporal image, trajectories of adherent noises can be calculated. Therefore, positions of noises can be also detected in the image sequence from the spatio-temporal image. Finally, we can obtain a noise-free image sequence by estimating textures on adherent noise regions.
3. Spatio-temporal image
3.1. Image acquisition
An image sequence is acquired while a pan-tilt camera rotates.
At first (frame 0), one image is acquired where the camera is fixed. In the next step (frame 1), another image is taken after the camera rotates rad about the axis which is perpendicular to the ground and passes along the center of the lens. In the t-th step (frame t), the camera rotate rad and the t-th image is taken. To repeat this procedure n times, we can acquire n/30 second movie if we use a 30fps camera.
Note that the rotation angle makes a positive direction a counterclockwise rotation (the direction of Fig. 3).
The direction and the angle of the camera rotation are estimated only from image sequences. At first, they are estimated by an optical flow. However, the optical flow may contain error. Therefore, the rotation angle is estimated between two adjacent frames by an exploratory way. Finally, the rotation angle is estimated between each frame and base frame. The detail of the estimation method is explained in (Yamashita et al., 2009).
3.2. Distortion correction
The distortion from the lens aberration of images is rectified. Let be the coordinate value without distortion, be the coordinate value with distortion (observed coordinate value), and be the parameter of the radial distortion, respectively (Weng et al., 1992). The distortion of the image is corrected by Equations (1) and (2).
3.3. Projective transformation
In the next step, the acquired t-th image (the image after rad camera rotation) is transformed by using the projective transformation. The coordinate value after the transformation is expressed as follows (Fig. 4):
where is the coordinate value of the t-th image before transformation, and is the image distance (the distance between the center of lens and the image plane), respectively.
The t-th image after the camera rotation is transformed to the image whose gaze direction is same with that before the camera rotation.
After the projective transformation, there are regions that have no texture in verge area of images (Black regions in Fig. 5(b)). Procedures mentioned below are not applied for these regions.
3.4. Cross-section of spatio-temporal image
Spatio-temporal image is obtained by arraying all the images in chronological order (Fig. 5(a)). In Fig. 5(a), is the horizontal axis that expresses , is the vertical axis that expresses , and is the depth axis that indicate the time (frame number t).
Here, let be the cross-section spatio-temporal image. In this case, .
In the cross-section spatio-temporal image , the trajectories of the static background scenery become vertical straight lines owing to the effect of the projective transformation. On the other hand, the trajectories of adherent noises in become curves whose shapes can be calculated by Equations (3) and (4). Note that the trajectory of an adherent noise in Fig. 5 (b) looks like a straight line, however, it is slightly-curved.
In this way, there is a difference between trajectories of static objects and those of adherent noises. This difference helps to detect noises.
4. Noise detection
4.1. Median image
Median values along time axis are calculated in the cross-section spatio-temporal image . After that, a median image can be generated by replacing the original pixel values by the median values (Fig. 6(a)).
Adherent noises are eliminated in , because these noises in are small in area as compared to the background scenery.
A clear image sequence can be obtained from by using the inverse transformation of Equations (3) and (4) if there is no moving object in the original image. However, if the original image contains moving objects, the textures of these objects blur owing to the effect of the median filtering. Therefore, the regions of adherent noises are detected explicitly, and image restoration is executed for the noise regions to generate a clear image sequence around the moving objects.
4.2. Difference Image
A difference between the cross-section spatio-temporal monochrome image and the median monochrome image is calculated for obtaining the difference image by Equation (5).
Pixel values in regions of where adherent noises exist become large, while pixel values of in the background regions are small (Fig. 6(b)).
4.3. Noise region image
The regions where the pixel values of the difference images are larger than a certain threshold are defined as the noise candidate regions. The judgment image is obtained by
The region of is defined as noise candidate regions (Fig. 6(c)). Note that an adherent noise does not exist on the same cross-section image when time increases, because -coordinate value of the adherent noise changes owing to the influence of the projective transformation in Equation (4). Therefore, we consider the influence of this change and generate in the way that the same adherent noise is on the same cross-section image.
In the next step, regions of adherent noises are detected by using . The trajectories of adherent noises are expressed by Equation (3). Therefore, the trajectory of each curve is tracked and the number of pixel where is equal to 1 is counted. If the total counted number is more than the threshold value , this curve is regarded as the noise region. As mentioned above, this tracking procedure is executed in 3-D space. This process can detect adherent noise regions precisely, even when there are moving objects in the original image sequence thanks to the probability voting (counting).
After detecting noise regions in all cross-section spatio-temporal image , the noise region image is generated by the inverse projective transformation from all information (Fig. 6(d)).
Ideally, the noise regions consist of adherent noises. However, the regions where adherent noises don't exist are extracted in this process because of other image noises. Therefore, the morphological operations (i.e., erosion and dilation) are executed for eliminating small noises.
5. Noise removal
Adherent noises are eliminated from the cross-section spatio-temporal image by using the image restoration technique (Bertalmio et al., 2003) for the noise regions detected in Section 4.
At first, a original image is decomposed into a structure image and a texture image (Rudin et al., 1992). Figure 7 shows an example of the structure image and the texture image. Note that contrast of the texture image (Fig. 7(c)) is fixed for the viewability.
After the decomposition, the image inpainting algorithm (Bertalmio et al., 2000) is applied for the noise regions of the structure image , and the texture synthesis algorithm (Efros & Leung, 1999) is applied for the noise regions of the texture image , respectively. This method (Bertalmio et al., 2000) overcomes the weak point that the original image inpainting technique (Bertalmio et al., 2000) has the poor reproducibility for a complicated texture. After that, noise-free image can be obtained by merging two images.
Finally, a clear image sequence without adherent noises is created with the inverse projective transformation.
Image sequence was acquired in a rainy day in the outdoor environment.
Figure 8(a) shows an example of the original image when the rotation speed of the camera is constant, and Fig. 8(b) shows the result of the projective transformation. In this experiment, the frame rate was 30fps, the image size was 360x240pixels, and the length of the movie was 100frames, respectively. We used a pan-tilt-zoom camera (Sony EVI-D100) whose image distance was calibrated as 261pixel. Parameters for the noise detection were set as ,
Figure 9 shows the intermediate result of the noise detection. Figures 9(a) and (b) show the cross-section spatio-temporal image in color and monochromic formats, respectively (red scanline in Fig. 8 (b), ). There is a moving object (a human with a red umbrella) in this image sequence. Figures 9(c), (d) and (e) show the median image , the difference image , and the judge image , respectively. Figure 10(a) shows the noise region image .
Figure 11 shows the noise removal result. Figures 11(a) and (b) show the structure image after applying the image inpainting algorithm and the texture image after applying the texture synthesis algorithm, respectively, while Fig. 11(c) shows the noise removal result of the cross-section spatio-temporal image.
Figure 12 shows the final results of noise removal for the image sequence. All waterdrops are eliminated and the moving object can be seen very clearly in all frames.
To verify the accuracy of the noise detection, Fig. 10(a) is compared with the ground truth that is generated by a human operator manually (Fig. 10(b)). Figure 10(c) shows the comparison results. In Fig. 10(c), red regions indicate the correct detection, blue regions mean undetected noises, and green regions are exceeded detection regions. Actually, undetected noises are hard to detect when we see the final result (Fig. 12(b)). This is because the image interpolation works well in the noise removal step.
Figure 13 shows comparison results of texture interpolation with an existing method. Figure 13(b) shows the result by the image inpainting technique (Bertalmio et al., 2000), and Fig. 13(c) shows the result by our method. The result by the existing method is not good (Fig. 13(b)), because texture of the noise region is estimated only from adjacent region. In principle, it is difficult to estimate texture in several cases from only a single image. On the other hand, our method can estimate texture robustly by using a spatio-temporal image processing (Fig. 13(c)).
Figure 16(a) shows an example of the original spatio-temporal image when the speed and the direction of the camera rotation is not constant, and Fig. 16(b) shows the result of the projective transformation, respectively.
Figure 17 shows the intermediate results of the noise removal. Figures 17(a) shows the cross-section spatio-temporal image . There are moving objects (walking men) in this image sequence. Figures 17(b), (c), (d), and (e) show the median image , the difference image , the judgment image , and the noise region image , respectively. Figure 17(f) shows the noise removal result from the cross-section spatio-temporal image.
Figure 18 shows the final results of noise removal for the image sequence. All noises are eliminated and the moving object can be seen very clearly in all frames.
From these results, it is verified that our method can remove adherent noises on the protecting glass of the camera regardless of their positions, colors, sizes, existence of moving objects, and the speed and the direction of the camera rotation.
In this paper, we propose a noise removal method from image sequence acquired with a pan-tilt camera. We makes a spatio-temporal image to extract the regions of adherent noises by examining differences of track slopes in cross section images between adherent noises and other objects. Regions of adherent noises are interpolated from the spatio-temporal image data. Experimental results show the effectiveness of our method.
As future works, the quality of the final result will be improved for interpolating noise regions in space. As to the camera motion, a camera translation should be considered in addition to a camera rotation (Haga et al., 1997). It is important to compare the performance of our method with recent space-time video completion methods (e.g., Matsushita et al., 2005, Shen et al., 2006, Wexler et al., 2007).