In this paper, we propose a noise removal method from image sequences by spatio-temporal image processing. A spatio-temporal image can be generated by merging the acquired image sequence (Fig. 1(a)), and then cross-section images can be extracted from the spatio-temporal image (Fig. 1(b)). In these cross-section images, we can detect moving objects and estimate the motion of objects by tracing trajectories of their edges or lines.
In recent years, cameras are widely used for surveillance systems in outdoor environments such as the traffic flow observation, the trespassers detection, and so on. It is also one of the fundamental sensors for outdoor robots. However, the qualities of images taken through cameras depend on environmental conditions. It is often the case that scenes taken by the cameras in outdoor environments are difficult to see because of adherent noises on the surface of the lens-protecting glass of the camera.
For example, waterdrops or mud blobs attached on the protecting glass may interrupt a field of view in rainy days (Fig. 2). It would be desirable to remove adherent noises from images of such scenes for surveillance systems and outdoor robots.
Professional photographers use lens hoods or put special water-repellent oil on lens to avoid this problem. Even in these cases, waterdrops are still attached on the lens. Cars are equipped with windscreen wipers to wipe rain from their windscreens. However, there is a problem that a part of the scenery is not in sight when a wiper crosses.
Therefore, this paper proposes a new noise removal method from images by using image processing techniques.
A lot of image interpolation or restoration techniques for damaged and occluded images have been also proposed in image processing and computer vision societies (Kokaram et al., 1995, Masnou & Morel, 1998, Joyeux et al., 1999, Bertalmio et al., 2000, Bertalmio et al., 2001, Kang et al., 2002, Bertalmio et al., 2003, Matsushita et al., 2005, Shen et al., 2006, Wexler et al., 2007). However, some of them can only treat with line-shape scratches (Kokaram et al., 1995, Masnou & Morel, 1998, Joyeux et al., 1999), because they are the techniques for restoring old damaged films. It is also required that human operators indicate the region of noises interactively (not automatically) (Bertalmio et al., 2000, Bertalmio et al., 2001, Kang et al., 2002, Bertalmio et al., 2003, Matsushita et al., 2005, Shen et al., 2006, Wexler et al., 2007). These methods are not suitable for surveillance systems and outdoor robots.
On the other hand, there are automatic methods that can remove noises without helps of human operators (Hase et al., 1999, Garg & Nayar, 2004). Hase et al. have proposed a real-time snowfall noise elimination method from moving pictures by using a special image processing hardware (Hase et al., 1999). Garg and Nayar have proposed an efficient algorithm for detecting and removing rain from videos based on a physics-based motion blur model that explains the photometry of rain (Garg & Nayar, 2004). These techniques work well under the assumptions that snow particles or raindrops are always falling. In other words, they can detect snow particles or raindrops because they move constantly.
However, adherent noises such as waterdrops on the surface of the lens-protecting glass may be stationary noises in the images. Therefore, it is difficult to apply these techniques to our problem because adherent noises that must be eliminated do not move in images.
Previous study (Yamashita et al., 2003) is based on the comparison of images that are taken with multiple cameras. However, it cannot be used for close scenes that have disparities between different viewpoints, because it is based on the difference between images.
Stereo camera systems are widely used for robot sensors, and they must of course observe both distant scenes and close scenes. Therefore, we have proposed a method that can remove waterdrops from stereo image pairs that contain objects both in a distant scene and in a close range scene (Tanaka et al., 2006). This method utilizes the information of corresponding points between stereo image pairs, and thereby sometimes cannot work well when appearance of waterdrops differs from each other between left and right images.
We have also proposed a noise removal method by using a single camera (Yamashita et al., 2004, Yamashita et al., 2005). These methods use a pan-tilt camera, and eliminate adherent noises based on the comparison of two images; a first image and a second image taken by a different camera angle (Fig. 3). However, adherent noises cannot be eliminated if a background object is blocked by a waterdrop in the first image and is also blocked by another waterdrop in the second image.
In this paper, we use not only two images at certain two frames but all of the image sequence to remove adherent noises in the image sequence. We generate a spatio-temporal image by merging the acquired image sequence, and then detect and remove adherent noises (Yamashita et al., 2008, Yamashita et al., 2009).
The composition of this paper is detailed below. In Section 2, we mention about outline of our method. In Section 3, the method of making a spatio-temporal image is explained. In Section 4 and Section 5, the noise detection and removal method are constructed, respectively. In Section 6, experimental results are shown and we discuss the effectiveness of our method. Finally, Section 7 describes conclusions and future works.
2. Overview of noise detection and removal method
As to adherent noises on the protecting glasses of the camera, the positions of noises in images do not change when the direction of the camera changes (Fig. 3). This is because adherent noises are attached to the surface of the protecting glass of the camera and move together with the camera. On the other hand, the position of static background scenery and that of moving objects change while the camera rotates.
We transform the image after the camera rotation to the image whose gaze direction (direction of the principal axis) is same with that before the camera rotation. Accordingly, we can obtain a new image in which only the positions of adherent noises and moving objects are different from the image before the camera rotates.
A spatio-temporal image is obtained by merging these transformed images. In the spatio-temporal image, trajectories of adherent noises can be calculated. Therefore, positions of noises can be also detected in the image sequence from the spatio-temporal image. Finally, we can obtain a noise-free image sequence by estimating textures on adherent noise regions.
3. Spatio-temporal image
3.1. Image acquisition
An image sequence is acquired while a pan-tilt camera rotates.
At first (frame 0), one image is acquired where the camera is fixed. In the next step (frame 1), another image is taken after the camera rotates
Note that the rotation angle
The direction and the angle of the camera rotation are estimated only from image sequences. At first, they are estimated by an optical flow. However, the optical flow may contain error. Therefore, the rotation angle is estimated between two adjacent frames by an exploratory way. Finally, the rotation angle is estimated between each frame and base frame. The detail of the estimation method is explained in (Yamashita et al., 2009).
3.2. Distortion correction
The distortion from the lens aberration of images is rectified. Let
3.3. Projective transformation
In the next step, the acquired
After the projective transformation, there are regions that have no texture in verge area of images (Black regions in Fig. 5(b)). Procedures mentioned below are not applied for these regions.
3.4. Cross-section of spatio-temporal image
In the cross-section spatio-temporal image
In this way, there is a difference between trajectories of static objects and those of adherent noises. This difference helps to detect noises.
4. Noise detection
4.1. Median image
Median values along time axis
Adherent noises are eliminated in
A clear image sequence can be obtained from
4.2. Difference Image
A difference between the cross-section spatio-temporal monochrome image and the median monochrome image is calculated for obtaining the difference image
Pixel values in regions of
4.3. Noise region image
The regions where the pixel values of the difference images are larger than a certain threshold
The region of
In the next step, regions of adherent noises are detected by using
After detecting noise regions in all cross-section spatio-temporal image
Ideally, the noise regions consist of adherent noises. However, the regions where adherent noises don't exist are extracted in this process because of other image noises. Therefore, the morphological operations (i.e., erosion and dilation) are executed for eliminating small noises.
5. Noise removal
Adherent noises are eliminated from the cross-section spatio-temporal image
At first, a original image
After the decomposition, the image inpainting algorithm (Bertalmio et al., 2000) is applied for the noise regions of the structure image
Finally, a clear image sequence without adherent noises is created with the inverse projective transformation.
Image sequence was acquired in a rainy day in the outdoor environment.
Figure 8(a) shows an example of the original image when the rotation speed of the camera is constant, and Fig. 8(b) shows the result of the projective transformation. In this experiment, the frame rate was 30fps, the image size was 360x240pixels, and the length of the movie was 100frames, respectively. We used a pan-tilt-zoom camera (Sony EVI-D100) whose image distance
Figure 9 shows the intermediate result of the noise detection. Figures 9(a) and (b) show the cross-section spatio-temporal image
Figure 11 shows the noise removal result. Figures 11(a) and (b) show the structure image after applying the image inpainting algorithm and the texture image after applying the texture synthesis algorithm, respectively, while Fig. 11(c) shows the noise removal result of the cross-section spatio-temporal image.
Figure 12 shows the final results of noise removal for the image sequence. All waterdrops are eliminated and the moving object can be seen very clearly in all frames.
To verify the accuracy of the noise detection, Fig. 10(a) is compared with the ground truth that is generated by a human operator manually (Fig. 10(b)). Figure 10(c) shows the comparison results. In Fig. 10(c), red regions indicate the correct detection, blue regions mean undetected noises, and green regions are exceeded detection regions. Actually, undetected noises are hard to detect when we see the final result (Fig. 12(b)). This is because the image interpolation works well in the noise removal step.
Figure 13 shows comparison results of texture interpolation with an existing method. Figure 13(b) shows the result by the image inpainting technique (Bertalmio et al., 2000), and Fig. 13(c) shows the result by our method. The result by the existing method is not good (Fig. 13(b)), because texture of the noise region is estimated only from adjacent region. In principle, it is difficult to estimate texture in several cases from only a single image. On the other hand, our method can estimate texture robustly by using a spatio-temporal image processing (Fig. 13(c)).
Figure 16(a) shows an example of the original spatio-temporal image when the speed and the direction of the camera rotation is not constant, and Fig. 16(b) shows the result of the projective transformation, respectively.
Figure 17 shows the intermediate results of the noise removal. Figures 17(a) shows the cross-section spatio-temporal image
Figure 18 shows the final results of noise removal for the image sequence. All noises are eliminated and the moving object can be seen very clearly in all frames.
From these results, it is verified that our method can remove adherent noises on the protecting glass of the camera regardless of their positions, colors, sizes, existence of moving objects, and the speed and the direction of the camera rotation.
In this paper, we propose a noise removal method from image sequence acquired with a pan-tilt camera. We makes a spatio-temporal image to extract the regions of adherent noises by examining differences of track slopes in cross section images between adherent noises and other objects. Regions of adherent noises are interpolated from the spatio-temporal image data. Experimental results show the effectiveness of our method.
As future works, the quality of the final result will be improved for interpolating noise regions in
This research was partially supported by Special Project for Earthquake Disaster Mitigation in Urban Areas in cooperation with International Rescue System Institute (IRS) and National Research Institute for Earth Science and Disaster Prevention (NIED).
Bertalmio M. Bertozzi A. L. Sapiro G. 2001Navier-Stokes, Fluid Dynamics, and Image and Video Inpainting, (CVPR2001), 1 355 362, 2001.
Bertalmio M. Sapiro G. Caselles V. Ballester C. 2000Image Inpainting, 417 424, 2000.
Bertalmio M. Vese L. Sapiro G. Osher S. 2003Simultaneous Structure and Texture Image Inpainting, , 12 8 882 889, 2003.
Efros A. A. Leung T. K. 1999Texture Synthesis by Non-parametric Sampling, (ICCV1999), 2 1033 1038, 1999.
Garg K. Nayar S. K. 2004Detection and Removal of Rain from Videos, (CVPR2004), 1 528 535, 2004.
Haga T. Sumi K. Hashimoto M. Seki A. Kuroda S. 1997Monitoring System with Depth Based Object Emphasis Using Spatiotemporal Image Processing, (PRMU97-126), 97 325 41 46, 1997.
Hase H. Miyake K. Yoneda M. 1999Real-time Snowfall Noise Elimination, (ICIP1999), 2 406 409, 1999.
Joyeux L. Buisson O. Besserer B. Boukir S. 1999Detection and Removal of Line Scratches in Motion Picture Films, (CVPR1999), 548 553, 1999.
Kang S. H. Chan T. F. Soatto S. 2002Inpainting from Multiple Views, , 622 625, 2002.
Kokaram A. C. Morris R. D. Fitzgerald W. J. Rayner P. J. W. 1995Interpolation of Missing Data in Image Sequences, , 4 11 1509 1519, 1995.
Masnou S. Morel J. M. 1998Level Lines Based Disocclusion, (ICIP1998), 259 263, 1998.
Matsushita Y. Ofek E. Tang X. Shum H. Y. 2005Full-frame Video Stabilization, (CVPR2005), 1 50 57, 2005.
Rudin L. I. Osher S. Fatemi E. 1992Nonlinear Total Variation Based Noise Removal Algorithms, , 60 259 268, 1992.
Shen Y. Lu F. Cao X. Foroosh H. 2006Video Completion for Perspective Camera Under Constrained Motion, (ICPR2006), 3 63 66, 2006.
Tanaka Y. Yamashita A. Kaneko T. Miura K. T. 2006Removal of Adherent Waterdrops from Images Acquired with a Stereo Camera System, , 89-D, 7 2021 2027, 2006.
Weng J. Cohen P. Herniou M. 1992Camera Calibration with Distortion Models and Accuracy Evaluation, , 14 10 965 980, 1992.
Wexler Y. Shechtman E. Irani M. 2007Space-Time Completion of Video, , 29 3 463 476, 2007.
Yamashita A. Fukuchi I. Kaneko T. 2009Noises Removal from Image Sequences Acquired with Moving Camera by Estimating Camera Motion from Spatio-Temporal Information, (IROS2009), 2009.
Yamashita A. Harada T. Kaneko T. Miura K. T. 2005Virtual Wiper-Removal of Adherent Noises from Images of Dynamic Scenes by Using a Pan-Tilt Camera-, , 19 3 295 310, 2005.
Yamashita A. Harada T. Kaneko T. Miura K. T. 2008Removal of Adherent Noises from Image Sequences by Spatio-Temporal Image Processing, (ICRA2008), 2386 2391, 2008.
Yamashita A. Kaneko T. Miura K. T. 2004A Virtual Wiper-Restoration of Deteriorated Images by Using a Pan-Tilt Camera-, (ICRA2004), 4724 4729, 2004.
Yamashita A. Kuramoto M. Kaneko T. Miura K. T. 2003A Virtual Wiper-Restoration of Deteriorated Images by Using Multiple Cameras-, (IROS2003), 3126 3131, 2003.