Open access

Removal of Adherent Noises from Image Sequences by Spatio-Temporal Image Processing

Written By

Atsushi Yamashita, Isao Fukuchi and Toru Kaneko

Published: 01 December 2009

DOI: 10.5772/7045

From the Edited Volume

Image Processing

Edited by Yung-Sheng Chen

Chapter metrics overview

3,366 Chapter Downloads

View Full Metrics

1. Introduction

In this paper, we propose a noise removal method from image sequences by spatio-temporal image processing. A spatio-temporal image can be generated by merging the acquired image sequence (Fig. 1(a)), and then cross-section images can be extracted from the spatio-temporal image (Fig. 1(b)). In these cross-section images, we can detect moving objects and estimate the motion of objects by tracing trajectories of their edges or lines.

In recent years, cameras are widely used for surveillance systems in outdoor environments such as the traffic flow observation, the trespassers detection, and so on. It is also one of the fundamental sensors for outdoor robots. However, the qualities of images taken through cameras depend on environmental conditions. It is often the case that scenes taken by the cameras in outdoor environments are difficult to see because of adherent noises on the surface of the lens-protecting glass of the camera.

For example, waterdrops or mud blobs attached on the protecting glass may interrupt a field of view in rainy days (Fig. 2). It would be desirable to remove adherent noises from images of such scenes for surveillance systems and outdoor robots.

Figure 1.

Spatio-temporal image.

Figure 2.

Example of adherent noise.

Professional photographers use lens hoods or put special water-repellent oil on lens to avoid this problem. Even in these cases, waterdrops are still attached on the lens. Cars are equipped with windscreen wipers to wipe rain from their windscreens. However, there is a problem that a part of the scenery is not in sight when a wiper crosses.

Therefore, this paper proposes a new noise removal method from images by using image processing techniques.

A lot of image interpolation or restoration techniques for damaged and occluded images have been also proposed in image processing and computer vision societies (Kokaram et al., 1995, Masnou & Morel, 1998, Joyeux et al., 1999, Bertalmio et al., 2000, Bertalmio et al., 2001, Kang et al., 2002, Bertalmio et al., 2003, Matsushita et al., 2005, Shen et al., 2006, Wexler et al., 2007). However, some of them can only treat with line-shape scratches (Kokaram et al., 1995, Masnou & Morel, 1998, Joyeux et al., 1999), because they are the techniques for restoring old damaged films. It is also required that human operators indicate the region of noises interactively (not automatically) (Bertalmio et al., 2000, Bertalmio et al., 2001, Kang et al., 2002, Bertalmio et al., 2003, Matsushita et al., 2005, Shen et al., 2006, Wexler et al., 2007). These methods are not suitable for surveillance systems and outdoor robots.

On the other hand, there are automatic methods that can remove noises without helps of human operators (Hase et al., 1999, Garg & Nayar, 2004). Hase et al. have proposed a real-time snowfall noise elimination method from moving pictures by using a special image processing hardware (Hase et al., 1999). Garg and Nayar have proposed an efficient algorithm for detecting and removing rain from videos based on a physics-based motion blur model that explains the photometry of rain (Garg & Nayar, 2004). These techniques work well under the assumptions that snow particles or raindrops are always falling. In other words, they can detect snow particles or raindrops because they move constantly.

However, adherent noises such as waterdrops on the surface of the lens-protecting glass may be stationary noises in the images. Therefore, it is difficult to apply these techniques to our problem because adherent noises that must be eliminated do not move in images.

To solve the static noise problem, we have proposed the method that can remove view-disturbing noises from images taken with multiple cameras (Yamashita et al., 2003, Tanaka et al., 2006).

Previous study (Yamashita et al., 2003) is based on the comparison of images that are taken with multiple cameras. However, it cannot be used for close scenes that have disparities between different viewpoints, because it is based on the difference between images.

Figure 3.

Image acquisition by using camera rotation.

Stereo camera systems are widely used for robot sensors, and they must of course observe both distant scenes and close scenes. Therefore, we have proposed a method that can remove waterdrops from stereo image pairs that contain objects both in a distant scene and in a close range scene (Tanaka et al., 2006). This method utilizes the information of corresponding points between stereo image pairs, and thereby sometimes cannot work well when appearance of waterdrops differs from each other between left and right images.

We have also proposed a noise removal method by using a single camera (Yamashita et al., 2004, Yamashita et al., 2005). These methods use a pan-tilt camera, and eliminate adherent noises based on the comparison of two images; a first image and a second image taken by a different camera angle (Fig. 3). However, adherent noises cannot be eliminated if a background object is blocked by a waterdrop in the first image and is also blocked by another waterdrop in the second image.

In this paper, we use not only two images at certain two frames but all of the image sequence to remove adherent noises in the image sequence. We generate a spatio-temporal image by merging the acquired image sequence, and then detect and remove adherent noises (Yamashita et al., 2008, Yamashita et al., 2009).

The composition of this paper is detailed below. In Section 2, we mention about outline of our method. In Section 3, the method of making a spatio-temporal image is explained. In Section 4 and Section 5, the noise detection and removal method are constructed, respectively. In Section 6, experimental results are shown and we discuss the effectiveness of our method. Finally, Section 7 describes conclusions and future works.

Advertisement

2. Overview of noise detection and removal method

As to adherent noises on the protecting glasses of the camera, the positions of noises in images do not change when the direction of the camera changes (Fig. 3). This is because adherent noises are attached to the surface of the protecting glass of the camera and move together with the camera. On the other hand, the position of static background scenery and that of moving objects change while the camera rotates.

We transform the image after the camera rotation to the image whose gaze direction (direction of the principal axis) is same with that before the camera rotation. Accordingly, we can obtain a new image in which only the positions of adherent noises and moving objects are different from the image before the camera rotates.

A spatio-temporal image is obtained by merging these transformed images. In the spatio-temporal image, trajectories of adherent noises can be calculated. Therefore, positions of noises can be also detected in the image sequence from the spatio-temporal image. Finally, we can obtain a noise-free image sequence by estimating textures on adherent noise regions.

Advertisement

3. Spatio-temporal image

3.1. Image acquisition

An image sequence is acquired while a pan-tilt camera rotates.

At first (frame 0), one image is acquired where the camera is fixed. In the next step (frame 1), another image is taken after the camera rotates θ 1 rad about the axis which is perpendicular to the ground and passes along the center of the lens. In the t-th step (frame t), the camera rotate θ t rad and the t-th image is taken. To repeat this procedure n times, we can acquire n/30 second movie if we use a 30fps camera.

Note that the rotation angle θ t makes a positive direction a counterclockwise rotation (the direction of Fig. 3).

The direction and the angle of the camera rotation are estimated only from image sequences. At first, they are estimated by an optical flow. However, the optical flow may contain error. Therefore, the rotation angle is estimated between two adjacent frames by an exploratory way. Finally, the rotation angle is estimated between each frame and base frame. The detail of the estimation method is explained in (Yamashita et al., 2009).

3.2. Distortion correction

The distortion from the lens aberration of images is rectified. Let ( u ˜ , v ˜ ) be the coordinate value without distortion, ( u 0 , v 0 ) be the coordinate value with distortion (observed coordinate value), and κ 1 be the parameter of the radial distortion, respectively (Weng et al., 1992). The distortion of the image is corrected by Equations (1) and (2).

u 0 = u ˜ + κ 1 u ˜ ( u ˜ 2 + v ˜ 2 ) E1
v 0 = v ˜ + κ 1 v ˜ ( u ˜ 2 + v ˜ 2 ) E2

3.3. Projective transformation

In the next step, the acquired t-th image (the image after θ t rad camera rotation) is transformed by using the projective transformation. The coordinate value after the transformation ( u t , v t ) is expressed as follows (Fig. 4):

u t = f f tan θ t + u ˜ t f u ˜ t tan θ t E3
v t = f 1 + f tan 2 θ t f u ˜ t tan θ t v ˜ t E4

where ( u ˜ t , v ˜ t ) is the coordinate value of the t-th image before transformation, and f is the image distance (the distance between the center of lens and the image plane), respectively.

The t-th image after the camera rotation is transformed to the image whose gaze direction is same with that before the camera rotation.

After the projective transformation, there are regions that have no texture in verge area of images (Black regions in Fig. 5(b)). Procedures mentioned below are not applied for these regions.

Figure 4.

Projective transformation.

Figure 5.

Spatio-temporal image.

3.4. Cross-section of spatio-temporal image

Spatio-temporal image I ( u , v , t ) is obtained by arraying all the images ( u t , v t ) in chronological order (Fig. 5(a)). In Fig. 5(a), u is the horizontal axis that expresses u t , v is the vertical axis that expresses v t , and t is the depth axis that indicate the time (frame number t).

We can clip a cross-section image of I ( u , v , t ) . For example, Fig. 5(b) shows the cross-section image of the spatio-temporal image in Fig. 5(a) along v = v 1 .

Here, let S ( u , t ) be the cross-section spatio-temporal image. In this case, S ( u , t ) = I ( u , v 1 , t ) .

In the cross-section spatio-temporal image S ( u , t ) , the trajectories of the static background scenery become vertical straight lines owing to the effect of the projective transformation. On the other hand, the trajectories of adherent noises in S ( u , t ) become curves whose shapes can be calculated by Equations (3) and (4). Note that the trajectory of an adherent noise in Fig. 5 (b) looks like a straight line, however, it is slightly-curved.

In this way, there is a difference between trajectories of static objects and those of adherent noises. This difference helps to detect noises.

Advertisement

4. Noise detection

4.1. Median image

Median values along time axis t are calculated in the cross-section spatio-temporal image S ( u , t ) . After that, a median image M ( u , t ) can be generated by replacing the original pixel values by the median values (Fig. 6(a)).

Adherent noises are eliminated in M ( u , t ) , because these noises in S ( u , t ) are small in area as compared to the background scenery.

A clear image sequence can be obtained from M ( u , t ) by using the inverse transformation of Equations (3) and (4) if there is no moving object in the original image. However, if the original image contains moving objects, the textures of these objects blur owing to the effect of the median filtering. Therefore, the regions of adherent noises are detected explicitly, and image restoration is executed for the noise regions to generate a clear image sequence around the moving objects.

Figure 6.

Noise detection.

4.2. Difference Image

A difference between the cross-section spatio-temporal monochrome image and the median monochrome image is calculated for obtaining the difference image S ( u , t ) by Equation (5).

Pixel values in regions of D ( u , t ) where adherent noises exist become large, while pixel values of D ( u , t ) in the background regions are small (Fig. 6(b)).

D ( u , t ) = | S ( u , t ) M ( u , t ) | E5

4.3. Noise region image

The regions where the pixel values of the difference images are larger than a certain threshold T b are defined as the noise candidate regions. The judgment image H ( u , t ) is obtained by

H ( u , t ) = { 0,         D ( u , t ) T b 1,         D ( u , t ) T b E6

The region of H ( u , t ) = 1 is defined as noise candidate regions (Fig. 6(c)). Note that an adherent noise does not exist on the same cross-section image when time t increases, because v -coordinate value of the adherent noise changes owing to the influence of the projective transformation in Equation (4). Therefore, we consider the influence of this change and generate H ( u , t ) in the way that the same adherent noise is on the same cross-section image.

In the next step, regions of adherent noises are detected by using H ( u , t ) . The trajectories of adherent noises are expressed by Equation (3). Therefore, the trajectory of each curve is tracked and the number of pixel where H ( u , t ) is equal to 1 is counted. If the total counted number is more than the threshold value T n , this curve is regarded as the noise region. As mentioned above, this tracking procedure is executed in 3-D ( u , v , t ) space. This process can detect adherent noise regions precisely, even when there are moving objects in the original image sequence thanks to the probability voting (counting).

After detecting noise regions in all cross-section spatio-temporal image S ( u , t ) , the noise region image R ( u ˜ , v ˜ ) is generated by the inverse projective transformation from all H ( u , t ) information (Fig. 6(d)).

Ideally, the noise regions consist of adherent noises. However, the regions where adherent noises don't exist are extracted in this process because of other image noises. Therefore, the morphological operations (i.e., erosion and dilation) are executed for eliminating small noises.

Advertisement

5. Noise removal

Adherent noises are eliminated from the cross-section spatio-temporal image S ( u , t ) by using the image restoration technique (Bertalmio et al., 2003) for the noise regions detected in Section 4.

At first, a original image S ( u , t ) is decomposed into a structure image f ( u , t ) and a texture image g ( u , t ) (Rudin et al., 1992). Figure 7 shows an example of the structure image and the texture image. Note that contrast of the texture image (Fig. 7(c)) is fixed for the viewability.

After the decomposition, the image inpainting algorithm (Bertalmio et al., 2000) is applied for the noise regions of the structure image f ( u , t ) , and the texture synthesis algorithm (Efros & Leung, 1999) is applied for the noise regions of the texture image g ( u , t ) , respectively. This method (Bertalmio et al., 2000) overcomes the weak point that the original image inpainting technique (Bertalmio et al., 2000) has the poor reproducibility for a complicated texture. After that, noise-free image can be obtained by merging two images.

Finally, a clear image sequence without adherent noises is created with the inverse projective transformation.

Figure 7.

Image decomposition.

Advertisement

6. Experiment

Image sequence was acquired in a rainy day in the outdoor environment.

Figure 8(a) shows an example of the original image when the rotation speed of the camera is constant, and Fig. 8(b) shows the result of the projective transformation. In this experiment, the frame rate was 30fps, the image size was 360x240pixels, and the length of the movie was 100frames, respectively. We used a pan-tilt-zoom camera (Sony EVI-D100) whose image distance f was calibrated as 261pixel. Parameters for the noise detection were set as T b = 50 ,

T n = 10 E7

Figure 9 shows the intermediate result of the noise detection. Figures 9(a) and (b) show the cross-section spatio-temporal image S ( u , t ) in color and monochromic formats, respectively (red scanline in Fig. 8 (b), v = 150 ). There is a moving object (a human with a red umbrella) in this image sequence. Figures 9(c), (d) and (e) show the median image M ( u , t ) , the difference image D ( u , t ) , and the judge image H ( u , t ) , respectively. Figure 10(a) shows the noise region image R ( u ˜ , v ˜ ) .

Figure 11 shows the noise removal result. Figures 11(a) and (b) show the structure image after applying the image inpainting algorithm and the texture image after applying the texture synthesis algorithm, respectively, while Fig. 11(c) shows the noise removal result of the cross-section spatio-temporal image.

Figure 12 shows the final results of noise removal for the image sequence. All waterdrops are eliminated and the moving object can be seen very clearly in all frames.

Figure 8.

Acquired image.

Figure 9.

Results of noise detection.

Figure 10.

Noise region image R ( u ˜ , v ˜ ) .

Figure 11.

Results of noise removal.

To verify the accuracy of the noise detection, Fig. 10(a) is compared with the ground truth that is generated by a human operator manually (Fig. 10(b)). Figure 10(c) shows the comparison results. In Fig. 10(c), red regions indicate the correct detection, blue regions mean undetected noises, and green regions are exceeded detection regions. Actually, undetected noises are hard to detect when we see the final result (Fig. 12(b)). This is because the image interpolation works well in the noise removal step.

Figure 12.

Results of noise removal (waterdrop).

Figure 13 shows comparison results of texture interpolation with an existing method. Figure 13(b) shows the result by the image inpainting technique (Bertalmio et al., 2000), and Fig. 13(c) shows the result by our method. The result by the existing method is not good (Fig. 13(b)), because texture of the noise region is estimated only from adjacent region. In principle, it is difficult to estimate texture in several cases from only a single image. On the other hand, our method can estimate texture robustly by using a spatio-temporal image processing (Fig. 13(c)).

Figure 14 shows results of mud blob removal, and Fig. 15 shows results of waterdrop and mud blob removal, respectively.

Figure 16(a) shows an example of the original spatio-temporal image when the speed and the direction of the camera rotation is not constant, and Fig. 16(b) shows the result of the projective transformation, respectively.

Figure 17 shows the intermediate results of the noise removal. Figures 17(a) shows the cross-section spatio-temporal image S ( u , t ) . There are moving objects (walking men) in this image sequence. Figures 17(b), (c), (d), and (e) show the median image M ( u , t ) , the difference image D ( u , t ) , the judgment image H ( u , t ) , and the noise region image R ( u ˜ , v ˜ ) , respectively. Figure 17(f) shows the noise removal result from the cross-section spatio-temporal image.

Figure 18 shows the final results of noise removal for the image sequence. All noises are eliminated and the moving object can be seen very clearly in all frames.

From these results, it is verified that our method can remove adherent noises on the protecting glass of the camera regardless of their positions, colors, sizes, existence of moving objects, and the speed and the direction of the camera rotation.

Advertisement

7. Conclusion

In this paper, we propose a noise removal method from image sequence acquired with a pan-tilt camera. We makes a spatio-temporal image to extract the regions of adherent noises by examining differences of track slopes in cross section images between adherent noises and other objects. Regions of adherent noises are interpolated from the spatio-temporal image data. Experimental results show the effectiveness of our method.

As future works, the quality of the final result will be improved for interpolating noise regions in S ( u , v , t ) space. As to the camera motion, a camera translation should be considered in addition to a camera rotation (Haga et al., 1997). It is important to compare the performance of our method with recent space-time video completion methods (e.g., Matsushita et al., 2005, Shen et al., 2006, Wexler et al., 2007).

Figure 13.

Comparison of noise removal results.

Figure 14.

Results of noise removal (mud blob).

Figure 15.

Results of noise removal (waterdrop and mud blob).

Figure 16.

Image sequence when the rotation speed of the camera is not constant.

Figure 17.

Result of noise detection and removal (camera motion is not constant).

Figure 18.

Results of noise removal (camera motion is not constant).

Advertisement

Acknowledgments

This research was partially supported by Special Project for Earthquake Disaster Mitigation in Urban Areas in cooperation with International Rescue System Institute (IRS) and National Research Institute for Earth Science and Disaster Prevention (NIED).

References

  1. 1. Bertalmio M. Bertozzi A. L. Sapiro G. 2001 Navier-Stokes, Fluid Dynamics, and Image and Video Inpainting, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR2001), 1 355 362 , 2001.
  2. 2. Bertalmio M. Sapiro G. Caselles V. Ballester C. 2000 Image Inpainting, ACM Transactions on Computer Graphics(Proceedings of SIGGRAPH2000), 417 424 , 2000.
  3. 3. Bertalmio M. Vese L. Sapiro G. Osher S. 2003 Simultaneous Structure and Texture Image Inpainting, IEEE Transactions on Image Processing, 12 8 882 889 , 2003.
  4. 4. Efros A. A. Leung T. K. 1999 Texture Synthesis by Non-parametric Sampling, Proceedings of the 7th IEEE International Conference on Computer Vision (ICCV1999), 2 1033 1038 , 1999.
  5. 5. Garg K. Nayar S. K. 2004 Detection and Removal of Rain from Videos, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR2004), 1 528 535 , 2004.
  6. 6. Haga T. Sumi K. Hashimoto M. Seki A. Kuroda S. 1997 Monitoring System with Depth Based Object Emphasis Using Spatiotemporal Image Processing, Technical Report of IEICE (PRMU97-126), 97 325 41 46 , 1997.
  7. 7. Hase H. Miyake K. Yoneda M. 1999 Real-time Snowfall Noise Elimination, Proceedings of the 1999 IEEE International Conference on Image Processing (ICIP1999), 2 406 409 , 1999.
  8. 8. Joyeux L. Buisson O. Besserer B. Boukir S. 1999 Detection and Removal of Line Scratches in Motion Picture Films, Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR1999), 548 553 , 1999.
  9. 9. Kang S. H. Chan T. F. Soatto S. 2002 Inpainting from Multiple Views, Proceedings of the 1st International Symposium on 3D Data Processing Visualization and Transmission, 622 625 , 2002.
  10. 10. Kokaram A. C. Morris R. D. Fitzgerald W. J. Rayner P. J. W. 1995 Interpolation of Missing Data in Image Sequences, IEEE Transactions on Image Processing, 4 11 1509 1519 , 1995.
  11. 11. Masnou S. Morel J. M. 1998 Level Lines Based Disocclusion, Proceedings of the 5th IEEE International Conference on Image Processing (ICIP1998), 259 263 , 1998.
  12. 12. Matsushita Y. Ofek E. Tang X. Shum H. Y. 2005 Full-frame Video Stabilization, Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR2005), 1 50 57 , 2005.
  13. 13. Rudin L. I. Osher S. Fatemi E. 1992 Nonlinear Total Variation Based Noise Removal Algorithms, Physica D, 60 259 268 , 1992.
  14. 14. Shen Y. Lu F. Cao X. Foroosh H. 2006 Video Completion for Perspective Camera Under Constrained Motion, Proceedings of the 18th International Conference on Pattern Recognition (ICPR2006), 3 63 66 , 2006.
  15. 15. Tanaka Y. Yamashita A. Kaneko T. Miura K. T. 2006 Removal of Adherent Waterdrops from Images Acquired with a Stereo Camera System, IEICE Transactions on Information and Systems, 89-D , 7 2021 2027 , 2006.
  16. 16. Weng J. Cohen P. Herniou M. 1992 Camera Calibration with Distortion Models and Accuracy Evaluation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 14 10 965 980 , 1992.
  17. 17. Wexler Y. Shechtman E. Irani M. 2007 Space-Time Completion of Video, IEEE Transactions on Pattern Analysis and Machine Intelligence, 29 3 463 476 , 2007.
  18. 18. Yamashita A. Fukuchi I. Kaneko T. 2009 Noises Removal from Image Sequences Acquired with Moving Camera by Estimating Camera Motion from Spatio-Temporal Information, Proceedings of the 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS2009), 2009.
  19. 19. Yamashita A. Harada T. Kaneko T. Miura K. T. 2005 Virtual Wiper-Removal of Adherent Noises from Images of Dynamic Scenes by Using a Pan-Tilt Camera-, Advanced Robotics, 19 3 295 310 , 2005.
  20. 20. Yamashita A. Harada T. Kaneko T. Miura K. T. 2008 Removal of Adherent Noises from Image Sequences by Spatio-Temporal Image Processing, Proceedings of the 2008 IEEE International Conference on Robotics and Automation (ICRA2008), 2386 2391 , 2008.
  21. 21. Yamashita A. Kaneko T. Miura K. T. 2004 A Virtual Wiper-Restoration of Deteriorated Images by Using a Pan-Tilt Camera-, Proceedings of the 2004 IEEE International Conference on Robotics and Automation (ICRA2004), 4724 4729 , 2004.
  22. 22. Yamashita A. Kuramoto M. Kaneko T. Miura K. T. 2003 A Virtual Wiper-Restoration of Deteriorated Images by Using Multiple Cameras-, Proceedings of the 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS2003), 3126 3131 , 2003.

Written By

Atsushi Yamashita, Isao Fukuchi and Toru Kaneko

Published: 01 December 2009