Wavelet-based Moving Object Segmentation: From Scalar Wavelets to Dual-tree Complex Filter Banks

In this chapter we explain wavelet based moving object detection and segmentation in video frames. Starting from discrete wavelet transform (DWT), we show the recent developments employing multi-wavelets (MW), and later on we switch to the use of dual-tree complex wavelet transform (DT-CWT). Working on a video instead of an image requires more attention, and at the same time, proposes novel approaches such as the so-called non-separable 3D oriented dual-tree complex wavelet transform. A comprehensive comparison shows the advantages and disadvantages of current wavelet based techniques used in image/video segmentation. The segmentation and tracking of moving objects in video are important tasks in many applications. These tasks make it possible for video coding standards such as MPEG4 Sikora (1997), which provides content based functionalities through the concept of video object plane while employing ideas like content based scalability, as well as separate and flexible reconstruction and manipulation of contents Kim & Hwang (2002). The video surveillance systems designed for security applications need to track and furthermore distinguish intruding objects. All these applications require algorithms to detect, segment and track moving objects so that further high level processing can be performed. The approaches for moving object segmentation can be categorized into several groups. In fact, video segmentation algorithms can be categorized into four general subgroups; segmentation based on motion information only, segmentation based on motion and spatial information, segmentation based on change detection, and segmentation based on edge detection Kim & Hwang (2002). Due to the lack of spatial information motion segmentation techniques, which are closely related to motion estimation, suffer from occlusion and aperture problem. This limits the accuracy of the boundaries of segmented objects. The algorithms in the second group are suggested to improve some of the weak points of the first group. Spatial information is blended with motion information to make algorithms more stable in extraction of object boundaries. However, these techniques are not suitable for content based applications as they are not necessarily characterized by similar intensity, color, or motion. The algorithms in the third group start with the gray value difference image between two consecutive frames, and then a decision rule is applied on the absolute difference in order to identify moving areas. If the moving objects are not sufficiently textured, only the occlusion areas are marked as changed and interior of the objects remain unchanged. Therefore, the objects that stop moving for a certain period of time will be lost. The algorithms in the last category address video sequences as the 8

sequence of edge maps rather than gray-level images. The method is less sensitive to illumination changes and since binary information is used fewer computations are required. Several relevant references are given in Baradarani et al. (2006) for details. Change detection for inter-frame differences is one of the most feasible solutions Kim & Hwang (2002), since it enables automatic detection of new appearance. Huang and Hsieh proposed a wavelet based technique for moving object detection Huang & Hsieh (2003). Using the single change detection (SCD) method in the wavelet domain they showed that their method gives better results than Kim's method Kim & Hwang (2002). SCD is typically based on considering two consecutive frames in a video sequence. Huang and Hsieh refined their method in a more recent paper Huang et al. (2004) by introducing a double change detection (DCD) approach which improves the number of detected edge points. Double change detection overcomes the double edge problem inherent in change detection techniques. The problem of moving object segmentation resembles that of a denoising problem where multiwavelets Strela et al. (1999) and DT- CWT Sendur & Selesnick (2002a), Sendur & Selesnick (2002b), Baradarani & Yu (2007) have been shown to offer a good solution to the denoising problems Strela et al. (1999). One can consider the moving part of a video frame as the desired image and the background as noise (almost). Thus, denoising techniques can be employed to aid the successful extraction of moving object edges from noisy backgrounds in video frames. The work of Strela et al. (1999) has revealed that multi-wavelets, which introduce redundancy through the repeated row pre-processing, offer a better solution to denoising problems whenever compared with the scalar wavelet transform. Sendur and Selesnick proposed a bivariate shrinkage function and introduced an important denoising algorithm based on DT-CWT in Sendur & Selesnick (2002a), Sendur & Selesnick (2002b). Motivated by these facts, in Baradarani et al. (2006), we developed a multi-wavelet based method for moving object detection and segmentation. It is well known that over sampled representation is a useful tool for feature extraction Strela et al. (1999) where the repeated row pre-processing technique in multi-wavelet transform is an over sampled data representation. The method is based on determining the change detection mask in the multi-wavelet domain. In view of the success of the mentioned techniques and the celebrated achievements of DT-CWT in signal/image processing, we proposed a DT-CWT based method for moving object detection and segmentation Baradarani & Wu (2008). The method is based on determining the change detection mask in the complex wavelet domain. The approach is enriched by using DCD and bivariate shrinkage denoising.

Structure of a Two-channel Wavelet Filter Bank
A typical two-channel filter bank with the decomposition (analysis) and reconstruction (synthesis) stages is shown in Fig. 1. Let H 0 (ω) = ∑ n h 0 [n]e −jnω be the lowpass and H 1 (ω) = ∑ n h 1 [n]e −jnω the highpass discrete-time filters in the analysis and synthesis side of the filter bank, respectively. The scaling and wavelet functions associated with the analysis side of the filter bank are defined by the two-scale iterative equations Fig. 1. Two-channel analysis/synthesis filter bank.
where n is an integer. The lowpass and highpass filters, F 0 (ω) and F 1 (ω), in the synthesis side define the scaling and wavelet functions similarly in terms of the filter coefficients f 0 [n] and f 1 [n], respectively. Recall that the scaling function φ f and wavelet function ψ f in the synthesis side of the filter bank are similarly defined via f 0 , f 1 , φ f , and ψ f . A 'biorthogonal' filter bank constitutes a perfect reconstruction filter bank if and only if its filters satisfy the no-distortion condition for some integer n d ≥ 0, and the no-aliasing condition The above no-aliasing condition is automatically satisfied if Note that in an 'orthogonal' design, the conjugate quadrature filter, e.g., G(z), can be obtained from the spectral factorization of a product filter P(z), where P(z) = G(z)G(z −1 ) and the product filter satisfies the halfband condition P(z) + P(−z) = 1, and the nonnegativity of the frequency response P(e jω ) ≥ 0 for all ω ∈ R. Thus, H 0 (z) = G(z), H 1 (z) = z −1 G(−z −1 ), F 0 (z) = G(z −1 ), and F 1 (z) = zG(−z) and an orthogonal wavelet with the specified number of vanishing moments can then be obtained from the CQF Daubechies (1992).

The 9/7-10/8 Dual-tree Complex Filter Bank
In this Section we introduce our recently designed dual-tree complex filter bank Yu & Baradarani (2008). Consider the two-channel dual-tree complex implementation of the DT-CWT. The primal filter bank B in each level defines the real part of the wavelet transform. The dual filter bank B defines the imaginary part. Recall that the scaling and wavelet functions associated with the analysis side of B are defined by the two-scale equations as in (1) and (2), respectively. The scaling function φ f and wavelet function ψ f in the synthesis side of B are similarly defined via f 0 and f 1 . The same is true for the scaling functions ( φ h and φ f ) and wavelet functions ( ψ h and ψ f ) of the dual filter bank B.
The dual-tree filter bank defines analytic complex wavelets ψ h + j ψ h and ψ f + jψ f , if the wavelet functions of the two filter banks form Hilbert transform pairs. Specifically, the analysis wavelet ψ h (t) of B is the Hilbert transform of the analysis wavelet ψ h (t) of B, and the synthesis wavelet , and Ψ f (ω) are the Fourier transforms of wavelet functions ψ h (t), ψ f (t), ψ h (t), and ψ f (t) respectively, and sign represents the signum function. This introduces limited redundancy and allows the transform to provide approximate shift-invariance and more directionality of filters Kingsbury (2006), Selesnick et al. (2005), preserving the usual properties of perfect reconstruction and computational efficiency with good frequency responses. It should be noted that these properties Theorem Yu & Ozkaramanli (2005): Consider filter banks B and B. Suppose their wavelet filters are defined as above. Then, the dual-tree complex wavelets ψ h + j ψ h and ψ f + jψ f are analytic, if and only if the scaling filters are related as The theorem states that in order to form the two Hilbert pairs, the scaling filters of the dual filter bank B must be a half-delayed or a half-advanced version of the corresponding scaling filter of the primal filter bank B. For orthogonal filter banks, condition (6) becomes the well-known half-sample delay requirement Selesnick (2001). It is pointed out that when the dual tree complex wavelets are analytic, the two filter banks B and B share common properties including vanishing moments of the wavelet functions, symmetry, and orthogonality or biorthogonality. In other words, properties of one filter bank can be inherited from the other via the Hilbert transform pair requirement. This suggests that we can design dual filter banks from a given primal filter bank. This is the approach we used in Yu & Baradarani (2008) to design the 9/7-10/8 complex filter bank. In this chapter, we have employe the modified version of our recently designed complex filter banks in Yu & Baradarani (2008). The modification procedure is twofold. A recent and important SDP based work of Dumitrescu Dumitrescu (2008) has improved the orthogonality error of our dual-tree wavelet filters significantly. Following the procedure in Dumitrescu (2008) we first obtain an enhanced version of our earlier filter bank. Then we align and tune the obtained filters to be us in a dual-tree structure with suitable time-shift to preserve orthogonality. shows the in analysis side of the primal . It should be noted that the magnitude spectra plot of the complex wavelets ψ h (t) + j ψ h (t) and ψ f (t) + j ψ f (t) are essentially one-sided Selesnick et al. (2005). This implies that the wavelet bases form (approximate) Hilbert transform pairs. This is shown in Fig. both for the analysis and synthesis sides of the modified 9/7-10/8.

Scalar Wavelet-based Method
The application of wavelet transform in segmentation of moving objects in video frames is motivated by the fact that background removal part of this problem can be cast as a denoising problem. The background in video sequences is far from being constant. It is affected by camera noise and non-uniform illumination. The difference frame in the wavelet domain is comprised of the moving object edges and noise. The aim here is to detect as many moving object edges as possible and at the same time suppress the noise. The block diagram of wavelet based moving object detection using single change detection (SCD) approach, proposed by Huang & Hsieh (2003), is shown in Fig. 4. This approach uses two consecutive frames, namely f n−1 and f n . Fig. 5 presents the same technique but employing double change detection (DCD) instead of SCD Huang et al. (2004). The concept of DCD is straightforward. To extract the moving objects without the double edge problem, three successive frames, i.e., f n−1 , f n and f n+1 are used. Therefore, the moving objects in frame n can be obtained by detecting the common regions of the difference frames between f n−1 and f n , and between f n and f n+1 , i.e., the moving object in frame n can be determined by the intersection of the two obtained frame differences (see Fig. 5 ). We focus on DCD as it has been shown that DCD can simply outperform the SCD based techniques Huang et al. (2004). The three spatial domain frames are transformed into the wavelet domain resulting in three images I n−1 , I n and I n+1 with respective subbands. The two frame differences in the wavelet domain are obtained by The observed frame differences FD n and FD n+1 are corrupted by camera noise and nonuniform illumination. This is expressed as where FD * n and FD * n+1 are the desired frame differences and η 1 and η 2 represent the noise. It can be shown that the distribution of noise in the observed difference frames follows approximately a Gaussian Donoho (1995). A univariate soft thresholding technique Donoho (1995), Strela et al. (1999) is used to estimate the desired frame differences FD * n and FD * n+1 . Finally, Canny edge detector is applied on FD * n and FD * n+1 in order to extract DE n and DE n+1 , and the edge maps of the two difference frames are then obtained by where Φ denotes the Canny edge detector and the superscript W indicates that the edge maps are in the wavelet domain. The union operation is applied to obtain the edge maps DE n and DE n+1 of the significant difference pixels in the spatial domain. Finally the intersect operator is used to obtain the moving object edge map (ME n ) of frame n This is the main idea behind all the wavelet based moving objec . It should be noted that the algorithms and results in Huang & Hsieh (2003) and Huang et al. (2004)

Multi-wavelets and Moving Object Detection
Although the structure of a multi-wavelet based method is almost the same as the above mentioned scalar wavelet based algorithm, followings are the main differences: 1) Two pairs of scaling functions and wavelets are used in each level (stage) of the analysis and synthesis sides of the filter bank. Recall that a single scaling function and wavelet is employed in scalar wavelet based structure. That is, in fact, the definition of multi-wavelets (see Fig. for L 1 L 1 , L 1 L 2 , H 1 L 1 , L 1 H 2 ,. . . ).
2) Multi-wavelets have shown a high performance in denoising issues. We imposed a softthresholding step in MW domain as is seen in Fig. to improve the raw data of subbands before reconstruction.
3) Even if a one-level multi-wavelet structure is selected, sixteen subbands are generated. More frequency subbands with different directionality leads to more information in a raw dataset. 4) Multi-wavelets are well known for their edge preserving property. The main performance increment in multi-wavelet based approachs is due to this fundamental property.
The MW-based method, shown in Fig. 6, is applied to the Missa and Trevor sequences which are typical 256×256 gray level videos. To have a better understanding about the steps and in Fig. using one-level GHM_p3_rr multi-wavelets. Fig. shows further steps for moving edge points of frame 1 are extracted using frames 2 , , and . We use "GHM" Geronimo et al. (1994), "GHM_p3" Ozkaramanli (2003) and "Alpert" Alpert (1993) multi-wavelets with repeated row (MW_rr) and approximation order (MW_ap) pre-processing to obtain the edge maps. The performance of these multi-wavelets employing the DCD technique is depicted in 1 Recall that we always need three consecutive frames due to the use of DCD.    Huang et al. (2004) gives better performance than previous studies in literature Kim & Hwang (2002), Huang & Hsieh (2003). The presented MW rr method in Baradarani et al. (2006) provides better results when compared with the scalar wavelets given in Huang & Hsieh (2003), Huang et al. (2004). Furthermore, MW rr outperforms MW ap approach approximately by 34 percent on the average. This confirms the usefulness of repeated row pre-processing in detecting moving object edges in Strela et al. (1999) due to the similarity of this problem to that of denoising. It is pointed out that in case of using multi-wavelets, two wavelets and two scaling functions are used in the structure, i.e., there are 4, 16, 64,. . . subbands at the first, second,. . . stages in the respective filter bank structure (see Fig. ).

DT-CWT in Moving object Detection
Motivated by our MW-based approach, we are interested in investigating the application of DT-CWT to detection of moving objects in a video employing the modified dual-tree filterbank. In the first application, we demonstrated the application of 9/7-10/8 in image denoising. Our motivation here is twofold. Firstly, DT-CWT has been shown to be a useful tool for image denoising. Sendur and Selesnick have shown that DT-CWT is more effective than DWT for denoising purposes Sendur & Selesnick (2002a). Secondly, we already showed that the 9/7-10/8 filter bank has promising results in image denoising Baradarani & Yu (2007). Thus, DT-CWT is expected to be a good candidate for moving object edge map detection algorithms. The block diagram of the proposed technique is shown in Fig. . This approach, which is based on DCD, uses three consecutive frames, namely f n−1 , f n , and f n+1 . The method is applied to the Missa and Trevor sequences, which are typical gray level videos. Figure 1 shows the performance of six MW-based implementations along with the results obtained by modified 9/7-10/8 DT-CWT filter bank for Trevor video, all with DCD enrichment. Scalar wavelet based methods, presented in Huang et al. (2004) with SCD and DCD, are also shown for reference. As is shown in Figure 12, it stands somewhere in between the results of MW_rr and MW_ap. Our earlier experiments show that DT-CWT performs better for frame and single image segmentation, and multi-wavelets are preferred for video and moving object segmentation. In general, the performance of multi-wavelets with repeated row is better than multi-wavelets with approximation order with pre-processing. The complex filter bank is also compared to edge points has been increased significantly when th filter bank is used also the accuracy of the detected moving object edges is greatly improved. The term accuracy e dual-tree complex Although it is better than 9/7-10/8, it cannot outperform MW_rr.
MW_rr in terms of number of edges that can be extracted. Not only number of detected 9 9 2 www.intechopen.com here is reflected in the sharpness of fine features, e.g., mouth, nose, eyes, and eyeglasses. This is related to the edge preserving property of MW_rr, and good directionality and shift-invariance of the DT-CWT approach.

Moving Object Segmentation
After moving object edge extraction, the moving objects formed by the edges can be segmented from the rest of the frame. Because of non-ideal segmentation of moving object edges, there are some disconnected edges which make it impossible to extract the whole object. Therefore, post-processing is applied using morphological operations in order to generate connected edges representing a connected moving object. Morphological operations of binary closing Gonzales & Woods (2005) are used with the structuring element shown in Figure 9 The structuring element is a binary 5 ×5 matrix whose size can be changed with respect to the frame size. The connected components with a pixel count less than κ (a threshold) are assumed as noise. These components are not accepted as moving objects. The ones with a pixel count greater than κ segmented as moving objects. The value of κ is determined with respect to the image size. For example, it is not reasonable to choose a large value of threshold, e.g., κ = 2000, as it looks like an object rather than noise in a 256×256 frame. Moving object segmentation in Missa and Trevor videos, 256×256 gray , are in Figs 10 and 1 . In these figures, (a), (b), and (c) show the frames 5 (32), 26 (33), 27 (34), from left to right, for the Missa (and for the Trevor with the frame numbers inside parenthesis) respectively. In Fig. 10, (d), (h), and (l) represent ME n for frame 26; (e), (i), and (m) refer to ME n after binary closing; (f), (j), and (n) show connected components; (g), (k), and (o) present segmented moving object. Also, in this figure, the second row corresponds to the use of GHM_p3_rr multi-wavelets; the third row shows the results obtained by using 9/7-10/8 DT-CWT; and the last row shows the results if DWT is employed. Figs. 1 (d), (e), and (f) illustrate the segmented moving object in Trevor video for frame 33 withGHM_p3_rr multi-wavelets, DT-CWT and DWT with DCD (DWT_DCD) respectively.

Conclusions and Future Work
In this Chapter we briefly discussed the most famous current wavelet based moving object detection and segmentation algorithms, and the main concepts behind them. By the term wavelet here, we refer to DWT, multi-wavelets, and DT-CWT. DWT is very good for compression, it has the perfect reconstruction property, and contains low computation. However, DWT is a shift-dependent approach, poor in directional selectivity, and sensitive for edges. Multi-wavelets are very good for denoising problems, preserve edges better, but computationally expensive. DT-CWT has the perfect reconstruction property, the transformation is shift-invariance, it has good directionality, and preserves edges better. However, the method is computationally expensive. The simulation results show that the performance improvement of DT-CWT based technique is on the average over the scalar wavelets. Furthermore, the results indicate tha outperforms multi-wavelets with approximation order pre-processing considerably and performs slightly less than multi-wavelets with repeated row pre-processing. The former is probably due to the good directionality and shiftinvariance of DT-CWT, while the later is obtained because of the better edge preserving property of multi-wavelets with repeated row pre-processing. Another open topic, which can reduce the computational complexities, is the use of concept of new 3D transformations such as non-separable oriented 3D-DT-CWT which deals with whole video as a single box of 3D data. In view of the success of the 9/7 filter bank in JPEG2000 standard, as well as the improved shift-invariance and better directionality of the dual-tree complex wavelet transform, it is reasonable to hope that dual-tree complex filter banks may also give better results in image (video) compression. It is important to point out that, the main idea in this chapter is to introduce and discuss the current techniq ues in literature for moving ob ject detection only. The segmentation approach used to illustrate results cannot compete to the advanced techniques purely designed for segmentation. F or example, the area highlighted by an in F ig = . 11 (d)-(f), belongs to background which ideally should not be shown in a segmented image as a moving part. The main contibution of this chapter, and our previous works, i ] [ ] [ s to present the performance of wavelet family in moving ob ject ' 'detection '. We have used a basic segmentation to show that the more moving pixels detected, the more area of moving obj ect pixel neighbors are shown.
Another important technical point is th selection pixels will be detected, however, it will not refer to the correct moving obj ect pixels. Any other definition, not only shall offer more pixels, but also it must observe correct ones only. This, that it cannot be replaced by a union operator for instance ,which only increases the number of detected pixels incorrectly. Of course if we change the intersection operator with union, more In fact, if one is interested in using other transformations such as framelets, curvelets, sure-lets, contourlets and etc, the following points must be taken into account. F irstly, the work is divided into two sections ; detection and segmentation. The results shown in F ig. 12, obtained under certain conditions like noise, the exact video we used, and more importantly the wavelet filters and Secondly, segmentation is totally independent from detection. If we want to compare, for example, curvelets with framelets and wavelets, this comparison should be carried out in deterve a fair comparison.
ction section. Note that a single segmentation algorithm must be applied to all methods, to obse If one needs to compare his results with F ig. 12, alignments in each transformation and etc. first all these algorithms must be implemented to be employed under new conditions. www.intechopen.com   