DLLBS: Direct LL-mask Band Scheme; 2LS: Lifting Scheme; 32×2 AFS: 2×2 Average Filter Scheme; 4DSS: Down-Sampled Scheme; 5Accuracy rate: Success Tracking/ Original Sequency.The moving objects detection and tracking results.
In recent years, video surveillance systems for the purpose of security have been developed rapidly. More and more researches try to develop intelligent video surveillance systems to replace the traditional passive video surveillance systems (Hu et al., 2004) and (Jacobs & Pless, 2008). The intelligent video surveillance system can detect moving objects in the initial stage and subsequently process the functions such as object classification, object tracking, and object behaviors description. Detecting moving object is a very important aspect of computer vision and has a very wide range of surveillance applications. The accurate location of the moving object does not only provide a focus of attention for post-processing but also can reduce the redundant computation for the incorrect motion of the moving object. The successful moving object detection in a real surrounding environment is a difficult task, since there are many kinds of problems such as illumination changes, fake motion (Cheng & Chen, 2006), night detection (Huang, 2008), and Gaussian noise in the background (Gonzalez & Woods, 2001) that may lead to detect incorrect motion of the moving object. There are three typical approaches for motion detection (Hu et al., 2004), (Jacobs & Pless, 2008), and (Collins, 2000): background subtraction, temporal differencing, and optical flow. The background subtraction method detects moving regions between the current frame and the reference background frame. It provides the most complete motion mask data, but is susceptible to dynamic scene changes due to lighting and extraneous events. Therefore, it has to update the reference background frame frequently. The temporal differencing approach extracts the moving region by using consecutive frames of the image sequences. It is suitable for dynamic environment, but often extracts incomplete relevant motion object pixels. The optical flow method uses characteristics of flow vectors of moving objects over time to detect moving regions. However, most optical flow methods are with higher complex computation. Generally, the above three moving object detection methods are all sensitive to illumination changes, noises, and fake motion such as moving leaves of trees.
In order to solve the mentioned problems, several approaches for object detecting and tracking were proposed (Ahmed et al., 2005), (Alsaqre & Baozong, 2004), (Cheng & Chen, 2006), (Chen & Yang, 2007), (Collins, 2000), (Cvetkovic et al., 2006), (Hsieh & Hsu, 2007), (Hu et al., 2004), (Hu et al., 2009), (Huang et al., 2008), (Jacobs & Pless, 2008), (Liu et al., 2006), (Mckenna, 2000), (Sugandi, 2007), and (Tab, 2007). Video tracking systems have to deal with variously shaped and sized input objects, which often result in a massive computing cost of the input of images. Cheng et al. (Cheng & Chen, 2006) used discrete wavelet transform (DWT) to detect and track moving objects. The 2-D DWT can be used to decompose an image into four-subband images (LL, LH, HL, and HH). It only processes the part of LL-band image due to the consideration of low computing cost and noise reduction issues. Although this method provides low computing cost (low resolution) for post-processing and noise reduction based on the conventional DWT, the LL-band image produced by the original image size via two dimensions (row and column) calculation may cause high computing cost in the pre-processing. Especially they use the three-level low-low band image (LL3) that does not only bring a great image size transfer computation, but also the slow motion of the real moving objects may disappear. After dealing with the background subtraction, Alsaqre
To overcome the above-mentioned problems, we propose a method, direct LL-mask band scheme (DLLBS), for detecting and tracking moving objects by using SMDWT ( Hsia et al., 2009 ). In DLLBS, we can select only the LL-mask band of SMDWT. Unlike the conventional DWT method to process row and column dimensions separately by low-pass filter and down-sampling, the LL-mask band of SMDWT can be used to directly calculate the LL-band image. Our proposed method can reduce the image transfer computing cost and remove fake motion that is not belonged to the real moving object. For objects occlusion, a new approach, characteristic point recognition (CPR), was proposed. Combined with DLLBS and CPR, it can have accurate object tracking for various types of occlusions. Furthermore, it can retain a better slow motion of objects than that of the low resolution method (Sugandi et al., 2007) and provide effective and complete moving object regions.
2. Discrete Wavelet Transform and low resolution technique
Due to the imperfection of video acquisition systems and transmission channels, images are often corrupted by noise. Therefor, this degradation leads to a significant reduction of image quality, especially for the task that performs high-level computer vision, such as object tracking, recognition, etc. Before dealing with motion object detection, there are several methods for removing noises or fake motion and reducing computing cost proposed in the past several years. DWT (Cheng & Chen, 2006) and low resolution technique (Andra et al., 2000) are two important approaches, and are briefly described in the following sub-sections.
2.1. Discrete Wavelet Transform method
Wavelet transform (Mallat, 1989) was proposed in the mid-1980s, and it has been used in various fields such as signal processing, image processing, computer vision, image compression, biochemistry medicine, etc. For image processing, it provides an extremely flexible multi-resolution image and can decompose an original image into different subband images including low- and high-frequencies. Therefore people can choose the specific resolution data or subband images upon their own demands ( Hsia et al., 2009 ), (Mallat, 1989), (Ge et al., 2007), (Liu et al., 2006), (Ahmed et al., 2005), and (Tab et al., 2007).
A 2-D DWT of an image is illustrated in Fig. 1(a). When the original image is decomposed into four-subband images, it has to deal with row and column directions separately. First, the high-pass filter
Cheng et al. (Cheng & Chen, 2006) applied the 2-D DWT for detecting and tracking moving objects and only the LL3-band image is used for detecting the moving object motion. Because noises are preserved in high frequency, it can reduce the computing cost for post-processing by using the LL3-band image. This method can be used for coping with noise or fake motion effectively, however the conventional DWT scheme has the disadvantages of complicated calculation when an original image is decomposed into the LL-band image. Moreover if it uses an LL3-band image to deal with the fake motion, it may cause incomplete moving object detecting regions.
2.2. Low resolution method
Sugandi et al. (Sugandi, 2007) proposed a simple method by using the low resolution concept to deal with the fake motion such as moving leaves of trees. The low resolution image is generated by replacing each pixel value of an original image with the average value of its four neighbor pixels and itself as shown in Fig. 2. It also provides a flexible multi-resolution image like the DWT. Nevertheless, the low resolution images generated by using the 2×2 average filter method are more blurred than that by using the DWT method, as shown in Fig. 3. The average filtering is a low pass filter which denoises the image and performs restoration by the noise reduction spatial domain. It may reduce the preciseness of post-processing operation (such as occlusion and object identification), because the post-processing depends on the correct location of the moving object detecting and accuracy moving object data.
3. Direct LL-mask band scheme
In order to detect and track the moving object more accurately, we propose a new method called direct LL-mask band scheme (DLLBS) that is based on the 2-D integer symmetric mask-based discrete wavelet transform (SMDWT) ( Hsia et al., 2009 ). It does not only retain the features of the flexibilities for multi-resolution, but also does not cause high computing cost when using it for finding different subband images. In addition, it preserves more image quality of the low resolution image than that of the low resolution method (Sugandi, 2007).
3.1. Symmetric Mask-based Discrete Wavelet Transform (SMDWT)
In 2-D DWT, the computation needs a large transpose memory requirement and has a long critical path. The SMDWT has many advanced features such as short critical path, high speed operation, regular signal coding, and independent subband processing ( Hsia et al., 2009 ). The derivation coefficient of the 2-D SMDWT is based on the 2-D 5/3 integer LDWT. For computation speed and simplicity considerations, four-masks, 3×3, 5×3, 3×5, and 5×5, are used to perform spatial filtering tasks. Moreover, the four-subband processing can be further optimized to speed up and reduce the temporal memory of the DWT coefficients. The four-matrix processors consist of four mask filters, and each filter is derived from one 2-D DWT of 5/3 integer lifting-based coefficients ( Hsia et al., 2009 ). The coefficients of each subband mask are shown in Fig. 4, and the block diagram of the 2-D SMDWT is shown in Fig. 5.
3.2. Detection and tracking flow
The pre-processing flowchart of the proposed DLLBS moving object detection and tracking system is shown in Fig. 6. Frist, prior to color converting RGB data to YCbCr data (using Y data only). Basically we apply the double-change-detection method (Huang et al., 2004) to detect the moving objects. In order to decrease the holes left inside the moving entities, three continuous frames (F
The motion mask (MM
The holes may still exist in the motion masks, because some motion pixels are too tiny such that it causes error judgments as non-motion ones. In order to increase the motion mask (MM
Then we apply the erosion operator for eliminating redundant pixels in the motion mask boundary as follows:
It scans eight neighbors of the motion mask MMR
Labeling is useful when the moving objects in the scene are more than one (The connected component labeling is then employed to label each moving object and track each moving object individually). The labeling of the components based on pixel connectivity (intensity) (Gonzalez & Woods, 2001) is obtained by scanning an image and groups, pixel by pixel from top left to bottom right, in order to identify the connected pixel regions by comparing the eight neighbors that have already been encountered in the scan. If the pixel has at least one neighbor with the same label, we label this pixel as the neighbor. The labeled moving objects are thus found, and then we extract the boundary of the moving object using rectangle box to track the moving object. For this reason, the bounding box is found according to its motion mask from the foregoing work. The bounding box is made by finding the minimum and maximum values of row and column coordinates of the motion mask. In order to track moving objects in the original image size, we have to transform the coordinate from the LL2 image size back to the original image size according to the spatial relationship of the DWT as follows:
In the block-matching motion estimation, the motion vector is the displacement of a block with the minimum distortion from the reference block. The CamShift block-matching algorithm determines the motion vector by identifying a block with the minimum distortion from fast search strategies of the diamond-arc-hexagon search patterns in the search area (Chiang et al., 2008).
3.3. Occlusion handling for multiple objects tracking
In the post-processing, occlusion handling is a major problem in a video surveillance system. The most popular color space is the RGB color space (Hu et al., 2009). If the multiple objects bounding boxes are occluded, the object bounding boxes are merged into the occlusion bounding box. Here we propose a new approach for occlusion in multiple objects tracking, called characteristic point recognition (CPR). Fig. 8 shows the operation flowchart of CPR. CPR uses bounding boxes during pre-processing of DLLBS. For each tracked individual, the system will detect whether it makes occlusion with other object or group. It can obtain the RGB information from the video capture device directly to calculate the color information of the moving pixels. Owing to the information of moving pixels the size of the inter-frame difference image (1/16 of the original image) is with the central pixels.
To recognize every object, it uses the bounding box to find the characteristic point (CP). CP represents the central point of the bounding box as shown in the following equation:
At first, the CP of every object is stored in the buffer when the first frame is input, and is regarded as the initial sample. In latter frames, the CP is matched with the sample. In other words, the CP of 1 to
After the match step, Cd
However, the objects of a frame may disappear or be occluded in latter frames. In order to hold the information of the object, the CP of the object has to be retained. Hence, we must know the object which has ever occurred when the object appears again in some frames. Because the CP may be changed by the environmental factors, the buffer has to be updated whenever a new frame is input in order to obtain the latest CP. If a new object appears, the CP of the new object should be added into the buffer to update the CP information. The CPR flowchart is shown in Fig. 8.
4. Experimental results
In this work, the experimental results of several different environments including indoor (all day) and outdoor (all day) environments with statistic video system are demonstrated. The original image frame sizes are 320×240, 640×480, and the format of color image frame is 24-bit in a RGB system. We use all gray level frames from transferring the RGB system to YCbCr system for detecting moving object motion and utilize the LL2 (for 320×240) and the LL3 (for 640×480) image size of 80×60 generated by using SMDWT from the original image for our proposed moving object detection and tracking system. The experimental environment is set using Intel 2.83 GHz Core 2 Quad CPU, 2 GB RAM, Microsoft Windows XP SP3, and Borland C++ Builder (BCB) 6.0. BCB is chosen as the software development platform. The software includes verifying for algorithms and image process for the moving objects detection.
4.1. Dealing with noise issues
There are many kinds of difficulties such as illumination changes, fake motion, and Gaussian noise in the background. Different LL-band images including one-level, two-level, three-level, and multi-level LL-band images are used to deal with noises and compare their results. We suggest that a successful eliminating noise image has no other motion mask besides moving object motion masks, as shown in Figs. 9 and 10. Table 1 shows the average (Figs. 9 and 10) successful eliminating noise rate of each level LL-band image. The first row is in the indoor environment and the second row in the outdoor environment. Each level LL-band image has effective results when dealing with indoor noises like Gaussian noise produced by random noise and statistical noise. However, when dealing with the outdoor noise such as moving leaves of trees, the LL1-band image has poor results because these outdoor noises sometimes are large that cannot be eliminated completely.
|Level||Accuracy rate||Accuracy rate||Accuracy rate||Accuracy rate|
|LL1 (160×120)||99.54 %||99.54 %||99.07 %||98.15 %|
|LL2 (80×60)||99.07 %||99.07 %||93.07 %||81.94 %|
|LL3 (40×30)||95.83 %||95.83 %||86.11 %||63.89 %|
4.2. Moving object tracking
We consider it to have a complete moving object region if it is a successful work, as shown in Fig. 11(a). In Fig. 11(b), the moving object regions have only a part of moving object, and that will be treated as a failure tracking. Fig. 12(a) expresses the original frame without detecting and tracking moving objects. Without DLLBS technique many noise masks are tracked. However, even if the moving objects are tracked, those moving regions are fragmented, as shown in Fig. 12(b). By using DLLBS, the noises can be filtered out, as shown in Fig. 12(c). It still generates incomplete moving object regions by using LL1-band image, because the relevance of these pixels in the LL1-band image is deleted. When using a three-level resolution image to detect the moving objects, it generates incompletely moving object regions, owing to the LL3-band image causing too many slow motions belonged to the moving object disappeared, as shown in Fig. 12(e). Finally, let us look at the results of the LL2-band image in Fig. 12(d). Using the two-level band image has a better tracking region and also can cope with noises and fake motion effectively, as shown in Table 1.
We use the 2×2 average filter scheme (AFS) in substituting the original DLLBS block system to demonstrate the moving object, however it is more blurred than the DLLBS technique. The accuracy rate of the successful object tracking with the 2×2 AFS are shown in Tables 1, 4, and 5. It is easy to perceive the contrasts between Tables 1 and 4 of any resolution image; the LL-band image generated by the DLLBS has a better successful ratio than the low resolution image generated by the 2×2 AFS.
Several experiments have been made to prove the feasibility of the proposed approach for moving object detection, tracking, and occlusion. We used an entry-level video camera and capture card to capture the test sequences in our campus (Tamkang University), and simulated several cases of condition for moving objects, such as signal object in day time (indoor/outdoor), signal object at night (outdoor), and multiple objects in day time (outdoor) environments. All the test sequences are stored as the Microsoft AVI format with raw file of resolution 320×240, 640×480, and frame rate of 30 fps as shown in Fig. 13.
The choice of the threshold
|Resolution||Night in the outdoor||Day and Night in the indoor||Day and Midday in the outdoor|
|Environments||Fake motions||Low contrast||Reflection|
We established 16 test sequences at Tamkang University in different environments, such as day time, night time, rainy day, fast movement, slow movement, and occlusion, as shown in Fig. 13. Compared with other approaches (2×2 AFS and DSS), the DLLBS can obtain a good sparsity for spatially localized details, such as edges and singularities, as shown in Table 3. Because such details are typically abundant in natural images and convey a significant part of the information embedded therein, DWT has found a significant application for image denoising. From Tables 4, 5, and 6, we notice that some objects are not correctly identified in the test frame of the sequences. The wrong identification occurs in two reasons:
The moving object just enters or leaves the scene.
Because the moving object is detected and tracked at the border of the scene, the extracted features of the moving object in the case cannot represent the moving object very well.
The moving object is slowing down.
In this issue, the temporal difference image of the object becomes smaller and loses its situation.
|Accuracy rate||Detection + Tracking||Accuracy rate||Detection + Tracking||Accuracy rate||Detection + Tracking|
|Sequence1||98.61 %||53.8 FPS||98.61 %||58.5 FPS||71.76 %||52.1 FPS|
|Sequence2||95.90 %||56.5 FPS||96.31 %||57.1 FPS||56.15 %||49.0 FPS|
|Sequence3||97.40 %||54.1 FPS||92.36 %||60.7 FPS||96.59 %||63.1 FPS|
|Sequence4||93.55 %||54.3 FPS||82.61 %||62.0 FPS||92.65 %||63.2 FPS|
|Sequence5||82.97 %||56.7 FPS||80.44 %||60.2 FPS||77.29 %||65.6 FPS|
|Sequence6||82.57 %||53.9 FPS||78.90 %||61.5 FPS||46.79 %||63.9 FPS|
|Sequence7||90.28 %||54.7 FPS||37.50 %||61.4 FPS||40.28 %||63.5 FPS|
|Sequence8||83.33 %||55.1 FPS||73.46 %||62.4 FPS||78.40 %||61.2 FPS|
|Sequence9||90.16 %||53.5 FPS||75.13 %||59.0 FPS||83.94 %||58.5 FPS|
|Average||90.53 %||54.7 FPS||79.48 %||60.3 FPS||71.53 %||60.1 FPS|
|Accuracy rate||Detection + Tracking + occlusion||Accuracy rate||Detection + Tracking + occlusion||Accuracy rate||Detection + Tracking + occlusion|
|Sequence10||92.94 %||56.7 FPS||88.61 %||60.6 FPS||82.92 %||57.6 FPS|
|Sequence11||89.98 %||54.8 FPS||83.67 %||60.9 FPS||92.60 %||58.7 FPS|
|Sequence12||90.43 %||53.9 FPS||79.79 %||60.8 FPS||82.98 %||55.9 FPS|
|Sequence13||90.00 %||52.6 FPS||86.67 %||59.7 FPS||75.56 %||63.7 FPS|
|Average||90.84 %||54.5 FPS||84.69 %||60.5 FPS||83.52 %||58.9 FPS|
|Accuracy rate||Detection + Tracking + occlusion||Accuracy rate||Detection + Tracking + occlusion||Accuracy rate||Detection + Tracking + occlusion|
|Sequence14||85.60 %||14.1 FPS||73.60 %||16.5 FPS||31.20 %||15.5 FPS|
|Sequence15||88.36 %||14.1 FPS||79.45 %||16.6 FPS||63.01 %||15.3 FPS|
|Sequence16||81.33 %||14.2 FPS||70.67 %||16.6 FPS||33.33 %||15.2 FPS|
|Average||85.10 %||14.1 FPS||74.57 %||16.6 FPS||42.51 %||15.3 FPS|
The direct LL-mask band scheme (DLLBS) for moving object detection and tracking is proposed in this work. It is able to detect and track moving objects in indoor and outdoor environments with statistic video systems. The proposed DLLBS does not only overcome the drawbacks of high complex computation and slow speed for the conventional DWT, but also preserves the wavelet features of the flexible multi-resolution image and the capability for dealing with noises and fake motion such as moving leaves of trees. In the real-word application, the experimental results demonstrate that the 2-D LL2-band (for 320×240) and the 2-D LL3-band (for 640×480) can effectively track moving objects by region-based tracking under any environments (day and night), as well as it can cope with noise issues. For occlusion considerations, we propose a new approach, characteristic point recognition (CPR). Combined with DLLBS and CPR, it can accurately track various types of occlusions. The DLLBS system can be extended to the real-time video surveillance system applications, such as object classification and descriptive behaviors of objects.