Novel Video Coder Using Multiwavelets

Information has become one of the most valuable assets in the modern era. Recent technology has introduced the paradigm of digital information and its associated benefits and drawbacks. A thousand pictures require a very large amount of storage. While the advancement of computer storage technology continues at a rapid pace a means of reducing the storage requirements of an image and video is still needed in most situations. Thus, the science of digital image and video compression has emerged. For example, one of the formats defined for High Definition Television (HDTV) (Ben Waggoner 2002) broadcasting is 1920 pixels horizontally by 1080 lines vertically, at 30 frames per second. If these numbers are multiplied together with 8 bits for each of the three primary colors, the total data rate required would be 1.5 GB/sec approximately. So compression is highly necessary. This storage capacity seems to be more impressive when it is realized that the intent is to deliver very high quality video to the end user with as few visible artifacts as possible. Current methods of video compression such as Moving Pictures Experts Group (MPEG) standard (Peter Symes 2000, Keith Jack 1996) can provide good performance in terms of retaining video quality while reducing the storage requirements. But even the popular standards like MPEG have limitations. Research in new and better methods of image and video compression is ongoing, and recent results suggest that some newer techniques may provide much greater performance. This motivates to go for video compression. An extension of image compression algorithms based on multiwavelets and making them suitable for video (as video contains sequence of still pictures) is essential. This chapter gives a summary of the new multiwavelet decomposition algorithm along with quantization techniques and illustrates their potential for inclusion in new video compression applications and standards (Sudhakar et al., 2009, Sudhakar & Jayaraman 2007, Sudhakar & Jayaraman 2008). Video coding for telecommunication applications has evolved through the development of the ISO/IEC MPEG-1, MPEG-2 and ITU-T H.261, H.262 and H.263 video coding standards (and later enhancements of H.263 known as H.263+ and H.263++), (Iain E.G. Richardson 2002) and has diversified from ISDN and T1/E1 service to embrace PSTN, mobile wireless networks, and LAN/Internet network delivery.


Introduction
Information has become one of the most valuable assets in the modern era.Recent technology has introduced the paradigm of digital information and its associated benefits and drawbacks.A thousand pictures require a very large amount of storage.While the advancement of computer storage technology continues at a rapid pace a means of reducing the storage requirements of an image and video is still needed in most situations.Thus, the science of digital image and video compression has emerged.For example, one of the formats defined for High Definition Television (HDTV) (Ben Waggoner 2002) broadcasting is 1920 pixels horizontally by 1080 lines vertically, at 30 frames per second.If these numbers are multiplied together with 8 bits for each of the three primary colors, the total data rate required would be 1.5 GB/sec approximately.So compression is highly necessary.This storage capacity seems to be more impressive when it is realized that the intent is to deliver very high quality video to the end user with as few visible artifacts as possible.Current methods of video compression such as Moving Pictures Experts Group (MPEG) standard (Peter Symes 2000, Keith Jack 1996) can provide good performance in terms of retaining video quality while reducing the storage requirements.But even the popular standards like MPEG have limitations.
Research in new and better methods of image and video compression is ongoing, and recent results suggest that some newer techniques may provide much greater performance.This motivates to go for video compression.An extension of image compression algorithms based on multiwavelets and making them suitable for video (as video contains sequence of still pictures) is essential.This chapter gives a summary of the new multiwavelet decomposition algorithm along with quantization techniques and illustrates their potential for inclusion in new video compression applications and standards (Sudhakar et

Proposed video coder
This section deals with proposed video coder and the new concepts which matches with the existing standards.The proposed novel encoder is shown in Figure 1.The new schemes used in this video coder are highlighted first and are explained in the subsequent sections.


In Intra frame coding the following new schemes are introduced. Multiwavelet transform is used for coding the frames (I-frames)  'SPIHT', 'SPECK', 'Novel scheme' is used for coding of multiwavelet coefficients  In Inter frame coding the following new schemes are introduced. Fast algorithms for motion estimation  Half pixel accuracy motion estimation  Predictive coding of motion vectors  Multiple reference frame motion compensation

Intra frame coding
Removing the spatial redundancy within a frame is called as intraframe coding.Normally Iframes are coded in this way.This is achieved using transform.Before applying the multiwavelet transform to the input images or residuals, the image is to be preprocessed.The prefilter (Strela, V., 1996; Strela, V., 1998) is chosen corresponding to the filters chosen for applying multiwavelet transforms (Strela, V & Walden A.T., 1998; Strela V et al., 1999).Similarly, the post processing is to be done at the receiver side.) are used and their performances are studied.SPIHT performs better for high bit rate but produces poor quality at low bit rates.SPECK performs well at low bit rates but results in poor compression.So a novel scheme is introduced.In this coder the 'Y' and 'U' components are coded using 'SPIHT' and the 'V' component is coded using 'SPECK' at 75% of the rates used in 'SPIHT'.The very first frame or every twelfth frame of video sequences is coded as I-frame.Every other frame is coded as P-frame.If the mean square error between the predicted frame and the actual frame is greater than the threshold then the current frame is coded as the I-frame.The 'bpp' settings of SPIHT encoder for residual are set to very less rate compared to the I-frame rate.

Entropy coding
The purpose of the entropy coding algorithm (Lei and Sun 1991), is to represent frequently occurring (run, level) pairs with a short code and less frequently occurring pairs with a longer code.In this way, the run-level data may be compressed into a small number of bits.Huffman coding and arithmetic coding are used widely for entropy coding of image and video data.In this chapter Huffman coding is used as the entropy coding.

Inter frame coding
The temporal redundancy between the successive frames is removed by interframe prediction.This is achieved by Motion estimation and compensation.An efficient fast motion estimation algorithm to predict the current frame from the previous reference frames is used.Here the motion estimation is done up to half pixel accuracy.The detailed explanations are given in the subsequent sections

Fast motion estimation algortihm
Full search (FS) block motion estimation matches all possible points within a search area in the reference (target) frame to find the block with the minimum block distortion measure (BDM).Thus this algorithm gives the best possible results.However, a full search algorithm accounts for about two-thirds of the total computational power and it is very intensive computationally.Due to the high requirement of intensive computation for the full search algorithm many fast motion algorithms (Peter Symes 2000) have been proposed over the last two decades to give a faster estimation with similar block distortion compared to the full search method.The most well known fast Block Motion Algorithms (BMA) are the threestep search (TSS) (Li et al 1994;Koga et al 1981), the new three-step search (NTSS), the fourstep search (4SS) (Po Ma 1996) and the diamond search (DS) (Shan Zhu and Kai-Kuang Ma 2000).Diamond search is more popular among the existing standards.The main aim of these fast search algorithms is to reduce the number of search points in the search window and hence the computations.This is completely evident from the Table .1.The motion field for a block of a real world image sequence is gentle, smooth usually and varies slowly.One of the most important assumptions of all fast motion estimation algorithm is 'error surface is monotonic' i.e.BDM is the least at the center or the global minima of the search area and it increases monotonically as the checking point moves away from the global minima.

Half pixel accuracy motion estimation
Fractional pixel motion estimation is employed in modern coding standards in which the displacement of an object between two frames in videos is not an integer no of pixels.Here motion vectors are used.These vectors point to candidate blocks that are placed at half pixel locations.It is advantageous to place a candidate block at fractional location.This gives better matching properties than at an integer location.Further it helps to reduce the degree of error between origilnal and predicted image.Interpolating linearly or bilinearly the nearest pixels at integer locations, it is possible to obtain the pixel values in the fractional locations.But the demerit here is that the computational overhead increases.
Hence it becomes necessary to save the computation overhead.Conventional encoders can be used for this purpose.The process of motion estimation in this conventional encoder is dealt in two steps.
1. Criteria minimum is found at integer location.
2. Interpolation of candidate block correponding to the eight nearest half pixel displacement motion vectors as shown in fig. 2 Interpolation is done to the best integer and motion vector is refined into sbupixel by computing the criterain between the current block and its eight half pixel candidate block.Real time encoder finds this process too difficult to be implemented because of its complexity in computation, hence much faster methods have been investigated in the literature (Lee and Chen 1997).

Predictive coding of motion vectors
The motion vectors are predicted from the previously coded motion vectors (Lee et al 2000) so as to reduce the number of bits required to code them.Variable bit rate coding is used to encode the difference.Based on the previously found motion vectors, a predicted vector MVp is formed, which depends on the motion compensation partition size and its availability of nearby vectors.The Motion Vector Difference (MVD) between current and predicted vector is encoded and transmitted.Variable bit length coding is used for encoding the difference.Short codes are used code the most frequenly occuring motion vector.Figure 3 shows the actual motion vectors and the difference between the predicted one and the actual motion vectors.

Block diagram of the proposed decoder system
The block diagram of the proposed decoder is shown in the figure 4.Here every step is a reverse process to the encoder except the motion prediction.By using the reference frames and the decoded motion vectors a new frame is reconstructed by motion compensation method.

Results and discussion
This section has four sub sections.Section 5.1 deals with "SPIHT results" and it gives the information about the performance of SPIHT due to the variation of I rate and P rate.Several comparisons are made here like comparison between 'DS' and 'KCDS' and also between Wavelet and Multiwavelet.Section 5.2 discusses the results between 'SPECK' and 'Novel scheme'.This section also features the performance of 'SPECK' for different videos and the comparison among SPIHT, SPECK and Novel scheme.Novel scheme is one in which the 'Y' and 'U' components are coded with 'SPIHT' but 'V' component is coded using SPECK at 75% of rate used in 'SPIHT'.Summary of the results is provided in section 5.3.Section 5.4 deals with reconstructed frames illustrating the Comparison of 'SPIHT', 'SPECK' and 'Novel scheme'.In this work, two sets of video sequences are used.First set is CIF (352  288) which includes "Dancer", "Football" video sequences.The other set is QCIF (176  144) with the video sequences, "Claire", "Foreman", "Trevor" and "Miss America".The videos used are listed in the Table 2 and their visuals are shown in Figure 5, followed by some description about them.The other conventions used are the 'I' rate and 'P' rate.'I' rate is the rate at which the reference or intra frame is coded and 'P' rate is the rate at which the residue is coded.

'SPIHT' results
The results are observed with 'I' rate = 0.9 and 'P' rate = 0.05.

'Claire' video
Here "Cardbal2" performs well in terms of Average PSNR and "Cl" produces higher compression ratio.In terms of search algorithm, 'KCDS' and 'DS' almost perform equally in terms of average PSNR with 'KCDS' gives better compression ratio.

'Foreman' video
Here "GHM" multifilter performs better in terms of average PSNR and "Cl" in terms of compression ratio."Sa4" performs better as well.In all the cases 'KCDS' performs marginally better than 'DS'.

'Dancer' video
Here in terms of multiwavelet "cardbal2" performs better in terms of average PSNR and "Cl" in terms of compression ratio.Here also, "Sa4" performs better.'DS' performs marginally better than KCDS in terms of average PSNR and 'KCDS' perform better in terms of compression ratio.

'Football' video
Here in terms of multiwavelet "cardbal2" performs better in terms of average 'PSNR' and "Cl" in terms of compression ratio.But overall "Sa4" performs better.In all the cases KCDS performs marginally better than DS.

'Trevor' video
Here "Cardbal2" performs better in terms of average 'PSNR' and "Cl" in terms of compression ratio.But overall "Sa4" performs better.In all the cases 'KCDS' performs marginally better than DS.

Conclusion
The above results lead to the following conclusions based on block matching Algorithms, Transforms, and quantization schemes,as listed below.Based on the block matching algorithm for motion estimation, kite cross diamond search (KCDS) based video compression is faster and gives better quality compared to diamond search (DS).The numerical results elucidate the above fact.The video compression based on wavelets is better for high bit rates (above 0.8 bpp) in terms of average PSNR but it is slow and also results in less compression.But at low bit rate, Multiwavelet performs extremely better than wavelets in terms of average PSNR, compression ratio, and speed.Based on quantization scheme SPIHT based video compression is good for high bit rates but fails for low bit rates where SPECK performs well but with low compression ratio.The proposed novel scheme performs well both at low and high bit rates.Addressing individual multiwavelets, the 'Sa4' and 'Cl' multifilters tend to perform better for all type of videos.Since the Novel scheme employs both SPIHT and SPECK quantization schemes, the merits of both quantization schemes are added to give very good results in terms of PSNR, CR, execution time, and thus, it is found to be a close competitor between the two quantization schemes taken individually.Hence, multiwavelet based coder will give efficient storage space because of higher amount of compression ratio.The lower value in PSNR at high bit rates can be improved by the introduction of better prediction schemes that exploits the statistical nature of every video.

Acknowledgment
The author wishes to express his deep sense of gratitude and thanks to his mentor Dr. Jayaraman Subramanian, Professor, Electrical and Electronics Engineering Department at BITS Pilani, Dubai Campus who has made this work possible with his persistent help, continued drive and timely motivation.The author is very much grateful to his mother Mrs. Santha Radhakrishnan, his wife Mrs.Vinitha Mohan, his son Master Hemesh S.V. and other family members for their constant encouragement to carry out the project in time.Last but not the least the author submit his thanks for the management where is currently working.

Fig. 1 .
Fig. 1.Block diagram of the Proposed Novel encoder

Fig. 3 .
Fig. 3. (a) Actual motion vectors (b) Difference between the predicted and actual motion vectors Now the difference is encoded as: First bit represents the sign of the difference; negative difference is represented by 1 and positive as 0.  Next to the sign bit is M ones followed by one zero; M is the absolute value of difference.Lastbit represent the decimal value; 0.5 is represented as 1 and 0.0 is represented as 0 For example, -1.0 and 0.5 are coded as -1.0  1100 0.5  001

Fig. 4 .
Fig. 4. Block diagram of the proposed decoder system

Table 1 .
Average Searching Points for different fast searching AlgorithmsMany fast motion estimation algorithms is based on the centre biased motion vector distribution.But this assumption may not hold for videos with very fast motions.Kite Cross Diamond Search (KCDS) algorithm (Chi-Wai Lam et al 2004) which is based on the cross centre biased distribution characteristics is employed in this chapter.

Table 2
processed separately and hence the peak signal value is 255.The average of these 3 values will give the average PSNR for a particular frame.When many frames are considered the average PSNR for all the frames is used as the performance factor.The PSNR in dB for an M  N Video frame for each component is calculated as . List of test videosThe 'Claire' and 'Miss America' videos have very small motions with still background and contain the motion of only one object.The 'foreman' has large motion and variable background due to camera motion.'Trevor' video has random motions involving different objects.The 'Dancer' video has moving background and contains the slow motions of two objects.The 'football' video has a very large motion without moving background in the opposite direction.It also contains the motion of many objects moving with different velocities.The parameters used here are PSNR and Compression ratio.The video format used is "YUV".Each component i.e. 'Y', 'U', 'V' are

Table 13 .
Comparison of average PSNR, CR for different Wavelets in 'Miss America' video (84 Frames) using 'KCDS'

Table 16 .
CR values for different 'I' rates with a constant 'P' rate of 0.05 bpp; 96 frames The results available in Tables15 and 16show the variation of I rate with constant 'P' rate, for two different videos 'Miss America' (slow motion) and 'Trevor' (Fast and Random motion).Irrespective of the videos, the PSNR values show an improvement as 'I' rate increases, with the reduction of compression ratio.The Compression ratio (roughly 5 to 10) is increased in the case of multiwavelets compared to wavelets, irrespective of the videos.

3 Comparison between wavelet and multiwavelet in 'SPECK' and 'Novel scheme'
25om the results available in table25, multiwavelet performs better than wavelets in both 'SPECK' and 'Novel scheme' for all the videos in terms of PSNR, CR, and Execution time.The results available in table 26, are taken with I rate of 0.8 and 'P' rate of 0.08.The first 84 frames are considered for all the videos.In general 'SPECK' performs better in terms of average PSNR and execution time but with poor compression ratio for all the videos.Novel scheme is found to be a close competitor with better compression ratio.'SPIHT' yields high compression ratio but it is very slow.Novel scheme matches 'SPIHT' closely and it is also faster than 'SPIHT'.In overall comparison, 'Novel Scheme' performs better than and 'SPECK'.