Total number of lifting steps
1. Introduction
Over the past few decades, a considerable number of studies have been conducted on two dimensional (2D) discrete wavelet transforms (DWT) for image or video signals. Ever since the JPEG 2000 has been adopted as an international standard for digital cinema applications, there has been a renewal of interest in hardware and software implementation of a lifting DWT, especially in attaining high throughput and low latency processing for high resolution video signals [1, 2].
Intermediate memory utilization has been studied introducing a line memory based implementation [3]. A lifting factorization has been proposed to reduce auxiliary buffers to increase throughput for boundary processing in the block based DWT [4]. Parallel and pipeline techniques in the folded architecture have been studied to increase hardware utilization, and to reduce the critical path latency [5, 6]. However, in the lifting DWT architecture, overall delay of its output signal is curial to the number of lifting steps inside the DWT.
In this chapter, we discuss on constructing a ‘nonseparable’ 2D lifting DWT with reduced number of lifting steps on the condition that the DWT has full compatibility with the ‘separable’ 2D DWT in JPEG 2000. One of straightforward approaches to reduce the latency of the DWT is utilization of 2D memory accessing (not a line memory). Its transfer function is factorized into nonseparable (NS) 2D transfer functions. So far, quite a few NS factorization techniques have been proposed [7, 14]. The residual correlation of the Haar transform was utilized by a NS lifting structure [7]. The Walsh Hadamard transform was composed of a NS lossless transform [8], and applied to construct a lossless discrete cosine transform (DCT) [9]. Morphological operations were applied to construct an adaptive prediction [10]. Filter coefficients were optimized to reduce the aliasing effect [11]. However, these transforms are not compatible with the DWT defined by the JPEG 2000 international standard.
In this chapter, we describe a family of NS 2D lifting DWTs compatible with DWTs defined by JPEG 2000 [12, 14]. One of them is compatible with the 5/3 DWT developed for lossless coding [12]. The other is compatible with the 9/7 DWT developed for lossy coding [13]. It is composed of single NS function structurally equivalent to [12]. For further reduction of the lifting steps, we also describe another structure composed of double NS functions [14]. The NS 2D DWT family summarized in this chapter has less lifting steps than the standard separable 2D DWT set, and therefore it contributes to reduce latency of DWT for faster coding.
This chapter is organized as follows. Standard 'separable' 2D DWT and its latency due to the total number of lifting steps are discussed, and a low latency 'nonseparable' 2D DWT is introduced for 5/3 DWT in section 2. The discussion is expanded to 9/7 DWT in section 3. In each section, it is confirmed that the total number of lifting steps is reduced by the 'nonseparable' DWT without changing relation between input and output of the 'separable' DWT. Furthermore, structures to implement 'lossless' coding are described for not only 5/3 DWT but also for 9/7 DWT. Performance of the DWTs is investigated and compared in respect of lossless coding and lossy coding in section 4. Implementation issue under finite word length of signal values is also discussed. Conclusions are summarized in section 5. References are listed in section 6.
2. The 5/3 DWT and Reduction of its Latency
JPEG 2000 defines two types of one dimensional (1D) DWTs. One is 5/3 DWT and the other is 9/7 DWT. Each of them is applied to a 2D input image signal, vertically and horizontally. This processing is referred to 'separable' 2D structure. In this section, we point out the latency problem due to the total number of lifting steps of the DWT, and introduce a 'non separable' 2D structure with reduced number of lifting steps for 5/3 DWT.
2.1. One Dimensional 5/3 DWT defined by JPEG 2000
Fig.1 illustrates a pair of forward and backward (inverse) transform of the one dimensional (1D) 5/3 DWT. Its forward transform splits the input signal
Relation between input and output of the forward transform is expressed as
where
The backward (inverse) transform synthesizes the two band signals
where
In the equations (3) and (4), down sampling and up sampling are defined as
respectively for an arbitrary signal
for 5/3 DWT defined by the JPEG 2000 international standard.
2.2. Separable 2D 5/3 DWT of JPEG 2000 and its Latency
Fig.2 illustrates extension of the 1D DWT to 2D image signal. The 1D DWT is applied vertically and horizontally. In this case, an input signal is denoted as
Down sampling and up sampling are defined as
respectively for an arbitrary 2D signal
for Fig.2, instead of (7) for Fig.1.
The structure in Fig.2 has 4 lifting steps in total. It should be noted that a lifting step must wait for a calculation result from the previous lifting step. It causes delay and it is essentially inevitable. Therefore the total number of lifting steps (= latency) should be reduced for faster coding of JPEG 2000.
The procedure described above can be expressed in matrix form. Since Fig.2 can be expressed as Fig.3, relation between input vector X and output vector Y is denoted as
Fig.4 illustrates that each of the lifting step performs interpolation from neighboring pixels. Each step must wait for calculation result of the previous step. It causes delay. Our purpose in this chapter is to reduce the total number of lifting steps so that the latency is lowered.
2.3. Non Separable 2D 5/3 DWT for Low latency JPEG 2000 Coding
In this subsection, we reduce the latency using 'non separable' structure without changing relation between X and Y in (13). Fig.5 illustrates a theorem we used in this chapter to construct a nonseparable DWT. It is expressed as
where
for arbitrary value of
with
into (13), we have
for X and Y in (14).
Finally, the nonseparable 2D 5/3 DWT is constructed as illustrated in Fig.6. It has 3 lifting steps in total. The total number of lifting steps (= latency) is reduced from 4 (100%) to 3 (75%) as summarized in table 1 (separable lossy 5/3). Signal processing of each lifting step is equivalent to the interpolation illustrated in Fig.7. In the 2nd step, two interpolations can be simultaneously performed with parallel processing. Note that the nonseparable 2D DWT requires 2D memory accessing.
2.4. Introduction of Rounding Operation for Lossless Coding
In Fig.1, the output signal
However, introducing rounding operations in each lifting step, all the DWTs mentioned above become 'lossless'. In this case, a rounding operation is inserted before addition and subtraction in Fig.1 as illustrated in Fig.8. It means
which guarantees 'lossless' reconstruction of the input value, namely
3. The 9/7 DWT and Reduction of its Latency
In the previous section, it was indicated that replacing the normal 'separable' structure by the 'nonseparable' structure reduces the total number of lifting steps. It contributes to faster processing of DWT in JPEG 2000 for both of lossy coding and lossless coding. It was also indicated that it reduces total number of rounding operations in DWT for lossless coding. All the discussions above are limited to 5/3 DWT. In this section, we expand our discussion to 9/7 DWT for not only lossy coding, but also for lossless coding.
3.1. Separable 2D 9/7 DWT of JPEG 2000 and its Latency
JPEG 2000 defines another type of DWT referred to 9/7 DWT for lossy coding. It can be expanded to lossless coding as described in subsection 3.4. Comparing to 5/3 DWT in Fig.1, 9/7 DWT has two more lifting steps and a scaling pair. Filter coefficients are also different from (7). They are given as
for 9/7 DWT of JPEG 2000. Fig.9 illustrates the separable 2D 9/7 DWT. In the figure, filters are denoted as
It should be noted that this structure has 8 lifting steps.
Fig.10 also illustrates the separable 2D 9/7 DWT for matrix representation. Similarly to (13), it is expressed as
In (29), a scaling pair K_{k} and filter a matrix K_{p,q} are defined as
3.2. Single Non Separable 2D 9/7 DWT for Low latency JPEG 2000 coding
In this subsection, we reduce the latency using 'non separable' structure without changing relation between X and Y in (27), using the theorem 1 in (16)(18) illustrated in Fig.5. Starting from Fig.10, unify the four scaling pairs {
where
Next, applying the theorem 1, we have the single nonseparable 2D DWT as illustrated in Fig.12. It is denoted as
As a result, the total number of lifting steps (= latency) is reduced from 8 (100%) to 7 (88%) as summarized in table 1 (nonseparable lossy 9/7).
3.3. Double Non Separable 9/7 DWT for Low latency JPEG 2000 Coding
In the previous subsection, a part of the separable structure is replaced by a nonseparable structure. In this subsection, we reduce one more lifting step using one more nonseparable structure. Starting from equation (31) illustrated in Fig. 11, we apply
Namely, (31) becomes
as illustrated in Fig.13. Then the theorem 1 can be applied twice as
and finally, we have the double nonseparable 2D DWT as illustrated in Fig.14. The total number of the lifting steps is reduced from 8 (100%) to 6 (75 %). This reduction rate is the same for the multi stage octave decomposition with DWTs.
3.4. Lifting Implementation of Scaling for Lossless Coding
Due to the scaling pair {
Similarly, the scaling pair in equation (32) is also factorized as
as illustrated in Fig.15. In the equation above,
Fig.16, Fig.17 and Fig.18 illustrate 2D 9/7 DWTs for lossless coding. As summarized in table 1, it is indicated that the total number of lifting steps is reduced from 16 (100%) in Fig.16 to 11 (69%) in Fig.17 and 10 (63%) in Fig.18. Furthermore, the total number of rounding operations is also reduced from 32 (100%) in Fig.16 to 16 (50%) in Fig.17 and 12 (38%) as summarized in table 2.
lossy  lossless  
5/3  9/7  5/3  9/7  
separable  4 (100%)  8 (100%)  4 (100%)  16 (100%)  
non separable 
single  3 ( 75%)  7 ( 88%)  3 ( 75%)  11 ( 69%) 
double    6 ( 75%)    10 ( 63%) 
4. Performance Evaluation
In this section, all the DWTs summarized in table 3 are compared in respect of lossless coding performance first. Lossy coding performance is evaluated next and a problem due to finite word length implementation is pointed out. This problem is avoided by compensating word length at the minimum cost.
lossless  
5/3  9/7  
separable  8 (100%)  32 (100%)  
non separable 
single  4 ( 50%)  16 ( 50%) 
double    12 ( 38%) 
lossless  
5/3  9/7  
separable  5/3 Sep (Fig.3)  9/7 Sep (Fig.16)  
non separable 
single  5/3 Ns1 (Fig.6)  9/7 Ns1 (Fig.17) 
double    9/7 Ns2 (Fig.18) 
4.1 Lossless Coding Performance
Table 4 summarizes lossless coding performance of the DWTs in table 3 at different number of stages in octave decomposition. The EBCOT is applied as an entropy coder without quantization or bit truncation. Results were evaluated in bit rate (= average code length per pixel) in [bpp]. Fig.19 illustrates the bit rate averaged over images. It indicates that '5/3 Ns1' is the best followed by '5/3 Sep'. The difference between them is only 0.01 to 0.02 [bpp]. Among 9/7 DWTs, '9/7 Ns1' is the best followed by '9/7 Sep'. The difference is 0.03 to 0.04 [bpp]. As a result of this experiment, it was found that there is no significant difference in lossless coding performance.
Image  DWT  Number of Stages  
1  2  3  4  5  6  
Couple  5/3 Sep  4.74  4.65  4.63  4.62  4.62  4.62 
5/3 Ns1  4.73  4.64  4.62  4.61  4.61  4.61  
9/7 Sep  4.91  4.83  4.81  4.80  4.80  4.80  
9/7 Ns1  4.89  4.80  4.79  4.78  4.78  4.77  
9/7 Ns2  4.93  4.84  4.82  4.81  4.81  4.81  
Boat  5/3 Sep  4.78  4.70  4.69  4.69  4.69  4.69 
5/3 Ns1  4.77  4.69  4.69  4.68  4.68  4.68  
9/7 Sep  4.87  4.80  4.80  4.79  4.79  4.79  
9/7 Ns1  4.85  4.78  4.77  4.77  4.77  4.77  
9/7 Ns2  4.87  4.80  4.80  4.79  4.79  4.79  
Lena  5/3 Sep  5.06  4.97  4.95  4.95  4.95  4.95 
5/3 Ns1  5.05  4.96  4.94  4.94  4.94  4.94  
9/7 Sep  5.19  5.09  5.07  5.07  5.07  5.07  
9/7 Ns1  5.17  5.06  5.05  5.04  5.05  5.05  
9/7 Ns2  5.18  5.07  5.06  5.05  5.06  5.06  
average  5/3 Sep  4.86  4.77  4.76  4.75  4.75  4.75 
5/3 Ns1  4.85  4.76  4.75  4.74  4.74  4.74  
9/7 Sep  4.99  4.91  4.89  4.89  4.89  4.89  
9/7 Ns1  4.97  4.88  4.87  4.86  4.87  4.86  
9/7 Ns2  4.99  4.90  4.89  4.88  4.89  4.89 
4.2. Lossy Coding Performance
Fig.20 indicates rate distortion curves of the DWTs in table 3 for an input image 'Lena'. Fivestage octave decomposition of DWT is applied. Transformed coefficients are quantized with the optimum bit allocation and EBCOT is applied as an entropy coder. In the figure, PSNR is calculated as
where
From an input image
As indicated in Fig.20, there is no difference among '9/7 Sep', '9/7 Ns1' and '9/7 Ns2'. All of them have the same ratedistortion curve. There is also no difference between '5/3 Sep' and '5/3 Ns1'. It indicates that the nonseparable DWTs in table 3 have perfect compatibility with the standard DWTs defined by JPEG 2000. Note that this is true under long enough word length. In this experiment, word length of signals
4.3. Finite Word Length Implementation
Fig.21 indicates rate distortion curves for the same image but word length of signals in the forward transform is shortened just after each of multiplications. Signal values are multiplied by 2
To cope with this problem, word length is compensated for '9/7 Ns2' at the minimum cost of word length. In case of finite word length implementation, the distortion
where
where
It means that finite word length noise
with parameters
is satisfied where {
5/3 Sep  5/3 Ns1  9/7 Sep  9/7 Ns1  9/7 Ns2  
48.78  47.23  40.13  39.11  35.31  
6.27  6.24  6.01  6.01  5.99 
Fig.22 indicates experimentally measured relations between the compatibility
Fig.23 illustrates rate distortion curves for the compensated NS DWTs. It is confirmed that the deterioration problem observed in Fig.21 is recovered to the same level of the standard separable DWTs of JPEG 2000. It means that the finite word length problem peculiar to the nonseparable 2D DWTs can be perfectly compensated by adding only 1 bit word length, in case of implementation with very short word length, i.e.
5/3 Sep  5/3 Ns1  9/7 Sep  9/7 Ns1  9/7 Ns2  

0  0.248  0  0.170  0.805 

0  0.0048  0  0.0000  0.0033 
5/3 Sep  5/3 Ns1  9/7 Sep  9/7 Ns1  9/7 Ns2  
Δ 
0.000  0.248  0.000  0.170  0.805 

0  1  0  1  1 
5. Conclusions
In this chapter, 'separable' 2D DWTs defined by JPEG 2000 and its latency due to the total number of lifting steps were discussed. To reduce the latency, a 'nonseparable' 2D DWTs were introduced for both of 5/3 DWT and 9/7 DWT. It was confirmed that the total number of lifting steps is reduced by the 'nonseparable' DWT maintaining good compatibility with the 'separable' DWT. Performance of these DWTs were evaluated in lossless coding mode, and no significant difference was observed. A problem in finite word length implementation in lossy coding mode was discussed. It was found that only one bit compensation guarantees good compatibility with the 'separable' DWTs.
In the future, execution time of the DWTs on hardware or software platform should be investigated.
References
 1.
ISO / IEC FCD 154441, Joint Photographic Experts Group  2.
Descampe F. Devaux G. Rouvroy J. D. Legat J. J. Quisquater B. Macq 2006 A Flexible Hardware JPEG 2000 Decoder for Digital Cinema"  3.
Chrysafis A.O. 2000 Linebased, Reduced Memory, Wavelet Image Compression"  4.
Jiang W. Ortega A. 2001 Lifting Factorizationbased Discrete Wavelet Transform Architecture Design", IEEE Trans.  5.
Guangming S. Weifeng L. Li Zhang 2009 An Efficient Folded Architecture for Liftingbased Discrete Wavelet Transform"  6.
BingFei W. ChungFu L. 2005 A Highperformance and Memoryefficient Pipeline Architecture for the 5/3 and 9/7 Discrete Wavelet Transform of JPEG2000 Codec"  7.
Iwahashi M. Fukuma S. Kambayashi N. 1997 Lossless Coding of Still Images with Four Channel Prediction"  8.
Komatsu K. Sezaki K. 2003 Non Separable 2D Lossless Transform based on Multiplierfree Lossless WHT"  9.
Britanak V. Yip P. Rao K. R. 2007 Discrete Cosine and Sine Transform, General properties, Fast Algorithm and Integer Approximations"  10.
Taubman D. 1999 Adaptive, Nonseparable Lifting Transforms for Image Compression"  11.
Kaaniche M. Pesquet J. C. Benyahia A. B. Popescu B. P. 2010 Twodimensional Non Separable Adaptive Lifting Scheme for Still and Stereo Image Coding"  12.
Chokchaitam S. Iwahashi M. 2002 Lossless, NearLossless and Lossy Adaptive Coding Based on the Lossless DCT"  13.
Iwahashi M. Kiya H. 2009 Non Separable 2D Factorization of Separable 2D DWT for Lossless Image Coding"  14.
Iwahashi M. Kiya H. 2010 A New Lifting Structure of Non Separable 2D DWT with Compatibility to JPEG 2000"  15.
Daubechies W.S. 1998 Factoring Wavelet Transforms into Lifting Steps"  16.
Jayant N. S. Noll P. 1984 Digital Coding of Waveforms Principles and applications to speech and video"