## 1. Introduction

A discrete wavelet transform (DWT) has been widely applied to various digital signal processing techniques. It has been designed under a certain condition such as perfect reconstruction, aliasing cancellation, regularity, vanishing moment, etc. This article introduces a new condition referred to “DC lossless”. It guarantees lossless reconstruction of a constant input signal (DC signal) instead of rounding of signal values and coefficient values inside a transform. The minimum word length of the values under the new condition is theoretically derived and experimentally verified.

Since JPEG 2000 algorithm based on the discrete wavelet transform (DWT) was adopted as an international standard for digital cinema video coding [1], high speed and low power implementation of a DWT has been becoming an issue of great importance [2,3]. In designing a DWT, its coefficient values and signal values are assumed to be real numbers. However, in implementation, they are rounded to rational numbers so that they are expressed with finite word length representation in binary digit. Therefore it is inevitable to have rounding errors inside a DWT processing unit.

In this article, we derive a condition on word length of coefficient values and that of signal values of a DWT such that the transform becomes lossless for a DC signal. Under this condition (DC lossless condition), it is theoretically guaranteed that an output signal contains no error in spite of rounding of coefficients and signals inside the DWT. We treat the irreversible 9-7 DWT adopted by the JPEG 2000 for lossy coding of image signals as an example.

In case of the 5-3 DWT in JPEG 2000 for lossless coding, benefiting from its lifting structure [4-6], lossless reconstruction of any signal is guaranteed even though signals and coefficients are rounded. On the contrary, it does not hold for the 9-7 DWT because of scaling for adjusting DC gain of a low pass filter in a forward transform [7]. However, we have pointed out that it became possible to be lossless for a DC signal under a certain condition on word length of coefficients and signals [8].

This DC lossless condition is a necessary condition for the regularity which has been analyzed by numerous researchers to improve coding performance of a transform. When the regularity is not satisfied, the DWT has some problems such as a checker board artifact which is observed in a reconstructed signal as unnecessary high frequency noise in flat or smooth region of a signal [9]. It also brings about DC leakage which decreases the coding gain of a transform [10].

The regularity has been structurally guaranteed for a two channel quadrature mirror filter bank (QMF) [9] and the DCT [10] respectively. However, since these previous methods were based on the lattice structure, these are not directly applicable to the lifting structure of the 9-7 DWT. Beside these relations to the regularity, the DC lossless condition itself is also considered to be important for white balancing of a video system in which the DC signal is used as a reference input for calibration [11].

This article aims at deriving the DC lossless condition theoretically and clarifying the minimum word length of signals and coefficients. In conventional analysis, errors due to shortening of word length of signals (signal errors) were described as 'additive' to a signal [7,12]. They were treated as independent and uniformly distributed white noise. On the other hand, errors due to rounding of coefficients (coefficient errors) were described as 'multiplicative' to a signal and evaluated with the sensitivity [13-15]. It should be noted that the signal error and the coefficient error have been treated independently. Unlike those conventional approaches, we utilize mutual effect between rounding of signals and that of coefficients. Introducing a new model which unifies the coefficient error and the signal error, we define tolerance for those errors as a parameter to simultaneously control both of word length of signals and that of coefficients.

As a result of our theoretical analysis, the minimum word length of signals and that of coefficients inside the lifting 9-7 DWT are derived under the DC lossless condition. We confirm that the minimum word length derived by our analysis is shorter than that determined by a conventional approach. We also confirm that the DWT under the condition does not have the checker board for a DC signal.

This article is organized as follows. Chapter 2 defines a rounding operation and a rounding error, describes their basic properties in algebraic approach, and derives 'addition' formula and 'multiplication' formula of the rounding (modulo) operation. Application of these formulas to scaling of a signal value is introduced in chapter 3. Chapter 4 introduces the DC lossless DWT. Its usefulness is also described. Derivation process of conditions on word length of signals and coefficients is described in chapter 5. The new condition derived from the basic properties in chapter 2 is summarized in chapter 6. Other related condition derived from a conventional approach is also summarized. Theoretical results are verified and the minimum word length of the DC lossless DWT is clarified in chapter 7. This article is concluded in chapter 8.

## 2. Rounding operation and its basic formulas

This chapter introduces basic properties of the rounding operation focusing on 'quotient', rather than 'remainder', in modulo operation. So far, 'remainder' had been attracted numerous mathematicians' attention and various basic properties were found such as the Chinese remainder theorem in the commutative algebra (commutative ring theory). On the contrary, 'quotient' plays an important role as a 'practical' value in finite word length implementation in modern computer systems. This chapter introduces an algebraic approach of expressing 'quotient' as a practical value, and 'remainder' as a rounding error, so that it can be applied to analyzing exact behavior of rounding errors in a complex calculation procedure.

### 2.1. Definition of rounding operation and rounding error

In a digital calculation system, all the values of both of signals and coefficients are calculated and stored as a binary digit with finite word length. In this article, we treat a case such that a value *x* is expressed with a fixed point binary expression as

where *b*_{p}, *p**F*,*I*-1), is a set of binary digit for a value *x*. It has *I* bit integer part including one sign bit and *F* bit fraction part. Hereinafter, *F* is referred to as word length of a value *x*. This *F* bit value *x* has a range expressed as

For example, in case of *I*=1 and *F*=2, the maximum value is *x*=0.75 for [*b*_{0} *b*_{-1} *b*_{-2}]=[0 1 1], and the minimum value is *x*=-1.00 for [*b*_{0} *b*_{-1} *b*_{-2}]=[1 0 0].

When an *F* bit signal value is multiplied with a coefficient value, in a convolution of a filtering process in DWT for example, a resulting signal value has longer word length than its original value. Therefore it is rounded to *F* bit again. So far there are various types of rounding operations [16]. In this article, we deal with the rounding operation defined by

as an example. This rounding operation generates a rounding error. We denote it as

Expanding these expressions to an *F* bit case, we can define the rounding operation and the rounding error as

for an *F* bit word length implementation case.

Fig.1 illustrates rounding operations expressed by these equations. The term *R*_{F}[*x*] is a quotient, and *Δ*_{F}[*x*] (= *x* - *R*_{F}[*x*]) is related to a remainder. The former is an actual value treated in a digital system under a finite word length implementation, and the latter is a rounding error. We are now trying to develop an algebraic expression approach to exactly trace a practical value and a rounding error in a convolution processing inside a DWT.

Fig. 1. Definition of the rounding operation and the rounding error. (a) An integer implementation case. (b) An *F* bit word length implementation case.

### 2.2. Basic properties of the rounding operation

Since a convolution includes additions and multiplications, we should know behavior of an addition of two values *x* and *y*. Resulting value is *R*_{F}[*x*+*y*] and its rounding error is *Δ*_{F}[*x*+*y*]. A multiplication result *R*_{F}[*xy*] and its error *Δ*_{F}[*xy*] should be also investigated.

First of all, let's derive basic properties of the rounding operation starting with an obvious property;

It represents that only a real number *x* can be rounded if *y* is an integer to calculate a rounded value of *x*+*y*. In this case, its rounding error becomes

It suggests that an integer *y* can be ignored when only the rounding error is considered in an analysis. There is another obvious property;

Since the range of a rounding error is

we can add two more identities;

The equations above for *F*=0 can be straightforwardly extended to an *F*≠0 case as follows.

In addition, Eq.(12) can be extended to a more general case with an integer *n* as

### 2.3. Basic formulas of the rounding operation

Utilizing the basic properties in Eqs.(11)-(14), we can derive an addition formula and a multiplication formula of a practical value (quotient) of the rounding operation as follows.

*Addition formula*

*Proof:*

*Q.E.D.*

*Multiplication formula*

*Proof:*

*Q.E.D.*

Formulas for a rounding error (remainder) can be also derived as

for real numbers *x* and *y*. These formulas have following variations;

*Addition formula*

*Multiplication formula*

Especially when two kinds of word lengths are mixed in a signal processing, the following variation of the multiplication formula is conveniently applied to analyzing behavior of signals and errors in a pair of encoder and decoder [17].

*Proof:*

*Q.E.D.*

## 3. Application of the formulas to basic signal processing

This chapter applies the formulas to some basic signal processing cases.

### 3.1. Mapping invariant condition

Fig.2 illustrates a scaling of a signal value *x* with a coefficient value *h*. As illustrated in Fig.2(a), this processing maps an input value *x* to an output value *y*^{*} with an ideal (infinite word length) coefficient value *h*. Note that *x* has *F* bit word length. Output value of the multiplication is also rounded to F bit (*y*^{*} has *F* bit word length). In implementation, as illustrated in Fig.2(b) , a coefficient value *h* is also rounded to *W* bit word length (*h'* has *W* bit word length). We aim at finding the minimum word length *W* of a coefficient *h'* such that the mapping is invariant (*y -y*^{*} =0).

In case of Fig.2(b), an input value *x* is multiplied by a rounded value *RW*[*h*] (=*h*') of a given coefficient *h*. The result *RW*[*h*]*x* is rounded to R_{F}[*RW*[*h*]*x*] (=*y*). When it is the same as *RF*[*hx*] (=*y*^{*} ), the mapping of *x* is invariant. It means that effect of rounding of *h* is nullified. This mapping invariant case is expressed as

From the basic properties, the mapping invariant condition is derived as

*Proof:*

*Q.E.D.*

The Eq.(23) also means

which gives tolerance to the rounding error of a coefficient [8]. This is the mapping invariant condition on word length *W* of a coefficient under a given word length *F* of signals. It represents exact (not approximated) behavior of rounding errors.

Unlike the condition above, a sufficient condition can be derived by substituting the upper bound of errors and signals;

to Eq.(23). It results in the condition described as

This condition is too strict and requires too long word length to guarantee the mapping invariance. In both cases of Eq.(24) and Eq.(26), the mapping invariant condition determines the minimum of word length W of a coefficient under a given word length *F* of signals.

### 3.2. Lossless condition on a scaling pair

Fig.3 illustrates a pair of two multipliers. In Fig.3(a), an input signal *x* has *F2* bit word length. It is scaled with a coefficient h_{1}, and its output value *y* is rounded to *F*_{1} bit. It is re-scaled with a coefficient *h*_{2} (=1/*h*_{1}), and its final output value *w* is rounded to *F*_{2} bit. This scheme is embedded in a forward transform and a backward transform of DWT for example. It is required to regain *w* exactly the same as *x*, under a given word length set of F_{1} and F_{2}. Note that rounding errors due to finite word length expression of coefficients *h*_{1} and *h*_{2} can be ignored as far as the mapping invariant condition is satisfied.

We apply the formulas and the properties to derive the condition on *F*_{1} and *F*_{2}. The lossless case in Fig.3(b) is described as

From the basic properties, the lossless condition on a scaling pair is derived as

*Proof:*

*Q.E.D.*

This condition determines the word length F_{1} and F_{2} of signals for an input value *x*. It represents exact condition such that total accumulated rounding error is nullified by the rounding just after the final multiplier with *h*_{2}. As a result, the original value *x* is recovered as the final output without any loss.

Unlike the exact condition above, a sufficient condition can be derived by analyzing the upper bound as follows.

As a results, when the sufficient condition given by

holds, the scaling pair becomes lossless. However this condition is too strict and requires too long word length of signals.

## 4. Application of the formulas to DWT

This chapter introduces the DC lossless DWT [8, 18]. Definition and its usefulness are also described. The algebraic approach based on the formulas is applied to derive conditions on word length of signals and coefficients. Derivation process is described in chapter 5.

### 4.1. DWT and its word length of signals and coefficients

Fig.4 illustrates the irreversible 9-7 DWT of the JPEG 2000 standard [1]. The forward transform in Fig.4(a) decomposes an input signal *x*(*n*), *n**n* | 1,2, *L*} into band signals *y*_{1}(*m*) and *y*_{2}(*m*), *m**m* | 1,2, *L*/2}. The backward transform in Fig.4(b) reconstructs the signal *w*(*n*) from the band signals. In the figure, *z*^{-1} and ↓2 indicate the delay and the down sampler respectively.

The multiplier coefficients *ci, i**i* | 1,2, *FS, FB* or *FX* [bit] by a rounding operation illustrated as a circle.

Denoting the integer part as *IS* [bit], total word length *WS* [bit] of a signal s is defined as

including 1 [bit] for the sign part. Similarly, total word length *WC* [bit] of a coefficient c is defined as

In Fig.4, fraction part of the input signal *x*(*n*) is given as *FX* [bit]. Inside the DWT, fraction part of the signals are rounded to F_{S} [bit] just after each of all the multiplications with *ci*, i*FB* [bit] and *FX* [bit] respectively. Note that we do not truncate integer part of signals and that of coefficients. We are determining F_{S} and F_{C} such that the DC lossless property is satisfied.

### 4.2. Definition of DC lossless property and its necessity

In this article, we define the DC lossless as the conjunction of the following two propositions:

for a given constant value *d* with *FX* [bit] fraction part. When the proposition in Eq.(33) holds, the DWT has no DC leakage for the DC input signal with value *d*. Similarly, when the proposition in Eq.(34) is true, the reconstructed signal *w*(*n*) contains no checker board artifact for the DC input signal. In the following chapters, we investigate the minimum fraction part of signals *FS* [bit] which guarantees the DC lossless for given *FX* and *FB* [bit]. We also investigate the minimum fraction part *ci*, i

Fig.5(a) illustrates an example of a video system. It contains an encoder and a decoder which are composed of a forward DWT and a backward DWT. In white balancing, a camera and a display are calibrated with a constant valued input signal (DC signal) [11,19]. Therefore, it is useful for this calibration if the forward DWT and its backward do not generate any error. In this case, the camera and the display can be calibrated ignoring existence of the encoder and the decoder as illustrated in Fig.5(b). Namely, the DC lossless condition provides a low complexity DWT useful for the white balancing.

In addition, the DC lossless condition is a necessary condition for the regularity which controls smoothness of basis functions and coding performance of a transform. A DWT under the regularity does not generate the checker board artifact or the DC leakage. Harada et. al. analyzed a condition for the regularity of a two channel quadrature mirror filter bank (QMF) [9]. They confirmed that a QMF under the condition has reduced checker board artifact for an input step signal. It is expanded to a multirate system under short word length expression [20]. The regularity was structurally guaranteed for a biorthogonal linear phase filter bank [21,22] and the DCT [10] respectively. However, since these previous methods are based on factorization of a transfer function including (1+z^{-1}) or (1-z^{-1}) in the lattice structure, these are not directly applicable to the lifting structure of the 9-7 DWT in Fig.4.

In this article, we derive the DC lossless condition theoretically in chapter 5, and determine the minimum word length of signals and that of coefficients under the condition in following chapters.

## 5. Derivation of condition for the DC lossless DWT

This chapter describes derivation process of the DC lossless condition.

### 5.1. New model for error analysis

Fig.6(a) illustrates a multiplier in the DWT circuit. An input value s has *FS* [bit] fraction part and multiplied by a coefficient *c*'. The coefficient is originally designed as a real number c. It is rounded to a rational number *c*' in implementation. It produces the coefficient error:

Just after the multiplication, the signal is rounded to *s*' with *FS* [bit] fraction part as

where e' is the signal error. From Eq.(35) and (36), the final output becomes

where cs is the ideal output. This conventional model, illustrated in Fig.6(b), describes the coefficient error *e*' as additive [7,12]. In addition, these errors are treated independently as mutually uncorrelated noises.

Unlike these existing approaches, as illustrated in Fig.6(c), we describe the coefficient error e'' as

From Eq.(15), we utilize the fact that *e*'' is observed as a 'particle';

where *p* is an integer. Given the tolerable maximum to an integer *p*, word length of the coefficient c can be controlled independently of other coefficients in other sections inside the DWT. Furthermore, denoting the signal error as *e*' similarly to Eq.(36), the output value is described as

where

as illustrated in Fig.6(d). In this new model, both of the coefficient error e'' and the signal error *e*' are unified to the error *e*. Utilizing Eqs.(13) and (15), its absolute value is limited to

Note that the parameter *p* to control word length of a coefficient c is included in this equation. It is equivalent to

where its proof is given in appendix.

Benefitting from this inequality, it becomes possible to consider mutual effect of the coefficient error and the signal error.

Inside the forward DWT, the error e is propagated and added up with other errors from other multipliers. When its maximum absolute value is less than

### 5.2. DC equivalent circuit

When the input signal is restricted to a DC signal, *x*(*n*) can be described as a scalar *x* independent of *n*. The delay *z*^{-1} can be treated as 1 and (1+*z*^{-1}) can be replaced by 2. Therefore, instead of the circuits in Fig.4, we can use their equivalent circuits for a DC input signal in Fig.7 to derive the condition.

In Fig.7(a) , a scalar *x* with *FX* [bit] fraction part is multiplied by the rational numbers *ci*, *i**FS* [bit]. Finally, the signals are rounded to *FB* [bit] at its output to produce two scalars [*y*_{1} *y*_{2}]. The unified errors inside the circuit are described as

where

Similarly, for the backward transform in Fig.7(b), errors are described as

where

Similarly to Eq.(42), these errors are described with the parameters p_{i} and *q*_{i} to control word length of coefficients as

for a given word length *Fs* [bit] of signals.

### 5.3. Nullification of accumulated errors

In Fig.7(a), the unified errors in Eq.(46) are propagated and accumulated inside the circuit. When the accumulated errors are nullified by the rounding at output of the forward transform, Eq.(33) is satisfied. In the figure, **Y**_{12}=[*y*_{1} *y*_{2}]^{T} is described as

where

It is described with the unified error matrices **E**_{1} and **E**_{2} as

where

and

Similarly, output values **W**_{12}=[*w*_{1} *w*_{2}]^{T} from the backward transform in Fig.7(b) are

where

and

When the DWT is DC lossless, output values of the transforms are

Using this equation, the accumulated errors are defined as

Substituting Eqs.(49), (50), (51) and using the property in Eq.(6), we have

Applying Eq.(12), it becomes clear that when the conditions;

are satisfied, the accumulated errors are nullified by the rounding operations at the final output of each of the forward transform and the backward transform.

## 6. Derived conditions on word length for the DC lossless DWT

This chapter summarizes the new condition derived from the basic properties in chapter 2, and other related condition derived from a conventional approach.

### 6.1. Critical condition on word length

Finally, we derive the condition on word length of coefficients and signals such that Eq.(54) is satisfied. Since the unified errors in **E**_{1}, **E**_{2}, **E**_{3} and **E**_{4} have the maximum in Eqs.(46) and (47) described with the parameters p_{i} and q_{i}, the DC lossless condition is also described with the parameters by substituting

for

into Eq.(54). This is the condition we derived based on the new model described in section 5.1. We investigate the fraction part *F*_{Ci} [bit] of a coefficient *ci, i**Fs* [bit] of signals.

### 6.2. Sufficient condition on word length

As an example, in case of all the parameter in Eq.(55) are given as *pi* = *qi* = *p* and *FCi* = *FC* for

where

for *FX* = *FB* = 0. As a result, the DC lossless condition on the word length is given as

where

and G_{E} is the lower bound. This means a sufficient condition for the DC lossless. Since it is too strict, the word length under this condition is redundant. Unlike this sufficient condition, our critical condition given as Eq.(54) under Eq.(55) determines the word length minimum and necessary for the DC lossless.

## 7. Simulation results

This chapter verifies theoretically derived conditions, and clarifies the minimum word length of the DC lossless DWT.

### 7.1. Word length under the sufficient condition

Utilizing the sufficient condition in section 6.2, we calculated the optimized word length under the cost function defined as *J=2*^{-1}(*FC* *+FS*). The cost *J* is minimized for three examples. Ex.1 trades the word length between *FC* and *FS*, namely *FC + FS* = constant. Ex.2 and Ex.3 are *FC = FS* and *WC = WS* respectively. Results are summarized in table 1. Table 2 summarizes word length of signals and coefficients for an 8 bit system with *Wx*=8 (*Ix*=7 and F_{x}=0). Ex.1 requires (F_{S}, F_{C})=(4, 12) [bit] for signals and coefficients respectively. Ex.2 and Ex.3 require *FS* =*FC* =11 [bit] and *WS* =*WC* =14 [bit] respectively. The condition in Eq.(58) is plotted as a solid line in Fig.8. According to the sufficient condition, it is impossible to be DC lossless for

and it is also confirmed by the figure. It guarantees the DC lossless, however the condition is too strict. Therefore the word length is redundant and there is room for further reduction.

### 7.2. Word length under the critical condition

In Fig.8, a cross "*FS*, *FC*) which satisfies the critical condition in section 6.1 for any 8 bit integer *x* with *WX*=*IX*=8 and *FX*=0. Here in after, we denote an input DC value to a video system as

where *x* is an input value to the DC equivalent circuit in Fig.7. The minimum of *FC* for each *FS* is indicated as a broken line. It is clear that the word length derived by the critical condition is shorter than that determined by the sufficient condition. For example in Fig.8(a), the fraction part *FS* (= *FC*) is reduced from 11 [bit] to 9 [bit] for Ex.2. The word length is not shortened for Ex.1 and Ex.3. In case of Fig.8(b), *FS* (= F_{C}) is reduced from 13 [bit] to 12 [bit] for Ex.2. (*FS*, *FC*) is reduced from (14, 4) to (13, 3) or (12, 4) for Ex.1. *WS* (= *WC*) is reduced from 16 [bit] to 15 [bit] for Ex.3. It is confirmed that the word length is shortened due to the analysis in this article.

### 7.3. Word length for a specific value

Fig.9(a) and Fig.9(b) illustrate the word length under the conditions for the black value "16" and the white value "235" respectively. These specific values are utilized in white balancing of an 8-bit video system [19]. For example, (F_{S}, F_{C}) is reduced from (4, 12) in Fig.8(a) to (2, 9) in Fig.9(a) for Ex.1. Table 3 summarizes the minimum word length for these specific input DC values [23]. It is observed that the word length can be reduced by limiting input DC signals to a specific value. Fig.10 indicates the minimum word length F_{C} of coefficients for an input value x at a given word length F_{S} of signals. This is an example at (*FS*, *WX*)=(3, 8). The sufficient condition gives the same word length for any of input values. Unlike this conventional statistical analysis, our analysis gives the minimum word length shorter than that determined by the sufficient condition for each of input DC values.

### 7.4. Optimum word length assignment

Since we described tolerance for the unified errors as parameters p_{i} and q_{i} in Eq.(46) and (47), it becomes possible to simultaneously control both of word length of signals and that of coefficients. Table 4 summarizes these parameters for an input value 16 and the word length of signals at *FS*=2 [bit] as an example. It indicates that [*p*_{1} *p*_{2} *p6*] in Eq.(46) are [0 0 0 0 0 1] for the word length of coefficients at *FC*=9 [bit]. In this case, all the coefficients c_{i} in the forward transform have the same length. It is worth paying attention to the fact that the parameter *p*_{1} is the same for *FC*=9, 8 and 7 [bit] for example. It means that word length of the coefficient c_{1} can be reduced from 9 to 7 [bit] without any influence to the errors. Therefore, word length [F_{C1} *FC2**FC6*] of coefficients [c_{1} *c2* *c6*] can be reduced from [9 9 9 9 9 9] to [7 9 7 4 6 4] according to the table.

Table 5 summarizes results of this optimum word length assignment for the forward transform. Comparing to table 3, it is observed that word length of coefficients is reduced from 9.00 [bit] to 6.17 [bit] on average for an input value *xin*=16. Table 6 summarizes results for the backward transform. In this case, the word length is furthermore shortened. It is observed that *c*_{6} and *c*_{4} can be omitted since *y*_{2} is equal to zero under the DC lossless. Fig.11 illustrates image signals reconstructed by the DWT which does not satisfy the DC lossless condition. It demonstrates the checker board artifact for reference. It is confirmed that total word length is furthermore shortened utilizing the tolerance parameters *pi* and *qi* introduced in this article.

## 8. Conclusions

Introducing a new model which unifies the coefficient error and the signal error, and utilizing the nullification of the accumulated errors, this article theoretically derived a condition on word length of signals and coefficients such that the 9-7 DWT of JPEG 2000 becomes lossless for a DC input signal. It was confirmed that the minimum word length derived by the newly introduced 'critical' condition was shorter than that determined by a conventionally well known 'sufficient' condition. It was also confirmed that the DWT under the condition does not have the checker board for a DC signal. Analysis in this article contributes to build a low complexity DC lossless DWT.

## 9. Appendix

Proof of Eq.(43)

according to Eqs.(13) and (15). Applying the triangle inequality to the left hand side, we have

According to Eq.(13), each terms in the right hand side are described as

Therefore (A.2) and (A.3) under (A.1) means

and finally we have Eq.(43) as

*Q.E.D.*