Xilinx VirtexE (xcv300epq240-8); D4 = DAUB4; D6 = DAUB6; Tm = latency for multiply operation; Ta = latency for addition operationFPGA1 implementation and hardware comparison.
The Discrete Wavelet Transform (DWT) has extensively been used in a wide range of applications, including numerical analysis, image and video coding, pattern recognition, medical and telemetric imaging, etc. The invention of DWT decomposition by Mallat (Mallat, 1998) shows that the DWT can be viewed as a multiresolution decomposition of signal. This means it decomposes the signal into its components in different frequency bands. The Inverse DWT does the opposite, i.e. it reconstructs the signal from its octave band components. After its inclusion in JPEG2000 compression standard (Seo & Kim, 2007), significant research has been done to optimize the DWT implementation to reduce the computational complexity. Among a wide range of wavelets, the Daubechies wavelets include members ranging from highly localized to highly smooth and can provide excellent performance in image compression (Daubechies, 1992). Among the family members, the first two – Daubechies 4-tap (DAUB4) and Daubechies 6-tap (DAUB6) – are popular choices in medical imaging applications.
While compressing medical images, the key here is to preserve as much critical information as possible in the reconstructed image so that accurate diagnosis is possible. There have been several efficient implementations of wavelet filters proposed for applications in image processing (Lee & Lim, 2006; Martina & Masera, 2007; Acharyya et al., 2009; Shi et al., 2009; Lai et al., 2009). But, the use of conventional fixed-point (FP) binary (or any other weighted) representation for implementing discrete wavelet coefficients (that are irrational in nature) introduces round-off or approximation errors at the very beginning of the process. The error is due to the lack of exact representation of the irrational numbers that form the coefficient basis. These errors tend to expand as the calculations progress through the architecture, degrading the quality of image reconstruction (Wahid et al., 2003). A lossless mapping technique, known as Algebraic Integer Quantization (AIQ), can be used to minimize the approximation error and efficiently compute the DAUB4 and DAUB6 coefficients (Wahid et al., 2004). The AIQ scheme is divided into two parts: the first stage is based on factorization and decomposition of transform matrices exploiting the symmetric structure. After the decomposition, we map the irrational transform basis coefficients using multidimensional algebraic integers that results in exact representation and simpler implementation. As a result, less error is introduced in the computation process that yields significantly better reconstruction of images while keeping critical information, making the scheme suitable for medical and telemetric imaging applications.
As a case study, we apply the scheme to several medical images, such as endoscopic, ultrasound, x-ray, CT-scan images and evaluate the performance. The chapter is organized as follows: Previous related works are presented next. Section 3 presents a brief introduction to Daubechies wavelets. In Section 4, we explain the AIQ scheme applied to Daubechies wavelets. Then the simulation and synthesized results of the case study are summarized in Section 5. Finally, we conclude the work in Section 6.
2. Past work
Lewis and Knowles proposed an architecture for Daubechies wavelets without multipliers (Lewis & Knowles, 1991). A major drawback was that it was heavily dependent on the properties of only one specific wavelet, DAUB4 tap coefficients. At the same time, Aware Inc. came out with a chip called Wavelet Transform Processor (WTP) (Aware, 1991). It essentially consists of a 4-tap filter (4 Multiply-Accumulate cells) and some external memory with control but no specific features that can take advantage of the DWT structure rather it relies heavily on the software to compute the DWT. It is also a complex design requiring extensive user control. Parhi and Nishitani proposed two architectures, folded and digit serial, for 1D DWT (Parhi & Nishitani, 1993). These architectures do not easily scale with the filter size and the number of octaves computed. The number of multipliers is higher, and hence the silicon area is large. In (Vishwanath et al., 1995), the authors proposed linear systolic array architecture. Paek and Kim in proposed recursive and semi-recursive architectures for DWT which has several drawbacks like large area (hardware cost), scheduling control overhead and incomplete data-bus utilization (Paek & Kim, 1998).
Most of the research work to reduce the hardware complexity is inclined towards multiplierless implementations by maneuvering the filter banks (Lee & Lim, 2006; Martina & Masera, 2007; Acharyya et al., 2009) or using lifting schemes (Shi et al., 2009; Lai et al., 2009; Huang et al., 2004). However, in these designs, the use of conventional FP binary representation results in erroneous computation process and degrades image reconstruction. In this chapter, we present an efficient low-cost implementation of the DAUB filters with a demonstration of performance advantages on medical images and noisy environment.
3. Daubechies wavelets
This section provides a brief introduction to Daubechies wavelets. This class of wavelets includes members ranging from highly localized to highly smooth – Daubechies-2 (DAUB2 with two coefficients) to Daubechies-20 (DAUB20 with 20 coefficients) and also provides excellent performance in image compression (Daubechies, 1992). The Daubechies wavelet coefficients are based on computing wavelet coefficients, (where, n = 0, 1, 2,..., N-1 and N is the number of coefficients) to satisfy the following conditions (Mallat, 1998):
The conservation of area under a finite length signal:
The accuracy conditions: (where m = 0, 1, 2,..., p-1 and)
The perfect reconstruction conditions: and
Then the low-pass filter is and the high-pass filter is. One of the simplest and most localized members is the DAUB6 which has six coefficients:
Where, and. For an 8x8 input data, the DAUB6 forward transform matrix (using an assumption of periodicity) is shown in Eq. (2):
The structure of the matrix uses the set of coefficients, as a smoothing filter (low-pass) and the set, as a non-smoothing filter (high-pass). The DWT is invertible and orthogonal - the inverse transform, when viewed as a matrix, is simply the transpose of the forward transform matrix. So, basically we need only 2 sets of multiply-accumulate (MAC) cells each containing 6 multipliers and 5 adders where partial products are computed separately and subsequently added. However, it can be seen that, by introducing additional control circuitry, the same multipliers can be used for both low-pass and high-pass filtering. As a result, the number of multipliers can be reduced to 6 instead of 12. Fig. 1 shows the signal flow graph of the conventional finite-precision (FP) implementation of Eq. (2), where is the input data vector. Since all the coefficients are fixed, for a fixed precision, we can in fact replace all the multipliers by adders and shifters. As a result, the total equivalent additions required to compute the 1-D DAUB6 filter is 44.
4. AIQ-based algorithm
Algebraic integer (AI) is defined by real numbers that are roots of monic polynomials with integer coefficients (Wahid et al., 2004). As an example, let denote a primitive 16th root of unity over the ring of complex numbers. Then satisfies the equation:. The ring can be regarded as consisting of polynomials in of degree 7 with integer coefficients. The elements of are added and multiplied as polynomials, except that the rule is used in the product to reduce the degree of powers to below 8.
In summary, algebraic integers of an extension of degree n can be assumed to be of the form:
Where, is called the AI basis and the coefficients are integers. The process of mapping with AI is known as Algebraic Integer Quantization (AIQ).
The AIQ technique is useful in computing discrete transforms as first explored by Cozzens and Finkelstein (Cozzens & Finkelstein, 1985). In their work, the algebraic integer number representation, in which the signal sample is represented by a set of (typically four to eight) small integers, combines with the Residue Number System (RNS) to produce processors composed of simple parallel channels. The analog samples must first be quantized into the algebraic integer representation and the final algebraic integer result converted back to an analog or digital form. In between these two conversions, the algebraic integer representation must be converted into and out of two levels of RNS parallelism.
4.1. AIQ-based Daubechies wavelets
Now we present the concept of AIQ to encode the DAUB6 coefficients. First of all, consider the polynomial of two variables:
So the corresponding coefficients, , are encoded in the form. Then all the DAUB6 coefficients are exactly encoded (scaled by) and shown below in Eq. (5):
The obvious advantages of this approach are: a) Very small dynamic range (numbers ranging from 0 to 10); b) Multiplication by a constant is very easy and efficient (only 1 addition is required in most cases; so, multiplication can be eliminated by add/shift algorithm); and c) We have 3 parallel channels through which data flows independently (since is zero for all) and also a very simple scheduling is needed. No quantization errors would be incurred. Fig. 2 shows the signal flow graph of the AIQ-based scheme that requires 42 adders.
The encoding scheme can be easily applied to DAUB4 coefficients. In that case, we need a 2nd degree polynomial of one variable:
Where,. As a result, the error-free mapping of DAUB4 coefficients can computed as given below (scaled by):
4.2. Hardware implementation
The AIQ-based architecture is coded in Verilog and prototyped onto Xilinx VirtexE FPGA to assess the performance. A precision of 8-bit is used in the multipliers to minimize the hardware and optimize the operation, which is completed in one clock cycle (CC). Table 1 presents the comparison of the synthesized results along with the fixed-point (FP) designs. It can be seen from the table that the AIQ scheme for both DAUB4 (D4) and DAUB6 (D6) costs lesser hardware resources and has lower critical path delay; in case of DAUB4, the savings is even higher.
|Scheme||Fixed-point||AIQ||Overall savings (%)|
A key advantage of the AIQ approach is error reduction. In order to better understand the sources of error induction, we present in Fig. 3 the entire process of digital computation process. Here each arrow represents one key stage in the computation. In case of FP-based approach, there are three stages of error accumulation (or induction). In stage 1, the quantization (or approximation) error is introduced due to the lack of finite representation of transform basis coefficients that are irrational numbers. The error gets larger and larger in later stages as the process continues.
However, in AIQ-based approach, there is only one stage of error induction at the end: in stage 1, the basis coefficients are mapped with algebraic integers (that is error-free); then all the required computations take place (that is also error-free) using the AI representation. Finally, the data is converted back to binary (in stage 3) where some errors may be introduced; but these errors are much less compared to FP-approach, and are introduced very late in the computation process, and hence less affect the quality of image reconstruction.
Moreover, the error introduced at the final stage in AIQ approach can be further minimized using higher precision AIQ multipliers. We have performed an error analysis that shows the error incurred for different bit-length of the AIQ multipliers (in Fig. 4). The error is computed taking a multiplier of 16-bit width as reference. The signed digit representation (for 8-bits) is shown below in Eq. (8):
5. Performance evaluation
The AIQ-based algorithm to compute the Daubechies Wavelet Transform is intended to be used in applications where the quality of image reconstruction is critical, such as biomedical imaging, telemedicine, capsule endoscopy (Wahid et al., 2008), etc. Due to the error-free nature of integer mapping, the AIQ approach results in a much better reconstruction compared to conventional binary approach. Here, we present the results of our study, where we apply the scheme to several standard benchmark and medical images, such as endoscopic, ultrasound, x-ray, CT-scan images and evaluate the performance.
The section is divided into four sub-sections. In the first section, we evaluate the performance of the AIQ scheme for standard images followed by the analysis of medical images. Next, we show the performance of the scheme in a noisy environment. Finally, the results are compared with existing works related to medical image compression. In all these cases, we have used peak-signal-to-noise-ratio (PSNR) as the visual quality assessment index which is given by Eq. (9):
Where, M and N are the image width and height namely; x and x’ are the original and reconstructed component values namely.
5.1. Performance analysis on standard images
Here, we perform 1-D forward transform on the benchmark “Goldhill” image followed by the inverse transform to get the original image back. No compression was performed, so the image quality degradation is purely due to arithmetic quantization effects. Different hardware precision (number of bit) is used where a full adder is considered as the unit for the hardware cost, and making a simple assumption of a hardware cost of n for an n-bit word (this will be a best case comparison for the fixed-point binary implementation). The results are shown in Fig. 5 and 6.
As shown in Fig. 5(a), the PSNR of the reconstructed image gets higher with the increased bit precision which is expected. In all cases, the AIQ performs much better that FP, i.e., the data recovery rate is higher, especially in lower bit rate region. In other words, for a fixed level of distortion, the number of bits required to transmit the transformed coefficients of AIQ approach would be less than those required for FP technique.
In Fig. 5(b), we present an analysis of reconstruction quality (in PSNR) with the cost of implementation. An interesting comparison is to select similar hardware cost and then compare the reconstruction performance. As an example, for same hardware cost of 150, compare the PSNR of both FP (44dB) and AIQ implementation (82dB). In this case the difference in the PSNR is 38dB in favour of AI – that is for similar cost of implementation, the AIQ scheme produces much better image reconstruction. On the other hand, for same PSNR, say 60 dB, AIQ scheme (95) requires around three times less hardware than FP implementation (270).
Same kind of superiority is seen for DAUB6 (Figure 6) too. So, not only an improvement in image reconstruction quality is obtained but also hardware cost is reduced. Fig. 7 shows the image reconstruction for a 8-bit FP vs. a 14-bit AIQ for DAUB4. The difference in PSNR is about 45dB and the level of improvement is quite noticeable.
5.2. Performance analysis on medical images
We have performed an exhaustive simulation using several medical images, such as, X-ray, CT-scan, Ultrasound (US) and endoscopic images, and the results are presented in Table 2 (showing for two cases of bit precision: 8-bits and 12-bits). In all cases, the AIQ-based scheme produces a high PSNR and outperforms the conventional approach by a far margin. Some sample original and reconstructed images are shown in Fig. 8 – 11. In most cases, the difference in PSNR is around 8dB with noticeable level of improvement.
5.3. Performance analysis on noisy images
As a final study, the AIQ algorithm is tested under a noisy environment. The Gaussian white noise and Poisson noise are added to the images of all types, and the performance is compared with the FP implementation. The results are tabulated in Table 3 (all using 8-bit precision). Like previous cases, due to less error accumulation in the computation process, the AIQ-based approach is seen to have performed better than FP approaches even in a noisy environment.
|Gaussian||D4 - FP||43.9||47.5||43.1||44.6||44.4|
|D4 - AIQ||51.4||55.5||50.0||51.6||52.4|
|D6 - FP||40.6||44.3||39.7||41.3||41.1|
|D6 - AIQ||49.9||53.2||49.3||50.6||49.7|
|Poisson||D4 - FP||44.1||47.6||43.2||44.8||44.6|
|D4 - AIQ||51.6||55.6||50.0||51.8||52.5|
|D6 - FP||40.7||44.5||39.8||41.5||41.3|
|D6 - AIQ||50.1||53.4||49.3||50.9||49.9|
5.4. Comparative analysis
In order to show the effectiveness of the AIQ approach, the algorithm is compared with some compression standards: JPEG, JPEG2000, and existing algorithms targeted to medical imaging. Since there is no benchmark for medical images, we have conducted the experiment with benchmark images like “Lena”, “Barbara” and “Goldhill”. Table 4 summarizes the comparison results (all using 8-bit precision with 0.25 bits per pixel). From the table, it is clearly observed that the error-free algorithm performs competitively compared to other existing compression schemes.
|HS-HIC (Mohammed, 2008)||35.0||26.1||30.5|
|Hybrid (Yu & Mitra, 1997)||35.0||31.5||32.9|
|JPEG (Wahid et al., 2008)||32.4||27.7||29.7|
|JPEG2000 (Seo & Kim, 2007)||34.1||28.8||30.5|
|OB-HIC (Mohammed & Abd, 2010)||35.9||32.8||33.8|
In this chapter, we have presented an efficient approach to compute Daubechies wavelet transforms that is based on encoding the basis set of forward transform coefficients using algebraic integers. The AIQ approach not only reduces the number of arithmetic operations, but also reduces the dynamic range of the computations. Because of error-free mapping in the earlier stages, less error is introduced in the system, as compared to FP implementation, that results in much better data reconstruction. The performance is validated using standard and medical images in both normal and noisy conditions. In all cases, the AIQ-based approach outperforms the conventional FP scheme by far margin. The rate of data recovery is very high while preserving critical information that makes the scheme suitable for medical and telemetric imaging applications.
The author would like to acknowledge the Natural Science and Engineering Research Council of Canada (NSERC) for its support to this research work. The author is also indebted to the Canadian Microelectronics Corporation (CMC) for providing the hardware and software infrastructure used in the development of this design.