VLSI Implementation of Medical Image Fusion using DWT-PCA Algorithms

Nowadays, the usage of DIP is more important in the medical field to identify the activities of the patients related to various diseases. Magnetic Resonance Imaging (MRI) and Computer Tomography (CT) scan images are used to perform the fusion process. In brain medical image, MRI scan is used to show the brain structural information without functional data. But, CT scan image is included the functional data with brain activity. To improve the low dose CT scan, hybrid algorithm is introduced in this paper which is implemented in FPGA. The main objective of this work is to optimize performances of the hardware. This work is implemented in FPGA. The combination of Discrete Wavelet Transform (DWT) and Principle Component Analysis (PCA) is known as hybrid algorithm. The Maximum Selection Rule (MSR) is used to select the high frequency component from DWT. These three algorithms have RTL architecture which is implemented by Verilog code. Application Specified Integrated Chips (ASIC) and Field Programmable Gate Array (FPGA) performances analyzed for the different methods. In 180 nm technology, DWT-PCA-IF architecture achieved 5.145 mm 2 area, 298.25 mW power, and 124 ms delay. From the fused medical image, mean, Standard Deviation (SD), entropy, and Mutual Information (MI) performances are evaluated for DWT-PCA method.


Introduction
In recent years, Image Fusion (IF) importance has increased rapidly. The process of combining two or more images into one image is called as IF. Through this, all kinds of information are possible to take from the different images [1]. Based on the image stage, the fusion has been classified into two types, those are transform domain and spatial domain fusion [2]. IF is used in so many applications like medical, automated industry, engineering field, military, etc. [3]. Among all those fields, medical field application is more important in IF which helps to identify the human problems [4]. In medical, two major models like MRI and CT scan helps to analyze the normal and abnormal tissue and internal structure of human body because both MRI and CT contain some different information of the human brain [5]. MRI scan is used for soft tissue which detects the skull problems as well as CT scan is used for hard tissue to identify the bone structure [6]. Earlier many techniques used in IF like pixel level based, decision level, and feature level based [7]. Many of the existing algorithm has been used for IF process such as Electrical Capacitance Tomography (ECT) algorithm [8], Non-Subsample Contour let Transform (NSCT) [9], sparse representation and decision [10], Curvelet transform [11], hybrid Entropy concept [12], hybrid Dual tree complex wavelet transform [13], and hybrid IF and image registration [14]. The main problem with these methods is information loss. To check the hardware utilization and improve the efficiency, the IF has been implemented in FPGA. The way of implementation is also different in FPGA. In FPGA, DWT [15], multi model method [16], and configurable pixel level [17] methods have been implemented for IF process. The hardware utilization of these methods is high. To overcome these problems, hybrid algorithm with the maximum selection rule is implemented in this paper. From the DWT, high frequency component signal only processes the MSR and output of this is given to the Inverse DWT. The combination of DWT and PCA is named as hybrid algorithm. The PCA output gives the IF output. These methods implemented in FPGA architecture to improve the efficiency of the IF. At last, FPGA and ASIC performances improved in proposed method compared to conventional methods. Mean, Standard Deviation (SD), Entropy, and Mutual Information (MI) performances also calculated for all the algorithms. The rest of the paper is organized as follows: Section-2 elaborates the literature survey, Section-3 describes the proposed method, Section-4 discusses the experimental results, and Conclusion is summarized in section-5.

Literature review
Mishra et al. [18] presented Modified Frei-chen based image fusion method. This method was utilized in Structural Similarity (SS), and contrast in Night Vision (NV) based two-scale decomposition. This method achieved 48%, 15%, and 100% of improvements in total edge transfer, SS, and NV. This architecture was implemented in the Xilinx tool which consumes 4% of resources. This proposed method was analyzed in synopsis tool with 90 nm CMOS technology. This algorithm provides less accuracy and less fusion efficiency.
Bavirisetti and Dhulli [19] proposed two scale image fusion using saliency detection. This method was used for Saliency extraction process, which can highlight the significant information. This works gave better results compared to multiscale fusion technique. This method failed to process the medical images perfectly.
Pemmaraju et al. [20] presented wavelet based image fusion using FPGA. This proposed method was implemented in Xilinx EDK 10.1 using Spartan 3E. This FPGA contains combinational blocks which are flexible for high speed application. This architecture contains memory, flip flops, and LUT. This proposed method was applied to multi focus image fusion. DWT does not provide stationary outputs and low frequency component has less efficiency.
Yang et al. [21] proposed multi model based image fusion based on fuzzy logic. With the help of type 2 fuzzy, NSCT was analyzed using pre-registered source image for getting low and high bands. Low frequency bands are used by local energy algorithm. The proposed fused image was taken with the help of inverse NSCT with all sub bands. The accuracy, contrast, and versatility was also evaluated. The main drawback of this method is low spatial resolution.
Bhaskar and Munde [22] proposed image fusion using Non-Subsampled Shearlet Transform (NST) in FPGA implementation. Input image was separated into individual image co-efficient using NST. Different rules were applied to fuse the high and low bands. With the help of inverse NST, the fused image was taken. This proposed method was implemented in Xilinx system generator and MATLAB. The power value was reduced in proposed method. But, the hardware utilization of this proposed method is high.
Agarwal and Bedi [23] presented hybrid image fusion for medical diagnosis. In this paper, wavelet and Curvelet transforms were used to perform the IF. The segmented blocks were fused into sub bands using Curvelet transform. The resolution of the fused image is too less which affects quality of the image.
Sanjay et al. [24] proposed IF based on DWT and type-2 fuzzy logic. In this paper, CT and MRI images were fused with the help of hybrid method. The fused low level bands and high level bands were reconstructed to perform the IDWT. This hybrid algorithm fails to use more logic function and analyses the hardware utilization.

DWT-PCA-IF architecture
Image Fusion is one of the important processes for obtaining more information from different images. The overall process of image fusion is shown in Figure 1.
• The input CT image is read into MATLAB and the pixel is converted to binary value. These binary values are stored in a text file.
• The same process is applied to MRI images also. • Both CT and MRI images binary values perform the DWT process which gives four frequency components such as Low High (LH), High Low (HL), High High (HH), and Low Low (LL).
• These frequency components perform MSR. In this operation, high frequency component only required.
• So, HH, HL, and LH frequency components performed MSR operation which gives three results.
These three results are given to the IDWT process along with low frequency component (LL).
• After performing IDWT, both results are given to the PCA component which gives the fused image.
• DWT, MSR, IDWT and PCA are implemented in Verilog and the final output is written in text file.
• With the help of MATLAB, that binary values are converted to pixel which shows the fused image.

DWT architecture
For analyzing the signal, wavelet converts the time domain to frequency domain. The DWT is implemented using two major blocks namely Filter Bank (FB) and Lifting Scheme (LS). The DWT is a decimated wavelet transform, where the size of the image reduces by half at each scale. It is easy to convert the spatial domain inputs into frequency domain in wavelet transform [25]. High pass and low pass coefficient series are obtained from the input series y 0 , y 1 , … . y n . The high pass and low pass coefficients are represented by using the following two Eqs. (1) and (2).
where, the wavelet filters are represented as s n (z) and t n (z), and length of the filter is denoted as l and j = 0, … , [n/2] À 1.
The spatial domain DWT is applied in two directions. First, 1D-DWT is applied on the horizontal axis and that results are applied to the vertical axis of 1D-DWT. There are four parts named as LL, LH, HL and HH obtained from the 2D-DWT.
The two-dimensional DWT applies to all the rows and columns of an image. If the input image is of size 2 k Â 2 k pixels at level L + 1 its size will be 2 k/2 Â 2 k/2 The various kinds of decomposition methods are used in wavelets over an image. The DWT is applied to the input image, which is decomposed into four sub image. These sub images are named as sub bands. The LL sub band is the coarse level sub image, HH, LH, and HL are the diagonal, vertical and horizontal components of the image respectively. Finally, the input image is decomposed into four major components that is shown in Figure 2. A high level 2D-DWT is developed by LL frequency and low pass components for multi resolution analysis.
Let assume input image is Y.
Here, Y is splitting into two different bands such as Y o and Y e .

Maximum selection rule
The MSR diagram is shown in Figure 5. This rule is applicable for the high frequency component. So that HH, HL, LH frequency values perform the MSR operation. Both DWT output values are connected to the MUX for choosing maximum value.
These outputs are given to the IDWT for changing the frequency domain into the time domain.

PCA architecture
The architecture of PCA is shown in Figure 6 which contains control engine, covariance matrix, MUX, multiplier, adder, and comparator. With the help of detected spike waveform, the covariance matrix is calculated. The covariance matrix is called as PC spike waveform. The MAC address is used for distilling and orthogonalization process to improve the PCA efficiency. Comparator and right shift are used to shift the procedure and level checking. The entire algorithm split into four processing units and the data is stored in register files. Finite State Machine (FSM) is used for scheduling and allocating the resources during the PCA  processing. FSM is very effective for controlling the remaining signal [26]. These outputs are helpful to perform the image fusion. The fused architecture binary output is read in MATLAB for showing fused image.

Experimental results and discussion
In this section, the experimental simulation results and discussion of the proposed methodology is detailed effectively in terms of performance measure. The performance of the proposed methodology was evaluated by ASIC and FPGA performances.

Discussion
The input images (CT and MRI) are shown in Figures 7 and 8. These images are converted to binary which are shown in Figures 9 and 10.  The ASIC performance of the different methods are tabulated in Table 1. In this table, values of ASIC performance of the Existing-I [18], existing-II [20], existing-III [22], and DWT-PCA-IF are compared.
The comparison of ASIC performances is tabulated in Table 1. Here, all the methods are implemented and the results are tabulated. All the methods are implemented in the cadence RTL compiler with 180 and 45 nm technology. From this table, it's clear that DWT-PCA-IF provides better performances when compared to previous existing architectures.

Comparative analysis
In this work, three papers have been compared with proposed method. A. Mishra, S. Mahapatra, and S. Banerjee [18], applied modified Frei-chen operator based IF for real time applications. Scalable decomposition was used to perform the fusion operation which was implemented in Virtex 4 FPGA. The overall architecture RTL was too complex to perform the IF algorithm which caused more area. Pemmaraju et al. [20], implemented IF based on DWT using FPGA. This algorithm was implemented in Xilinx EDK 10.1 FPGA Spartan 3E hardware. There is no explanation of RTL architecture, and .ucf file. Due to use of wavelet, the power consumption is too high. Bhaskar and Munde [22] performed IF based on nonsubsampled shearlet transform. Xilinx system generator was used to implement this  design with MATLAB tool. The fused image affected by more noise and it require more hardware utilization. The comparison graph of area, power, and delay are shown in Figures 11-13. The dark blue bar graph is represented as DWT-PCA-IF architecture. All the ASIC performance is reduced due to the hybrid algorithm.
The FPGA performances are tabulated in Table 2. In this table, Virtex 4 and Virtex 5 devices are used to evaluate LUT, flip flop, slices, and frequency. These values are shows that the DWT-PCA-IF architecture achieves better FPGA performance parameters.
The comparison graph of LUT, Flip flop, slices, and frequency are shown in Figures 14-17. The hardware utilizations are evaluated from this FPGA performance. The top module and 2D DWT and 1D DWT RTL schematic diagram are shown in Figures 18 and 19.     The performance evaluation for different methods is given in Table 3. Here, some of the performances are evaluated such as Mean, Standard Deviation (SD), Entropy, and Mutual information (MI). This performance evaluated for fused medical image. From this table, it is clears that DWT-PCA gives better performances than existing methods. Finally, the fused image is shown in Figure 20. The above RTL schematics are taken from the Xilinx tool.

Conclusion
The proposed architecture has been designed effectively in order to reduce the hardware utilization. In this work, DWT-PCA-IF architecture has been designed to perform the image fusion. In this work, medical images like MRI and CT have been used in the fusion process to obtain more information. The hybrid VLSI architecture  provided better fused image compared to previous works. The DWT-PCA-IF architecture was implemented using Verilog code. DWT and PCA method were used to reduce the power and area consumption. The ASIC and FPGA performance were analyzed for different architectures. In 180 nm technology, DWT-PCA-IF architecture achieved 5.145 mm 2 area, 298.25 mW power, and 124 ms delay. In Virtex 4, the proposed architecture achieved 3014 LUT, 3987 flip flop, 1968 slices, and 355.14 MHz frequency. From the fused image, 55.658 mean, 53.14 SD, 9.621 entropy, and 3.141 MI value has been evaluated. In the future, different kind of optimization algorithm will be designed to improve the ASIC and FPGA performances.