Hardware Implementation of a Real-Time Image Data Compression for Satellite Remote Sensing

The image data compression is very important to reduce the image data volume and data rate for the satellite remote sensing. The chapter describes how the image data compression hardware is implemented and uses the FORMOSAT-5 Remote Sensing Instrument (RSI) as an example. The FORMOSAT-5 is an optical remote sensing satellite with 2 meters Panchromatic (PAN) image resolution and 4 meters Multi-Spectrum (MS) image resolution, which is under development by the National Space Organization (NSPO) in Taiwan. The payload consists of one PAN band with 12,000 pixels and four MS bands with 6,000 pixels in the remote sensing instrument. The image data compression method complies with the Consultative Committee for Space Data Systems (CCSDS) standard CCSDS 122.0-B-1 (2005). The compression ratio is 1.5 for lossless compression, 3.75 or 7.5 for lossy compression. The Xilinx Virtex-5QV FPGA, XQR5VFX130 is used to achieve near real time compression. Parallel and concurrent handling strategies are used to achieve high-performance computing in the process.

For FDWT, the coefficients in the equation (3) and (4) are listed in Table 1. The coefficients used in the FORMOSAT-5 are a little different from those defined in the CCSDS 122.0-B-1. Just 24 bits, not 32 bits, are used for these coefficients in the FORMOSAT-5 to save FPGA multiplexer resource. FDWT

Bit plane encoder
After DWT processing, the Bit Plane Encoder handles DWT coefficient for data compression. The Bit Plane Encoder encodes a segment of images from most significant bit (MSB) to least significant bit (LST). The BPE encoding uses less bits to express image data to achieve compression ratio. In CCSDS 122.0-B-1, the maximum number of bytes in the compressed segment can be defined to limit the data volume. The quality limit can be defined to constraint the amount of DWT coefficient information to be encoded.
The BPE performs DC and AC data encryption as the flow shown in Fig.2. In DC part data encryption, AC part maximum value of each block will be computed. Then, a scheme should be used to determine how many bits for "DC_MAX_Depth" and "AC_MAX_Depth" in this segment. In addition, the DC and AC optimized encryption type and value of W/8 blocks should be determined. Finally, the DC part data and W/8 AC_MAX data will be encrypted and the bit stream is transmitted to next stage. W is the pixel size per image line, e.g. W is 12,000 for PAN image and W is 6,000 for MS image in FORMOSAT-5.
In AC part data encryption, it consists of 5 stages. Data encryption and bit-out proceed block by block in each stage. The entropy coding scheme is used by data encryption. The stage 0 is for processing DC 3rd part data. The stage 1 is for processing Parent part coefficients in each block. The stage 2 is for processing Children part coefficients in each block. The stage 3 is for processing Grand-Children part coefficients in each block. The stage 4 is just concatenated stage 1, stage2 and stage 3 left data. After adding segment header, the compressed image data are finished.

Architecture description
The image flow of the Remote Sensing Instrument in the FORMOSAT-5 is shown in Fig. 3. Behind the telescope, there is one CMOS sensor module inside the Focal Panel Assembly (FPA) to take the images. The CMOS sensor module can be accessed by two FPA electronics. The output data stream is sent to the Image Data Pre-processing (IDP) module in the RSI EU for data re-ordering. Then the resultant data are sent to the Image Data Compression (IDC) module for data compression. The compressed data with format header are stored in the Mass Memory (MM) modules under the control of the Memory Controller (MC) module. While the satellite flies above the ground station, the image files can be retrieved and transmitted to the ground station.

Design and implementation
The image data input interfaces between each functional module are shown in the Fig. 4. The serial image data from FPA are re-ordered in the IDP to make the image data output in correct pixel order. Then the image data are transferred to IDC in parallel on 12-bit data bus with lower transmission clock rate. One channel of PAN data and four channels of MS data are compressed individually in the IDC. The compressed PAN and MS data are stored individually in image files under the control of MC module.

Hardware design
The image data rate between each stage is shown in Fig. 5. The PAN sensors output are divided into 8 channels with 80Mbps rate individually to accommodate the high data rate. The channel rate for each MS band is 40Mbps. The parallel handling architecture can increase the image data handling speed.
The PAN and MS image data compression boards are shown in Fig. 6 a) & b). The architecture block diagram of the PAN channel in IDC is illustrated in Fig. 7. The MS channels are similar. The space grade Xilinx FPGA, XQR5VFX130, is used for image compression processing. The major characteristics of the XQR5VFX130 are 130,000 logic cells, 298 blocks of 36K bits RAM, 320 enhanced DSP slices,700Krad total dose, and etc. The PROM part for FPGA programming is XQR17V16, which has 16Mbits memory size with 50krad total dose capability. One XQR5VFX130 FPGA is used for PAN data compression. Two XQR5VFX130 FPGAs are used for four MS data compression. The external memories, 24 chips of 256K x 32 SRAM, are used as data buffer during compression process.

DWT process
The DWT flows at three levels are illustrated in Fig. 8a, 8b and 8c. The RAM memory banks are used for buffer storage. In the first level, the LL1, LH1, HL1 and HH1 are generated. Then, the LL1 is transmitted to level 2 DWT process to generate LL2, LH2, HL2 and HH2. The LL2 is transmitted to level 3 DWT process to generate LL3, LH3, HL3 and HH3. The LL3 contains the most information of the original image. These subbands are stored in the temporary buffers for BPE process.

BPE process
The BPE module is the actual unit to perform data compression. When DWT acknowledges that one section data is completed and saved in the buffer, BPE retrieves the wavelet domain data from buffer and uses different compression scheme for different DWT sub-section data. According to various compression ratio requirements, BPE performs data truncation or appends zero fill bits. After necessary header information is added, the compressed data is sent to mass memory word by word for storage.
The compression data format is listed in Table 2. Within a segment, BitDepthDC is defined as the bit number of the maximum value in all DC coefficients. BitDepthAC is defined as the bit number of the maximum value in all AC coefficients. The amount of quantization q' of DC coefficients is determined by the dynamic range of the AC and DC coefficients in a segment in Table 3. DC quantization factor q is defined as q= max(q', BitShift(LL3)).  Table 3. DC Coefficient Quantization q' = 16 -10 = 6. Then, DC quantization factor q is 6 and N = 16 -6 =10. So, each DC coefficient bit(15) ~ bit(6) are encoded using coding quantization method, and bit(5) ~ bit(4) will just concatenated immediately at the end of the coded quantized DC coefficients of the segment, finally bit(3) ~ bit(0) are encoded at AC stage0 phase. The detailed coding algorithm is described in CCSDS 122.0-B-1 (2005).
The AC part data have the major portion of image (63/64), so AC part data coding dominates the whole compression performance. The CCSDS adopts bit plane encoding concept, that is, the most important bits of each AC subsection part data is encoded first, then less important bits, until specified segment byte limit size is achieved or bit 0 of each data segment is encoded. Even, it is needed to append zero bits to achieve segment byte limited size.
In order to have good compression efficiency, the CCSDS standard specifies AC Parent, Children, and Grand Children data to proceed entropy symbol mapping scheme. The basic concept of entropy coding is to use smaller bit pattern to represent more frequently repeated bit pattern.
In the CCSDS standard, a "gaggle" consists of a set of 16 consecutive blocks within a segment. There are two running phases in our design to use entropy coding scheme to represent the final coding result, pre-running phase and normal running phase. The prerunning phase is designed to get 2-bits、3-bits、and 4-bits entropy value for each gaggle on each bit-plane. The normal running phase is to use entropy table to map the final coding bits string. The detailed coding algorithm is described in CCSDS 122.0-B-1 (2005). The IDC implementation block diagram is shown in Fig. 9.

FPGA design optimization
Some design skills are used to save the limited multiplier and memory resources in the FPGA chip. In the Equation (1) and (2), nine multipliers for Low Pass Filter and seven multipliers for High Pass Filter are needed. Totally 3 x 2 x (9+7) = 96 multipliers are needed for 3 layers, horizontal and vertical, low pass and high pass filter. By using the multiplexers, adders and timing sharing algorithm in our IDC design as in Fig. 10 and 11, three multipliers for Low Pass Filter and two multipliers for High Pass Filter are needed. In other words, totally 3 x 2 x (3+2) = 30 multipliers are needed for 3 layers 2 dimension FDWT architecture, i.e. 66 multipliers are reduced. The timing relation chart of DWT three layers is shown in Fig. 12. The "W" is the original source image width (pixels/line) which is 12000 for PAN and 6000 for MS in FROMOSAT-5.
The source clock is 45 MHz for PAN and 11.25 MHz for MS. In the Layer1, LH1, HL1 and HH1 data are generated every two source clocks with data size W/2 words. In the Layer2, LH2, HL2 and HH2 data are generated every four source clocks with data size W/4 words. In the Layer 3, LL3, LH3, HL3, HH3 data are generated every 8 source clocks with data size W/8 words. The data in different layers are generated interleavely to achieve high throughput for real time data processing.
The buffer size to handle the image compression is Width * Length for frame-based method. But for the strip-based method, just fixed buffer size, Width * 138, is needed. For 8 minutes FORMOSAT-5 PAN imaging data, the buffer size for frame-based will be 200,000 times of buffer size for strip-based. So, it is very important to use strip-based method to save memory size, cost, and handling time in satellite application, even for ground image handling. The total required memory can be reduced as shown in Table 4. It can save the cost and reduce the power consumption used by memory chips.  There are some benefits to use space grade FPGA chip than ASIC. The space grade FPGA has good anti-radiation capability. The line pixel number and clock rate can be reconfigured.
There are some comparisons of data compression chips in Table 5.

Image quality verification
The 12-bit test images in the CCSDS official website have been tested and similar results are gotten as in the CCSDS report. In order to consider more practical case, one North Vancouver image taken by FORMOSAT-2 satellite on 2009/12/9 is adopted. The  When the IDWT is used with compression ratio 1.5, the PSNR is very large to indicate near lossless compression, except the infrared band. When the FDWT is used with compression ratio 7.5, the PSNR may drop to 35dB which is worse than average PSNR 56.77dB using six 12-bit CCSDS test images. This is mainly because North Vancouver image shown in Fig. 13 is much more complicated than the standard CCSDS test images.
To use the satellite image as data input to real compression hardware, a set of simulated Focal Plane Assembly (FPA) is under development as illustrated in Fig. 14

Conclusion
In this chapter there has been described the implementation of CCSDS recommended image data compression. The parallel processing, time sharing and computation via pure hardware in FPGA chip can achieve high-performance computing. The image data compression module based on FPGA has been developing to provide enough compression ratios with required image quality for FORMOSAT-5 mission. The performance has been verified by standard CCSDS 122.0 test images and FORMOSAT-2 images. The technology can be used on similar image data compression application in space. The compression throughput can be promoted following the improvement on the FPGA technology. The main advantage of this technique is that it allows real time image compression by efficient hardware implementation with low power consumption. This makes it especially suitable for satellite remote sensing.

Acknowledgment
The work is supported by the National Space Organization (NSPO) in Taiwan