The average PSNR (dB) values of different resolution enhancement techniques on the test video sequences.
Mobile phones are one of the most commonly used tools in our daily life and many people record videos of the various events by using the embedded cameras, and usually due to low resolution of the cameras, reviewing the videos on the high resolution screens is not very pleasant. That is one of the reasons that nowadays resolution enhancements of low resolution video sequences are at the centre of interest of many researchers. There are two main approaches in the literature for performing the resolution enhancement. The first approach is multi-frame super resolution based on the combination of image information from several similar images taken from a video sequence (M. Elad and A. Feuer, PAMI, 1999). The second approach is referred as single-frame super resolution, which uses prior training data to enforce super resolution over a single low resolution input image. In this work we are following the first approach which is multi frame resolution enhancement taken from low resolution video sequences.
Tsai and Huang are the pioneers of super resolution idea (1984). They used the frequency domain approach. Further work has been conducted by Keren et al. (1988) describing a spatial domain procedure by using a global translation and rotation model in order to perform image registration. Furthermore, Reddy and Chatterji (1996) introduced a frequency domain approach for super resolution. Later on, Cortelazzo and Lucchese (2000) presented a method for estimating planar roto-translations that operates in the frequency domain. Irani and Peleg (1991) have developed a motion estimation algorithm, which considers translations and rotations in spatial domain. Meanwhile, further researches have been conducted on developing on resolution enhancement of low resolution video sequences ( Demirel and Izadpanahi, 2008 , B. Marcel, M. Briot, and R. Murrieta, 1997, Demirel et al. [EUSIPCO] 2009, N. Nguyen and P. Milanfar, 2000, Robinson et al 2010). Vandewalle et al. (2006) considered a frequency domain technique to specifically register a set of aliased images. In their method images were differently considered by a planar motion method. Their proposed algorithm used low-frequency information which has the highest signal-to-noise ratio (SNR), and in their setup, the aliasing-free part of the images.
Wavelet transform is also being widely used in many image processing applications, especially in image and video super resolution techniques (Piao et al. 2007, Demirel et al. [IEEE Geoscience and Remote Sensing Letter] 2010, Temizel and Vlachos 2005, Demirel and Anbarjafari 2010, Anbarjafari and Demirel [ETRI], 2010 ). A one-level discrete wavelet transform (DWT) of a single frame of a video sequence produces a low frequency subband, and three high frequency subbands oriented at 0 , 45 , and 90 (Mallat,1999).
In this work, we have proposed a new video resolution enhancement technique which generates sharper super resolved video frames. The proposed technique uses DWT to decompose low resolution frames of the video sequences into four subbands, namely, low-low (LL), low-high (LH), high-low (HL), and high-high (HH). Then the three high frequency subbands (LH, HL, and HH subbands) of the respective frames have been enlarged by using bicubic interpolation. In parallel, the input low resolution frames have been super resolved by using Irani and Peleg technique separately (Irani and Peleg, 1991). Illumination inconsistence can be attributed to uncontrolled environments. Because Irani and Peleg registration technique is used, it is an advantage that the frames used in the registration process have the same illumination. In this paper, we have also proposed a new illumination compensation method by using singular value decomposition (SVD). The illumination compensation technique is performed on the frames before the implementation of Irani and Peleg resolution enhancement technique. Finally, the interpolated high frequency subbands, obtained from DWT of the corresponding frames, and their respective super resolved input frames have been combined by using inverse DWT (IDWT) to reconstruct a high resolution output video sequence. The proposed technique has been compared with several state-of-art image resolution enhancement techniques. The following registration techniques are used for comparison purposes:
The reconstruction techniques used in this work for comparison are:
Iterated Back Projection (1991)
Robust super resolution technique (Zomet et al., 2001)
Structure Adaptive Normalized Convolution (Pham et al., 2006)
The experimental results are showing that the proposed method overcomes the aforementioned resolution enhancement techniques. Also as it will be shown in the experimental section, the proposed illumination compensation improves the quality of the super resolved sequence (the PSNR) by 2.26 dB for Akiyo video sequence.
2. State-of-art super resolution methods
In this section a brief introduction of four super resolution methods, which have been used to compare the performance of the proposed super resolution technique, are reviewed.
2.1. L. Lucchese et al. super resolution method
Lucchese et al. super resolution method operates in the frequency domain. The estimation of relative motion parameters between the reference image and each of the other input images is based on the following property: The amplitude of the Fourier transform of an image and the mirrored version of the amplitude of the Fourier transform of a rotated image have a pair of orthogonal zero-crossing lines. The angle that these lines make with the axes is identical to half the rotation angle between the two images. Thus the rotation angle will be computed by finding these two zero crossings lines. This algorithm uses a three-stage coarsest to finest procedure for rotation angle estimation with a wide range of degree accuracy. The shift is estimated afterwards using a standard phase correlation method.
2.2. Reddy et al. super resolution method
In this method a registration algorithm that uses the Fourier domain approach to align images which are translated and rotated with respect to one another, was proposed. Using a log-polar transform of the magnitude of the frequency spectra, image rotation and scale can be converted into horizontal and vertical shifts. These can therefore also be estimated using a phase correlation method. Their method utilizes reparability of rotational and translational components property of the Fourier transform. According to this property, the translation only affects the phase information, whereas the rotation affects both phase and amplitude of the Fourier transform. One of the properties of the 2D Fourier Transform is that if we rotate the image, the spectrum will rotate in the same direction. Therefore, the rotational component can first be estimated. Then, after compensating for rotation, and by using phase correlation techniques, the translational component can be estimated easily.
2.3. Irani et al. super resolution method
Irani et al. have developed a motion estimation algorithm. This algorithm considers translations and rotations in spatial domain. The motion parameters which are unknown can be computed from the set of approximation that can be derived from the following equation (1), where the horizontal shift a, vertical shift b, and rotation angle θ between two images g1 and g2 can be expressed as:
Finally, after determining and applying the results, the error measure between images g1 and g2 is approximated by (1) where this summation is counted over overlapping areas of both images.
For reducing E to its minimal value and obtaining more accurate result, the linear system in (5) is applied. By solving the following matrix, the horizontal shift a, vertical shift b, and rotation angle θ will be computed as follows.
Fig. 1 (a-d) shows the four low resolution consecutive frames, where (e), (f) and (g) shows super resolved high resolution images by using Cortelazzo et al., Reddy et al., and Irani et. al methods respectively.
2.4. Motion-based localized super resolution technique by using discrete wavelet transform
The main loss of an image or a video frame after being super resolved is on its high frequency components (i.e. edges), which is due to the smoothing caused by super resolution techniques. Hence, in order to increase the quality of the super resolved image, preserving the edges is essential. Hence, DWT has been employed in order to preserve the high frequency components of the image by decomposing a frame into different subband images, namely Low-Low (LL), Low-High (LH), High-Low (HL), and High-High (HH).
LH, HL, and HH subband images contain the high frequency components of the input frame. The DWT process for each frame of the input video generates 4 video sequences in each subband (i.e. LL, LH, HL and HH video sequences). Then, the Irani et al. super resolution method in (1991) is applied to all subband video sequences separately. This process results in 4 super resolved subband video sequences. Finally, IDWT is used to combine the super resolved subbands to produce the high resolution video sequence.
By super resolving the LL, LH, HL and HH video sequences and then by applying IDWT, the output video sequence would contain sharper edges than the super resolved video sequence obtained by any of the aforementioned super resolution techniques directly. This is due to the fact that, the super resolution of isolated high frequency components in HH, HL and LH will preserve more high frequency components after the super resolution of the respective subbands separately than super resolving the low resolution image directly.
In this technique, the moving regions are extracted to be super resolved with the proposed super resolution technique explained above. The static regions are similarly transformed into wavelet domain and each static subband sequence is interpolated by bicubic interpolation. The high resolution sequence of the static region is generated by composing the interpolated frames using the IDWT. Eventually, the super resolved sequence is achieved by combining the super resolved moving sequence and the interpolated static region sequence.
The method can be summarized with the following steps:
1. Acquire frames from video and extract motion region(s) using frame subtraction.
2. Determine the significant local motion region(s) by applying connected component labeling.
3. Apply DWT to decompose the static background region into different subbands.
4. Apply bicubic interpolation for enhancing resolution of each subband obtained from step 3.
5. Use IDWT to reconstruct the super resolved static background.
6. Apply DWT to decompose the moving foreground region(s) into different subbands.
7. Super resolve the extracted subbands by applying Irani et al. super resolution method.
8. Use IDWT to reconstruct the super resolved moving region(s).
9. Combine the sequences obtained from steps (5) and (8) to generate the final super resolved vide sequence.
In the first step, four consecutive frames are used where each frame is subtracted from the reference frame so the differences between frames are extracted. After applying OR operation for all subtracted images local motion(s) will appear.
In the second step, the area of local motion(s) can be determined by using connected component labeling. In the third, fourth, and fifth steps the rest of the frames which does not include any motion and it is static, will be decomposed by DWT, interpolated with the help of bicubic interpolation, and composed by IDWT.
Fig. 2 shows four consecutive frames taken from a video sequence. The second frame is used as the reference frame. The rectangular part shown in each frame corresponds to the moving part. The rest of the reference frame is the static part. In every four frame the rectangular moving part will change according to the moving part in those frames.
In the sixth, seventh and eighth steps, motion parts will be decomposed into different subbands by DWT, super resolved by using Irani et al. super resolution technique, and all subbands will be composed by IDWT.
In the final step, we combine super resolved motion frames with the interpolated background to achieve the final high resolution video sequence. The algorithm is shown in Fig. 3.
3. Proposed Resolution Enhancement Technique
As mentioned in the introduction, there are many super resolution techniques for enhancing the resolution of the video sequences. The main loss of a video frame after being super resolved is on its high frequency components (i.e. edges), which is due to the smoothing caused within the super resolution processes. Also in many video resolution enhancement techniques due to slight changes in illumination of the successive frames, registration will be done poorly which causes drop in the quality of the super resolved sequence. Therefore, in order to increase the quality of the super resolved video sequence, preserving the edges (high frequencies) of each frame and correcting the slight illumination differences can increase the quality of the super resolved sequence.
In the present work, the Irani and Peleg registration technique is used for registration in which at each stage four successive frames are used. The frames can be named as f0, f1, f2, and f3 in which f1 is the reference frame (subject to resolution enhancement). The illumination compensation is applied in order to reduce the illumination difference between f0, f2, and f3 and f1 for better registration. The illumination compensation is obtained by applying illumination enhancement using singular value decomposition (SVD) (Demirel et al. [ISCIS], 2008) iteratively. The number of iteration depends on the threshold, τ, value which is equalled to the difference between the mean of the reference frame and the mean of the corresponding frame and is chosen according to the application. In this paper, the threshold value has been heuristically chosen to be 0.2. The aim of illumination correction technique is to enhance the illumination of frames f0, f2, and f3 in order to have the same illumination as the reference frame. For this purpose each frame has been decomposed into three matrices by using SVD:
in which U and V are two orthogonal square matrices known as hanger and aligner respectively, and Σ is a matrix containing the sorted eigenvalues of f on its main diagonal. As it is reported in (Demirel et al. [IEEE Geoscience and Remote Sensing Letter] 2010, Demirel et al. [ISCIS], 2008 ), Σ contains the intensity information of the given frame. The first singular value, σ1, is usually much bigger than the other singular values. That is why manipulating the σ1 will affect the illumination of the image significantly. Hence our aim will be correcting the illumination of the frames in the way that the biggest singular value of the enhanced frame is close enough to the highest singular value of the reference frame. For this purpose a correction coefficient is calculated by using:
Then the enhanced frame is constructed by using:
Because after obtaining the enhanced frame, it will be converted into 8-bit representation (quantization will take the place), therefore highest singular value obtained from the repetition of equation (8) will slightly differ from the highest singular value in the right hand side of equation (10). The algorithm of the illumination enhancement technique is shown in Fig. 4.
Fig. 5 is showing the convergence of pixel average of two of the frames towards the reference frame for Akiyo video sequence in progressive iterations.
In this work, discrete wavelet transform (DWT) (Mallat, 1999) has been applied in order to preserve the high frequency components of each frame. The one level DWT process for each frame of the input video generates four video sequences (i.e. LL subband sequences and three high frequency subband sequences with 0 , 45 , and +90 orientations known as LH, HL, and HH subbands). In parallel to DWT process, the Irani and Peleg super resolution technique is applied to video sequences in spatial domain. This process results in super resolved frame which can be regarded as a LL subband of a higher (target) resolution frame. The LH, HL, and HH subbands of the higher (target) resolution frames are generated by interpolation of the previously extracted LH, HL, and HH subbands from the input reference frames. Finally, Inverse DWT (IDWT) is used to reconstruct the super resolved subbands to produce the resolution enhanced frame, resulting in a high resolution video sequence.
By super resolving the different subbands of video sequences and then by applying IDWT, the output video sequence contains sharper edges. This is due to the fact that, the proposed super resolution technique isolates high frequency components and preserves more high frequency components after the super resolution of the respective subbands separately than other super resolution technique.
The proposed method can be summarized with the following steps:
10. Acquire frames from a video.
11. Apply the proposed illumination compensation technique before registration.
12. Apply DWT to decompose the low resolution input video sequence into different subband sequences.
13. Super resolve the original corresponding frame by applying Irani and Peleg super resolution technique.
14. Apply bicubic interpolation to the extracted high frequency subbands (LH, HL, and HH).
15. Apply IDWT to the output of step 4 and three outputs of step 5 in order to reconstruct the high resolution super resolved sequence.
In the fourth step, four illumination compensated consecutive frames are used for registration in implementation of Irani and Peleg super resolution technique. Fig. 6 illustrates the block diagram of the proposed video resolution enhancement technique.
A possible application of the proposed resolution enhancement technique is that if someone is holding his/her digital camera while taking a series of four shoots of a scene within a short period of time. The small translation of the person’s hands during capturing the snapshots which may cause some illumination changes is sufficient to reconstruct the high resolution image.
In all steps of the proposed technique db.9/7 wavelet function and bicubic interpolation are used. In the next section, the result of comparison between the proposed technique with the conventional and state-of-art techniques mentioned in the introduction is reported. The quantitative results are showing the superiority of the proposed method over the other techniques.
4. Results and Discussions
Super resolution method proposed in this paper is compared with the state-of-art super resolution techniques using Vandewalle (2006), Marcel (1997), Lucchese (2000), and Keren (1988) registration followed by interpolation, iterated back projection, robust super resolution, and structure adaptive normalized convolution techniques for reconstruction. The proposed method has been tested on four well known video sequences (Xiph.org Test Media, 2010), namely Mother daughter, Akiyo, Foreman, and Container. Table 1 is showing the PSNR value of the aforementioned super resolution techniques for the above video sequences.
|RESOLUTION ENHANCEMENT TECHNIQUE||PSNR (dB) VALUE FOR DIFFERENT SEQUENCES|
|REGISTRA- TION||RECONSTRUCTION||MOTHER DAUGHTER||AKIYO||FOREMAN||CONTAINER|
|Iterated Back Projection||27.1||31.49||30.17||24.3|
|Structure Adaptive Normalized Convolution||28.95||32.98||33.46||26.38|
|Iterated Back Projection||27.12||31.52||29.84||25.2|
|Structure Adaptive Normalized Convolution||28.66||33.16||33.25||26.28|
|Iterated Back Projection||27.06||31.52||29.88||25.28|
|Structure Adaptive Normalized Convolution||29.01||32.8||33.3||26.36|
|Iterated Back Projection||27.17||31.53||29.87||25.31|
|Structure Adaptive Normalized Convolution||28.63||32.97||33.25||26.15|
|Proposed resolution enhancement technique without illumination compensation||31.53||34.07||35.87||28.94|
|Proposed resolution enhancement technique with illumination compensation||32.17||35.24||36.52||30.07|
The low resolution video sequences are generated by downsampling and lowpass filtering each frame of the high resolution video sequence (Temizel, 2007). In this way we keep the original high resolution video sequences for comparison purposes as a ground truth. All video sequences have 300 frames and the reported average PSNR values in Table 1 are the average of 300 PSNR values. The low resolution video sequences have the size of 128x128 and the super resolved sequences have the size of 256x256.
Fig. 7 is demonstrating the visual result of the proposed method for proposed method compared with other state-of-art techniques for ‘mother-daughter’ video sequences. As it is observable from Fig. 4, the proposed method is results in a sharper image compared with the other conventional and state-of-art video super resolution techniques.
This paper proposes a new video resolution enhancement technique by applying an illumination compensation technique based on SVD before registration process and using DWT in order to preserve the high frequency components of the frames. The output of the Irani and Peleg technique is used as LL subband in which LH, HL, and HH subbands are obtained by using bicubic interpolation of the former high frequency subbands. Afterwards all these subbands have been combined using IDWT to generate respective super resolved frame. The proposed technique has been tested on various well known video sequences, where the quantitative results show the superiority of proposed technique over the conventional and state-of-art video super resolution techniques.
Authors would like to thank Prof. Dr. Ivan Selesnick from Polytechnic University for providing the DWT codes in MATLAB.