The comparison of computation complexity of conventional independent the DCTII, DSTII, DFT, and hybrid DCTII/DSTII/DFT
1. Introduction
The last decade based on orthogonal transform has been seen a quiet revolution in digital video technology as in Moving Picture Experts Group (MPEG)4, H.264, and high efficiency video coding (HEVC) [1–7]. The discrete cosine transform (DCT)II is popular compression structures for MPEG4, H.264, and HEVC, and is accepted as the best suboptimal transformation since its performance is very close to that of the statistically optimal KarhunenLoeve transform (KLT) [15].
The discrete signal processing based on the discrete Fourier transform (DFT) is popular in wide range of applications depending on specific targets: orthogonal frequency division multiplexing (OFDM) wireless mobile communication systems in 3GPPLTE [3], mobile worldwide interoperability for microwave access (WiMAX), international mobile telecommunicationsadvanced (IMTAdvanced), broadcasting related applications such as digital audio broadcasting (DAB), digital video broadcasting (DVB), digital multimedia broadcasting (DMB)) based on DFT. Furthermore, the Haarbased wavelet transform (HWT) is also very useful in the joint photographic experts group committee in 2000 (JPEG2000) standard [2], [8]. Thus, different applications require different types of unitary matrices and their decompositions. From this reason, in this book chapter we will propose a unified hybrid algorithm which can be used in the mentioned several applications in different purposes.
Compared with the conventional individual matrix decompositions, our main contributions are summarized as follows:
We propose the diagonal sparse matrix factorization for a unified hybrid algorithm based on the properties of the Jacket matrix [9], [10] and the recursive decomposition of the sparse matrix. It has been shown that this matrix decomposition is useful in developing the fast algorithms [11]. Individual DCTII [1–3], [6], [7], [12], DSTII [4], [6], [7], [13], DFT [3], [5], [14], and HWT [8] matrices can be decomposed to one orthogonal character matrix and a corresponding special sparse matrix. The inverse of the sparse matrix can be easily obtained from the property of the block (element)wise inverse Jacket matrix. However, there have been no previous works in the development of the common matrix decomposition supporting these transforms.
We propose a new unified hybrid algorithm which can be used in the multimedia applications, wireless communication systems, and broadcasting systems at almost the same computational complexity as those of the conventional unitary matrix decompositions as summarized in Table 1 and 2. Compared with the existing unitary matrix decompositions, the proposed hybrid algorithm can be even used to the heterogeneous systems with hybrid multimedia terminals being serviced with different applications. The block (element)wise diagonal decompositions of DCTII, DSTII, DFT and DWT have a similar pattern as CooleyTukey’s regular butterfly structures. Moreover, this unified hybrid algorithm can be also applied to the wireless communication terminals requiring a multiuser multiple inputmultiple output (MIMO) SVD block diagonalization systems [15], [11,19], [22] and diagonal channels interference alignment management in macro/femto cell coexisting networks [16]. In [1516, 19, 22 23], a blockdiagonalized matrix can be applied to wireless communications MIMO downlink channel.
In Section 2, we present recursive factorization algorithms of DCTII, DSTII, and DFT matrix for fast computation. In Section 3, hybrid architecture is proposed for fast computations of DCTII, DSTII, and DFT matrices. Also numerical simulations follow. The conclusion is given in Section 4.
2. Jacket matrix based recursive decompositions of Fourier matrix
2.1. Recursive decomposition of DCTII
Definition 1
That is, the inverse of the Jacket matrix can be determined by its elementwise inverse [911]. The row permutation matrix,
where
The block column permutation matrix,
where
then
where the lowest order Hadamard matrix is defined by
Note that since the BIJM requires a matrix transposition and then normalization by its size, a class of transforms can be easily inverted as follows:
Due to a simple operation of the BIJM, we can reduce the complexity order as the matrix size increases. In the following, we shall use this property of the BIJM in developing a hybrid diagonal blockwise transform.
According to [14] and [7], the DCTII matrix is defined as follows:
where
Here, the matrix
where
Since
where a lower triangular matrix
and a diagonal matrix
Using (10), we first rewrite (7) as
which can be evaluated recursively as follows:
Note that in (13) a
In [17], the author proposed a recursive decimationinfrequency algorithm, where the same decomposition specified in (10) was used. However, due to using a different permutation matrix, a different recursive form was obtained. Different recursive decomposition was proposed in [18]. Four different matrices, such as the first matrix, the last matrix, the odd numbered matrix, and the even number matrix, were proposed. Compared to the decomposition in [18], the proposed decomposition is seen to be more systematic and requires less numbers of additions and multiplications. We show a complexity comparison among the proposed decomposition and other methods in Table 12.



DCTII 

















DCTII  4  8  6  8  6 
8  26  16  24  16  
16  74  44  64  40  
32  194  116  160  96  
64  482  292  384  224  
128  1154  708  896  512  
256  2690  1668  2048  1152  
DSTII  4  9  5  8  6 
8  29  13  24  16  
16  83  35  64  40  
32  219  91  160  96  
64  547  227  384  224  
128  1315  547  896  512  
256  3075  1283  2048  1152  
DFT  4  8  4  8  4 
8  24  12  24  12  
16  64  32  64  32  
32  160  80  160  80  
64  384  192  384  192  
128  896  448  896  448  
256  2048  1024  2048  1024 
Applying (13), we can readily compute
The corresponding butterfly data flow diagram of
2.2. Recursive decomposition of the DSTII
The DSTII matrix [14] and [7] can be expressed as follows:
Similar to the procedure we have used in the DCTII matrix, we first define the permuted DSTII matrix,
From (16), we can have a recursive form for
where the submatrix
where
whereas the matrix
Further applying (17) to the Kronecker product
Note that if we compare (21) and (13), a similarity can be found in the proposed matrix decompositions. That is, starting from the common lowest order
Now utilizing the properties of the BIJM, we can first obtain
such that the inverse of the matrix
Note that applying again the properties of the BIJM and (18), a recursive form of the inverse DSTII can be easily obtained.
2.3. Recursive decomposition of DFT
The DFT is a Fourier representation of a given sequence
where
for
where
where
Similar to the development for DCTII and DSTII, we first rewrite (26) using (27) as
It is clear that the form of (29) is the same as that of (13), where we only need to change
3. Proposed hybrid architecture for fast computations of DCTII, DSTII, and DFT matrices
We have derived recursive formulas for DCTII, DSTII, and DFT. The derived results show that DCTII, DSTII, and DFT matrices can be unified by using a similar sparse matrix decomposition algorithm, which is based on the blockwise Jacket matrix and diagonal recursive architecture with different characters. The conventional method is only converted from DFT to DCTII, DSTII. But our proposed method can be universally switching from DCTII to DSTII, and DFT vice versa. Figs. 13 exhibit the similar recursive flow diagrams and let us motivate to develop universal hybrid architecture via switching mode selection. Moreover, the butterfly data flow graphs have
3.1. From DCTII to DSTII
The Npoint DCTII of x is given by
The Npoint DSTII of
Let
Using the following trigonometric identity
(32) becomes
where
where,
Then, the DSTII matrix is resulted from the DCTII matrix. Note that compatibility property exists in the DCTII and DSTII.
3.2. From DFT to DCTII
The (m,n) elements of the DCTII kernel matrix is expressed by
A new sequence
For the sequence
where
With the result above we have avoided computing a DFT of double size. We have
Now, the result can be put in the more compact matrixvector form
Then, the DCTII matrix is resulted from the DFT matrix.
3.3. From DCTII and DSTII to DFT
We develop a relation between the circular convolution operation in the discrete cosine and sine transform domains. We need to measure half of the total coefficients. The main advantage of a proposed new relation is that the input sequences to be convolved need not be symmetrical or asymmetrical. Thus, the transform coefficients can be either symmetric or asymmetric [21].
From (30) and (31), it changes to coefficient for circular convolution (C) format. Thus, we have the following equations:
We can rewrite the DFT (24)
Multiplying (42) by
Comparing the first term of (41) with first one of (43), it can be seen that
3.4. Unified hybrid fast algorithm
Based on the above conversions from the proposed decomposition of DCTII, we can form a hybrid fast algorithm that can cover DCTII, DSTII, and DFT. The general block diagram of the proposed hybrid fast algorithm is shown in Fig. 4. The common recursive block of
3.5. Numerical simulations
As shown in [7] the coding performance DST outperforms DCT at high correlation values
Now consider an
Depending on the availability of boundary values (in top boundary and leftboundary) in images the hybrid coding scheme accomplishes the 2D transform of a block pixels as two sequential 1D transforms separately performed on rows and columns. Therefore the choice of 1D transform for each direction is dependent on the corresponding prediction boundary condition.
Vertical transform (for each column vector): employ DST if top boundary is used for prediction; otherwise use DCT.
Horizontal transform (for each row vector): employ DST if left boundary is used for prediction; otherwise use DCT.
What we observed from numerical experiments is that the combined scheme over DCTII only performs better in perceptual clarity as well as PSNR. Jointly optimized spatial prediction and block transform (see Fig. 5 (e) and (f)) using DCT/DSTII compression(PSNR 35.12dB) outperforms only DCTII compression(PSNR 32.38dB). Less blocky artifacts are revealed compared to that of DCTII. Without
4. Conclusion
In this book chapter, we have derived a unified fast hybrid recursive Fourier transform based on Jacket matrix. The proposed analysis have shown that DCTII, DSTII, and DFT can be unified by using the diagonal sparse matrix based on the Jacket matrix and recursive structure with some characters changed from DCTII to DSTII, and DFT. The proposed algorithm also uses the matrix product of recursively lower order diagonal sparse matrix and Hadamard matrix. The resulting signal flow graphs of DCTII, DSTII, and DFT have a regular systematic butterfly structure. Therefore, the complexity of the proposed unified hybrid algorithm has been much less as its matrix size gets larger. This butterfly structure has grown by a recursive nature of the fast hybrid Jacket Hadamard matrix. Based on a systematic butterfly structure, a unified switching system can be devised. We have also applied the circulant channel matrix in our proposed method. Thus, the proposed hybrid scheme can be effectively applied to the heterogeneous transform systems having various matrix dimensions. Jointly optimized DCT and DSTII compression scheme have revealed a better performance (about 3dB) over the DCT or DST only compression method.
Appendix
Appendix A
A Proof of Proposition 1
We use mathematical induction to prove Proposition 1. The lowest order BIJM is defined as
where
equation (4) holds for 2
Appendix B
A Proof of Lemma 1
According to the definition of an
where
Using (47),
which proves (10) in Lemma 1.
Appendix C
By using the sum and difference formulas for the sine function, we can have
where
By taking (49) and into the right hand side of (18), we have
The left hand side of (18) matrix
We can obtain (50) and (51) are the same and the expression of (18) is correct.
References
 1.
Rao, KR. and Yip, P., Discrete Cosine Transform: Algorithms, Advantages, Applications . Boston, MA: Academic Press, 1990.  2.
Richardson, IE., The H.264 Advanced Video Compression Standard , 2nd ed. Hoboken, New Jersey: John Wiley and Sons.  3.
Rao, KR., Kim, DN., and Hwang, J. J., Fast Fourier Transform: Algorithm and Applications . New York, N.Y.: Springer, 2010.  4.
Jain, A. K., Fundamentals of Digital Image Processing . Prentice Hall, 1987.  5.
Wang, R., Introduction to Orthogonal Transforms: With Applications in Data Processing and Analysis . Cambridge, UK: Cambridge University Press, 2012.  6.
ITUT SG16 WP3/JCTVC, CE 7.5, “Performance analysis of adaptive DCT/DST selection,” July 2011.  7.
Hai, J., Saxena, A., Melkote, V., and Rose, K., “Jointly optimized spatial prediction and block transform for video and image coding,” IEEE Trans. Image Process. , vol. 21, no. 4, pp. 1874–1884, April 2012.  8.
Strang, G. and Nguyen, T., Wavelets and Filer Banks . Wellesley, MA: WellesleyCambridge Press, 1996.  9.
Lee, MH., “A new reverse Jacket transform and its fast algorithm,” IEEE Trans. Circuits Syst. II , vol. 47, no. 1, pp. 39–47, Jan. 2000.  10.
Chen, Z.,. Lee, MH, and Zeng, G., “Fast cocyclic Jacket transform,” IEEE Trans. Signal Process. , vol. 56, pp. 2143–2148, May 2008.  11.
Lee, MH., Jacket MatricesConstruction and Its Application for Fast Cooperative Wireless Signal Processin g. LAP LAMBERT Academic publishing, Germany, November, 2012.  12.
Wang, CL. and Chen, CY., “Highthroughput VLSI architectures for the 1D and 2D discrete cosine transform,” IEEE Trans. Circuits Syst. Video Technol ., vol. 5, pp. 31–40, Feb. 1995.  13.
Wang, Z., “Fast Algorithm for the Discrete W Transform and for the Discrete Fourier Transform,” IEEE Trans. on Acoustics, Speech and Signal Process ., vol. 32, No. 4, pp. 803 – 816, Aug. 1984.  14.
Lee, MH., “High speed multidimensional systolic arrays for discrete Fourier transform,” IEEE Trans. Circuits Syst. II, vol. 39, no. 12, pp. 876–879, Dec. 1992.  15.
Kim, KJ., Fan, Y.,Iltis, R. A., Poor, H. V., and Lee, M. H., “A reduced feedback precoder for MIMOOFDM cooperative diversity system,” IEEE Trans. Veh. Technol. , vol. 61, pp. 584–596, Feb. 2012.  16.
Jang, U.,Cho, K.,Ryu, W., and Lee, HJ., “Interference management with block diagonalization for macro/femto coexisting networks,” ETRI Journal , vol. 34, pp. 297–307, June 2012.  17.
Hou, HS., “A fast recursive algorithm for computing the discrete cosine transform,” IEEE Trans. Acoust., Speech, Signal Process ., vol. 35, no. 10, pp. 1455–1461, Oct. 1987.  18.
Chen, WH., Smith, CH., and Fralick, SC., “A fast computational algorithm for the discrete cosine transform,” IEEE Trans. Commun ., vol. 25, no. 9, pp. 1004–1009, Sep. 1977.  19.
Spencer, Q. H., Lee, A. Swindlehurst, M. Haardt, “Zeroforcing methods for downlink spatial multiplexing in multiuser MIMO channels”, IEEE Trans. Signal Process ., vol. 52,no. 2, Feb. 2004.  20.
Andrews, HC., Caspari, KL., “A generalized technique for spectral analysis,” IEEE Trans. Computers , vol. 19, no. 1, pp.1617, 1970.  21.
Reju, VG.,Koh, SN.,Soon, IY., “Convolution using discrete sine and cosine transforms”, IEEE Signal Processing Letters, vol. 14, no. 7, July 2007.  22.
Lee, MH., Khan, MHA., Sarker, MA. L., Guo, Y. and Kim, KJ., “A MIMO LTE precoding based on fast diagonal weighted Jacket matrices”, Fiber and Integrated Optics ,Taylor and Francis , Invited paper, vol. 31, no. 2, pp. 111132, March 2012.  23.
Khan, MHA., Li, J., Lee, MH., “A block diagonal Jacket matrices for MIMO broadcast channel" IEEE International Symposium on Broadband Multimedia Systems and Broadcasting , Brunel University, June 47^{th}, 2013, UK.