Open access

Efficient 2-D DCT Computation from an Image Representation Point of View

Written By

G.A. Papakostas, D.E. Koulouriotis and E.G. Karakasis

Published: December 1st, 2009

DOI: 10.5772/7043

Chapter metrics overview

4,358 Chapter Downloads

View Full Metrics

1. Introduction

Discrete Cosine Transform (DCT) constitutes a powerful tool in signal processing, since its first introduction (Ahmed et al., 1974). It belongs to a class of mathematical operations that includes the well-known Fast Fourier Transform (FFT), having as basic operation taking a signal and transforming it from one type of representation to another. More precisely, DCT transforms a signal from the spatial domain to the frequency space, with minimum information redundancy, since its kernel functions (cosines) comprise an orthogonal basis.

The main advantage of the DCT transform is its high energy compactness and thus the resulted DCT coefficients fully describe the signal in process. This benefit in conjunction with its implementation simplicity has inspired the scientists to use DCT as the basic transform in the well known image compression standard calling JPEG (ISO/IEC, 1994). Particularly, a 2-D version of the DCT is used to compute the projection of an image in the orthogonal basis of cosines functions, by resulting to a set of coefficients that constitutes the DCT image domain. According to the JPEG standard these coefficients are being compressed in a next step by applying a specific quantization table and an entropy coding procedure.

Besides the usage of the 2-D DCT as part of image compression algorithms, it is widely used as feature extraction or dimensionality reduction method in pattern recognition applications (Sanderson & Paliwal, 2003, Er et al., 2005, Jadhav & Holambe, 2008, Liu & Liu, 2008), in image watermarking and data hiding (Qi et al., 2008, Choi et al., 2008, Alturki et al., 2007, Wong et al., 2007) and in various image processing applications (See et al., 2007, Krupinski & Purczynski, 2007, Abe & Iiguni, 2007).

From the above it is obvious that the applicability range of the 2-D DCT is wide and increases continuously. This fact has motivated many researchers to investigate and develop algorithms that accelerate the computation time needed to calculate the DCT coefficients of an image. As a result of this massive research, many algorithms that present high computation rates were proposed (Zeng et al., 2001, Diab et al., 2002, Wu et al., 2008, Shao et al., 2008, Plonka & Tasche, 2005) and many hardware implementations were presented (Nikara et al., 2006, Tan et al., 2001) through the years.

These algorithms manage to reduce the number of operations in terms of multiplications and additions, by applying common techniques with the DFT algorithms, where optimized arithmetic methodologies and matrix operations are applied.

A completely different methodology that is making use of the image’s morphology and intensities distribution, which highly improves the computational rate of the 2-D DCT, is presented in this chapter, as alternative to the conventional one. Moreover, the algorithms’ nature allows the collaboration with the already existing fast methods, in order to provide high computation rates.

2.2. D Discrete Cosine Transform

The Discrete Cosine Transform (DCT) is a popular signal transformation method, which is making use of cosine functions of different frequencies, as kernels. There are several variants of DCT with slightly modified definitions and properties, like DCT I, II, III, IV, V-VIII (Wikipedia) with the corresponding inverse formulas.

Among these types the DCT II, is usually used in image processing and compression (JPEG, MPEG), because it has strong energy compaction, meaning that a few coefficients enclose the most of the signal in process.

The (p+q)thorder DCT-II coefficient for an NxNimage having intensity function f(x,y), is described by the following formulas


with 0 ≤ p,q,x,y ≤ N-1, and


is the cosine kernel function of the orthogonal basis, while


is a normalization factor.

According to (1) there are a lot of computations that have to be performed in calculating a set of DCT coefficients, due to the cosine functions evaluations, but most of all due to the number of image’s pixels. While in the case of blocked-DCT algorithm (1) is applied to an 8x8 image block and thus the number of pixels is quite small, in the case of the global-DCT the number of pixels is increasing and is equal to the total image pixels.

A novel methodology that attempts to reduce the complexity of the above formulas by modifying these definitions in a more simple form is presented, in the next section.


3. Improved 2-D DCT computation

As it can be seen from (1), the high computational complexity of the 2-D DCT computation process, lies on the calculation of the same quantities for the entire image and thus the computation time is highly dependent on the image size by resulting to a O(N2)complexity. This fact was the main point of inspiration of investigating alternative computation methodologies to decrease the complexity by simplifying the way the operations are performed.

Recently, the authors proposed a new algorithm to accelerate the computation of Legendre moments on binary images (Papakostas et al., 2007) and generalized it in the case of gray-scale images with application on the Geometric moments (Papakostas et al., 2008), based on a new image representation scheme called Image Slice Representation (ISR).

This new image representation method is adopted and applied on the computation of 2-D DCT coefficients, in order to improve the overall computation rate, as it will be verified by appropriate experiments.

3.1. Image Slice Representation (ISR)

The main idea behind the ISR is based on a novel image perspective according to which, each gray-scale image consisting of pixels having several intensities in the range [0,255], can be completely decomposed by a finite number of image slices of different intensity.

Each slice can be defined as the following definition describes,

Definition 1:Slice of a gray-scale image, of a certain intensity fi, is the image with the same size and the same pixels of intensity fias the original one, while the rest of the pixels are considered to be black.

For example let us consider the following 4x4 gray-scale image, where the pixels’ intensities have been highlighted, so that the decomposition result is clear.

Figure 1.

Original gray-scale image, where the pixels’ intensities are displayed as a number in the range [0,255].

If we consider the gray-scale image of Fig.1, as the resultant of non-overlapped image slices, having specific intensities in their pixels’ positions, then we can decompose the original image to several slices, which can reconstruct it, by applying fundamental mathematical operations.

In this sense the image of Fig.1, can be decomposed to the following slices,

Figure 2.

Image Slice Representation of the image ofFig.1: 1st image slice of 192 intensity, 2nd image slice of 95 intensity, 3rd image slice of 28 intensity, 4th image slice of 221 intensity and 5th image slice of 234 intensity.

As a result of the Definition 1, is the following Lemma 1 and 2:

Lemma 1:Any 8-bit gray-scale image can be decomposed into a maximum of 255 slices, where each slice has pixels of one intensity value and black. Lemma 2:The binary image as a special case of a gray-scale image consists of only one slice, the binary slice, where only the intensities of 255 and 0 are included.

The main advantage of this image decomposition is by describing a gray-scale image into a set of two-level slices of certain intensity; one can effectively apply the IBR (Papakostas et al., 2007) algorithm to compute the 2-D DCT coefficients of each binary slice and finally the total coefficients of the initial gray-scale, by superposition.

Based on the ISR representation, the intensity function f(x,y)of a gray-scale image, can be defined as an expansion of the slices’ intensity functions,


where nis the number of slices (equal to the number of different intensity values) and fi(x,y)is the intensity function of the ithslice. In the case of a binary image nis equal to 1 and thus f(x,y)=f1(x,y).

By using the ISR representation scheme, the computation of the (p+q)th2-D DCT coefficient (1) of a gray-scale image f(x,y), can be performed according to,


where fiandCpqi, i=1,2,…,sare the intensity functions of the slices and the corresponding (p+q)thorder coefficient of the ithbinary slice, respectively.

The corresponding coefficient of a binary sliceCpqi, is the coefficient computed by considering a block representation of the image (Papakostas et al., 2007), as follows


wherex1,bi,x2,biandy1,bi,y2,biare the coordinates of the block bi, with respect to the horizontal and vertical axis, respectively.

The proposed methodology has already been applied to the computation of binary and gray-scale images’ moments in (Papakostas et al., 2007) and (Papakostas et al., 2008) respectively, with remarkable results.

As a result of the above analysis (5) and (6), is the following Proposition 1,

Proposition 1:The (p+q)thorder 2-D DCT coefficient of a gray-scale image is equal to the “intensity-weighted”sum of the same order coefficients of a number of binary slices.

3.2. Partial Image Block Representation algorithm – PIBR

Once a gray-scale image is decomposed to several slices according to the ISR method presented above, each slice can be considered as a two-level image where the IBR algorithm can be effectively applied.

While the separate application of the IBR algorithm to each slice significantly increases the overall block extraction time, a modified IBR algorithm, which extracts the image blocks in a single pass, is introduced in the sequel. This algorithm is called PIBR (Partial Image Block Representation) and consists of the same steps with the IBR, but with the extra process of extracting blocks of all intensity values.

The PIBR algorithm consists of one pass of the image and a bookkeeping process. More precisely, the algorithm is described by the following steps:

Algorithm – PIBR

Step 1. Consider each line y of the image f and find the object level intervals for each intensity value that exists in line y.

Step 2. Compare the intervals and blocks that have the same intensity in line y-1.

Step 3. If an interval does not match with any block of the same intensity, then it is the beginning of a new block.

Step 4. If a block matches with an interval of the same intensity, the end of the block is in the line y.

As a result of the application of the above algorithm, are several sets of rectangular blocks, having the same intensity value. After the blocks extraction, the image can be redefined in terms of blocks of different intensities, instead of individual pixels as,


where nis the number of image slices (maximum 255) and kiis the number of image blocks having intensity i. Each block is described by two pixels, the upper left and down right corner pixels of the block.

Once the blocks of all slices are extracted, (6) can be applied to compute the 2-D DCT coefficients of each slice and then by using (5) the corresponding coefficient of the original gray-scale image.

It has to be noted that the procedure of extracting image’s slices and slice’s blocks, does not add significant overhead in the overall process, something which is verified by the experimental study taken place and presented in the next section.

3.3. Complexity analysis

Before proceeding with the experiments to investigate the performance of the proposed methodology in comparison with the conventional one, a detailed analysis of the algorithm’s complexity is somewhat necessary in order to study its computational efficiency.

The following analysis, considers the worst case of image’s intensities distribution which corresponds to the image called Chessboard defined as: f(i,j)=0 (black) or f(i,j)=1 (white), f(i,j)≠{f(i-1,j), f(i+1,j), f(i,j-1), f(i,j+1).

In the following Table 1, the computational complexity of the proposed algorithm in computing a single 2-D DCT coefficient of an NxNimage, in the worst case, in comparison with the conventional method, in terms of the number of fundamental operations, is presented.

Direct MethodProposed Method
Multiplications10*N 2 + 29*N 2 + 4
Additions3*N *M3*N 2 + 1
Divisions2*N 2 + 12*N 2 + 1
Kernel Computations2*N 22*N 2

Table 1.

Algorithms’ computational complexity

From Table 1, it is clear that the use of the ISR can significantly reduce the number of multiplications required in order to compute a single coefficient, while the rest operations remain quite the same.

This outperformance, is reinforced in the case of multiple coefficients computation and in arbitrary images where the number of extracted blocks are smaller but with wider area.


4. Experimental study

In order to investigate the computational performance of the proposed methodology, appropriate experiments have taken place, in a 3.2 GHz PC having 1GB RAM, in C++ programming language and their results are presented in this section.

For the sake of the experiments three well known benchmark images and an artificial one have been selected, the Lena, F16, Boat and Chessboard images having 128x128 pixels size, as depicted in the following Fig.3. The images of Fig.3, is in gray-scale format but the same experiments are repeated and for their binary versions, since the usage of binary images are more suitable in real-time pattern recognition systems, where there are constraints on the amount of information that can be processed in real-time sense.

The experiments are performed in two directions: the performance of the proposed computation scheme in computing the 2-D DCT of the whole image, a common practice in pattern recognition applications where the resulted coefficients are used as discriminative features, called global-mode is investigated firstly.

Figure 3.

Benchmark images (a) Chessboard, (b) Lena, (c) F16 and (d) Boat in gray-scale format.

In a second state, the efficiency of computing the 2-D DCT in a block-mode, meaning that the algorithm is applied in a finite number of image sub-blocks, which is a common policy in the case of image compression applications such as JPEG, and the dependency of the computation speed on the block size is derived. It has to be remarked that the complexity of the two algorithms, as already presented in section 3.3, is independent of the order (p,q)of the coefficient and thus the complexity of computing a certain number of coefficients is multiple of the corresponding complexity of one.

Direction I - global-mode

According to this experimental direction, the novel algorithm is applied in the entire image and its performance is compared to the corresponding of the conventional method. In the following Table 2, the computation statistics for the case of the test images are presented in details.

Operation TypeMethodChessboard ImageImageF16 ImageBoat Image
Kernel ComputationsDirect32768327683276832768
Total Reduction (%)5.88%10.00%11.71%10.57%

Table 2.

Performance (in number of operations) of the algorithms of computing a single DCT coefficient, for the set of all gray-scale test images.

The above results justify the theoretical analysis about the algorithms’ complexity presented in section 3.3 and show that the introduced methodology requires less fundamental operations in comparison to the conventional method. Chessboard image represents the worst case image for the ISR since the number of extracted slices is very high equal to the half of image’s pixels. Even in the case of the chessboard image the proposed method outperforms the traditional one, by reducing the total operations of about 5.88%, while for the other images this reduction approaches 10% in almost all cases.

Table 3, illustrates some useful performance statistics of applying ISR to the gray-scale images, such as the time needed to extract image’s blocks, the number of extracted slices and the total blocks. As it can be seen from this table, the block extraction time is a fast procedure that does not add significant overhead in the whole computation and the number of extracted blocks varies depending on the image’s intensities distribution.

Chessboard ImageImageF16 ImageBoat Image
Block Extraction Time(msecs)0.64340.65840.69120.6891
Num. of Blocks16384147081409514501
Num. of Intensity Slices2205211213

Table 3.

Performance statistics of applying ISR to the gray-scale benchmark images.

Since DCT coefficients are widely used as discriminative features to pattern classification applications, there is an increasing interest in implementing the process of coefficients extraction in an embedded system having a microprocessor as basic processing unit.

Chessboard ImageImageF16 ImageBoat Image
Kernel ComputationsDirect32768327683276832768
Total Reduction (%)5.88%60.80%57.27%57.39%

Table 4.

Performance (in number of operations) of the algorithms of computing a single DCT coefficient, for the set of all binary test images.

In these applications the need of real-time processing and resources restrictions (memory resources) constitute major issues of the software developer in implementing any time consuming algorithm such as 2-D DCT. Binary images comprise a good choice that satisfies the two above requirements and thus the study of the proposed algorithm’s performance on bi-level images is of significant importance.

In this way, the gray-scale images of Fig.3 are initially converted to binary format by applying the well known Otsu’s thresholding algorithm (Otsu, 1979) and the same experiments are executed in the resulted binary images. The corresponding results are presented in the above Table 4.

Table 4, shows that the efficiency of using the proposed method for binary images is significant, since it achieves 60% reduction of the total operations in comparison with the conventional method. This result is justified by the fact that in the case of binary images the number of extracted blocks is low while blocks’ area is increased and thus according to (5)-(6), the whole processing is less time consuming.

The corresponding statistics of applying ISR to the binary images are depicted in the following Table 5, where it can be observed that the number of extracted blocks is small indeed.

Chessboard ImageImageF16 ImageBoat Image
Block Extraction Time(msecs)0.64340.19660.17780.1810
Num. of Blocks16384139412991295
Num. of Intensity Slices2222

Table 5.

Performance statistics of applying ISR to the binary benchmark images.

Conclusively, it can be declared that the proposed methodology can be effectively applied to accelerate the computation of the 2-D DCT when the transform is applied on the entire image. While the proposed algorithm presents remarkable performance in gray-scale images, in the special case of binary images, the algorithm outperforms the conventional method totally, by making it appropriate for real-time, performance demanding applications.

Direction II – block-mode

Due to the high popularity of DCT in image compression methods, it is of significant importance to investigate the performance of the proposed methodology when applied on a sub-block of the image (block-mode), as it is performed in the JPEG standard.

As part of the JPEG standard, the 2-D DCT is applied on 8x8 image sub-blocks, but it is important to investigate the behaviour of the method in several dimensions of image’s sub-blocks. Therefore, it is decided to study the algorithm for 2x2, 4x4, 8x8, and 16x16, sub-blocks in both gray-scale and binary benchmark images and the results are illustrated in Fig.4 and Fig.5 respectively.

Significant conclusions can be derived from these figures, about the performance of the proposed methodology when it is applied in a block-mode. The computation of 2-D DCT of an image decomposed to sub-blocks of nxnsize, is accelerated by using the ISR for sub-block size greater than 8x8 considering a gray-scale format, while in binary images this acceleration occurs earlier.

Therefore, the use of the proposed methodology in the case of binary images improves the overall computation rate but when applied on gray-scale images the advantages of the method are restricted for sub-blocks greater than the 8x8 size, which is the official size used in the standard JPEG. This fact generates some questions about the efficiency of the introduced algorithm, since the usefulness of sub-block size greater than 8x8 in JPEG compression has to be studied.

Figure 4.

Total operations in respect to sub-block size for (a) Chessboard, (b) Lena, (c) F16 and (d) Boat gray-scale images.

For this reason, it is decided to investigate the compression performance of the JPEG under variable sub-block sizes. The compression ratio for a fixed image quality (MSE – Mean Squared Error) and for 2x2, 4x4, 8x8 and 16x16 sub-block size is computed and the results are displayed in the following Fig.6. These results testify that the 8x8 sub-block size is the optimum choice regarding the compression ratio and the image quality, an observation which deals with the JPEG standard.

Figure 5.

Total operations in respect to sub-block size for (a) Lena, (b) F16 and (c) Boat binary images.


5. Conclusion

A novel methodology that ensures the computation of 2-D DCT coefficients in gray-scale images as well as in binary ones, with high computation rates, was presented in the previous sections. Through a new image representation scheme, called ISR (Image Slice Representation) the 2-D DCT coefficients can be computed in significantly reduced time, with the same accuracy.

Figure 6.

Compression performance of JPEG for various sub-block sizes.

Appropriate experiments prove that the new algorithm totally outperforms the conventional method, in the case the algorithm applied on the entire image area for any image type, while in its blocked version the advantages were obvious for sub-block sizes greater than 8x8.

The experimental results are very promising and in conjunction to the capability of the methodology to collaborate with any of the existent fast algorithms, it can be a good choice for real-time and time demanding applications.


  1. 1. AbeY.IiguniY.2007Image restoration from a downsampled image by using the DCT.Signal Processing,871023702380.
  2. 2. AhmedN.NatarajanT.RaoK. R.1974Discrete cosine transform.IEEE Trans. Computers,2319093.
  3. 3. AlturkiF. T.AlmutairiA. F.MersereauuR. M.2007Analysis of blind data hiding using discrete cosine transform phase modulation.Signal Processing: Image Communication,224347362.
  4. 4. ChoiH. J.SeoY. H.YooJ. S.KimD. W.2008Digital watermarking technique for holography interference patterns in a transform domain.Optics and Lasers in Engineering,464343348.
  5. 5. DiabC.OueidatM.ProstR.2002A new IDCT-DFT relationship reducing the IDCT computational cost.IEEE Trans. Signal Processing,50716811684.
  6. 6. ErM. J.ChenW.WuS.2005High-Speed face recognition based on discrete cosine transform and RBF neural networks.IEEE Trans. on Neural Networks,163679691.
  7. 7. ISO/IEC109181(1994). Digital compression and coding of continuous-tone still images: Requirements and guidelines.
  8. 8. JadhavD. V.HolambeR. S.2008Radon and discrete cosine transforms based feature extraction and dimensionality reduction approach for face recognition.Signal Processing,881026042609.
  9. 9. KrupinskiR.PurczynskiJ.2007Modeling the distribution of DCT coefficients for JPEG reconstruction.Signal Processing: Image Communication,225439447.
  10. 10. LiuZ.LiuC.2008Fusion of the complementary Discrete Cosine Features in the YIQ color space for face recognition.Computer Vision and Image Understanding,1113249262.
  11. 11. NikaraJ. A.TakalaJ. H.AstolaJ. T.2006Discrete cosine and sine transforms-regular algorithms and pipeline architectures.Signal Processing,862230249.
  12. 12. OtsuN.1979A threshold selection method from gray-level histograms.IEEE Trans. Systems Man Cybernet.,96266.
  13. 13. PapakostasG. A.KarakasisE. G.KoulouriotisD. E.2008Efficient and accurate computation of geometric moments on gray-scale images.Pattern Recognition,41618951904.
  14. 14. PapakostasG. A.KarakasisE. G.KoulouriotisD. E.2007Exact and speedy computation of Legendre moments on binary Images.Proceedings of 8th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS’07),48ISBN, Santorini- Greece, 2007.
  15. 15. PlonkaG.TascheM.2005Fast and numerically stable algorithms for discrete cosine transforms.Linear Algebra and its Applications,394309345.
  16. 16. QiH.ZhengD.ZhaoJ.2008Human visual system based adaptive digital image watermarking.Signal Processing,881174178.
  17. 17. SandersonC.PaliwalK. K.2003Features for robust face-based identity verification.Signal Processing,835931940.
  18. 18. SeeK. W.LokeK. S.LeeP. A.LoeK. F.2007Image reconstruction using various discrete orthogonal polynomials in comparison with DCT.Applied Mathematics and Computation,1932346359.
  19. 19. ShaoX.JohnsonS. G.2008Type-IV DCT, DST, and MDCT algorithms with reduced numbers of arithmetic operations.Signal Processing,88613131326.
  20. 20. TanT. C.BiG.ZengY.TanH. N.2001DCT hardware structure for sequentially presented data.Signal Processing,811123332342.
  21. 21. Wikipedia contributors, Discrete cosine transform, Wikipedia, The Free Encyclopedia,
  22. 22. WongK.QiX.TanakaK.2007A DCT-based Mod4 steganographic method.Signal Processing,87612511263.
  23. 23. WuJ. S.ShuH. Z.SenhadjiL.LuoL. M.2008A fast algorithm for the computation of 2-D forward and inverse MDCT.Signal Processing,88614361446.
  24. 24. ZengY.ChengL.BiG.KotA. C.2001Integer DCTs and Fast Algorithms.IEEE Tans. Signal Processing,491127742782.

Written By

G.A. Papakostas, D.E. Koulouriotis and E.G. Karakasis

Published: December 1st, 2009