Fast Algorithm Designs of Multiple-Mode Discrete Integer Transforms with Cost-Effective and Hardware-Sharing Architectures for Multistandard Video Coding Applications

Chih-Peng Fan

doi:10.5772/64985

Abstract

In this chapter, first we give a brief view of transform-based video coding. Second, the basic matrix decomposition scheme for fast algorithm and hardware-sharing-based integer transform design are described. Finally, two case studies for fast algorithm and hardware-sharing-based architecture designs of discrete integer transforms are presented, where one is for the single-standard multiple-mode video transform-coding application, and the other is for the multiple-standard multiple-mode video transform-coding application.

Keywords

video coding
transform coding
fast algorithm
matrix factorization
hardware sharing
multiple modes
multiple standards

Author Information

Show +

Chih-Peng Fan*
- Department of Electrical Engineering, National Chung Hsing University, Taichung, Taiwan, ROC

*Address all correspondence to: cpfan@dragon.nchu.edu.tw

1. Introduction

Video-coding system has generally utilized block-based transform-coding skills to shrink the data rates by joining quantization and entropy coding. Among some block-based transforms, the discrete cosine transform (DCT) [1] and integer transforms have extensively been used to still image and video-coding specifications, such as JPEG [2], MPEG-1/2 [3, 4], MPEG-4 [5], H.264/AVC [6, 7], AVS [8, 9], VC-1 [10], VP8 [11], and HEVC [12]. Because integer transforms perform the low complexity and effective coding performance, the advanced video coding (AVC) in ITU-T H.264 [6, 7, 13, 14], which is also known as MPEG-4 part 10, applies integer transforms for transform process. The 4 × 4 and 8 × 8 transforms in [13, 14] were calculated exactly to prevent non-adaptation issues of inverse transforms for high-quality moving visual images. The VC-1 specification [10, 15, 16] employed 4 × 4 and 8 × 8 integer transforms, and it was developed by Microsoft Corporation and standardized by the Society of Motion Picture and Television Engineers (SMPTE). The 8 × 8 integer transform is utilized to obtain the high-coding performance in the Audio Video Coding Standard (AVS) for China [8, 9]. In [11], the VP8 video-coding standard was developed for Internet browser applications. The Joint Collaborative Team on Video Coding proposed the high-efficiency video coding (HEVC) specification [12]. By HEVC, the compression efficiency was greatly better than that achieved using the H.264/AVC high-profile-coding specification.

To support the single-standard H.264/AVC video coding, several transform architectures in [17–24] have been developed to approach the multiple transform modes in H.264. To support the single-standard H.265/HEVC video coding, several transform architectures in [25–32] have been developed to approach the multiple transform modes in HEVC. Besides, supporting multiple-standard functions in video coding has been an important issue in multimedia applications recently, such as H.264/AVC, MPEG-1/2/4, VC-1, AVS, and VP8 standards, and several transform architectures in [33–41] have also been developed to complete the multiple transform functions. Owing to the growth of multistandard video-coding applications, how to achieve low-computational complexities and implement by hardware-sharing-based cost-effective architectures simultaneously are interesting research topics for the VLSI design of video codecs.

2. Matrix decomposition preprocessing for fast algorithm and hardware-sharing-based designs

Based on the resemblance property, the 8 × 8 inverse integer transforms [41] in H.264/AVC, AVS, VC-1, VP8, MPEG-1/2/4, and HEVC specifications are revealed in Eq. (1), and Table 1 depicts the coefficient values in the transforms.

C8×8=[ abfcadgeacg−e−a−b−f−dad−g−b−aefcae−f−dac−g−ba−e−fda−c−gba−d−gb−a−ef−ca−cge−ab−fda−bf−ca−dg−e ]E1

Transform sizes	VC-1	AVS	VP8	MPEG-1/2/4	H.264/AVC	HEVC
4 × 4	√	√	√	N/A	√	√
8 × 8	√	√	N/A	√	√	√
16 × 16	N/A	N/A	N/A	N/A	N/A	√
32 × 32	N/A	N/A	N/A	N/A	N/A	√

Table 1.

The transform modes in several video-coding standards [41].

In Eq. (1), it is decomposed by Eq. (2) as

C8×8=P1⋅A0⋅Pr.E2

In Eq. (2), A₀ is divided into two modules, U_{4 × 4} and D_{4 × 4}, where P1=[ 1000000−1010000−1000100−1000001−100000011000001001000100001010000001 ], Pr=[ 1000000000100000000010000000001001000000000100000000010000000001 ], A0=[ afag0000ag−a−f0000a−g−af0000a−fa−g00000000−ed−cb0000−db−e−c0000−cebd0000−b−c−d−e ].

Thus

A0=U4×4⊕D4×4E3

and C_8×8 becomes

C8×8=P1⋅(U4×4⊕D4×4)⋅Pr.E4

In (3), “⊕ “ is the direct sum operator, and the two diagonal blocks U_{4 × 4} and D_{4 × 4} are processing in parallel. To cut down the computational operations and achieve effective hardware shares, the upper diagonal matrix U_{4 × 4} and the down diagonal matrix D_{4 × 4} are further decomposed into the cascaded multiplication form or the addition form of sparse matrices. After matrix factorizations, the chosen sparse matrices have the coefficients which are 1, −1, 0, or an integer, and an integer value can equal the combination of powers of two. Besides, zero factors in the chosen sparse matrices could be factorized as many as possible [42].

By Eq. (1), for VC-1 the values of the coefficient set {a, b, c, d, e, f, g} are {12, 16, 15, 9, 4, 16, 6}, and those for AVS are {8, 10, 9, 6, 2, 10, 4}. Next, those for MPEG-1/2/4 are {362, 502, 426, 284, 100, 473, 196}, and those for H.264/AVC are {8, 12, 10, 6, 3, 8, 4}. Finally, those for HEVC are {64, 89, 75, 50, 18, 83, 36}.

The general 4 × 4 inverse integer transform matrices [41] can be presented in Eq. (5) as

M4×4=[ hihjhj−h−ih−j−hih−ih−j ].E5

By Eq. (5), for VC-1 the values of the coefficient set {h, i, j} are {17, 22, 10}, and those for VP8 are {128, 167, 70}. Next, those for AVS-M are {2, 3, 1}, and those for H.264/AVC are {1, 1, 0.5}. Finally, those for HEVC are {64, 83, 36}.

3. Case study [32]: single-standard multiple-mode transform design

3.1. Hardware-sharing based 32 × 32 integer core transform for HEVC

The one-dimensional (1D) 32 × 32 inverse core transform for HEVC is described in [30]. By the symmetrical property, the 32 × 32 inverse core transform is presented as

Hi32=PA⋅CA1,E6

where CA1=[ C11C12C21C22 ], PA=[ I16x16−I˜16x16I˜16x16I16x16 ], I˜16x16=[ 00⋯0100010⋮⋮⋰0⋮010⋮010⋯00 ], and I_16×16 is a 16 × 16 identity matrix. In Eq. (6), P_A is the butterfly-like postprocessing, and C_A1 is the sparse matrix. By swapping each column of C_A1, it becomes

CA1=CA2⋅PAr.E7

By Eqs. (6) and (7), H_i32 becomes

Hi32=PA⋅CA2⋅PAr,E8

where P_Ar is the permutation matrix. In Eq. (7), C_A2 is expressed by

CA2=[ TA11016x16016x16TA22 ]=TA11⊕TA22,E9

where “⊕” means the direct sum operation, and then T_A11 and T_A22 are 16 × 16 matrices, which are revealed in [32]. The matrix P_Ar in Eq. (8) is expressed as

PAr=P(2,16),E10

where the permutation matrix P(m, n) is defined in [43], and the notation “⊗” means the Kronecker product. In Eq. (9), A_A22 is presented as

TA22 = TM1+ TN1,E11

First, the lower half of C_N1 is divided into sixteen 8 × 1 column vectors X_i, where i = 0, 1, 2, …, 15, and then T_N1 becomes

TN1=[ 08 x 16−−−−−−−−−X0X1…X15 ].E12

Second, the coefficients in a single column vector can be shared. The vector coefficient computations are achieved by integrating several base coefficients [32]. After realizing the column vectors of T_N1, the lower half of T_N1 is factorized as an integration of eight 1 × 16 row vectors depicted as Y_i, where i = 8, 9, …, and 15, and T_N1 becomes

TN1 =[ 08x16−−−Y8Y9⋮Y15 ].E13

Adder tree structures are utilized to calculate the aggregate results for the row vectors Y₈–Y₁₅ [32]. By the duplicate operations for T_N1, T_M1 is presented as

TM1=[ X^0⋯X^15−−−−−08x16 ],E14

where X^i is an 8 × 1 column vector, where i = 0, 1, 2, …, and 15. Then, T_M1 becomes

TM1=[ Y0⋮Y7−−−08x16 ],,E15

where Y_i is a 16 × 1 row vector, where i = 0, 1, …, and 7. The realization of T_M1 equals that of T_N1. Finally, the operations of T_M1 and T_N1 are merged to T_A22. The computational operations T_A22 require 630 additions and 326 shift operations [32]. The matrix T_A11 in Eq. (9), which is also denoted as H_i16, is the 1D 16 × 16 inverse core transform in HEVC [30].

3.2. Hardware-sharing-based 16 × 16 integer core transform for HEVC

The 16 × 16 integer core transform in [30] changes into

Hi16=PB⋅CB1,E16

where PB= [ I8x8−I˜8x8I˜8x8I8x8 ], and C_B1 is revealed in [32]. By swapping each column of C_B1, it will be

CB1=CB2⋅PBr,E17

where P_Br = P(8,2). By Eqs. (16) and (17), H_i16 is expressed by

Hi16= TA11=PB⋅CB2⋅PBr.E18

In Eq. (18), C_B2 is presented as

CB2=[ TB1108x808x8TB22 ]=TB11⊕TB22,E19

and T_B22 becomes

TB22 = TM2 + TN2,E20

where TM2=[ −9 25−43 57−70 80−87 90−25 70−90 80−43 −9 57−87−43 90−57−25 87−70 −9 80−57 80 25−90 9 87−43−70 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ],
TN2=[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0−70 43 87 −9−90−25 80 57−80 −9 70 87 25−57−90−43−87−57 −9 43 80 90 70 25−90−87−80−70−57−43−25 −9 ].

By the duplicate processed of T_N1 in Section 3.1, T_N2 turns into

TN2=[ 04x8−−−−−−U0 … U7 ],E21

where U_i is an 8 × 1 column vector, where i = 0, 1, 2, …, and 7. Next, T_N2 also is

TN2=[ 04x8−−−V4⋮V7 ],E22

where V_i is a 1 × 8 row vector, where i = 4, 5, 6, and 7. Adder tree schemes are applied to compute the summed outcomes of V₄–V₇ [32]. By the same processes of T_M1 in Section 3.1, T_M2 becomes

TM2=[ U^0…U^7−−−−−−04x8 ],E23

where U^i is a 4 × 1 column vector, where i = 0, 1, 2, …, and 7. Next, T_M2 also is

TM2=[ V0⋮V3−−−04x8 ],E24

where V_i is a 1 × 8 row vector, where i = 0, 1, 2, and 3. Then, adder trees are used to treat the row vectors V₀–V₃ [32]. Finally, the calculations of T_M2 and T_N2 are merged to T_B22. The computational operations of T_B22 are 164 additions and 106 shift operations [32]. Meantime, the T_B11 in Eq. (19), which is also denoted as H_i8, is the 1D 8 × 8 inverse core transform in HEVC [30].

3.3. Hardware-sharing-based 8 × 8 integer core transform for HEVC

The 8 × 8 integer transform in [30] is described as

Hi8=PC⋅CC1,E25

where PC= [ I4x4−I˜4x4I˜4x4I4x4 ], and CC1=[ 640 830 640 360640 360−640−830640−360−640 830640−830 640−3600−180 500−750 890−500 890−180−750−750 180 890 500−890−750−500−18 ]. After swapping each column in C_C1, it changes into

CC8=CC2⋅PCr,E26

where PCr=[ 1000000000100000000010000000001001000000000100000000010000000001 ]. Based on Eqs. (25) and (26), H_i8 is presented by

Hi8=TB11=PC⋅CC2⋅PCr,E27

In Eq. (27), C_C2 becomes

CC2=[ TC1104x404x4TC22 ]=TC11⊕TC22,E28

where TC11=[ 64 83 64 3664 36−64−8364−36−64 8364−83 64−36 ] and TC22=[ −18 50−75 89−50 89−18−75−75 18 89 50−89−75−50−18 ].

In Eq. (28), T_C22 is factorized as

TC22 = S1+ S2,E29

where S1=[ −1800 89089−180018 890−8900−18 ]. Moreover, S₁ is expressed by

S1=Z1+(18⋅Z2),E30

where Z1=[ 000−10−10000−101000 ] and Z2=[ −10 0 5 05−1 0 01 5 0−50 0−1 ]. In Eq. (29), S₂ is presented as

S2=25⋅Z3,E31

where Z3=[ 0 2−3 0−200−3−300 20−3−2 0 ]. By Eqs. (29)– (31), T_C22 becomes

TC22=Z1+(18⋅Z2)+(25⋅Z3).E32

In Eq. (32), the computations of T_C22 require 36 additions and 28 shift operations [32]. The matrix T_C11 in Eq. (28) is also the 1D 4 × 4 inverse core transform matrix in HEVC.

3.4. Hardware-sharing-based 4 × 4 integer core transform for HEVC

The 4 × 4 integer core transform matrix is indicated as

Hi4=PD⋅CD1,E33

where PD=[ 10 1 001 0 101 0−110−1 0 ] and CD1=[ 640 640640−6400−360 830−830−36 ]. By swapping each column of C_D1, it changes into

CD1=CD2⋅PD2.E34

where PDr=[ 1000001001000001 ]. From Eqs. (33) and (34), H_i4 is described by

Hi4=TC11=PD⋅CD2.PDr.E35

In Eq. (34), C_D2 is rewritten as

CD2=TD11⊕TD22.E36

In Eq. (36), T_D11 becomes

TD11=64⋅Z4,E37

where Z4=[ 1 11−1 ]. In Eq. (36), T_D22 is indicated by Z₅ and Z₆ as

TD22=36⋅Z5+11⋅Z6,E38

where Z5=[ 2 11−2 ] and Z6=[ 1 00−1 ]. Thus, the computations of T_D22 are 10 additions and 10 shift operations [32]. Based on Eqs. (35)– (38), H_i4 is changed into

Hi4=PD⋅[ (64⋅Z4)⊕(36⋅Z5+11⋅Z6) ]⋅PDr.E39

By the abovementioned discussions, the hardware modules of 4 × 4, 8 × 8, and 16 × 16 inverse core transforms are shared to implement H_i8, H_i16, and H_i32, respectively [32]. By sharing the hardware of H_i4 in Eq. (39), the cost-effective design of the 8 × 8, 16 × 16, and 32 × 32 inverse core transforms is obtained progressively. First, the hardware-sharing-based eight-point inverse transform is presented as

Hi8=PC⋅{ Hi4⊕[ Z1+(18⋅Z2)+(25⋅Z3) ] }⋅PCr.E40

Next, the hardware-sharing-based 16-point inverse transform is described as

Hi16=PB⋅{ Hi8⊕[ TM2+TN2 ] }⋅PBr.E41

Finally, the hardware-sharing-based 32-point inverse transform is depicted as

Hi32=PA⋅{ Hi16⊕[ TM1+TN1 ] }⋅PAr.E42

In this section, the hardware-sharing transform architecture cuts down the hardware cost because the same submodules and coefficients of the transforms are extracted to be shared. Figure 1 illustrates the architecture of the hardware-sharing-based inverse core transform design for 4 × 4/8 × 8/16 × 16/32 × 32 transforms [32].

3.5. Architecture comparison

The proposed 1D inverse core transform in [32] involves four inputs to sustain 4 × 4, 8 × 8, 16 × 16, and 32 × 32 transform modes. Several multiplexers are utilized to acquire the transform outputs of the 32 × 32 inverse core transform by the shared design of 4 × 4, 8 × 8, and 16 × 16 inverse core transforms [32]. Table 2 lists the number of adders and shifters needed to calculate four modes of the 1D inverse core transform for HEVC. The developed architecture in [32] does not require any multiplier, and the fixed-coefficient multiplications are replaced with simple additions and shift operations. Table 3 shows the comparison of three 16-point inverse transform designs. Compared with the previous works in [29] and [31], the applied architecture contains fewer adders. However, several more shifters are required. Compared with the cost of adders, the shifters need lower hardware expense. Thus, the used architecture decreases the hardware cost more efficiently than previous transform schemes do.

Figure 1.
The hardware-sharing-based inverse core transform structure for HEVC.

Transform sizes	32 × 32	16 × 16	8 × 8	4 × 4
No. of shifters	256	93	40	11
No. of adders	461	146	64	10

Table 2.

The 1D inverse transform architecture at different transform modes [32].

Designs	No. of shifters	No. of adders
Ahmed [29]	132	232
Haggag [31]	58	242
Design in Section 3.2	93	146

Table 3.

Hardware comparison of three 1D 16-point transform designs [32].

4. Case study [41]: multiple-standard multiple-mode transform design

4.1. Hardware-sharing design for 8 × 8 transforms mode

For H.264/AVC, the transform matrix is employed as a foundation matrix for the multistandard hardware-sharing scheme. Based on Eq. (3), the cost of the upper diagonal matrix in Eq. (43) is eight adders and two shifters.

U4×4_AVC=[ 888484−8−88−4−888−88−4 ]=8⋅C1⋅C2,E43

where C1=[ 10010−1100110100−1 ], and C2=[ 10100−0.50110−100100.5 ]. For AVS, the upper diagonal matrix U_{4×4_AVS} in Eq. (44) costs 10 adders and four shifters.

U4×4_AVS=[ 8108484−8−108−4−8108−108−4 ]=8⋅C1⋅(C2+C3),E44

where C3=[ 00000000.25000000.2500 ]. In Eq. (45), the upper diagonal matrix U_{4×4_VC1} for VC1 needs 14 adders and eight shifters.

U4×4_VC1=| 1216126126−12−1612−6−121612−1612−6 |=8⋅C1⋅(C4+C5⋅C2),E45

where and C4=[ 00000000.5000000.500 ], and C5=[ 1.500001.500001.500001.5 ]. For HEVC, the 8 × 8 transform matrix is acquired by the AVS design in Eq. (44), and the design in Eq. (46) costs 16 adders and 12 shifters.

U4×4HEVC=[ 648364366436−64−8364−36−648364−8364−36 ]=2⋅C1⋅[ 32⋅(C2+C3)−U1 ],E46

where U1=[ 0000020−1.500000−1.50−2 ]. For MPEG-1/2/4, the upper diagonal matrix is factorized by

U4×4_MPEG=[ 362473362196362196−362−473362−196−362473362−473362−196 ]=C1⋅[256⋅(C4+C5⋅C2)−(U2+U3)],E47

where U2=[ 2202200000220−2200000 ],and U3=[ 00000403900000390−4 ]. In Eq. (47), the parameter “22” of U₂ is implemented by (C₅ · C₅ ≪ 4) – (C₁ ≪ 1), where “≪1” is left shifting one bit, and the cost in Eq. (47) requires 28 adders and 26 shifters.

By Eq. (3), on the other side, the down diagonal matrix D_{4×4_AVC} for H.264/AVC becomes Eq. (48), and it needs 17 adders and eight shifters.

D4×4_AVC=[ −36−1012−612−3−10−103126−12−10−6−3 ]=8⋅U4⋅(D4+D5)⋅(D2+U3),E48

where U4=[ 100001000010000−1 ], D4=[ −1−110101−1−110−10111 ], D5=[ −0.5000000.5000.5000000.5 ], D2=[ 0.2500000.250000−0.2500000.25 ], U3=[ 000−1001001001000 ].

For AVS, the D_{4×4_AVS} matrix becomes (49), and D₄ and D₅ are shared with the design in Eq. (48), and then U₃ and U₄ are also partially shared with the scheme in Eq. (48). In Eq. (49), it costs 24 adders and 12 shifters

D4×4_AVS=[ −26−910−610−2−9−92106−10−9−6−2 ]=4⋅U4⋅(D4+D5)⋅D3⋅(D1+U3),E49

where U3=[ 0−100000−110000010 ], D3=[ 10000−10000−100001 ], and D1=[ 1.500001.50000−1.500001.5 ].

For VC-1, the D_{4×4_VC1} matrix is factorized by Eq. (50), and the design requires 21 adders and 12 shifters

D4×4_VC1=[ −49−1516−916−4−15−154169−16−15−9−4 ]=8⋅U4⋅(D4⋅D6+D5)⋅(D2+U3),E50

where D6=[ 1.500001.500001.500001.5 ]. For HEVC, the D_{4×4_HEVC} matrix is expressed by Eq. (51), and it expends 44 adders and 20 shifters

D4×4_HEVC=[−1850−7589−5089−18−75−75188950−89−75−50−18]=D4×4_AVS⋅9+[4⋅(U5⋅D1+U6)−U7],E51

where U5=[ 00−10000110000100 ], U6=[ 0−1001000000−10010 ], U7=[ 000101000010−1000 ]. For MPEG-1/2/4, based on D_{4×4_AVS}, the D_{4×4_MPEG} matrix is presented by Eq. (52), and the design costs 48 adders and 32 shifters

D4×4_MPEG=[ −100284−426502−284502−100−426−426100502284−502−426−284−100 ]=D4×4_AVS⋅50+[16⋅(U5⋅D1+U6)+2⋅U7].E52

4.2. Hardware-sharing design for 4 × 4 transforms mode

For AVS-M, the matrix M_{4×4_AVS} is presented by (53), and it spends 10 adders and six shifters

M4×4_AVS=[ 232121−2−32−1−232−32−1 ]=C1⋅(2⋅C2+U8),E53

where U8=[ 0000000100000100 ]. For VC-1, M_{4×4_VC1} is expressed by Eq. (54), and the design requires 14 adders and 12 shifters

M4×4_VC1=[ 172217101710−17−2217−10−172217−2217−10 ]=C1⋅(16⋅C2+U9),E54

where U9=[ 10100−20610−100602 ]. For VP8, all coefficients in 4 × 4 transform matrix are multiplied by 128 to get integer values, and it costs 18 adders and 14 shifters

M4×4_VP8=[ 1281671287012870−128−167128−70−128167128−167128−70 ]=C1⋅(128⋅C2+U10),E55

where U10=[ 00000−6039000003906 ]. The matrix U_{4×4_AVC}/8 equals the 4 × 4 inverse transform matrix in H.264/AVC. In addition, the matrix U_{4×4_HEVC} equals the 4 × 4 inverse transform matrix in HEVC. Thus, several multiplexers are used to share the hardware between the submatrices to decrease hardware cost.

4.3. Architecture comparison

The applied hardware-sharing-based 1D multistandard inverse integer transform scheme has two inputs, which sustain 4 × 4 and 8 × 8 transform modes. The hardware blocks of processing the 4 × 4 inverse transforms are shared with that of the upper diagonal matrix U_8×8. Thus, several multiplexers are utilized for U_8×8 to compute the 4 × 4 inverse transforms without additional operations. For the multistandard applications, the hardware-sharing architecture of the fast 1D 4 × 4 and 8 × 8 inverse integer transforms is illustrated in [41]. The shifters are also realized by wiring. Compared with the individual designs without hardware shares, Table 4 depicts that the used scheme in [41] decreases the number of shifters and adders by 50 and 75%, respectively.

Different 1D inverse integer transform modes	No. of adders	No. of shifters
Individual designs without hardware shares	336	180
Hardware-sharing-based design in Section 4	82	90
Reduction of cost	75%	50%

Table 4.

Hardware comparison between two architectures [41].

To implement the discussed architecture, a cell-based VLSI design flow is utilized to design, simulate, and verify the cost-effective hardware-sharing architecture. For fair comparisons among different transform structures, the normalized mode gain, which is required to normalize the gate counts, is described as follows: By matrix dimensions and without missing generality [40], the normalized mode gains defined for the 32 × 32, 16 × 16, 8 × 8, and 4 × 4 inverse integer transform matrices are 16, 4, 1, and 1/4, respectively.

The hardware-sharing-based design in Section 3 supports 4 × 4, 8 × 8, 16 × 16, and 32 × 32 inverse transform modes for HEVC. Thus, the normalized mode gain of the design is 21.25 (i.e., 16 + 4 + 1 + 0.25). Similarly, five 8 × 8 and five 4 × 4 inverse transform functions are provided by the hardware-shared design in Section 4. Therefore, the normalized mode gain is assigned by 6.25 (i.e., 5 + 1.25) [41]. Afterwards, the normalized gate counts are defined by [40, 41]

Normalized gate counts = Gate countsNormalized mode gain.E56

Table 5 shows the hardware cost comparisons among different 1D multiple transform architectures, which includes single-standard multiple-mode [32] and multiple-standard multiple-mode [41] transform designs.

Architecture	Ahmed et al. [29]	Hardware-sharing based-design in Section 3	Shen et. al. [26]	Martuza et. al. [28]	Qi et al. [36]	Wang et al. [38]	Hardware-sharing-based design in Section 4
Gate counts	144.8K	115.7 K	134.8 K	39.4 K	18 K	23.06 K	27.4 K
Normalized mode gain	21.25	21.25	25.75	5	3.5	4.5	6.25
Normalized gate counts	6.81 K	5.44 K	5.23 K	7.88 K	5.14 K	5.12 K	4.38 K
Supporting modes	Single-standard Multiple-mode	Single-standard Multiple-mode	Multiple- standard Multiple- mode	Multiple-standard Multiple-mode	Multiple-standard Multiple- mode	Multiple- standard Multiple- mode	Multiple-standard Multiple-mode
Supporting standards/Transforms	HEVC: 4 × 4, 8 × 8, 16 × 16, 32 × 32 modes	HEVC: 4 × 4, 8 × 8, 16 × 16, 32 × 32 modes	H.264/AVC, VC-1: 4 × 4,8 × 8 modes MPEG-1/2/4, AVS: 8 × 8 mode; HEVC: 4 × 4, 8 × 8, 16 × 16, 32 × 32 modes	H.264/AVC, VC-1, AVS, HEVC: 4 × 4, 8 × 8 modes	H.264/AVC, VC-1: 4 × 4, 8 × 8 modes; MPEG-1/2/4: 8 × 8 mode	H.264/AVC;, VC-1: 4 × 4, 8 × 8 modes; MPEG-1/2/4,AVS: 8 × 8 mode	H.264/AVC, VC-1, HEVC: 4 × 4, 8 × 8 modes; MPEG-1/2/4, AVS: 8 × 8 mode; VP8, AVS-M: 4 × 4 mode

Table 5.

Hardware cost comparisons among different 1D multiple transform architectures [32, 41].

5. Conclusion

For the single-standard multiple-mode transform design, this chapter discussed the 4 × 4, 8 × 8, 16 × 16, and 32 × 32 inverse core transforms in HEVC with a cost-effective and hardware-efficient design. By the symmetrical characteristics of the elements, the core transform matrices were factorized into several submatrices. Thus, the hardware of the (N/2) × (N/2) inverse core transform was shared with that of the N × N inverse core transform for N = 32, 16, and 8. Compared with the direct design without hardware shares, the applied transform scheme in Section 3 decreased the hardware cost of adders and shifters by 32 and 36%, respectively. Besides, for VLSI implementation, the design in Section 3 requires less normalized gate counts than the design does in [29].

For the multiple-standard multiple-mode transform design, this chapter also discussed the fast algorithm and hardware-sharing-based design of 4 × 4 and/or 8 × 8 inverse transforms among H.264/AVC, VC-1, HEVC, MPEG-1/2/4, AVS, and VP8 for multistandard video decoders. By only shifters and adders, the decomposition scheme of matrices was used to develop the hardware-shared scheme. The used structure in Section 4 decreased the number of shifters and adders by 50 and 75% more than the individual fast algorithm-based implementation did. Besides, for VLSI implementation, the design in Section 4 requires less normalized gate counts than the designs do in [26, 28, 36, 38].

Acknowledgments

This work was supported by Ministry of Science and Technology, Taiwan, R.O.C. under Grant MOST 105-2221-E-005-078.

References

1. J. R. Rao and P. Yip, Discrete Cosine Transform: Algorithms, Advantage, Applications, New York, NY: Academic, 1990.
2. ISO/IEC JTC 1/SC 29/WG 1—Coding of Still Pictures, 2009.
3. ISO/IEC 11172-2 MPEG-1 Video Coding Standard, Information Technology—Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1,5 Mbit/s – Part 2: Video, 1993.
4. ISO/IEC 13818-2 MPEG-2 Video Coding Standard, Information Technology—Generic Coding of Moving Pictures and Associated Audio Information: Video, 1995.
5. ISO/IEC 14496-2 MPEG-4 Video Coding Standard, Information Technology—Coding of Audio-Visual Objects – Part 2: Visual, 2004.
6. T. Wiegand and G. Sullivan, Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification, (ITU-T rec. H.264/ISO/IEC 14496-10 AVC, presented at Joint Video Team (JVC) of ISO/IEC MPEG and ITU-T VCEG), 2003.
7. Iain E. G. Richardson, H.264 and MPEG-4 Video Compression—Video Coding for Next-generation Multimedia, John Wiley & Sons, 111 River Street, Hoboken NJ07030-5774, New Jersey, United States, 2003.
8. W. Gao, C. Reader, F. Wu, Y. He, L. Yu, H. Lu, S. Yang, T. Huang, and X. Pan, AVS—The Chinese Next-Generation Video Coding Standard, National Association of Broadcasters (NAB) Conference, 2004.
9. L. Yu, S. Chen, and J. Wang, Overview of AVS video coding standards, Signal Processing: Image Communication, vol. 24, issue 4, pp. 247–262, April 2009.
10. SMPTE, Standard for Television: VC-1 Compressed Video Bitstream Format and Decoding Process, SMPTE 421M-2006.
11. J. Bankoski, P. Wilkins, and Y. Xu, Technical overview of VP8, an open source video codec for the web, IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6, July 11–15, 2011.
12. M. T. Pourazad, C. Doutre, M. Azimi, and P. Nasiopoulos, HEVC: the new gold standard for video compression: How does HEVC compare with H.264/AVC ?, IEEE Consumer Electronics Magazine, vol. 1, pp. 36–46, July 2012.
13. H. S. Malvar, A. Hallapuro, M. Karczewicz, and L. Kerofsky, Low-complexity transform and quantization in H.264/AVC, IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 598–603, July 2003.
14. S. Gordon, D. Marple, and T. Wiegand, Simplified use of 8x8 transforms—updated proposal and results, JVT-K028, 11th Meeting, Munich, Germany, March 2004.
15. S. Srinivasan, P. Hsu, T. Holcomb, K. Mukerjee, S. L. Regunathan, B. Lin, J. Liang, M. C. Lee, and J. Ribas-Corbera, Windows media video 9: overview and applications, Signal Processing: Image Communication, vol. 19, issue 9, pp. 851–875, October 2004.
16. S. Srinivasan and S. L. Regunathan, An overview of VC-1, Proceedings of the SPIE, Visual Communications and Image Processing (VCIP), Beijing, China, vol. 5960, pp. 720–728, July 2005.
17. T. C. Wang, Y. W. Huang, H. C. Fang, and L. G. Chen, Parallel 4x4 2D transform and inverse transform architecture for MPEG-4 AVC/H.264, IEEE International Symposium on Circuits and Systems, vol. 2, pp. 800–803, 2003.
18. Z. Y. Cheng, C. H. Chen, B. D. Liu, and J. F. Yang, High throughput 2-D transform architectures for H.264 advanced video coders, IEEE Asia-Pacific Conference on Circuits and Systems, pp. 1141–1144, December 2004.
19. K. H. Chen, J. I. Guo, and J. S. Wang, A high-performance direct 2-D transform coding IP design for MPEG-4 AVC/H.264, IEEE Transactions on Circuits and Systems for Video Technology, vol. 16, no. 4, pp. 472–483, April 2006.
20. G. A. Su and C. P. Fan, Cost effective hardware sharing architecture for fast 1-D 8x8 forward and inverse integer transforms of H.264/AVC high profile, IEEE Asia Pacific Conference on Circuits and Systems, pp. 1332–1335, November 2008.
21. T. T. T. Do and T. M. Le, High throughput area-efficient SoC-based forward/inverse integer transform for H.264/AVC, IEEE International Symposium on Circuits and Systems, pp. 4113–4116, May 2010.
22. W. Hwangbo and C. M. Kyung, A multi-transform architecture for H.264/AVC high-profile coders, IEEE Transactions on Multimedia, vol. 12, no. 3, pp. 157–167, April 2010.
23. M. L. Hsia and Oscal T. C. Chen, Low-complexity inverse integer transform in H.264/AVC, IEEE International Conference on Multimedia & Expo, pp. 826–830, July 2010.
24. M. Nadeem, S. Wong, and G. Kuzmanov, Inverse integer transform in H.264/AVC intra-frame encoder, Sixth IEEE International Symposium on Electronic Design, Test and Application, pp. 228–233, 2011.
25. R. Jeske, J. C. de Souza, G. Wrege, R. Conceicao, M. Grellert, J. Mattos, and L. Agostini, Low cost and high throughput multiplierless design of a 16 point 1-D DCT of the new HEVC video coding standard, Conference on Programmable Logic (SPL), pp. 1–6, March 2012
26. S. Shen, W. Shen, Y. Fan, and Xiaoyang Zeng, A unified 4/8/16/32-point integer IDCT architecture for multiple video coding standards, IEEE International Conference on Multimedia and Expo (ICME), pp. 788–793, July 2012.
27. W. Zhao, T. Onoye, and T. Song, High-performance multiplierless transform architecture for HEVC, IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1668–1671, 2013.
28. M. Martuza, K. A. Wahid, Implementation of a cost shared transform architecture for multiple video codecs, Journal of Real-Time Image Processing, vol. 10, no. 1, pp. 151–162, March 2015.
29. A. Ahmed, M. U. Shahid, and A. Rehman, N point DCT VLSI architecture for emerging HEVC standard, VLSI Design, volume 2012, Article ID 752024, pp. 1–13, 2012.
30. Joint Collaborative Team—Video Coding, CE10: Core transform design for HEVC, JCTVC-G495, Geneva, Switzerland, 21–30, November 2011.
31. M. N. Haggag, M. El-Sharkawy, and G. Fahmy, Efficient fast multiplication-free integer transformation for the 2-D DCT H.265 standard, IEEE International Conference on Image Processing, pp. 3769–3772, September 2010.
32. C. W. Chang, H. F. Hsu, C. P. Fan, C. B. Wu, and Robert C. H. Chang, A fast algorithm-based cost-effective and hardware-efficient unified architecture design of 4×4, 8×8, 16×16, and 32×32 inverse core transforms for HEVC, Journal of Signal Processing Systems, vol. 82, no. 1, pp. 69–89, 2016.
33. S. Lee and K. Cho, Architecture of transform circuit for video decoder supporting multiple standards, Electronics Letters, vol. 44, no. 4, pp. 274–275, February 2008.
34. C. P. Fan and G. A. Su, Efficient low cost sharing design of fast 1-D inverse integer transform algorithms for H.264/AVC and VC-1, IEEE Signal Processing Letters, vol. 15, pp. 926–929, December 2008.
35. G. A. Su and C. P. Fan, Low-cost hardware sharing architecture of fast 1-D inverse transforms for H.264/AVC and AVS applications, IEEE Transactions on Circuits and Systems, Part II, vol. 55, no. 12, pp. 1249–1253, December 2008.
36. H. Qi, Q. Huang, and W. Gao, A low-cost very large scale integration architecture for multistandard inverse transform, IEEE Transactions on Circuits and Systems, Part II, vol. 57, no. 7, pp. 551–555, July 2010.
37. Y. K. Lai and Y. F. Lai, A Reconfigurable IDCT architecture for universal video decoders, IEEE Transactions on Consumer Electronics, vol. 56, no. 3, pp. 1872–1879, August 2010.
38. K. Wang, J. Chen, W. Cao, Y. Wang, L. Wang, and J. Tong, A reconfigurable multi-transform VLSI architecture supporting video codec design, IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 58, no. 7, pp. 432–436, July 2011.
39. K. Wahid, M. Martuza, M. Das, and C. McCrosky, Resource shared architecture of multiple transforms for multiple video codecs, 24th Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 000947–000950, 2011.
40. C. P. Fan, C. W. Chang, and S. J. Hsu, Cost effective hardware sharing design of fast algorithm based multiple forward and inverse transforms for H.264/AVC, MPEG-1/2/4, AVS, and VC-1 video encoding and decoding applications, IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 4, pp. 714–720, April 2014.
41. C. W. Chang, H. F. Hsu, and C. P. Fan, High-efficiency multiple 4x4 and 8x8 inverse transform design with a cost-effective unified architecture for multistandard video decoders, 2014 IEEE Asia Pacific Conference on Circuits & Systems, Okinawa, Japan, pp. 507–510, November 2014.
42. C. W. Chang, Fast algorithm based cost-effective and hardware-sharing architecture designs of multiple-mode discrete integer transforms for multi-standard video Codecs, Ph.D. dissertation, National Chung Hsing University, Taiwan, 2015.
43. http://en.wikipedia.org/wiki/Kronecker_product

[1] 1. J. R. Rao and P. Yip, Discrete Cosine Transform: Algorithms, Advantage, Applications, New York, NY: Academic, 1990.

[2] 2. ISO/IEC JTC 1/SC 29/WG 1—Coding of Still Pictures, 2009.

[3] 3. ISO/IEC 11172-2 MPEG-1 Video Coding Standard, Information Technology—Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1,5 Mbit/s – Part 2: Video, 1993.

[4] 4. ISO/IEC 13818-2 MPEG-2 Video Coding Standard, Information Technology—Generic Coding of Moving Pictures and Associated Audio Information: Video, 1995.

[5] 5. ISO/IEC 14496-2 MPEG-4 Video Coding Standard, Information Technology—Coding of Audio-Visual Objects – Part 2: Visual, 2004.

[6] 6. T. Wiegand and G. Sullivan, Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification, (ITU-T rec. H.264/ISO/IEC 14496-10 AVC, presented at Joint Video Team (JVC) of ISO/IEC MPEG and ITU-T VCEG), 2003.

[7] 7. Iain E. G. Richardson, H.264 and MPEG-4 Video Compression—Video Coding for Next-generation Multimedia, John Wiley & Sons, 111 River Street, Hoboken NJ07030-5774, New Jersey, United States, 2003.

[8] 8. W. Gao, C. Reader, F. Wu, Y. He, L. Yu, H. Lu, S. Yang, T. Huang, and X. Pan, AVS—The Chinese Next-Generation Video Coding Standard, National Association of Broadcasters (NAB) Conference, 2004.

[9] 9. L. Yu, S. Chen, and J. Wang, Overview of AVS video coding standards, Signal Processing: Image Communication, vol. 24, issue 4, pp. 247–262, April 2009.

[10] 10. SMPTE, Standard for Television: VC-1 Compressed Video Bitstream Format and Decoding Process, SMPTE 421M-2006.

[11] 11. J. Bankoski, P. Wilkins, and Y. Xu, Technical overview of VP8, an open source video codec for the web, IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6, July 11–15, 2011.

[12] 12. M. T. Pourazad, C. Doutre, M. Azimi, and P. Nasiopoulos, HEVC: the new gold standard for video compression: How does HEVC compare with H.264/AVC ?, IEEE Consumer Electronics Magazine, vol. 1, pp. 36–46, July 2012.

[13] 13. H. S. Malvar, A. Hallapuro, M. Karczewicz, and L. Kerofsky, Low-complexity transform and quantization in H.264/AVC, IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 598–603, July 2003.

[14] 14. S. Gordon, D. Marple, and T. Wiegand, Simplified use of 8x8 transforms—updated proposal and results, JVT-K028, 11th Meeting, Munich, Germany, March 2004.

[15] 15. S. Srinivasan, P. Hsu, T. Holcomb, K. Mukerjee, S. L. Regunathan, B. Lin, J. Liang, M. C. Lee, and J. Ribas-Corbera, Windows media video 9: overview and applications, Signal Processing: Image Communication, vol. 19, issue 9, pp. 851–875, October 2004.

[16] 16. S. Srinivasan and S. L. Regunathan, An overview of VC-1, Proceedings of the SPIE, Visual Communications and Image Processing (VCIP), Beijing, China, vol. 5960, pp. 720–728, July 2005.

[17] 17. T. C. Wang, Y. W. Huang, H. C. Fang, and L. G. Chen, Parallel 4x4 2D transform and inverse transform architecture for MPEG-4 AVC/H.264, IEEE International Symposium on Circuits and Systems, vol. 2, pp. 800–803, 2003.

[18] 18. Z. Y. Cheng, C. H. Chen, B. D. Liu, and J. F. Yang, High throughput 2-D transform architectures for H.264 advanced video coders, IEEE Asia-Pacific Conference on Circuits and Systems, pp. 1141–1144, December 2004.

[19] 19. K. H. Chen, J. I. Guo, and J. S. Wang, A high-performance direct 2-D transform coding IP design for MPEG-4 AVC/H.264, IEEE Transactions on Circuits and Systems for Video Technology, vol. 16, no. 4, pp. 472–483, April 2006.

[20] 20. G. A. Su and C. P. Fan, Cost effective hardware sharing architecture for fast 1-D 8x8 forward and inverse integer transforms of H.264/AVC high profile, IEEE Asia Pacific Conference on Circuits and Systems, pp. 1332–1335, November 2008.

[21] 21. T. T. T. Do and T. M. Le, High throughput area-efficient SoC-based forward/inverse integer transform for H.264/AVC, IEEE International Symposium on Circuits and Systems, pp. 4113–4116, May 2010.

[22] 22. W. Hwangbo and C. M. Kyung, A multi-transform architecture for H.264/AVC high-profile coders, IEEE Transactions on Multimedia, vol. 12, no. 3, pp. 157–167, April 2010.

[23] 23. M. L. Hsia and Oscal T. C. Chen, Low-complexity inverse integer transform in H.264/AVC, IEEE International Conference on Multimedia & Expo, pp. 826–830, July 2010.

[24] 24. M. Nadeem, S. Wong, and G. Kuzmanov, Inverse integer transform in H.264/AVC intra-frame encoder, Sixth IEEE International Symposium on Electronic Design, Test and Application, pp. 228–233, 2011.

[25] 25. R. Jeske, J. C. de Souza, G. Wrege, R. Conceicao, M. Grellert, J. Mattos, and L. Agostini, Low cost and high throughput multiplierless design of a 16 point 1-D DCT of the new HEVC video coding standard, Conference on Programmable Logic (SPL), pp. 1–6, March 2012

[26] 26. S. Shen, W. Shen, Y. Fan, and Xiaoyang Zeng, A unified 4/8/16/32-point integer IDCT architecture for multiple video coding standards, IEEE International Conference on Multimedia and Expo (ICME), pp. 788–793, July 2012.

[27] 27. W. Zhao, T. Onoye, and T. Song, High-performance multiplierless transform architecture for HEVC, IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1668–1671, 2013.

[28] 28. M. Martuza, K. A. Wahid, Implementation of a cost shared transform architecture for multiple video codecs, Journal of Real-Time Image Processing, vol. 10, no. 1, pp. 151–162, March 2015.

[29] 29. A. Ahmed, M. U. Shahid, and A. Rehman, N point DCT VLSI architecture for emerging HEVC standard, VLSI Design, volume 2012, Article ID 752024, pp. 1–13, 2012.

[30] 30. Joint Collaborative Team—Video Coding, CE10: Core transform design for HEVC, JCTVC-G495, Geneva, Switzerland, 21–30, November 2011.

[31] 31. M. N. Haggag, M. El-Sharkawy, and G. Fahmy, Efficient fast multiplication-free integer transformation for the 2-D DCT H.265 standard, IEEE International Conference on Image Processing, pp. 3769–3772, September 2010.

[32] 32. C. W. Chang, H. F. Hsu, C. P. Fan, C. B. Wu, and Robert C. H. Chang, A fast algorithm-based cost-effective and hardware-efficient unified architecture design of 4×4, 8×8, 16×16, and 32×32 inverse core transforms for HEVC, Journal of Signal Processing Systems, vol. 82, no. 1, pp. 69–89, 2016.

[33] 33. S. Lee and K. Cho, Architecture of transform circuit for video decoder supporting multiple standards, Electronics Letters, vol. 44, no. 4, pp. 274–275, February 2008.

[34] 34. C. P. Fan and G. A. Su, Efficient low cost sharing design of fast 1-D inverse integer transform algorithms for H.264/AVC and VC-1, IEEE Signal Processing Letters, vol. 15, pp. 926–929, December 2008.

[35] 35. G. A. Su and C. P. Fan, Low-cost hardware sharing architecture of fast 1-D inverse transforms for H.264/AVC and AVS applications, IEEE Transactions on Circuits and Systems, Part II, vol. 55, no. 12, pp. 1249–1253, December 2008.

[36] 36. H. Qi, Q. Huang, and W. Gao, A low-cost very large scale integration architecture for multistandard inverse transform, IEEE Transactions on Circuits and Systems, Part II, vol. 57, no. 7, pp. 551–555, July 2010.

[37] 37. Y. K. Lai and Y. F. Lai, A Reconfigurable IDCT architecture for universal video decoders, IEEE Transactions on Consumer Electronics, vol. 56, no. 3, pp. 1872–1879, August 2010.

[38] 38. K. Wang, J. Chen, W. Cao, Y. Wang, L. Wang, and J. Tong, A reconfigurable multi-transform VLSI architecture supporting video codec design, IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 58, no. 7, pp. 432–436, July 2011.

[39] 39. K. Wahid, M. Martuza, M. Das, and C. McCrosky, Resource shared architecture of multiple transforms for multiple video codecs, 24th Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 000947–000950, 2011.

[40] 40. C. P. Fan, C. W. Chang, and S. J. Hsu, Cost effective hardware sharing design of fast algorithm based multiple forward and inverse transforms for H.264/AVC, MPEG-1/2/4, AVS, and VC-1 video encoding and decoding applications, IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 4, pp. 714–720, April 2014.

[41] 41. C. W. Chang, H. F. Hsu, and C. P. Fan, High-efficiency multiple 4x4 and 8x8 inverse transform design with a cost-effective unified architecture for multistandard video decoders, 2014 IEEE Asia Pacific Conference on Circuits & Systems, Okinawa, Japan, pp. 507–510, November 2014.

[42] 42. C. W. Chang, Fast algorithm based cost-effective and hardware-sharing architecture designs of multiple-mode discrete integer transforms for multi-standard video Codecs, Ph.D. dissertation, National Chung Hsing University, Taiwan, 2015.

[43] 43. http://en.wikipedia.org/wiki/Kronecker_product

Fast Algorithm Designs of Multiple-Mode Discrete Integer Transforms with Cost-Effective and Hardware-Sharing Architectures for Multistandard Video Coding Applications

Recent Advances in Image and Video Coding

Abstract

Keywords

Author Information

Chih-Peng Fan*

1. Introduction

2. Matrix decomposition preprocessing for fast algorithm and hardware-sharing-based designs

Table 1.

3. Case study [32]: single-standard multiple-mode transform design

3.1. Hardware-sharing based 32 × 32 integer core transform for HEVC

3.2. Hardware-sharing-based 16 × 16 integer core transform for HEVC

3.3. Hardware-sharing-based 8 × 8 integer core transform for HEVC

3.4. Hardware-sharing-based 4 × 4 integer core transform for HEVC

3.5. Architecture comparison

Figure 1.

Table 2.

Table 3.

4. Case study [41]: multiple-standard multiple-mode transform design

4.1. Hardware-sharing design for 8 × 8 transforms mode

4.2. Hardware-sharing design for 4 × 4 transforms mode

4.3. Architecture comparison

Table 4.

Table 5.

5. Conclusion

Acknowledgments

References

Implementation of Video Compression Standards in Digital Television

Fast Algorithm Designs of Multiple-Mode Discrete Integer Transforms with Cost-Effective and Hardware-Sharing Architectures for Multistandard Video Coding Applications

Recent Advances in Image and Video Coding

Abstract

Keywords

Author Information

Chih-Peng Fan*

1. Introduction

2. Matrix decomposition preprocessing for fast algorithm and hardware-sharing-based designs

Table 1.

3. Case study [32]: single-standard multiple-mode transform design

3.1. Hardware-sharing based 32 × 32 integer core transform for HEVC

3.2. Hardware-sharing-based 16 × 16 integer core transform for HEVC

3.3. Hardware-sharing-based 8 × 8 integer core transform for HEVC

3.4. Hardware-sharing-based 4 × 4 integer core transform for HEVC

3.5. Architecture comparison

Figure 1.

Table 2.

Table 3.

4. Case study [41]: multiple-standard multiple-mode transform design

4.1. Hardware-sharing design for 8 × 8 transforms mode

4.2. Hardware-sharing design for 4 × 4 transforms mode

4.3. Architecture comparison

Table 4.

Table 5.

5. Conclusion

Acknowledgments

References

Continue reading from the same book

Recent Advances in Image and Video Coding