WIMAX binary extended Hamming codes (
WIMAX has gained a wide popularity due to the growing interest and diffusion of broadband wireless access systems. In order to be flexible and reliable WIMAX adopts several different channel codes, namely convolutional-codes (CC), convolutional-turbo-codes (CTC), block-turbo-codes (BTC) and low-density-parity-check (LDPC) codes, that are able to cope with different channel conditions and application needs.
On the other hand, high performance digital CMOS technologies have reached such a development that very complex algorithms can be implemented in low cost chips. Moreover, embedded processors, digital signal processors, programmable devices, as FPGAs, application specific instruction-set processors and VLSI technologies have come to the point where the computing power and the memory required to execute several real time applications can be incorporated even in cheap portable devices.
Among the several application fields that have been strongly reinforced by this technology progress, channel decoding is one of the most significant and interesting ones. In fact, it is known that the design of efficient architectures to implement such channel decoders is a hard task, hardened by the high throughput required by WIMAX systems, which is up to about 75 Mb/s per channel. In particular, CTC and LDPC codes, whose decoding algorithms are iterative, are still a major topic of interest in the scientific literature and the design of efficient architectures is still fostering several research efforts both in industry and academy.
In this Chapter, the design of VLSI architectures for WIMAX channel decoders will be analyzed with emphasis on three main aspects: performance, complexity and flexibility. The chapter will be divided into two main parts; the first part will deal with the impact of system requirements on the decoder design with emphasis on memory requirements, the structure of the key components of the decoders and the need for parallel architectures. To that purpose a quantitative approach will be adopted to derive from system specifications key architectural choices; most important architectures available in the literature will be also described and compared.
The second part will concentrate on a significant case of study: the design of a complete CTC decoder architecture for WIMAX, including also hardware units for depuncturing (bit-deselection) and external deinterleaving (sub-block deinterleaver) functions.
2. From system specifications to architectural choices
The system specifications and in particular the requirement of a peak throughput of about 75 Mb/s per channel imposed by the WIMAX standard have a significant impact on the decoder architecture. In the following sections we analyze the most significant architectures proposed in the literature to implement CC decoders (Viterbi decoders), BTC, CTC and LDPC decoders.
2.1. Viterbi decoders
The most widely used algorithm to decode CCs is the Viterbi algorithm Viterbi, 1967, which is based on finding the shortest path along a graph that represents the CC trellis. As an example in Fig. 1 a binary 4-states CC is shown as a feedback shift register (a) together with the corresponding state diagram (b) and trellis (c) representations.
In the given example, the feedback shift register implementation of the encoder generates two output bits,
At each trellis step
As it can be inferred from (1)
The WIMAX standard specifies a binary 64 states CC with rate 0.5, whose shift register representation is shown in Fig. 3. Usually Viterbi decoder architectures exploit the trellis intrinsic parallelism to simultaneously compute at each trellis step all the branch metrics and update all the state metrics. Thus, said
To improve the decoder throughput, two Black & Meng, 1992 or more Fettweis & Meyr, 1989, Kong & Parhi, 2004, Cheng & Parhi, 2008 trellis steps can be processed concurrently. These solutions lead to the so called higher radix or
Thus, to achieve the throughput required by the WIMAX standard with a clock frequency limited to tens to few thousands of MHz,
However, since CCs are widely used in many communication systems, some recent works as Batcha & Shameri, 2007 and Kamuf et al., 2008 address the design of flexible Viterbi decoders that are able to support different CCs. As a further step Vogt & When, 2008 proposed a multi-code decoder architecture, able to support both CCs and CTCs.
2.2. BTC decoders
Block Turbo Codes or product codes are serially concatenated block codes. Given two block codes
If the Chase search fails the extrinsic information is approximated as
The decoder that receives the extrinsic information uses an updated version of
Several works in the literature deal with BTC complexity reduction. As an example Adde & Pyndiah, 2000 suggests to compute
Due to its row-column structure, the block turbo decoder can be parallelized by instantiating several elementary decoders to concurrently process more rows or columns, thus increasing the throughput. As a significant example in [Jego et al., 2006] a fully parallel BTC decoder is proposed. This solution instantiates
A detailed analysis of throughput and complexity of BTC decoder architectures can be found in Goubier et al. 2008 and LeBidan et al., 2008. In particular, according to Goubier et al. 2008 a simple one block decoder architecture that performs the row/column decoding sequentially (interleaved architecture) requires 2
where I is the number of iterations and
Considering the interleaved architecture described in Goubier et al. 2008 where a fully decoded block is output every 4.5 half iterations, we obtain that 75 Mb/s can be obtained with a clock frequency of 84 MHz, 31 MHz and 14 MHz for H(15,11), H(31,26) and
2.3. CTC decoders
Convolutional turbo codes were proposed in 1993 by Berrou, Glavieux and Thitimajshima Berrou et al., 1993 as a coding scheme based on the parallel concatenation of two CCs by the means of an interleaver (Π) as shown in Fig. 5 (a). The decoding algorithm is iterative and is based on the BCJR algorithm Bahl et al., 1974 applied on the trellis representation of each constituent CC Fig. 5 (b). The key idea relies on the fact that the extrinsic information output by one CC is used as an updated version of the input a-priori information by the other CC. As a consequence, each iteration is made of two half iterations, in one half iteration the data are processed according to the interleaver (Π) and in the other half iteration according to the deinterleaver (Π-1). The same result can be obtained by implementing an in-order read/write half iteration and a scrambled (interleaved) read/write half iteration. The basic block in a turbo decoder is a SISO module that implements the BCJR algorithm in its logarithmic likelihood ratio (LLR) form. If we consider a Recursive Systematic CC (RSC code), the extrinsic information λk(u;O) of an uncoded symbol u at trellis step k output by a SISO is
where ũ is an uncoded symbol taken as a reference (usually ũ=0), e represents a certain transition on the trellis and
The CTC specified in the WIMAX standard is based on a double binary 8-state constituent CC as shown in Fig. 6, where each CC receives two uncoded bits (A, B) and produces four coded bits, two systematic bits (A,B) and two parity bits (Y,W). As a consequence, at each trellis step four transitions connect a starting state to four possible ending states. Due to the trellis symmetry only 16 branch metrics out of the possible 32 branch metrics are required at each trellis step. As pointed out in Muller et al. 2006 high throughput can be achieved by exploiting the trellis parallelism, namely computing concurrently all the branch and state metrics.
The 16 branch metrics are computed by a BMU that implements (12) as shown in Fig. 7. To reduce the latency of the SISO, usually the decoding is based on a sliding-window approach Benedetto et al., 1996. As a consequence, at least two BMUs are required to compute the two recursions (forward and backward) according to the BCJR algorithm. However, since β metrics require to be trained between successive windows, usually a further BMU is required. A solution based on the inheritance of the border metrics of each window Abbasfar & Yao 2003 requires only two BMUs. Furthermore, this strategy reduces the SISO latency to the sliding window width W. The state metrics are updated according to (10) and (11) by two state metric processors, each of which is made of a proper number of processing elements (PE). As shown in Fig. 7 for the WIMAX CTC 8 PEs are required. It is worth pointing out that the constituent codes of the WIMAX CTC use the circulation state tailbiting strategy proposed in Weiss et al. 2001 that ensures that the ending state of the last trellis step is equal to the starting state of the fist trellis step. However, this technique requires estimating the circulation state at the decoder side. Since training operations to estimate the circulation state would increase the SISO latency, an effective alternative Zhan et al. 2006 is to inherit these metrics from the previous iteration.
As in Viterbi decoder architectures often in CTC decoders the state metrics are computed by means of the “wrapping” representation technique proposed in Hekstra, 1989. This solution requires a normalization stage, depicted in Fig. 7, when combining , β and γ metrics to compute the extrinsic information as in (8). The last stage of the output processor, that computes the output extrinsic information, is a tree of max blocks for each component of the extrinsic information and few adders to implement (8). As highlighted in Fig. 7 this scheduling requires a buffer to store input LLRs that are used to compute the backward recursion (BMU-MEM). Since the output extrinsic information is computed during the backward recursion, forward recursion metrics are stored in a buffer (-MEM). Further memory is required to implement the border metric inheritance, -EXT-MEM, β-EXT-MEM and β-LOC-MEM.
The throughput sustained by the CTC decoder, defined as the number of decoded bits over the time required for their computation, is
Usually optimized architectures Masera et al., 1999, Bickerstaff et al., 2003, Kim & Park, 2008 are obtained with
It is worth pointing out that parallel architectures increase not only the throughput but also the complexity of the decoder, so that some recent works aim at reducing the amount of memory required to implement SISO local buffers. In Liu et al., 2007 and Kim & Park, 2008 saturation of forward state metrics and quantization of border backward state metrics is proposed. Further studies have been performed to reduce the extrinsic information bit width by using adaptive quantization Singh et al., 2008, pseudo-floating point representation Park et al., 2008 and bit level representation Kim & Park, 2009.
2.4. LDPC code decoders
LDPC codes were originally introduced in 1962 by Gallager Gallager, 1962 and rediscovered in 1996 by MacKay and Neal [MacKay, 1996]. As turbo codes, they achieve near optimum error correction performance and are decoded by means of high complexity iterative algorithms.
An LDPC code is a linear block code defined by a
LDPC codes are usually decoded by means of an iterative algorithm variously known as sum-product, belief propagation or message passing, and reformulated in a version that processes logarithmic likelihood ratios instead of probabilities. In the first iteration, half variable nodes receive data from adjacent check nodes and from the channel and use them to obtain updated information sent to the check nodes; in the second half, check nodes take the updated information received from connected bit nodes and generate new messages to be sent back to variable nodes.
In message passing decoders, messages are exchanged along the edges of the Tanner graph, and computations are performed at the nodes. To avoid multiplications and divisions, the decoder usually works in the logarithmic domain.
The message passing algorithm is described in the following equations, where
Each variable node is initialized with the log-likelihood ratio (LLR)
The check node computes new check to variable messages as
After a number of iterations that strongly depends on the addressed application and code rate (typically 5 to 40), variable nodes compute an overall estimation of the decoded bit in the form
where the sign of
A large implementation complexity is associated to (19), which is simplified in different ways. First of all, function
A further change is usually applied to the scheduling of variable and check nodes in order to improve communications performance. In the two-phase scheduling, the updating of variable and check nodes is accomplished in two separate phases. On the contrary, the turbo decoding message passing (TDMP) Mansour & Shanbhag, 2003, also known as layered or shuffled decoding, allows for overlapped update operations: messages calculated by a subset of check nodes are immediately used to update variable nodes. This scheduling has been proved to be able to reduce the number of iterations by up to 50% at a fixed communications performance.
The required number of functional units in a decoder can be estimated based on the concept of processing power
As two messages are associated with each edge (to be sent from the CN to the VN and vice versa),
Actually, most of the implementation concerns come from the communication structure that must be allocated to support message passing from bit to check nodes and vice versa. Several hardware realizations that have been proposed in the literature are focused on how efficiently passing messages between the two types of processing units.
Three approaches can be followed in the high level organization of the decoder, coming to three kinds of architectures.
-Serial architectures: bit and check processors are allocated as single instances, each serving multiple nodes sequentially; messages are exchanged by means of a memory.
-Fully parallel architectures: processing units are allocated for each single bit and check node and all messages are passed in parallel on dedicated routes.
-Partially parallel architectures: more processing units work in parallel, serving all bit and check nodes within a number of cycles; suitable organization and hardware support is required to exchange messages.
For most codes and applications, the first approach results in slow implementations, while the second one has an excessive cost. As a result the only general viable solution is the third partially parallel approach, which on the other hand introduces the collision problem, already known in the implementation of parallel turbo decoders. Two main approaches have been proposed to deal with collisions:
Even if the first approach has proven to be effective, it significantly limits the supported code classes. The second approach, on the other hand, is well suited for flexible and general architectures. An even more challenging task is the design of LDPC decoders that are flexible in terms of supported block sizes and code rates Masera et al., 2007.
In partially parallel structures, permutation networks are used to establish the correct connections between functional units. However, structured LDPC codes, such as those specified in WIMAX, allow for replacing permutation networks by low complexity barrel shifters Boutillon et al., 2000, Mansour & Shanbhag, 2003.
Early terminal schemes can be adopted to improve the decoding efficiency by dynamically adjusting the iteration number according to the SNR values. The simplest approach requires that decoding decisions are stored and compared across two consecutive iterations: if no changes are detected, the decoding is terminated, otherwise it is continued up to a maximum number of iterations. More sophisticated iteration control schemes are able to reduce the mean number of iterations, so saving both latency and energy Kienle & When, 2005, Shin et al., 2007.
3. Case of study: complete WIMAX CTC decoder design
The WIMAX CTC decoder is made of three main blocks: symbol deselection (SD), subblock deinterleaver and CTC decoder as highlighted in Fig. 9 where N represents the number of couples included in a data frame. SD, subblock deinterleaver and CTC decoder blocks are connected together by means of memory buffers in order to guarantee that the non iterative part of the decoder (namely SD and subblock deinterleaver) and the decoding loop work simultaneously on consecutive data frames. Since the maximum decoder throughput is about 75 Mb/s and the native CTC rate is 1/3 (two uncoded bits produce six coded bits), at the input of the decoding loop the maximum throughput can rise up to 225 millions of LLRs per second. The same throughput ought to be sustained by the subblock deinterleaver, whereas even higher throughput has to be sustained at the SD unit in case of repetition.
3.1. Symbol deselection
Depending on amount of data sent by the encoder (puncturing or repetition), the throughput sustained by the symbol deselection (SD) can rise up to 900 millions of LLRs per second (repetition 4). When the encoder performs repetition, the same symbol is sent more than once. Thus, the decoder combines the LLRs referred to the same symbol to improve the reliability of that symbol. As shown in Fig. 9 this can be achieved partitioning the symbol deselection input buffer into four memories, each of which containing up to 6N LLRs.
Since the symbol deselection architecture can read up to four LLRs per clock cycle, it reduces the incoming throughput to 225 millions of LLRs per second. However, the symbol deselection has to compute the starting location and the number of LLRs to be written into the output buffer. The number of LLRs and the starting location are obtained as in (23) and (24) respectively, where
The efficient implementation of (25) is obtained with an adder whose inputs are
A block scheme of the architecture employed to compute
Furthermore, in order to support the puncturing mode, the output memory locations corresponding to unsent bits must be set to zero. To ease the SD architecture implementation, all the output memory locations are set to zero while
As it can be observed, to sustain 225 millions of LLRs per second a clock frequency of 450 MHz is required. To overcome this problem we impose not only to partition the input buffer into four memories, but also to increase the memory parallelism, so that each memory location contains p LLRs. Thus, we can rewrite (27) as (28) and by setting p to a conservative value, as p=4, the SD architecture processes simultaneously up to sixteen LLRs with
3.2. Subblock deinterleaver
The received LLRs belong to six possible subblocks depending on the coded bits they are referred to (
1: k←0 2: i←0 3: while i<N do 4:
Algorithm 1. Subblock deinterleaver address generator.
As a consequence, the number of tentative addresses generated,
Since all the subblocks can be processed simultaneously, this architecture deinterleaves six LLRs per clock cycle. As a consequence, the subblock deinterleaver sustains a throughput
Thus, a throughput of 225 Millions of LLRs per second is sustained using
To implement line 4 and 5 in Algorithm 1, three steps are required, namely the calculation of
3.3. CTC decoder
As detailed in section 2.3 to sustain the throughput required by the WIMAX standard a parallel decoder architecture is required. To that purpose we set
Moreover, the window width impacts both on the decoder throughput and on the depth of SISO local buffers. So that a proper W value for each frame size must be selected. In particular if
Exhaustive simulations show that collisions occur for
Thus, the parallel CTC interleaver-deinterleaver system is obtained as a cascaded two stage architecture (see Fig. 12). The first stage efficiently implements the WIMAX interleaver algorithm, whereas the second one extracts the common memory address
The CTC interleaver algorithm specified in the WIMAX standard is structured in two steps. The first step switches the LLRs referred to A and B that are stored at odd addresses. The second step provides the interleaved address i of the j-th couple as
A small Look-Up-Table (LUT) is employed to store
The second stage of the parallel CTC interleaver-deinterleaver architecture works as follows.
The straightforward implementation of (33) needs to calculate
The global architecture of the designed parallel SISO is given in Fig. 13 where each SISO contains the processors devoted to compute the different metrics required by the BCJR algorithm as detailed in section 2.3. A simple network is used to properly connect the SISOs according to the current value of P by setting the signal last_SISO. Furthermore, one address crossbar-switch (radx-switch) is used to implement the reading operation, a LIFO stores the address and makes them available for the writing phase, two data crossbar-switches (rdata-switch/wdata-switch) are used to properly send (receive) the data to (from) the memory (EI-MEM) according to the parallel interleaver
In Table 2 the complexity of all the blocks for a 130 nm standard cell technology is reported. The bit-width is: 6 bit for
|Architecture||SD||Subblock Deinterl.||SISOx1||Parallel Interl.|
This work is partially supported by the WIMAGIC project funded by the European Community.
Abbasfar A. Yao K. 2003 An Efficient and Practival Architecture for High Speed Turbo Decoders., , 337 341, Orlando, USA, Oct. 2003
Adde P. Pyndiah R. 2000 Recent Simplifications and Improvements in Block Turbo Codes., , 133 136, Brest, France, Sep. 2000
Bahl L. R. Cocke J. Jelinek F. Raviv J. 1974 Optimal Decoding of Linear Codes for Minimizing Symbol Error Rate., 20 2Mar. 1974, 284 287
Batcha M. F. N. Shameri A. Z. 2007Configurable, Adaptive Viterbi Decoder for GPRS, EDGA and WIMAX, Proceedings of the IEEE International Conference on Telecommunications and Malaysia International Conference on Communications, 237 241, Pennang, Malaysia, May 2007
Benedetto S. Montorsi G. Divsalar D. Pollara F. 1996 Algorithm for Continuous Decoding of Turbo Codes., , 32 4Apr. 1996, 314 315
Berrou C. Glavieux A. Thitimajshima P. 1993 Near Shannon Limit Error-Correcting Codes : Turbo Codes., , 1064 1070, Geneva, Switzerland, May 1993
Berrou C. Jezequel M. Douillard C. Kerouedan S. 2001 The Advantages of non-binary turbo codes, , 61 63, Cairns, Australia, Sep. 2001
Bickerstaff M. Davis L. Thomas C. Garret D. Nicol C. 2003 A 24 Mb/s Radix/4 LogMAP Turbo Decoder for 3GPP-HSDPA Mobile Wireless, Proceedings of the IEEE International Solid State Circuits Conference, Section 8- paper 8.5, San Francisco, USA, Feb. 2003
Black P. J. Meng T. H. 1992 A 140-Mb/s, 32-State, Radix4 Viterbi Decoder., 27 12Dec. 1992, 1877 1885
Boutillon E. Castura J. Kschischang F. 2000 Decoder-first code design., Proceedings of the 2nd International Symposium on Turbo Codes & Related Topics, 459 462, Brest, France, Sep. 2000
Chase D. 1972 A Class of Algorithms for Decoding Block Codes with Channel Measurement Information., , IT-19, 1Dec. 1972, 170 182
Chen J. Dholakia A. Eleftheriou E. Fossorier M. P. C. Hu X. Y. 2005 Reduced-Complexity Decoding of LDPC Codes., , 53 8Aug. 2005, 1288 1299
Cheng C. Parhi K. K. 2008 Hardware Efficient Low-Latency Architecture for High-Throughput Rate Viterbi Decoders., , 55 12Dec. 2008, 1254 1258
Cheng J. F. Ottosson T. 2000Linearly Approximated Log-MAP Algorithm for Turbo Decoding, Proceedings of the IEEE Vehicular Technology Conference, Tokio, Japan, May 2000, 2252 2256
Chi Z. Song L. Parhi K. K. 2004 On the Performance/Complexity Tradeoff in Block Turbo Decoder Design., , 52 2Feb. 2004, 173 175
Classon B. Blankenship K. Desai V. 2002 Channel Coding for 4G Systems with Adaptive Modulation and Coding., , 9 2Apr. 2002, 8 13
Dinoi L. Martini R. Masera G. Quaglio F. Vacca F. 2006 ASIP design for partially structured LDPC codes., , 42 18Aug. 2006, 1048 1049
Fettweis G. Meyr H. 1989 Parallel Viterbi algorithm implementation: Breaking the ACS-bottleneck., , 37 8Aug. 1989, 785 790
Gallager R. G. 1962 Low-Density Parity-Check Codes., , 8 1Jan. 1962, 21 28
Gilbert F. Thul M. J. Wehn N. 2003Communication Centric Architectures for Turbo-Decoding on Embedded Multiprocessors, Proceedings of Design Automation and Test in Europe Conference and Exhibition, 356 361, Munich, Germany, Mar. 2003
Giulietti A. van der Perre L. Strum M. 2002 Parallel Turbo Coding Interleavers: Avoiding Collisions in Accesses to Storage Elements., , 38 5Feb. 2002, 232 234
Gnaedig D. Boutillon E. Jezequel M. Gaudet V. C. Gulak P. G. 2003 Multiple Slice Turbo Codes., , 343 346, Brest, France, Sep. 2003
Goubier T. Dezan C. Pottier B. Jego C. 2008 Fine Grain Parallel Decoding of Product Turbo Codes: Algorithm and Architecture., Proceedings of the 5th International Symposium on Turbo Codes & Related Topics, 90 95, Lausanne, Switzerland, Sep. 2008
Gross W. J. Gulak P. G. 1998 Simplified MAP Algorithm Suitable for Implementation of Turbo Decoders., , 34 16Aug. 1998, 1577 1578
Guilloud F. Boutillon E. Tousch J. Danger J. 2007 Generic description and synthesis of LDPC decoders., , 55 11Nov. 2007, 2084 2091
Hekstra A. P. 1989 An Alternative to Metric Rescaling in Viterbi Decoders., , 37 11Nov. 1989, 1220 1222
Hocevar D. E. 2003 LDPC code construction with flexible hardware implementation., , 2708 2712, Anchorage, USA, May 2003
Jego C. Adde P. Leroux C. 2006 Full-Parallel Architecture for Turbo Decoding of Product Codes., IET Electronics Letters, 42 18Aug. 2006, 1052 1053
Kamuf M. Owall V. Anderson J. B. 2008 Optimization and Implementation of a Viterbi Decoder Under Flexibility Constraints., IEEE Trans. on Circuits and Systems I, 55 9Sep. 2008, 2411 2422
Kerouedan S. Adde P. 2000 Implementation of a Block Turbo Decoder on a Single Chip., Proceedings of the 2nd International Symposium on Turbo Codes & Related Topics, 133 136, Brest, France, Sep. 2000
Kienle F. Thul M. J. Wehn N. 2003 Implementation issue of scalable LDPC-decoders., Proceedings of the 3rd International Symposium on Turbo Codes & Related Topics, 291 294, Brest, France, Sep. 2003
Kienle F. Wehn N. 2005Low Complexity Stopping Criterion for LDPC Code Decoders, Proceedings of the IEEE Vehicular Technology Conference, 606 609, Stockholm, Sweden, May 2005
Kim J. H. Park I. C. 2008 Double Binary Circular Turbo Decoding Based on Border Metric Encoding., IEEE Trans. on Circuits and Systems II, 55 1Jan. 2008, 79 83
Kim J. H. Park I. C. 2009 Bit-Level Extrinsic Information Exchange Method for Double-Binary Turbo Codes., IEEE Trans. on Circuits and Systems II, 56 1Jan. 2009, 81 85
Kong J. J. Parhi K. K. 2004 Low-Latency Architectures for High-Throughput Rate Viterbi Decoders., IEEE Trans. on VLSI, 12 6Jun. 2004, 642 651
Kwak J. Lee K. 2002 Design of Dividable Interleaver for Parallel Decoding in Turbo Codes., IET , 38 22Oct. 2002, 1362 1364
Le N. Soleymani M. R. Shayan Y. R. 2005 Distance-Based Decoding of Block Turbo Codes., IEEE Communications Letters, 9 11Nov. 2005, 1006 1008
Le Bidan R. Leroux C. Jego C. Adde P. Pyndiah R. 2008 Reed-Solomon Turbo Product Codes for Optical Communications: From Code Optimization to Decoder Design., Journal on Wireless Communications and Networking, 2008Article ID 658042, 14 pages
Liu H. Diguet J. P. Jego C. . Jezequel M. Boutillon E. 2007 Energy Efficient Turbo Decoder with Reduced State Metric Quantization., Proceedings of the IEEE Workshop on Signal Processing Systems, 237 242, Shanghai, China, Oct. 2007
Mansour M. M. Shanbhag N. R. 2003 High throughput LDPC decoders., IEEE Trans. on VLSI., 11 6Dec. 2003, 976 996
Martina M. Nicola M. Masera G. 2008-a FlexibleA.-WiU. M. T. S.MaxTurbo.DecoderArchitecture.IEEE Trans on Circuits and Systems II, 55 55 4Apr. 2008, 369 373
Martina M. Nicola M. Masera G. (2008-b Hardware Design of a Low Complexity, Parallel Interleaver for WiMax Duo-Binary Turbo Decoding., , 12 11Nov. 2008, 846 848
Martina M. Nicola M. Masera G. 2009 VLSI Implementation of WiMax Convolutional Turbo Code Encoder and Decoder., , Systems, and Computers, 18 3May 2009, 535 564
Masera G. Piccinini G. Ruo Roch. M. Zamboni M. 1999 VLSI Architectures for Turbo Codes., IEEE Trans. on VLSI, 7 3Sep. 1999, 369 379
Masera G. Quaglio F. Vacca F. 2005 Finite precision implementation of LDPC decoders., IEE Proceedings- Communications, 152 6Dec. 2005, 1098 1102
Masera G. Quaglio F. Vacca F. 2007 Implementation of a Flexible LDPC Decoder., IEEE Trans. on Circuits and Systems II, 54 6Jun 2007, 542 546
Muller O. Baghdadi A. Jezequel M. 2006Exploiting Parallel Processing Levels for Convolutional Turbo Decoding, Proceedings of the IEEE International Conference on Information and Communication Technologies, 2353 2358, Damascus, Syria, Apr. 2006
Muller O. Baghdadi A. Jezequel M. 2009 From Parallelism Levels to a Multi-ASIP Architecture for Turbo Decoding., IEEE Trans. on VLSI, 17 1Jan. 2009, 92 102
Park S. M. Kwak J. Lee K. 2008 Extrinsic Information Memory Reduced Architecture for Non-Binary Turbo Decoder Implementation., Proceedings of the IEEE Vehicular Technology Conference, 539 543, Marina Bay, Singapore, May 2008
Pyndiah R. M. 1998 Near-Optimum Decoding of Product Codes: Block Turbo Codes., IEEE Trans. on Communications, 46 8Aug. 1998, 1003 1010
Quaglio F. Vacca F. Castellano C. Tarable A. Masera G. 2006, Proceedings of Design, Automation and Test in Europe Conference and Exhibition, 1 6, Munich, Germany, Mar. 2006,
Radar C. M. 1981 Memory Management in a Viterbi Decoder., IEEE Trans. on Communications, COM-29, 9Sep. 1981, 1399 1401
Robertson P. Villebrun E. Hoeher P. 1995A comparison of optimal and sub-optimal MAP decoding algorithms operating in the log domain, Proceedings of the IEEE International Conference on Communications, 1009 1013, Seattle, USA, Jun. 1995
Shin D. Heo K. Oh S. Ha J. 2007 A Stopping Criterion for Low-Density Parity-Check Codes., Proceedings of the IEEE Vehicular Technology Conference, 1529 1533, Dublin, Ireland, Apr. 2007.
Singh A. Boutillon E. Masera G. 2008, Proceedings of the 5th International Symposium on Turbo Codes & Related Topics, 134 138, Lausanne, Switzerland, Sep. 2008
Speziali F. Zory J. 2004Scalable and Area Efficient Concurrent Interleaver for High Throughput Turbo-Decoders, Proceedings of Euromicro Symposium on Digital SystemDesign, 334 341, Rennes, France, Sep. 2004
Talakoub S. Sabeti L. Shahrrava B. Ahmadi M. 2007 An Improved Max-Log-MAP Algorithm for Turbo Decoding and Turbo Equalization., , 56 3Jun. 2007, 1058 1063
Tarable A. Benedetto S. Montorsi G. 2004 Mapping interleaver laws to parallel turbo and LDPC decoders architectures., 50 9Sep. 2004, 2002 2009
Thul M. J. Wehn N. Rao L. P. 2002Enabling High-Speed Turbo-Decoding through Concurrent Interleaving, Proceedings of the IEEE International Symposium on Circuits and Systems, 897 900, Scottsdale, USA, May 2002
Thul M. J. Gilbert F. Wehn N. 2003Concurrent Interleaving Architectures for High-Throughput Channel Coding, Proceedings of the IEEE International Conference on Acoustic, Speech and Signal Processing, 613 616, Hong Kong, Apr. 2003
Vanstraceele C. Geller B. Brossier J. M. Barbot J. P. LowA.ComplexityBlock.TurboDecoder.Architecture., 56 56 12Dec. 2008, 1985 1987
Viterbi A. J. 1967 Error bounds for convolutional codes and an asymptotically optimum decoding algorithm., , IT-13, Apr. 1967, 260 269
Vogt J. Finger A. 2000 Improving the Max-Log-MAP Turbo Decoder.. , 36 23Nov. 2000, 1937 1939
Vogt T. When N. 2008 A Reconfigurable ASIP for Convolutional and Turbo Decoding in an SDR Environment., , 16 10Oct. 2008, 1309 1320
Wang H. Yang H. Yang D. 2006 Improved Log-MAP Decoding Algorithm for Turbo-like Codes., , 10 3Mar. 2006, 186 188
Weiss C. Bettstetter C. Riedel S. 2001 Code Construction and Decoding of Parallel Concatenated Tailbiting Codes., 47 1Jan. 2001, 366 368
Zhan C. Arslan T. Erdogan A. T. Mac Dougall. S. 2006An Efficient Decoder Scheme for Double Binary Circular Turbo Codes, Proceesings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 229 232, Toulouse, France, May 2006
Zhang J. Fossorier M. P. C. 2005 Shuffled iterative decoding., , 53 2Feb. 2005, 209 213