WIMAX binary extended Hamming codes (H(n,k) ) used for BTC.
\r\n\t• Role of technological innovation and corporate risk management
\r\n\t• Challenges for corporate governance while launching corporate environmental management among emerging economies
\r\n\t• Demonstrating the relationship between environmental risk management and sustainable management
\r\n\t• Contemplating strategic corporate environmental responsibility under the influence of cultural barriers
\r\n\t• Risk management in different countries – the international management dimension
\r\n\t• Global Standardization vs local adaptation of corporate environmental risk management in multinational corporations.
\r\n\t• Is there a transnational approach to environmental risk management?
\r\n\t• Approaches towards Risk management strategies in the short-term and long-term.
WIMAX has gained a wide popularity due to the growing interest and diffusion of broadband wireless access systems. In order to be flexible and reliable WIMAX adopts several different channel codes, namely convolutional-codes (CC), convolutional-turbo-codes (CTC), block-turbo-codes (BTC) and low-density-parity-check (LDPC) codes, that are able to cope with different channel conditions and application needs.
\n\t\t\tOn the other hand, high performance digital CMOS technologies have reached such a development that very complex algorithms can be implemented in low cost chips. Moreover, embedded processors, digital signal processors, programmable devices, as FPGAs, application specific instruction-set processors and VLSI technologies have come to the point where the computing power and the memory required to execute several real time applications can be incorporated even in cheap portable devices.
\n\t\t\tAmong the several application fields that have been strongly reinforced by this technology progress, channel decoding is one of the most significant and interesting ones. In fact, it is known that the design of efficient architectures to implement such channel decoders is a hard task, hardened by the high throughput required by WIMAX systems, which is up to about 75 Mb/s per channel. In particular, CTC and LDPC codes, whose decoding algorithms are iterative, are still a major topic of interest in the scientific literature and the design of efficient architectures is still fostering several research efforts both in industry and academy.
\n\t\t\tIn this Chapter, the design of VLSI architectures for WIMAX channel decoders will be analyzed with emphasis on three main aspects: performance, complexity and flexibility. The chapter will be divided into two main parts; the first part will deal with the impact of system requirements on the decoder design with emphasis on memory requirements, the structure of the key components of the decoders and the need for parallel architectures. To that purpose a quantitative approach will be adopted to derive from system specifications key architectural choices; most important architectures available in the literature will be also described and compared.
\n\t\t\tThe second part will concentrate on a significant case of study: the design of a complete CTC decoder architecture for WIMAX, including also hardware units for depuncturing (bit-deselection) and external deinterleaving (sub-block deinterleaver) functions.
\n\t\tThe system specifications and in particular the requirement of a peak throughput of about 75 Mb/s per channel imposed by the WIMAX standard have a significant impact on the decoder architecture. In the following sections we analyze the most significant architectures proposed in the literature to implement CC decoders (Viterbi decoders), BTC, CTC and LDPC decoders.
\n\t\t\tThe most widely used algorithm to decode CCs is the Viterbi algorithm Viterbi, 1967, which is based on finding the shortest path along a graph that represents the CC trellis. As an example in Fig. 1 a binary 4-states CC is shown as a feedback shift register (a) together with the corresponding state diagram (b) and trellis (c) representations.
\n\t\t\t\tBinary 4-state CC example: shift register (a), state diagram (b) and trellis (c) representations.
In the given example, the feedback shift register implementation of the encoder generates two output bits, c1\n\t\t\t\t\t and c2\n\t\t\t\t\t for each received information bit, u; c1\n\t\t\t\t\t is the systematic bit. The state diagram basically is a Mealy finite state machine describing the encoder behaviour in a time independent way: each node corresponds to a valid encoder state, represented by means of the flip flop content, e1\n\t\t\t\t\t and e2\n\t\t\t\t\t, while edges are labelled with input and output bits. The trellis representation also provides time information, explicitly showing the evolution from one state to another in different time steps (one single step is drawn in the picture).
\n\t\t\t\tAt each trellis step n, the Viterbi algorithm associates to each trellis state S a state metric ΓS\n\t\t\t\t\t\tn\n\t\t\t\t\t that is calculated along the shortest path and stores a decision dS\n\t\t\t\t\t\tn\n\t\t\t\t\t, which identifies the entering transition on the shortest path. First, the decoder computes the branch metrics (γn\n\t\t\t\t\t), that are the distances from the metrics labelling each edge on the trellis and the actual received soft symbols. In the case of a binary CC with rate 0.5 the soft symbols are λ1n\n\t\t\t\t\t and λ2n\n\t\t\t\t\t and the branch metrics γn\n\t\t\t\t\t(c2,c1) (see Fig. 2(a)). Starting from these values, the state metrics are updated by selecting the larger metric among the metrics related to each incoming edge of a trellis state and storing the corresponding decision dS\n\t\t\t\t\t\tn\n\t\t\t\t\t. Finally, decoded bits are obtained by means of a recursive procedure usually referred to as trace-back. In order to estimate the sequence of bits that were encoded for transmission, a state is first selected at the end of the trellis portion to be decoded, then the decoder iteratively goes backward through the state history memory where decisions dS\n\t\t\t\t\t\tn\n\t\t\t\t\t have been previously stored: this allows one to select, for current state, a new state, which is listed in the state history trace as being the predecessor to that state. Different implementation methods are available to make the initial state choice and to size the portion of trellis where the trace back operation is performed: these methods affect both decoder complexity and error correcting capability. For further details on the algorithm the reader can refer to Viterbi, 1967, [Forney, 1973]. Looking at the global architecture, the main blocks required in a Viterbi decoder are the branch metric unit (BMU) devoted to compute γn\n\t\t\t\t\t, the state metric unit (SMU) to calculate ΓS\n\t\t\t\t\t\tn\n\t\t\t\t\t and the trace-back unit (TBU) to obtain the decoded sequence. The BMU is made of adders and subtracters to properly combine the input soft symbols (see Fig. 2 (a) ). The SMU is based on the so called add-compare select structure (ACS) as shown in Fig.2 (b) . Said i the i-th starting state that is connected to an arriving state S by an edge whose branch metric is γi\n\t\t\t\t\t\tn-1, then ΓS\n\t\t\t\t\t\tn\n\t\t\t\t\t is calculated as in (1).
\n\t\t\t\tBMU and ACS architectures for a rate 0.5 CC.
As it can be inferred from (1) ΓS\n\t\t\t\t\t\tn\n\t\t\t\t\t is obtained by adding branch metrics with state metrics, comparing and selecting the higher metric that represents the shortest incoming path. The corresponding decision dS\n\t\t\t\t\t\tn\n\t\t\t\t\t is stored in a memory that is later read by the TBU to reconstruct the survived path. Due to the recursive form of (1), as long as n increases, the number of bits to represent ΓS\n\t\t\t\t\t\tn\n\t\t\t\t\t tends to become larger. This problem can be solved by normalizing the state metrics at each step. However, this solution requires to add a normalization stage increasing both the SMU complexity and critical path. An effective technique, based on two complement representation, helps limiting the growth of state metrics, as described in Hekstra, 1989.
\n\t\t\t\tWIMAX binary 64-state CC with rate 0.5 shift register representation.
The WIMAX standard specifies a binary 64 states CC with rate 0.5, whose shift register representation is shown in Fig. 3. Usually Viterbi decoder architectures exploit the trellis intrinsic parallelism to simultaneously compute at each trellis step all the branch metrics and update all the state metrics. Thus, said n the number of states of a CC, a parallel architecture employs a BMU and n ACS modules. Moreover, to reduce the decoding latency, the trace-back is performed as a sliding-window process Radar, 1981 on portions of trellis of width W. This approach not only reduces the latency, but also the size of the decision memory that depending on the TBU radix requires usually 3W or 4W cells Black & Meng, 1992.
\n\t\t\t\tTo improve the decoder throughput, two Black & Meng, 1992 or more Fettweis & Meyr, 1989, Kong & Parhi, 2004, Cheng & Parhi, 2008 trellis steps can be processed concurrently. These solutions lead to the so called higher radix or M-look-ahead step architectures. According to Kong & Parhi, 2004, the throughput sustained by an M-look-ahead step architecture, defined as the number of decoded bits over the decoding time is
\n\t\t\t\twhere fclk\n\t\t\t\t\t is the clock frequency, NT\n\t\t\t\t\t is the number of trellis steps, k=1 for a binary CC, k=2 for a double binary CC and the right most expression is obtained under the condition W<< NT\n\t\t\t\t\t that is a reasonable assumption in real cases.
\n\t\t\t\tThus, to achieve the throughput required by the WIMAX standard with a clock frequency limited to tens to few thousands of MHz, M=1 (radix-2) or M=2 (radix-4) is a reasonable choice.
\n\t\t\t\tHowever, since CCs are widely used in many communication systems, some recent works as Batcha & Shameri, 2007 and Kamuf et al., 2008 address the design of flexible Viterbi decoders that are able to support different CCs. As a further step Vogt & When, 2008 proposed a multi-code decoder architecture, able to support both CCs and CTCs.
\n\t\t\tBlock Turbo Codes or product codes are serially concatenated block codes. Given two block codes C1=(n1,k1,δ1) and C2=(n2,k2,δ2) where ni\n\t\t\t\t\t, ki\n\t\t\t\t\t and δi\n\t\t\t\t\t represent the code-word length, the number of information bits, and the minimum Hamming distance, respectively, the corresponding product code is obtained according to Pyndiah, 1998 as an array with k1\n\t\t\t\t\t rows and k2\n\t\t\t\t\t columns containing the information bits. Then coding is performed on the k1\n\t\t\t\t\t rows with C2\n\t\t\t\t\t and on the n2\n\t\t\t\t\t obtained columns with C1\n\t\t\t\t\t. The decoding of BTC codes can be performed iteratively row-wise and column-wise by using the sub-optimal algorithm detailed in Pyndiah, 1998. The basic idea relies on using the Chase search Chase, 1972 a near-maximum-likelihood (near-ML) searching strategy to find a list of code-words and an ML decided code-word d={d0,…, dn-1} with dj{-1,+1}. According to the notation used in [Vanstraceele et al., 2008], decision reliabilities are computed as
\n\t\t\t\twhere r={r0,…rn-1} is the received code-word and c-1(j)\n\t\t\t\t\t and c+1(j)\n\t\t\t\t\t are the code-words in the Chase list at minimum Euclidean distance from r such that the j-th bit of the code-word is -1 and +1 respectively. Then one decoder sends to the other the extrinsic information
\n\t\t\t\tIf the Chase search fails the extrinsic information is approximated as
\n\t\t\t\twhere β is a weight factor increasing with the number of iterations.
\n\t\t\t\tThe decoder that receives the extrinsic information uses an updated version of r obtained as
\n\t\t\t\twhere is a weight factor increasing with the number of iterations. A scheme of the elementary block turbo decoder is shown in Fig. 4 where the block named “decoder” is a Soft-In-Soft-out (SISO) module that performs the Chase search and implements (3), (4) and (5). An effective solution to implement the SISO module is based on a three pipelined stage architecture where the three stages are identified as reception, processing, and transmission units Kerouedan & Adde, 2000. As detailed in LeBidan et al., 2008, during each stage, the N soft values of the received word r are processed sequentially in N clock periods. The reception stage is devoted to find the least reliable bits in the received code-word. The processing stage performs the Chase search and the transmission stage calculates λ(dj), wj\n\t\t\t\t\t and rj\n\t\t\t\t\t\tnew\n\t\t\t\t\t. Another solution is proposed in Goubier et al. 2008 where the elementary decoder is implemented as a pipeline resorting to the mini-maxi algorithm, namely by using mini-maxi arrays to store the best metrics of all decoded code-words in the Chase list.
\n\t\t\t\tElementary block turbo decoder scheme.
Several works in the literature deal with BTC complexity reduction. As an example Adde & Pyndiah, 2000 suggests to compute β in (5) on a per-code-word basis, whereas in Chi et al., 2004 the dependency on in (6) is solved by replacing the term ∙wj\n\t\t\t\t\t with tanh(wj/2). In Le et al. 2005 both in (6) and β in (5) are avoided by exploiting Euclidean distance property.
\n\t\t\t\tDue to its row-column structure, the block turbo decoder can be parallelized by instantiating several elementary decoders to concurrently process more rows or columns, thus increasing the throughput. As a significant example in [Jego et al., 2006] a fully parallel BTC decoder is proposed. This solution instantiates n1\n\t\t\t\t\t+n2\n\t\t\t\t\t decoders that work concurrently. Moreover, by properly managing the scheduling of the decoders and interconnecting them through an Omega network intermediate results (row decoded data or column decoded data) are not stored.
\n\t\t\t\tA detailed analysis of throughput and complexity of BTC decoder architectures can be found in Goubier et al. 2008 and LeBidan et al., 2008. In particular, according to Goubier et al. 2008 a simple one block decoder architecture that performs the row/column decoding sequentially (interleaved architecture) requires 2\n\t\t\t\t\t(n1\n\t\t\t\t\t+n2\n\t\t\t\t\t) cycles to complete an iteration; as a consequence it achieves a throughput
\n\t\t\t\twhere I is the number of iterations and fclk\n\t\t\t\t\t is the clock frequency. The BTC specified for WIMAX is obtained using twice a binary extended Hamming code out of the ones show in Table 1\n\t\t\t\t
\n\t\t\t\tN | \n\t\t\t\t\t\t\tk | \n\t\t\t\t\t\t
15 | \n\t\t\t\t\t\t\t11 | \n\t\t\t\t\t\t
31 | \n\t\t\t\t\t\t\t26 | \n\t\t\t\t\t\t
63 | \n\t\t\t\t\t\t\t57 | \n\t\t\t\t\t\t
WIMAX binary extended Hamming codes (H(n,k) ) used for BTC.
Considering the interleaved architecture described in Goubier et al. 2008 where a fully decoded block is output every 4.5 half iterations, we obtain that 75 Mb/s can be obtained with a clock frequency of 84 MHz, 31 MHz and 14 MHz for H(15,11), H(31,26) and H(63,57) respectively.
\n\t\t\tConvolutional turbo codes were proposed in 1993 by Berrou, Glavieux and Thitimajshima Berrou et al., 1993 as a coding scheme based on the parallel concatenation of two CCs by the means of an interleaver (Π) as shown in Fig. 5 (a). The decoding algorithm is iterative and is based on the BCJR algorithm Bahl et al., 1974 applied on the trellis representation of each constituent CC Fig. 5 (b). The key idea relies on the fact that the extrinsic information output by one CC is used as an updated version of the input a-priori information by the other CC. As a consequence, each iteration is made of two half iterations, in one half iteration the data are processed according to the interleaver (Π) and in the other half iteration according to the deinterleaver (Π-1). The same result can be obtained by implementing an in-order read/write half iteration and a scrambled (interleaved) read/write half iteration. The basic block in a turbo decoder is a SISO module that implements the BCJR algorithm in its logarithmic likelihood ratio (LLR) form. If we consider a Recursive Systematic CC (RSC code), the extrinsic information λk(u;O) of an uncoded symbol u at trellis step k output by a SISO is
\n\t\t\t\twhere ũ is an uncoded symbol taken as a reference (usually ũ=0), e represents a certain transition on the trellis and u(e) is the uncoded symbol u associated to e. The max* function is usually implemented as a max followed by a correction term Robertson et al., 1995, Gross & Gulak, 1998, Cheng & Ottosson, 2000, Classon et al., 2002, Wang et al., 2006, Talakoub et al. 2007. A scaling factor can also be applied to further improve the max or max* approximation Vogt & Finger, 2000. The correction term, usually adopted when decoding binary codes, can be omitted for double binary turbo codes Berrou et al. 2001 with minor error rate performance degradation. The term b(e) in (8) is defined as
\n\t\t\t\twhere sS(e) and sE(e) are the starting and the ending states of e, \n\t\t\t\t\t\tk\n\t\t\t\t\t\n\t\t\t\t\t[sS(e)] and βk[sE(e)] are the forward and backward state metrics associated to sS(e) and sE(e) respectively (see Fig. 5 (b)) and γk[e] is the branch metric associated to e. The πk[c(e);I] term is computed as a weighted sum of the λk[c;I] produced by the soft demodulator as
\n\t\t\t\twhere ci(e) is one of the coded bits associated to e and nc\n\t\t\t\t\t is the number of bits forming a coded symbol c and πk[cu(e);I] in (8) is obtained as πk[c(e); I] considering only the systematic bits corresponding to the uncoded symbol u out of the nc\n\t\t\t\t\t coded bits. The πk[u(e);I] term is obtained combining the input a-priori information λk(u;I) and for a double binary code can be written as in (14), where A and B represent the two bits forming an uncoded symbol u.
\n\t\t\t\tThe CTC specified in the WIMAX standard is based on a double binary 8-state constituent CC as shown in Fig. 6, where each CC receives two uncoded bits (A, B) and produces four coded bits, two systematic bits (A,B) and two parity bits (Y,W). As a consequence, at each trellis step four transitions connect a starting state to four possible ending states. Due to the trellis symmetry only 16 branch metrics out of the possible 32 branch metrics are required at each trellis step. As pointed out in Muller et al. 2006 high throughput can be achieved by exploiting the trellis parallelism, namely computing concurrently all the branch and state metrics.
\n\t\t\t\tConvolutional turbo code: coder and iterative SISO based decoder (a), notation for a trellis step in the SISO (b).
The 16 branch metrics are computed by a BMU that implements (12) as shown in Fig. 7. To reduce the latency of the SISO, usually the decoding is based on a sliding-window approach Benedetto et al., 1996. As a consequence, at least two BMUs are required to compute the two recursions (forward and backward) according to the BCJR algorithm. However, since β metrics require to be trained between successive windows, usually a further BMU is required. A solution based on the inheritance of the border metrics of each window Abbasfar & Yao 2003 requires only two BMUs. Furthermore, this strategy reduces the SISO latency to the sliding window width W. The state metrics are updated according to (10) and (11) by two state metric processors, each of which is made of a proper number of processing elements (PE). As shown in Fig. 7 for the WIMAX CTC 8 PEs are required. It is worth pointing out that the constituent codes of the WIMAX CTC use the circulation state tailbiting strategy proposed in Weiss et al. 2001 that ensures that the ending state of the last trellis step is equal to the starting state of the fist trellis step. However, this technique requires estimating the circulation state at the decoder side. Since training operations to estimate the circulation state would increase the SISO latency, an effective alternative Zhan et al. 2006 is to inherit these metrics from the previous iteration.
\n\t\t\t\tWIMAX CTC: encoder and constituent CC structures.
As in Viterbi decoder architectures often in CTC decoders the state metrics are computed by means of the “wrapping” representation technique proposed in Hekstra, 1989. This solution requires a normalization stage, depicted in Fig. 7, when combining , β and γ metrics to compute the extrinsic information as in (8). The last stage of the output processor, that computes the output extrinsic information, is a tree of max blocks for each component of the extrinsic information and few adders to implement (8). As highlighted in Fig. 7 this scheduling requires a buffer to store input LLRs that are used to compute the backward recursion (BMU-MEM). Since the output extrinsic information is computed during the backward recursion, forward recursion metrics are stored in a buffer (-MEM). Further memory is required to implement the border metric inheritance, -EXT-MEM, β-EXT-MEM and β-LOC-MEM.
\n\t\t\t\tThe throughput sustained by the CTC decoder, defined as the number of decoded bits over the time required for their computation, is
\n\t\t\t\twhere fclk\n\t\t\t\t\t is the clock frequency, NT\n\t\t\t\t\t is the number of trellis steps, k=1 for a binary CTC, k=2 for a double binary CTC, 2I is the number of half iterations, Ncyc\n\t\t\t\t\t\tSISO\n\t\t\t\t\t and Ncyc\n\t\t\t\t\t\tID\n\t\t\t\t\t represent the number of clock cycles required by one SISO and by the interleaving/deinterleaving structure. Since both Ncyc\n\t\t\t\t\t\tSISO\n\t\t\t\t\t and Ncyc\n\t\t\t\t\t\tID\n\t\t\t\t\t are a function of NT\n\t\t\t\t\t they can be rewritten as Ncyc\n\t\t\t\t\t\tSISO=NT\n\t\t\t\t\t∙SP+SISOcyc\n\t\t\t\t\t\tlat\n\t\t\t\t\t and Ncyc\n\t\t\t\t\t\tID =NT\n\t\t\t\t\t∙SP+IDcyc\n\t\t\t\t\t\toh\n\t\t\t\t\t where SP is the sending period, namely the rate sustained by the decoder to output two consecutive valid output data (SP=1 means at each clock cycle new valid output data are ready), SISOcyc\n\t\t\t\t\t\tlat\n\t\t\t\t\t is the decoder latency, namely the number of clock cycles spent to produce the first valid output data, and IDcyc\n\t\t\t\t\t\toh\n\t\t\t\t\t is the interleaver/deinterleaver architecture overhead expressed in clock cycles. Usually, resorting to pipelining, Ncyc\n\t\t\t\t\t\tSISO\n\t\t\t\t\t and Ncyc\n\t\t\t\t\t\tID\n\t\t\t\t\t can be partially overlapped; thus, the number of cycles required by one SISO decoder is Ncyc\n\t\t\t\t\t\tdec=NT\n\t\t\t\t\t∙SP+SISOcyc\n\t\t\t\t\t\tlat\n\t\t\t\t\t+IDcyc\n\t\t\t\t\t\toh\n\t\t\t\t\t. Using the sliding window technique with the border metric inheritance strategy Abbasfar & Yao 2003, Zhan et al. 2006 we obtain SISOcyc\n\t\t\t\t\t\tlat≈SP∙W and so (15) can be rewritten as (16), where the rightmost expression is obtained considering W<<NT\n\t\t\t\t\t and IDcyc\n\t\t\t\t\t\toh\n\t\t\t\t\t\n\t\t\t\t\t<<SP∙NT\n\t\t\t\t\t that is a reasonable assumption in real cases.
\n\t\t\t\tWIMAX SISO block scheme.
Usually optimized architectures Masera et al., 1999, Bickerstaff et al., 2003, Kim & Park, 2008 are obtained with SP=1, whereas flexible architectures have higher SP values Vogt & Wehn, 2008, Muller et al., 2009. However, even with SP=1, a double binary turbo decoder architecture that achieves the throughput imposed by WIMAX with eight iterations (I=8), would require fclk\n\t\t\t\t\t=600 MHz. A possible solution to improve the throughput by a factor that ranges in [1.2, 1.9] is the based on decoder level parallelism Muller et al. 2006 and is usually referred to as “shuffling” Zhang & Fossorier, 2005. However, to further improve the throughput a parallel decoder made of P SISOs working concurrently is required. As a consequence, a parallel architecture achieves a throughput
\n\t\t\t\tThus, setting P=4, I=8 and SP=1, the WIMAX throughput is obtained with fclk\n\t\t\t\t\t=150 MHz. It is worth pointing out that a P-parallel CTC decoder is made of P SISOs connected to P memories devoted to store the extrinsic information. However, in a parallel decoder during the scrambled half iteration collisions can occur, namely more SISOs could need to access the same memory during the same cycle. Since the collision phenomenon increases IDcyc\n\t\t\t\t\t\toh\n\t\t\t\t\t, several algorithmic approaches to design collision free interleavers Giulietti et al. 2002, Kwak & Lee, 2002, Gnaedig et al., 2003, Tarable et al., 2004 have been proposed. On the other hand, architectures to manage collisions in a parallel turbo decoder have also been proposed in the literature Thul et al., 2002, Gilbert et al., 2003, Thul et al., 2003, Speziali & Zory, 2004, Martina et al. 2008-a, Martina et al., 2008-b, in particular Martina et al. 2008-b deals with the parallelization of the WIMAX CTC interleaver and avoids collision by the means of a throughput/parallelism scalable architecture that features IDcyc\n\t\t\t\t\t\toh\n\t\t\t\t\t=0.
\n\t\t\t\tIt is worth pointing out that parallel architectures increase not only the throughput but also the complexity of the decoder, so that some recent works aim at reducing the amount of memory required to implement SISO local buffers. In Liu et al., 2007 and Kim & Park, 2008 saturation of forward state metrics and quantization of border backward state metrics is proposed. Further studies have been performed to reduce the extrinsic information bit width by using adaptive quantization Singh et al., 2008, pseudo-floating point representation Park et al., 2008 and bit level representation Kim & Park, 2009.
\n\t\t\tLDPC codes were originally introduced in 1962 by Gallager Gallager, 1962 and rediscovered in 1996 by MacKay and Neal [MacKay, 1996]. As turbo codes, they achieve near optimum error correction performance and are decoded by means of high complexity iterative algorithms.
\n\t\t\t\tAn LDPC code is a linear block code defined by a CB parity check matrix H, characterized by a low density of ones: B is the number of bits in the code (block length), while C is the number of parity checks. A one in a given cell of the H matrix indicates that the bit corresponding to the cell column is used for the calculation of the parity check associated to the row. A popular description of an LDPC code is the bipartite (or Tanner) graph shown in Figure 8 for a small example, where B variable nodes (VN) are connected to C check nodes (CN) through edges corresponding to the positions of the ones in H.
\n\t\t\t\tLDPC codes are usually decoded by means of an iterative algorithm variously known as sum-product, belief propagation or message passing, and reformulated in a version that processes logarithmic likelihood ratios instead of probabilities. In the first iteration, half variable nodes receive data from adjacent check nodes and from the channel and use them to obtain updated information sent to the check nodes; in the second half, check nodes take the updated information received from connected bit nodes and generate new messages to be sent back to variable nodes.
\n\t\t\t\tIn message passing decoders, messages are exchanged along the edges of the Tanner graph, and computations are performed at the nodes. To avoid multiplications and divisions, the decoder usually works in the logarithmic domain.
\n\t\t\t\tExample Tanner graph.
The message passing algorithm is described in the following equations, where k represents the current iteration, Qji\n\t\t\t\t\t is the message generated by VN j and directed to CN i, Rij\n\t\t\t\t\t is the message computed by CN i and sent to VN j. C[j] is the whole set of incoming messages for VN j and R[i] is the whole set of the incoming messages for CN i.
\n\t\t\t\tEach variable node is initialized with the log-likelihood ratio (LLR) \n\t\t\t\t\t\tj\n\t\t\t\t\t associated to the received bit. Next, messages are propagated from the variable nodes to the check nodes along the edges of the Tanner graph. At the first iteration, only \n\t\t\t\t\t\tj\n\t\t\t\t\t are delivered, while starting from the second iteration VNs sum up all the messages Rij\n\t\t\t\t\t coming from CNs and combine them with \n\t\t\t\t\t\tj\n\t\t\t\t\t according to
\n\t\t\t\tThe check node computes new check to variable messages as
\n\t\t\t\twhere |R[j]|is the cardinality of the CN and (x) is a non linear function defined as
\n\t\t\t\tAfter a number of iterations that strongly depends on the addressed application and code rate (typically 5 to 40), variable nodes compute an overall estimation of the decoded bit in the form
\n\t\t\t\twhere the sign of \n\t\t\t\t\t\tj\n\t\t\t\t\t can be understood as the hard decision on the decoded bit.
\n\t\t\t\tA large implementation complexity is associated to (19), which is simplified in different ways. First of all, function (x) can be obtained by means of reduced complexity estimations Masera et al., 2005. Moreover sub-optimal, low complexity algorithms have been successfully proposed to simplify (19), such as for example the normalized Min-Sum algorithm Chen et al., 2005 where only the two smallest magnitudes are used.
\n\t\t\t\tA further change is usually applied to the scheduling of variable and check nodes in order to improve communications performance. In the two-phase scheduling, the updating of variable and check nodes is accomplished in two separate phases. On the contrary, the turbo decoding message passing (TDMP) Mansour & Shanbhag, 2003, also known as layered or shuffled decoding, allows for overlapped update operations: messages calculated by a subset of check nodes are immediately used to update variable nodes. This scheduling has been proved to be able to reduce the number of iterations by up to 50% at a fixed communications performance.
\n\t\t\t\tThe required number of functional units in a decoder can be estimated based on the concept of processing power Pc\n\t\t\t\t\t\n\t\t\t\t\tGouillod et al., 2007, which can be evaluated on the basis of the rate Rc\n\t\t\t\t\t of the code, the number K of information bits transmitted per codeword, the block size N=K/Rc\n\t\t\t\t\t, the required information throughput D, the operating clock frequency fclk\n\t\t\t\t\t, the maximum number of iterations iMAX\n\t\t\t\t\t and the total number of edges to be processed per iteration . This relation is expressed as
\n\t\t\t\tAs two messages are associated with each edge (to be sent from the CN to the VN and vice versa), 2Pc\n\t\t\t\t\t gives the number of messages that must be concurrently processed at each decoding iteration in order to achieve the target throughput D. Equation (22) does not consider the message exchange overhead: yet it assumes that all messages dispatched during a cycle are delivered simultaneously during the same cycle. The Pc\n\t\t\t\t\t value must then be assumed as a lower bound and the actual degree of parallelism strongly depends on both the structure of the H matrix Dinoi et al., 2006 and the adopted interconnect architecture among processing units Quaglio et al., 2006, Masera et al., 2007.
\n\t\t\t\tActually, most of the implementation concerns come from the communication structure that must be allocated to support message passing from bit to check nodes and vice versa. Several hardware realizations that have been proposed in the literature are focused on how efficiently passing messages between the two types of processing units.
\n\t\t\t\tThree approaches can be followed in the high level organization of the decoder, coming to three kinds of architectures.
\n\t\t\t\t-Serial architectures: bit and check processors are allocated as single instances, each serving multiple nodes sequentially; messages are exchanged by means of a memory.
\n\t\t\t\t-Fully parallel architectures: processing units are allocated for each single bit and check node and all messages are passed in parallel on dedicated routes.
\n\t\t\t\t-Partially parallel architectures: more processing units work in parallel, serving all bit and check nodes within a number of cycles; suitable organization and hardware support is required to exchange messages.
\n\t\t\t\tFor most codes and applications, the first approach results in slow implementations, while the second one has an excessive cost. As a result the only general viable solution is the third partially parallel approach, which on the other hand introduces the collision problem, already known in the implementation of parallel turbo decoders. Two main approaches have been proposed to deal with collisions:
\n\t\t\t\t-To design collision free codes Mansour & Shanbhag, 2003, Hocevar, 2003,
\n\t\t\t\t-To design decoder architecture able to avoid or at least mitigate collision effects Kienle et al., 2003, Tarable et al., 2004.
\n\t\t\t\tEven if the first approach has proven to be effective, it significantly limits the supported code classes. The second approach, on the other hand, is well suited for flexible and general architectures. An even more challenging task is the design of LDPC decoders that are flexible in terms of supported block sizes and code rates Masera et al., 2007.
\n\t\t\t\tIn partially parallel structures, permutation networks are used to establish the correct connections between functional units. However, structured LDPC codes, such as those specified in WIMAX, allow for replacing permutation networks by low complexity barrel shifters Boutillon et al., 2000, Mansour & Shanbhag, 2003.
\n\t\t\t\tEarly terminal schemes can be adopted to improve the decoding efficiency by dynamically adjusting the iteration number according to the SNR values. The simplest approach requires that decoding decisions are stored and compared across two consecutive iterations: if no changes are detected, the decoding is terminated, otherwise it is continued up to a maximum number of iterations. More sophisticated iteration control schemes are able to reduce the mean number of iterations, so saving both latency and energy Kienle & When, 2005, Shin et al., 2007.
\n\t\t\tThe WIMAX CTC decoder is made of three main blocks: symbol deselection (SD), subblock deinterleaver and CTC decoder as highlighted in Fig. 9 where N represents the number of couples included in a data frame. SD, subblock deinterleaver and CTC decoder blocks are connected together by means of memory buffers in order to guarantee that the non iterative part of the decoder (namely SD and subblock deinterleaver) and the decoding loop work simultaneously on consecutive data frames. Since the maximum decoder throughput is about 75 Mb/s and the native CTC rate is 1/3 (two uncoded bits produce six coded bits), at the input of the decoding loop the maximum throughput can rise up to 225 millions of LLRs per second. The same throughput ought to be sustained by the subblock deinterleaver, whereas even higher throughput has to be sustained at the SD unit in case of repetition.
\n\t\t\tDepending on amount of data sent by the encoder (puncturing or repetition), the throughput sustained by the symbol deselection (SD) can rise up to 900 millions of LLRs per second (repetition 4). When the encoder performs repetition, the same symbol is sent more than once. Thus, the decoder combines the LLRs referred to the same symbol to improve the reliability of that symbol. As shown in Fig. 9 this can be achieved partitioning the symbol deselection input buffer into four memories, each of which containing up to 6N LLRs.
\n\t\t\t\tSince the symbol deselection architecture can read up to four LLRs per clock cycle, it reduces the incoming throughput to 225 millions of LLRs per second. However, the symbol deselection has to compute the starting location and the number of LLRs to be written into the output buffer. The number of LLRs and the starting location are obtained as in (23) and (24) respectively, where NSCHk\n\t\t\t\t\t, mk\n\t\t\t\t\t and SPIDk\n\t\t\t\t\t are parameters specified by the WIMAX standard for the k-index subpacket when HARQ is enabled, namely NSCHk\n\t\t\t\t\t, is the number of concatenated slots, mk\n\t\t\t\t\t is the modulation order and SPIDk\n\t\t\t\t\t is the subpacket ID.
\n\t\t\t\tSince NSCHk\n\t\t\t\t\t[1, 480] and mk\n\t\t\t\t\t{2, 4, 6} we can rewrite (23) as
\n\t\t\t\tThe efficient implementation of (25) is obtained with an adder whose inputs are NSCHk\n\t\t\t\t\t and the selection between two hardwired left shifted versions of NSCHk\n\t\t\t\t\t (one position and three positions), followed by a programmable left shifter (five-six positions). Similarly, since SPIDk\n\t\t\t\t\t{0, 1, 2, 3}, the multiplication in (24) is avoided as
\n\t\t\t\tComplete CTC decoder block scheme.
A block scheme of the architecture employed to compute Fk\n\t\t\t\t\t and Lk\n\t\t\t\t\t is depicted in Fig. 10 (a).
\n\t\t\t\tFurthermore, in order to support the puncturing mode, the output memory locations corresponding to unsent bits must be set to zero. To ease the SD architecture implementation, all the output memory locations are set to zero while Lk\n\t\t\t\t\t and Fk\n\t\t\t\t\t are computed. As a consequence, about two clock cycles per sample are required to complete the symbol deselection, namely 6N LLRs are output in 12N clock cycles. So that the symbol deselection throughput can be estimated as
\n\t\t\t\tAs it can be observed, to sustain 225 millions of LLRs per second a clock frequency of 450 MHz is required. To overcome this problem we impose not only to partition the input buffer into four memories, but also to increase the memory parallelism, so that each memory location contains p LLRs. Thus, we can rewrite (27) as (28) and by setting p to a conservative value, as p=4, the SD architecture processes simultaneously up to sixteen LLRs with fclk\n\t\t\t\t\t=113 MHz.
\n\t\t\t\tThe received LLRs belong to six possible subblocks depending on the coded bits they are referred to (A, B, Y1\n\t\t\t\t\t, W1\n\t\t\t\t\t, Y2\n\t\t\t\t\t, W2\n\t\t\t\t\t) and each subblock is made of N LLRs. The subblock deinterleaver treats each subblock separately and scrambles its LLRs according to Algorithm 1, given below, where m and J are constants specified by the WIMAX standard and BROm(y) is the bit-reversed m-bit value of y.
\n\t\t\t\t1: k←0 2: i←0 3: while i<N do 4: Tk←2m(k mod J)+BROm\n\t\t\t\t\t\t(k/J) 5: if Tk<N then 6: i←i+1 7: else 8: discard Tk 9: end if10: k←k+111: end while
\n\t\t\t\tAlgorithm 1. Subblock deinterleaver address generator.
\n\t\t\t\tAs a consequence, the number of tentative addresses generated, NM\n\t\t\t\t\t, can be greater than N. Exhaustive simulations, performed on the possible N specified by the standard, show that the worst case is NM\n\t\t\t\t\t=191 that occurs with N=144. Since 191/144=1.326, a conservative approximation is NM\n\t\t\t\t\t=4N/3. The whole subblock deinterleaver architecture is obtained with one single address generator implementing Algorithm 1 to simultaneously write one LLR from each of the six subblock memories. In particular, as imposed by the WiMax standard, the interleaved LLRs belonging to the A and B subblocks are stored separately, whereas the interleaved LLRs belonging to Y1\n\t\t\t\t\t and Y2\n\t\t\t\t\t are stored as a symbol-by-symbol multiplexed sequence, creating a “macro-subblock” made of 2N LLRs. Similarly a macro-subblock made of 2N LLRs is generated storing a symbol-by-symbol multiplexed sequence of interleaved W1\n\t\t\t\t\t and W2\n\t\t\t\t\t subblocks.
\n\t\t\t\tSince all the subblocks can be processed simultaneously, this architecture deinterleaves six LLRs per clock cycle. As a consequence, the subblock deinterleaver sustains a throughput
\n\t\t\t\tThus, a throughput of 225 Millions of LLRs per second is sustained using fclk\n\t\t\t\t\t=50 MHz.
\n\t\t\t\tTo implement line 4 and 5 in Algorithm 1, three steps are required, namely the calculation of k mod J and k/J, the calculation of 2m(k mod J) and BROm\n\t\t\t\t\t(k/J), the generation of Tk\n\t\t\t\t\t while checking Tk<N. It is worth pointing out that k mod J can be efficiently implemented as an up-counter followed by a mod J block. Moreover, each time the mod J block detects k=J, a second counter is incremented: the final value in the second counter is k/J. Since m[3, 10], the 2m(k mod J) term is implemented as a programmable shifter in the range [0, 7] followed by a hardwired three position left shifter. The BROm\n\t\t\t\t\t(k/J) term is obtained by multiplexing eight hardwired bit reversal networks. Finally, a valid Tk\n\t\t\t\t\t address is obtained with an adder and is validated by a comparator. The address generation architecture is shown in Fig. 10 (b).
\n\t\t\tAs detailed in section 2.3 to sustain the throughput required by the WIMAX standard a parallel decoder architecture is required. To that purpose we set SP=1, I=8, and fclk\n\t\t\t\t\t=200 MHz, then from (17) we analyze the throughput as a function of N for W=32. As shown in Fig. 11, only P=4 allows to achieve the target throughput (horizontal solid line) for N≥480.
\n\t\t\t\tMoreover, the window width impacts both on the decoder throughput and on the depth of SISO local buffers. So that a proper W value for each frame size must be selected. In particular if N/(P∙W)\n\t\t\t\t\t\n\t\t\t\t\t
Exhaustive simulations show that collisions occur for P=2 and P=4 only with N=108. As a consequence, we select P as a function of N to simultaneously obtain a monotonically increasing throughput as a function of N and to avoid collisions. It is worth pointing out that, when collisions are avoided, the resulting parallel interleaver is a circular shifting interleaver: the address generation is simplified with all SISOs simultaneously accessing the same location of different memories.
\n\t\t\t\tSaid idx0\n\t\t\t\t\t\tt\n\t\t\t\t\t the memory accessed by SISO-0 at time t during a scrambled half iteration, the memory concurrently accessed by SISO-k is idxk\n\t\t\t\t\t\tt\n\t\t\t\t\t=(idx0\n\t\t\t\t\t\tt\n\t\t\t\t\t±k) mod P.
\n\t\t\t\tSymbol deselection starting address and number of elements generation block scheme (a), subblock deinterleaver address generation block scheme (b).
Parallel CTC decoder throughput as a function of the block size N for different parallelism degree values P. The horizontal line represents the target throughput.
Thus, the parallel CTC interleaver-deinterleaver system is obtained as a cascaded two stage architecture (see Fig. 12). The first stage efficiently implements the WIMAX interleaver algorithm, whereas the second one extracts the common memory address adxt\n\t\t\t\t\t and the memory identifiers idxk\n\t\t\t\t\t\tt\n\t\t\t\t\t from the scrambled address i.
\n\t\t\t\tThe CTC interleaver algorithm specified in the WIMAX standard is structured in two steps. The first step switches the LLRs referred to A and B that are stored at odd addresses. The second step provides the interleaved address i of the j-th couple as
\n\t\t\t\twhere P0\n\t\t\t\t\t and Pj\n\t\t\t\t\t\t’\n\t\t\t\t\t are constants that depend only on N and are specified by the standard. It is worth pointing out that the two steps can be swapped, as a consequence the first step can be performed on-the-fly, avoiding the use of an intermediate buffer to store switched LLRs. A simple architecture to implement (30) can be derived by rewriting (30) as
\n\t\t\t\twhere
\n\t\t\t\tA small Look-Up-Table (LUT) is employed to store P0\n\t\t\t\t\t mod N and Pj\n\t\t\t\t\t\t’\n\t\t\t\t\t mod N terms; then (31) is implemented by two parts as depicted in Fig. 12. The first part accumulates P0\n\t\t\t\t\t to implement the P0∙j term and the mod N block produces the correct modulo N result. The second part employs the two least significant bits of a counter (j−cnt) to select the proper Pj\n\t\t\t\t\t\t’\n\t\t\t\t\t mod N value, which is added to the (P0∙j) mod N term. A further modulo N operation is performed at the output. Since in this architecture both the first and the second part work on data belonging to [0, 2N−1], all the mod N operations are implemented by means of a subtracter and a multiplexer.
\n\t\t\t\tThe second stage of the parallel CTC interleaver-deinterleaver architecture works as follows.
\n\t\t\t\tSince adxt\n\t\t\t\t\t[0, N/P-1], it can be obtained from the scrambled address i produced by the first stage as
\n\t\t\t\tThe straightforward implementation of (33) needs to calculate N/P and to allocate P−2 multipliers, P−1 subtracters, a P-way multiplexer and few logic for selecting the proper adxt\n\t\t\t\t\t value. The N/P division can be simplified by choosing the possible P values as powers of two. Thus, we obtain a CTC decoder architecture that exploits throughput/parallelism scalability to avoid collisions, namely we employ: P=1 when N≤180, P=2 when 192≤N≤240 and P=4 when 480≤N≤2400. Moreover, as it can be inferred from Fig. 12, multiplications are avoided resorting to simple shift operations (x>>i=x/2i). The sign of the subtractions (dashed lines in Fig. 12) allows not only to select the proper adxt\n\t\t\t\t\t but also to find idx0\n\t\t\t\t\t\tt\n\t\t\t\t\t. Then, with P−1 modulo P adders the other idxk\n\t\t\t\t\t\tt\n\t\t\t\t\t values are straightforwardly generated. As it can be observed, choosing P as a power of two reduces the modulo P adders to simpler, binary adders. The actual throughput sustained by the described throughput/parallelism scalable architecture is represented by the bold line in Fig. 11.
\n\t\t\t\tParallel CTC address generator.
The global architecture of the designed parallel SISO is given in Fig. 13 where each SISO contains the processors devoted to compute the different metrics required by the BCJR algorithm as detailed in section 2.3. A simple network is used to properly connect the SISOs according to the current value of P by setting the signal last_SISO. Furthermore, one address crossbar-switch (radx-switch) is used to implement the reading operation, a LIFO stores the address and makes them available for the writing phase, two data crossbar-switches (rdata-switch/wdata-switch) are used to properly send (receive) the data to (from) the memory (EI-MEM) according to the parallel interleaver idxk\n\t\t\t\t\t\tt\n\t\t\t\t\t values.
\n\t\t\t\tParallel CTC decoder architecture.
In Table 2 the complexity of all the blocks for a 130 nm standard cell technology is reported. The bit-width is: 6 bit for λ[c;I] , 8 bit for λ[u;I] , and 12 bit for the state metrics. For further details the reader can refer to Martina et al., 2009.
\n\t\t\t\tArchitecture | \n\t\t\t\t\t\t\tSD | \n\t\t\t\t\t\t\tSubblock Deinterl. | \n\t\t\t\t\t\t\tSISOx1 | \n\t\t\t\t\t\t\tParallel Interl. | \n\t\t\t\t\t\t
Logic [kgate] | \n\t\t\t\t\t\t\t11 | \n\t\t\t\t\t\t\t1.7 | \n\t\t\t\t\t\t\t37 | \n\t\t\t\t\t\t\t2.8 | \n\t\t\t\t\t\t
Memory [kbit] | \n\t\t\t\t\t\t\t0 | \n\t\t\t\t\t\t\t0 | \n\t\t\t\t\t\t\t14.2 | \n\t\t\t\t\t\t\t59 | \n\t\t\t\t\t\t
Complexity of the whole receiver.
This work is partially supported by the WIMAGIC project funded by the European Community.
\n\t\tCylindrically layered structures have various exotic applications. For instance, a metal-core dielectric-shell nano-wire has been proposed for the cloaking applications in the visible spectrum. The functionality of this structure is based on the induction of antiparallel currents in the core and shell regions, and the design procedure is the so-called scattering cancelation technique [1]. Experimental realization of a hybrid gold/silicon nanowire photodetector proves the practicality of these structures [2]. As an alternative approach for achieving an invisible cloak, cylindrically wrapped impedance surfaces are designed by a periodic arrangement of metallic patches, and the approach is denominated as mantle cloaking [3]. Conversely, cylindrically layered structures can be designed in a way that they exhibit a scattering cross-section far exceeding the single-channel limit. This phenomenon is known as super-scattering and has various applications in sensing, energy harvesting, bio-imaging, communication, and optical devices [4, 5]. Moreover, a cylindrical stack of alternating metals and dielectrics behaves as an anisotropic cavity and exhibits a dramatic drop of the scattering cross-section in the transition from hyperbolic to elliptic dispersion regimes [6, 7]. The Mie-Lorenz theory is a powerful, an exact, and a simple approach for designing and analyzing the aforementioned structures.
Multilayered spherical structures have also attracted lots of interests in the field of optical devices. A dielectric sphere made of a high index material supports electric and magnetic dipole resonances which results in peaks in the extinction cross-section [8]. Moreover, by covering the dielectric sphere with a plasmonic metal shell, an invisible cloak is realizable, which is useful for sensors and optical memories [9]. By stacking multiple metal-dielectric shells, an anisotropic medium for scattering shaping can be achieved [10].
From the above discussions, it can be deduced that tailoring the Mie-Lorenz resonances in the curved particles results in developing novel optical devices. In this chapter, we are going to extend the realization of various optical applications based on the excitations of localized surface plasmons (LSP) in graphene-wrapped cylindrical and spherical particles. To this end, initially we introduce a brief discussion of modeling graphene material based on corresponding surface conductivity or dielectric model. Later, we extract the modified Mie-Lorenz coefficients for some curved structures with graphene interfaces. The importance of developed formulas has been proven by providing various design examples. It is worth noting that graphene-wrapped particles with a different number of layers have been proposed previously as refractive index sensors, waveguides, super-scatterers, invisible cloaks, and absorbers [11, 12, 13, 14, 15]. Our formulation provides a unified approach for the plane wave and eigenmode analysis of graphene-based optical devices.
Graphene is a 2D carbon material in a honeycomb lattice that exhibits extraordinary electrical and mechanical properties. In order to solve Maxwell’s equations in the presence of graphene, two approaches are applied by various authors, and we will review them in the following paragraphs. It should be noted that although we are discussing the graphene planar model, we will use the same formulas for the curved geometries when the number of carbon atoms exceeds 104, letting us neglect the effect of defects. Therefore, the radii of all cylinders and spheres are considered to be greater than 5 nm [16]. Moreover, bending the graphene does not have any considerable impact on the properties of its surface plasmons, except for a small downshift of the frequency. Figure 1 shows the propagation of the graphene surface plasmons on the S-shaped and G-shaped curves [17].
Propagation of graphene surface plasmons on curved structures: (a) S-shaped and (b) G-shaped [17].
Since graphene material is atomically thin, in order to consider its impact on the electromagnetic response of a given structure, boundary conditions at the interface can be simply altered. To this end, graphene surface currents that are proportional to its surface conductivity should be accounted for ensuring the discontinuity of tangential magnetic fields. In the infrared range and below, we can describe the graphene layer with a complex-valued surface conductivity
The parameters
where subscripts
Figure 2(a) and (b) shows the real and imaginary parts of graphene surface conductivity at the temperature of T = 300°K. The real part of the conductivity accounts for the losses, while the positive valued imaginary parts represent the plasmonic properties [20]. Moreover, the real and imaginary parts of the graphene equivalent bulk permittivity are shown in Figure 2(c) and (d). The negative valued real relative permittivity represents the plasmonic excitation, and the imaginary part of the permittivity represents the losses [21]. It should be noted that all of the formulas of this chapter are adapted with
(a) and (b) the real and imaginary parts of graphene surface conductivity [20] and (c) and (d) the real and imaginary parts of graphene equivalent permittivity [21].
In this section, the modified Mie-Lorenz coefficients of a single-layered graphene-coated cylindrical tube will be extracted. The formulation is expanded into the multilayered graphene-based tubes through exploiting the TMM method, and later, various applications of the analyzed structures, including emission and radiation properties, complex frequencies, super-scattering, and super-cloaking, will be explained.
Let us consider a graphene-wrapped infinitely long cylindrical tube. The structure is shown in Figure 3(a), and it is considered that a TEz-polarized plane wave illuminates the cylinder. In general, TE and TM waves are coupled in the cylindrical geometries. For the normally incident plane waves, they become decoupled, and they can be treated separately. For simplicity, we consider the normal incidence of plane waves where the wave vector
(a) A single-layered graphene-coated cylinder under TEz plane wave illumination and (b) corresponding scattering efficiency for ε1 = 3.9 and μc = 0.5 eV. The normalization factor in this figure is the diameter of the cylinder [23].
In order to obtain the modified Mie-Lorenz coefficients, the incident, scattered, and internal electromagnetic fields are expanded in terms of cylindrical coordinates special functions which are, respectively, the Bessel functions and exponentials in the radial and azimuthal directions. In order to exploit a terse mathematical notation, the vector wave functions are introduced as [22]:
The complete explanation of the above vector wave functions and their self and mutual orthogonally relations can be found in the classic electromagnetic books [22]. In the above equation,
In the graphene-based cylindrical structures, the plasmonic state is achieved via illuminating a TEz wave to the structure. Therefore, for the normal illumination, the incident, scattered, and dielectric electromagnetic fields are shown with the superscripts
where
The boundary conditions at the graphene interface at
By applying the boundary conditions in the expanded fields, the linear system of equations for extracting the unknowns can be readily obtained. The solution of the extracted equations for the scattering coefficients leads to:
The same procedure can be repeated for the TMz illumination. The normalized scattering cross-section (NSCS) reads as:
where the normalization factor is the single-channel scattering limit of the cylindrical structures. In order to have some insight into the scattering performance of graphene-wrapped wires, the scattering efficiency for ε1 = 3.9 and μc = 0.5 eV is plotted in Figure 3(b) by varying the radius of the wire. As the figure illustrates, a peak valley line shape occurs in each wavelength. They correspond to invisibility and scattering states and will be further manipulated in the next sections to develop some novel devices. The excitation frequency of the plasmons is the complex poles of the extracted coefficients [24] which will be discussed in the next subsection. Interestingly, the scattering states of graphene-coated dielectric cores are polarization-dependent. By using a left-handed metamaterial as a core, this limitation can be obviated [25].
As in any resonant problem, additional information can be obtained by studying the solutions to the boundary value problem in the absence of external sources (eigenmode approach). Although, from a formal point of view, this approach has many similar aspects with those developed in previous sections, the eigenmode problem presents an additional difficulty related to the analytic continuation in the complex plane of certain physical quantities. Due to the fact that the electromagnetic energy is thus leaving the LSP (either by ohmic losses or by radiation towards environment medium), the LSP should be described by a complex frequency where the imaginary part takes into account the finite lifetime of such LSP. The eigenmode approach is not new in physics, but its appearance is associated to any resonance process (at an elementary level could be an RLC circuit), where the complex frequency is a pole of the analytical continuation to the complex plane of the response function of the system (e.g., the current on the circuit). Similarly, in the eigenmode approach presented here, the complex frequencies correspond to poles of the analytical continuation of the multipole terms (Mie-Lorenz coefficients) in the electromagnetic field expansion.
In order to derive complex frequencies of LSP modes in terms of the geometrical and constitutive parameters of the structure, we use an accurate electrodynamic formalism which closely follows the usual separation of variable approach developed in Section 2.1. We can obtain a set of two homogeneous equations for the
where the prime denotes the first derivative with respect to the argument of the function and
where
When the size of the cylinder is small compared to the eigenmode wavelength, i.e.,
Taking into account that in the non-retarded regime the propagation constant of the plasmon propagating along perfectly flat graphene sheet can be approximated by:
it follows that the dispersion relation (14) for LSPs in dielectric cylinders wrapped with a graphene sheet can be written as:
where
For large doping (
which can be analytically solved for the plasmon eigenfrequencies,
where
In the following example, we consider a graphene-coated wire with a core radius
1 | ||
2 | ||
3 | ||
4 |
Resonance frequencies
In this section, multilayered cylindrical tubes with multiple graphene interfaces are of interest. In order to ease the derivation of the unknown expansion coefficients, matrix-based TMM formulation is generalized to the tubes with several graphene interfaces. Initially, consider a layered cylinder constructed by the staked ordinary materials under TEz plane wave illumination, as shown in Figure 4. The total magnetic field at the environment can be expressed as the superposition of incident and scattered waves as in Section 2.1. The unknown expansion coefficients of the scattered wave can be determined by means of the
Multilayered cylindrical structure consisting of alternating graphene-dielectric stacks under plane wave illumination. The 2D graphene shells are represented volumetrically for the sake of illustration [31].
where C represents the core layer. In the above equation, the dynamical matrix
The argument of the above special functions is
In order to incorporate the graphene surface conductivity in the above formulas, let us consider each graphene interface as a thin dielectric with the equivalent complex permittivity defined in Eq. (3) and utilize the TMM formulation in the limiting case of a small radius at the graphene interface with the wave number of kg, i.e.,
where the free-space impedance
Widely tunable scattering cancelation is feasible by using patterned graphene-based patch meta-surface around the dielectric cylinder as shown in Figure 5. The surface impedance of the graphene patches can be simply and accurately calculated by closed-form formulas, to be inserted in the modified Mie-Lorenz theory [32].
(a) Electromagnetic cloaking of a dielectric cylinder using graphene meta-surface and (b) corresponding electric field distribution [32].
Let us consider a triple shell graphene-based nanotube under plane wave illumination, as shown in Figure 6(a). This structure is used to design a dual-band super-scatterer in the infrared frequencies. To this end, modified Mie-Lorenz coefficients of various scattering channels should have coincided with the proper choice of geometrical and optical parameters. In order to construct the Tn matrix for this geometry, one needs to multiply nine 2 × 2 dynamical matrices, which is mathematically complex for analytical scattering manipulation. Therefore, the associated planar structure, shown in Figure 6(b), is used to develop the dispersion engineering method as a quantitative design procedure of the super-scatter. The separations of the free-standing graphene layers are d1 = d2 = 45 nm in the planar structure, and the transmission line model is used to analyze it. Moreover, the chemical potential of lossless graphene material is μc = 0.2 eV in all layers. The dispersion diagram of the planar structure is illustrated in Figure 7(a), which predicts the presence of three plasmonic resonances in each scattering channel of the tube at around the frequencies that fulfill
(a) Multilayered cylindrical nanotube with three graphene shells and (b) associated planar structure [30]. R1 is denoted with Rc in the text.
(a) Dipole and quadruple Mie-Lorenz scattering coefficients for the tube of Figure 6 and (b) dispersion diagram of the associated planar structure [30]. f1p, f2p, and f3p are the plasmonic resonances of the dipole mode predicted by the planar configuration. The prime denotes the same information for the quadruple mode. f1c, f2c, and f3c are the same information calculated by the exact modified Mie-Lorenz theory of the multilayered cylindrical structure.
In order to design a dual-band super-scatterer, the plasmonic resonances of two scattering channels have coincided by fine-tuning the results of the Bohr’s model. The optimized geometrical and constitutive parameters are Rc = 45.45 nm, d1 = 45.05 nm, d2 = 43.23 nm, ε1 = 3.2, ε2 = 2.1, ε3 = 2.2, and ε4 = 1. Figure 8 shows the NSCS and magnetic field distribution for the dual operating bands of the structure. It is clear that NSCS exceeds the single-channel limit by the factor of 4, and in the corresponding magnetic field, there is a large shadow around the nanometer-sized cylinder at each operating frequency. Other designs are also feasible by altering optical and geometrical parameters. Furthermore, the far-field radiation pattern is a hybrid dipole-quadrupole due to simultaneous excitation of the first two channels. It should be noted that an inherent characteristic of the super-scatterer design using plasmonic graphene material is extreme sensitivity to the parameters. Moreover, in the presence of losses, the scattering amplitudes do not reach the single-channel limit anymore, and this restricts the practical applicability of the concepts to low-frequency windows.
(a) and (b) The NSCS of dual-band super-scatterer respectively, in the first and second operating frequencies and (c) and (d) corresponding magnetic field distributions [30].
As another example, the dispersion diagram of Figure 7(a) along with Foster’s theorem has been used to conclude that each scattering channel of the triple shell tube contains two zeros which are lying between the plasmonic resonances, predicted by the Bohr’s model. Later, we have coincided the zeros and poles of the first two scattering channels in order to observe super-scattering and super-cloaking simultaneously [33]. The optimized material and geometrical parameters are εc = 3.2, ε1 = ε2 = 2.1, Rc = 45.45 nm, d1 = 46.25 nm, and d2 = 46.049 nm. The NSCS curves corresponding to the super-cloaking and super-scattering regimes are illustrated in Figure 9(a) and (b), as well as the expected phenomenon, is clearly observed. The corresponding magnetic field distributions, shown in Figure 9(c) and (d), also manifest the reduced and enhanced scatterings in the corresponding operating bands, respectively. Similar to the dual-band super-scatterer of the previous section, the performance of this structure is very sensitive to the optical, material, and geometrical parameters. By further increasing the number of graphene shells, other plasmonic resonances and zeros can be achieved for the manipulation of the optical response.
Simultaneous super-scattering and super-cloaking using the structure of Figure 6. NSCS for (a) super-cloaking and (b) super-scattering regimes and corresponding magnetic field distributions, respectively, in (c) and (d) [33].
In this section, multilayered graphene-coated particles with spherical morphology are investigated, and corresponding modified Mie-Lorenz coefficients are extracted by expanding the incident, scattered, and transmitted electromagnetic fields in terms of spherical harmonics. It is clear that by increasing the number of graphene layers, further degrees of freedom for manipulating the optical response can be achieved. For the simplicity of the performance optimization, an equivalent RLC circuit is proposed in the quasistatic regime for the sub-wavelength plasmons, and various practical examples are presented.
In this section, the most general graphene-based structure with N dielectric layers, as shown in Figure 10, is considered, and plane wave scattering is analyzed through extracting recurrence relations for modified Mie-Lorenz coefficients. It should be noted that since, in the TMM method, multiple matrix inversions are necessary, unlike the cylindrically layered structures of the previous section, the spherical geometries are analyzed through recurrence relations. Also, scattering from a single graphene-coated sphere has been formulated elsewhere [16], and it can be simply attained as the special case of our formulation.
Spherical graphene-dielectric stack (a) 2D and (b) 3D views [34]. Please note that the numbering of the layers is started from the outermost layer in order to preserve the consistency with the reference paper [35].
The scattering analysis is very similar to that of the single-shell sphere [16], unless the Kronecker delta function is used in the expansions in order to find the electromagnetic fields of any desired layer with terse expansions. Therefore [34]:
By considering
where super-indices (1) in the vector wave functions show that the Hankel functions are used in the field expansions. The boundary conditions at the interface of adjacent layers read as:
Therefore, the linear system of equations resulting from the above conditions is:
where
where the sub/superscripts H and V represent the TE and TM waves, respectively. The directions of propagation of these waves are realized thought the subscripts F (outgoing waves) and P (incoming waves). The effective reflection coefficients are extracted as:
Moreover, it can be readily shown that the transmission coefficients read as:
where
where symbol
The extinction efficiencies of graphene-based particles with different number of layers: (a) two, (b) three, and (c) four [34].
In order to realize the priority of the closed-form analytical formulation with respect to the numerical analysis, the simulation times of both methods are included in Table 2. Considerable time reduction using the exact solution is evident. Moreover, since 3D meshing and perfectly matched layers are not required in this method, it is efficient in terms of memory as well.
Structure | Simulation time | |
---|---|---|
Analytical | CST | |
Figure 11(a) | 0.053214 s | 32 h, 50 m, 18 s |
Figure 11(b) | 0.045831 s | 33 h, 45 m, 25 s |
Figure 11(c) | 0.151555 s | 33 h, 34 m, 55 s |
Comparing the simulation time of CST and our codes [34].
Based on the results of Section 3.1, the modified Mie-Lorenz coefficients of the graphene-based spherical particles form infinite summations in terms of spherical Bessel and Hankel functions. In general, graphene plasmons are excited in the sub-wavelength regime, and only the leading order term of the summation is sufficient for achieving the results with acceptable precision. In this regime, the polynomial expansion of the special functions can also be truncated in the first few terms [22]. Later, the extracted modified Mie-Lorenz coefficients can be rewritten in the form of the polynomials. To further simplify the real-time monitoring and performance optimization of the graphene-coated nanoparticles, an equivalent RLC circuit can be proposed by representing the rational functions in the continued fraction form as [36]:
The equivalent circuit corresponding to the above representation is shown in Figure 12.
The proposed equivalent circuit for the scattering analysis of electrically small graphene-coated spheres [36].
The continued fraction representation for the TM coefficients is:
where
The elements of the equivalent circuit for the TM coefficients read as:
In order to illustrate the application of Mie analysis for the graphene-wrapped structures, let us consider vertical and horizontal dipoles in the proximity of a graphene-coated sphere, as shown in Figure 13. Although in the Mie analysis, the excitation is considered to be a plane wave, by using the scattering coefficients, the total decay rates can be calculated for the dipole emitters, and it can be proven that the localized surface plasmons of the graphene-wrapped spheres can enhance the total decay rate, which is connected to the Purcell factor [16, 37]. The amount of electric field enhancement for the radial-oriented and tangential oscillating dipoles with the distance of xd, respectively, read as:
(a) Vertical and horizontal dipole emitters in the proximity of the graphene-coated sphere and (b) the local field enhancement for various dipole distances with averaged orientation [37].
Figure 13(b) shows the local field enhancement for the average orientation of the dipole emitter in the vicinity of the sphere with R1 = 20 nm, coated by a graphene material with the chemical potential of μc = 0.1 eV. As the figure shows, an enhanced electric field in the order of ∼104 is obtained for the dipole distance of 1 nm with averaged orientation, and it decreases as the dipole moves away from the sphere.
The possibility of a super-scatterer design using graphene-coated spherical particles is illustrated in Figure 14. The design parameters are ε1 = 1.44, R1 = 0.24 μm, and μc = 0.3 eV. The structure can be simply analyzed by the modified Mie-Lorenz coefficients. The general design concepts are similar to their cylindrical counterparts, namely, dispersion engineering using the associated planar structure, as shown in the inset of the figure. Due to the excitation of TM surface plasmons, the normalized extinction cross-section is five times greater than the bare dielectric sphere. Moreover, similar to the cylindrical super-scatterers, by considering a small amount of loss for the graphene coating by assigning
(a) Atomically thin super-scatterer and associated planar structure shown in the inset and (b) corresponding normalized scattering cross-sections by considering lossless and lossy graphene shells [38].
By pattering graphene-based disks with various radii around a dielectric sphere, it is feasible to design a wide-band electromagnetic cloak at infrared frequencies. The geometry of this structure is illustrated in Figure 15. In order to analyze the proposed cloak by the modified Mie-Lorenz theory, the polarizability of the disks can be inserted in the equivalent conductivity method. The extracted equivalent surface conductivity can be used to tune the surface reactance of the sphere for the purpose of cloaking [39].
Wide-band cloaking using graphene disks with varying radii [39].
The other application that can be adapted to our proposed formulation of multilayered spherical structures is multi-frequency cloaking. As Figure 16 shows, by proper design, a single graphene coating can eliminate the dipole resonace in a single reconfigurable frequency. The radius of the sphere is R1 = 100 nm and its core permittivity is ε1 = 3. It can be concluded that double graphene shells can suppress the scattering in the dual frequencies since each graphene shell with different geometrical and optical properties can support localized surface plasmon resonances in a specific frequency. By further increase of the graphene shells, other frequency bands can be generated. Figure 16(b) shows the cloaking performance of a spherical particle with multiple graphene shells. The radii of the spheres are 107.5, 131.5, and 140 nm, and the corresponding chemical potentials are 900, 500, and 700 meV, respectively. The permittivity of the dielectric filler is 2.1 [21].
(a) Single and (b) multi-frequency cloaking using single/multiple graphene shells around a spherical particle [21].
As another example, a dielectric-metal core-shell spherical resonator (DMCSR) with the resonance frequency lying in the near-infrared spectrum is considered. In order to increase the optical absorption, the outer layer of the structure is covered with graphene. The localized surface plasmons of graphene are mainly excited in the far-infrared frequencies and in the near-infrared and visible range; it behaves like a dielectric. By hybridizing the graphene with a resonator, its optical absorption can be greatly enhanced. Figure 17 shows the performance of the structure for various core radii [15].
Strong tunable absorption using a graphene-coated spherical resonator with fixed dielectric core refractive index of n and silver shell thickness of t [15].
The provided examples are just a few instances for scattering analysis of graphene-based structures. Based on the derived formulas, other novel optoelectronic devices based on graphene plasmons can be proposed. Moreover, since assemblies of polarizable particles fabricated by graphene exhibit interesting properties such as enhanced absorption, negative permittivity, giant near-field enhancement, and large enhancements in the emission and the radiation of the dipole emitters [40, 41, 42, 43], the research can be extended to the multiple scattering theory.
Open Access publishing helps remove barriers and allows everyone to access valuable information, but article and book processing charges also exclude talented authors and editors who can’t afford to pay. The goal of our Women in Science program is to charge zero APCs, so none of our authors or editors have to pay for publication.
",metaTitle:"What Does It Cost?",metaDescription:"Open Access publishing helps remove barriers and allows everyone to access valuable information, but article and book processing charges also exclude talented authors and editors who can’t afford to pay. The goal of our Women in Science program is to charge zero APCs, so none of our authors or editors have to pay for publication.",metaKeywords:null,canonicalURL:null,contentRaw:'[{"type":"htmlEditorComponent","content":"We are currently in the process of collecting sponsorship. If you have any ideas or would like to help sponsor this ambitious program, we’d love to hear from you. Contact us at info@intechopen.com.
\\n\\nAll of our IntechOpen sponsors are in good company! The research in past IntechOpen books and chapters have been funded by:
\\n\\nWe are currently in the process of collecting sponsorship. If you have any ideas or would like to help sponsor this ambitious program, we’d love to hear from you. Contact us at info@intechopen.com.
\n\nAll of our IntechOpen sponsors are in good company! The research in past IntechOpen books and chapters have been funded by:
\n\n