Implementation results and comparisons
Due to the fast development of Internet, the traffic load of data network has increased dramatically in past decades. Accordingly, optical network, as the major carrier for data transmission, needs to increase its capacity to meet the increasing data rate requirement. As stated in the white paper of Optical Internetworking Forum (OIF) (see ), the rapid growth of data flow demands optical network to double its capacity every 12-18 months. As a result, this critical requirement pushes various types of optical transmission systems to improve their delivered data rate at the same time.
Generally, in high-speed optical communication, the increase on data rate usually comes with the increase on signal bandwidth and sampling rate. In this case, due to the sensitiveness of optical and electronic devices, the additive transmission noise will inevitably increase as well. Therefore, how to increase the throughput of optical network without loss of robustness is an essential task when designing modern high-speed optical network.
To date, various techniques have been employed to enhance the quality of data transmission in optical network. Among those approaches, Forward error correction (FEC), or commonly called error correction coding (ECC), is viewed as the most cost-effective solution, and has been widely adopted in many industrial optical transmission systems. Many specific FEC codes, including Reed-Solomon (RS), BCH, and LDPC codes, are proposed in different industrial standards for error correction in physical layer. Among them, RS code is the earliest and most widely used FEC code in optical communication. For example, RS (255, 239) was the first generation FEC code for submarine fiber-optical transmission in , and its code rate is still the standard parameter for frame design. In addition, for Ethernet network such as 10GBase-LR in , Reed-Solomon (255, 239) code is also the standard FEC code.
There are several reasons for the wide application of RS code. First, modern long-distance optical network, especially long-haul network, is very high-speed system (10Gbps and beyond). For other promising FEC codes, such as LDPC or Turbo code, the corresponding decoding throughput usually can not meet such stringent requirement on data rate, or with the penalty of very high hardware complexity. Instead, RS decoder can achieve such high throughput with affordable hardware resource. Second, for local optical network, such as Ethernet network, the real-time response is an important metric for system design. Compared with its counterpart, RS code has the particular advantage on low decoding latency. Therefore, RS codes are widely employed in modern optical transmission system and are believed to play an important role in next generation optical networks.
Considering the importance of RS code, its efficient implementation is quite important for the optical transmission system. A low-complexity high-speed RS encoding/decoding system will improve the overall performance significantly. Particularly, since RS decoding is the most complex procedure in the RS-based FEC system, efficient RS decoder design should be well-studied. Therefore, targeted to different level of optical communication ranging from short-distance Ethernet network to long-haul backbone system, this chapter fully introduces efficient VLSI design of RS decoder. In addition, to meet the requirement of 100Gbps era, this chapter also discusses some new FEC schemes for ultra high-speed application (beyond 100Gbps).
The chapter is organized as follows. Section 2 reviews the RS decoding. The low-complexity high-speed RS decoders for short-distance network are discussed in Section 3. Section 4 analyzes performance-improved RS burst-error decoder for medium-distance system. Some recent FEC schemes targeted to 100Gbps long-haul network are introduced in Section 5. Section 6 draws the conclusion.
2. Review of RS decoding
According to the coding theory in , the procedure for decoding RS code contains three main steps: syndrome computation (SC), key equation solving (KES) and Chien search & error evaluation (CSEE). Therefore, the decoding procedure of RS code is summarized as below:
The architecture of SC block of an example RS (255, 239) decoder is shown in Fig. 1. Here
Generally, Berlekamp-Massey (BM) algorithm or modified Euclidean (ME) algorithm can be employed to solve key equation. To data, many efforts have addressed for efficient VLSI implementation of the above two algorithms. In , BM algorithm was reformulated as RiBM with the same regular architecture format compared with conventional ME algorithm in  and , and a folded BM algorithm based on RiBM was introduced in . Reference  and  implemented conventional ME algorithm with systolic and recursive architecture. In Section 3 and Section 4, based on the above efforts, some improved KES algorithms and their corresponding hardware implementations will be discussed for efficient RS decoder design.
Chien search is a widely employed approach to look for error position. Its basic idea is simple but efficient: If Λ(
Based on the above described Chien search and Forney algorithm, the architecture of CSEE block for example RS (255, 239) code is illustrated in Fig. 2. It consists of several unit cells (shown in Fig. 2(a)). Both of the sub-blocks that carry out Chien search and Forney algorithm consist of these basic cells. In the beginning,
The overall architecture of RS decoding is summarized in Fig. 3.
As mentioned in previous paragraph, since KES is the dominating step in the whole RS decoding, Section 3 and 4 will focus on the algorithm and architecture optimization of KES block.
3. Low-complexity high-speed RS decoders for short-distance network
For short-distance optical transmission, such as 10GBase-LR, since the noise rendered from transmission distance is quite limited, the requirement of coding gain is not as strict as long-distance backbone network (which will be discussed later). Therefore, as discussed in Section 1, Reed-Solomon (255, 239) code is widely used in this kind of network due to its high code rate and good error correction capability.
Although coding gain is not the major concern in this scenario, because of limited hardware resource, in order to implement efficient RS decoder, the designers have to consider the challenge of achieving high data rate with low hardware complexity. Accordingly, optimization of RS decoding architectures is necessary for high-efficiency hardware implementation.
In this section, based on the two main RS decoding algorithms, the improved ME-based and BM-based decoders are introduced.
3.1. rDCME-based RS decoder
In traditional ME algorithm, the inherent degree computation and systolic architecture renders large consumption of area and power (see  and ), which is not suitable for the discussed application. To reduce the unnecessary degree computation, DCME algorithm was introduced in . By generating internal switch and shift signals, the DCME algorithm can achieve the same function as ME algorithm without degree computation.
It is needed to point out that the initialization of DCME and ME is different due to the consideration of the design of the following introduced FSM.
Fig. 4 shows the FSM for generating control signals. In each iteration, there are two possible states: S0 and S1. S0 represents the case when both of
When the leading coefficients of
It should be pointed out that after every polynomial computation, if being carried out, the original leading coefficient of
If the previous state and the current state are both S0, it indicates that the polynomial computation is able to be executed in the two successive iterations. So
If the previous state is S0 and the current state is S1, KES block would process shift operation to eliminate leading zero (
If the previous state and the current state are both S1, it indicates that the two successive iterations are both in shifting operations. Similar with the above condition,
If the previous state is S1 and the current state is S0, the polynomial computation would be executed (
Fig. 5 shows the detailed architecture of rDCME algorithm. The KES block is designed with single PE. It is commonly known that recursive architecture usually can not be pipelined due to data dependency. And recursive architecture is always a bottleneck for high-speed. However, in the rDCME architecture, these disadvantages can be avoided. A 11-stages (2
Table 1 presents performance comparisons between the rDCME RS decoder and other existing RS decoders. It can be observed that the rDCME decoder has very low hardware complexity and high throughput. Compared with the existing ME architectures, the total gate count of the rDCME architecture is reduced by at least 30.4%. Therefore, the hardware efficiency is improved at least 1.84 times, which means under the same technology condition our design would be much more area-efficient compared with other existing RS decoder designs for multi-Gb/s optical communication systems.
3.2. PI-iBM-based RS decoder
Besides ME algorithm, BM algorithm is another main decoding approach for RS codes. An important and inevitable disadvantage of traditional iBM/RiBM algorithms is the high cost of area or iteration time for computing error value polynomial Ω(
In each iteration, scratch polynomial B(
Furthermore, in order to reduce hardware complexity significantly without sacrificing throughput per unit area, pipeline interleaving techniques in  is employed in the PI-iBM algorithm and architecture proposed in .
As depicted in the following PI-iBM algorithm, interleaving factor
The PI-iBM architecture consists of two blocks: pipeline interleaving error locator update (PI-ELU) block and pipeline interleaving discrepancy computation (PI-DC) block. As it is illustrated in Fig. 6, PI-ELU block is designed to execute Step3 for updating polynomials. Fig. 6(a) shows the internal architecture of the
In addition, PI-DC block mainly implements the function of updating discrepancy
Table 2 gives the implementation results of PI-iBM decoder and also lists some other designs. From this table we can find that the PI-iBM architecture deliver very high throughput with relatively low hardware complexity: the total throughput rate and throughput per unit area in the PI-iBM design are at least 200% more than those existing works. To achieve data rates from 10 Gb/s to 100 Gb/s, PI-iBM decoder has the lowest hardware complexity. If 65 nm CMOS technology is used in the implementation, the throughput of our design can be increased significantly. Thus the current designs can fit well for 10 Gb/s-40 Gb/s optical communication systems. For 100 Gb/s applications, we may need two to three independent hardware copies of the designs. However, the PI-iBM architecture will remain to have the lowest hardware complexity compared with existing designs. In short, the PI-iBM decoder is very area-efficient for very high-speed optical applications.
RiBM in 
4. High-performance low-complexity RS burst-error decoders for mediate distance network
For mediate distance optical network, such as Metro Ethernet network, traditional RS decoding can not provide enough coding gain for the data transmission in this scenario. Instead, enhanced FEC scheme should be employed for improved error-correcting capability. Notice that in this kind of systems, long burst error is the major error pattern in transmission procedure; therefore, burst-error decoding algorithm and architecture are attractive solutions for this case. In this chapter, we introduce efficient burst-error-correcting RS decoder to meet the requirement in this type of application.
4.1. Reformulated inversionless burst-error correcting (RiBC) algorithm
As excellent Maximum Distance Separable (MDS) code , RS code is very effective in correcting long burst errors. However, previous RS burst decoding algorithms in  and  are infeasible for hardware implementation due to their high computation complexity. In , Wu proposed a new approach to track the position of burst of errors. By introducing a new polynomial that is a special linear function of syndromes, this approach can correct a long burst of errors with length up to 2
Although the approach in  has reduced computation complexity, it still contains inversion operation and long data path, which impedes its efficient VLSI implementation, therefore, the algorithm in  was reformulated to the RiBC algorithm. The RiBC algorithm is a kind of list decoding algorithm. 8 polynomials are updated simultaneously in each iteration. After every 2
The RiBC algorithm is targeted for correcting burst error plus some random errors. By observing step2.3 and step 2.4, it can be founded that both of them are quite similar to the essential update equations in RiBM algorithm (see ). Therefore, it inspires us that both of the RiBC algorithm for burst-error correction and RiBM for random error correction can be implemented one the same hardware. Furthermore, considering single burst error correcting algorithm in  is a specific instance of RiBC algorithm with
4.2 Unified hybrid decoding (UHD) architecture
The overall UHD architecture is shown in Fig. 9. Here different blocks are used to process different steps in algorithm. Since excluding KES and PT blocks, other blocks are quite straightforward to be implemented; in this section we only introduce the architectures of KES and PT blocks and focus the discussion for the case of RiBC work mode. Interested readers can refer to  for the introduction of other blocks and other modes.
4.2.1. KES block architecture
For RiBC algorithm, KES block is employed to carry out steps 2.4. Fig. 10 presents the overall architecture of KES block and the internal structure of its two types of processing elements (PE): PE0 and PE1. As shown in Fig. 10(a), the KES block consists of 2
4.2.2. Position track (PT) block architecture
PT block is used to track the longest consecutive polynomials that are identical (step 3). Fig. 11 illustrates the architecture of PT block. The input, andfrom KES block at the
Table 3 presents the comparison between UHD and RiBM decoder. Here for the example RS (255, 239) code,
|4846||Unavailable to decode|
(the worst case)
|Unavailable to decode|
5. Ultra high performance FEC schemes for long-haul network
For long-haul optical transport networks (OTN), because the performance loss mainly results from long distance transmission, the requirement on coding gain is very strict. This requirement even gets more and more strict when optical backbone networks enter 100Gbps era. Based on OIF whitepaper, the new FEC schemes applied in 100G long-haul systems should achieve waterfall performance at very low BER region. Meanwhile the other requirements for OTN FEC such as capable of achieving very high speed and having very low error floor still remain. Therefore, 100Gbps era puts more challenges on FEC schemes. In this section, some recent FEC schemes targeted to 100Gbps applications are introduced.
5.1. Hard-decision BCH product codes
One candidate for 100Gbps application is BCH-based binary product codes such as the one presented in . The component BCH (992, 960) and (987, 956) codes are constructed carefully over
5.2. Some other soft-decision based concatenated codes
The above binary product scheme is based on hard-decision. If soft information is available in the system, soft-decision decoding approach can work with the product codes to enhance the overall decoding performance. Fig. 15 illustrates a LDPC code concatenated with BCH-based product code for long-haul network systems in . In this scheme, LDPC code is used as inner code and BCH-based product code is used as outer code. Some other soft-decision based concatenated FEC scheme such as RS code concatenated with LDPC coding system in  can also provide significant coding gain for targeted ultra high-speed optical communication.
With the evolution of optical network, the employed FEC scheme has been developed in several generations. The requirement on high data rata and large coding gain is always challenging the design for efficient FEC decoder. In this chapter, targeted to different types of optical transmission networks, ranging from local Ethernet to long-haul backbone system, different FEC solutions with efficient VLSI implementations are discussed. For short-distance networks, two kinds of area-efficient high-speed RS decoders are analyzed for the scenario. For mediate distance networks, which require some tradeoff between decoding performance and hardware efficiency, the introduced RS burst-error decoder can be employed to meet such requirement. For long-haul systems, which have stringent requirement on decoding performance, some candidate FEC schemes targeted to the future 100Gbps era are discussed. In summary, these various FEC architectures and schemes are good candidates for their specific targeted optical transmission applications.