Open access peer-reviewed chapter

A New Approximation Method for Constant Weight Coding and Its Hardware Implementation

By Jingwei Hu and Ray C.C. Cheung

Submitted: November 7th 2017Reviewed: February 7th 2018Published: October 31st 2018

DOI: 10.5772/intechopen.75041

Downloaded: 221

Abstract

In this chapter, a more memory-efficient method for encoding binary information into words of prescribed length and weight is presented. The solutions in existing work include complex float point arithmetic or extra memory overhead which make it demanding for resource-constrained computing platform. The solution we propose here solves the problems above yet achieves better coding efficiency. We also correct a crucial error in previous implementations of code-based cryptography by exploiting and tweaking the proposed encoder. For the time being, the design presented in this work is the most compact one for any code-based encryption schemes. We show, for instance, that our lightweight implementation of Niederreiter encrypting unit can encrypt approximately 1 million plaintexts per second on a Xilinx Virtex-6 FPGA, requiring 183 slices and 18 memory blocks.

Keywords

  • code-based cryptography
  • McEliece/Niederreiter cryptosystem
  • constant weight coding
  • FPGA implementation

1. Introduction

Most modern public-key cryptographic systems rely on either the integer factorization or discrete logarithm problem, both of which expect to be solvable on large-scale quantum computers using Shor’s algorithm [1]. The recent breakthroughs of powerful quantum computing have shown their strength in computing solutions to the hard mathematical problems mentioned [2, 3]. The cryptographic research community has identified the urgency of insecure vulnerabilities rooted in these cryptosystems and begun to settle their security on alternative hard problems in the last years, such as multivariate-quadratic, lattice-based, and code-based cryptosystems [4]. In this chapter, we address the problem of encoding information into binary words of predefined length and Hamming weight in resource-constrained computing environment, e.g., reconfigurable hardware, embedded microcontroller systems, etc. This is of interest, in particular, of efficient implementations of McEliece’s scheme [5, 6] or Niederreiter’s scheme [7], the most prospective candidates for code-based cryptography.

In the case of Niederreiter/McEliece encryption, the binary stream of plaintext is requested to be converted into the form of constant weight words. Constant weight means that there exists a constant number of “1” in the binary plaintext. Note that in the hybrid Niederreiter encryption systems (KEM/DEM encryption) [8, 9, 10], KEMs are designed to exchange symmetric keys securely, and DEMs use these symmetric keys for transmitting long messages. This class of encryption techniques does not get constant weight coding involved. Nevertheless, if we want to construct a complete code-based cryptography including standard public-key encryption, digital signature, and hybrid encryption, constant weight coding must be efficiently implemented as it is required by both public-key encryption [11, 12] and signature [13]. The exact solution [14] of constant weight coding needs to compute large binomial coefficients and has a quadratic complexity though it is assumed to be optimal as a source coding algorithm. In [15], Sendrier proposed the first solution of linear complexity by incorporating Huffman codes. Later, the author of [15] further improves its coding efficiency very close to 1 by means of Goloumb’s run-length encoding [16]. The new proposal is particularly easy to implement on hardware and has linear time complexity. The disadvantage though is that the length encoding is variable [17]. He also proposed an approximation version in this paper by regulating the value of d to the power of two. This approach significantly simplifies the encoding procedure and thus improves coding throughput.

Heyse et al. [18, 19] continued the research and proposed to adapt the approximation method in [17] to embedded system applications. Their design is implemented on AVR microcontrollers and Xilinx FPGAs. However, we observe that such method is not applicable to large parameter sets for the Niederreiter scheme (see Table 1) [20, 21, 22]. The work in [18, 19] preserves a lookup table of pre-stored data with the space complexity of Onto encode input messages into constant weight words. The memory overhead of this table is still intolerable for small embedded systems, and therefore their design is unscalable if n is large. CFS signature scheme [20] exploits very large Goppa code, and it requires to compress the lengthy constant weight signatures into a binary stream. MDPC-McEliece/Niederreiter encryption [23] uses very large n for practical security levels. For instance, n is set as large as 19,712 for 128-bit security and 65,536 for 256-bit security. Baldi et al. proposed a novel LDGM sparse syndrome signature scheme [13] with compact key size, which also requests a large constant weight coding within the signature generation. His method was successfully attacked in 2016 by a French research group [24]. At the time being, we do not have a considerably lightweight yet efficient solution for the constant weight encoding if we consider realizing such encoding in real-world applications.

ntSecurity levelApplicationCoding systemPublic/secret key size (kbits)Prestored data for CW coding (kbits)
10243860 bitEncryptionGoppa code239/151.44
20482780 bitEncryptionGoppa code507/108.38
269057128 bitEncryptionGoppa code913/182.410.5
6624117256 bitEncryptionGoppa code7488/2268.525.9
65536980 bitSignatureGoppa code9000/1019256
262144980 bitSignatureGoppa code40,500/45251024
1048576880 bitSignatureGoppa code160,000/20,0254096
98001880 bitSignatureLDGM code936/—19.1
2496023120 bitSignatureLDGM code4560/—58.4
4600029160 bitSignatureLDGM code13,480/—117.2
19712134128 bitEncryptionMDPC code9.6/—77
65536264256 bitEncryptionMDPC code32/—256

Table 1.

Parameters recommended used in the Niederreiter cryptosystem, referenced partly from [13, 19, 23, 25].

The purpose of this work is to tweak Sendrier’s approximation method [17] and hence to make it easy to implement for all possible secure parameters of the Niederreiter cryptosystem proposed in literature while maintaining the efficiency. Our contributions include:

  1. We propose a new approximation method of constant weight coding free from complicated float point arithmetic and heavy memory footprint. This method permits us to implement a compact yet fast constant weight encoder on the resource-constrained computing platform.

  2. We improve the coding efficiency by fine-tuning the optimal precision for computing the value of d, in comparison with other approximation methods. The experiments have shown that the performance of our new method is better than Heyse’s approximate version [18] and even comparable to Sendrier’s original proposal [17].

  3. We integrate our design with the Niederreiter encryption and obtain a more compact result. We fix a critical security flaw of Heyse et al.’s Niederreiter encryptor [19]. Our secure implementation of Niederreiter encryptor can encrypt approximately 1 million plaintexts per second on a Xilinx Virtex-6 FPGA, requiring 183 slices and 18 memory blocks.

This chapter is organized as follows. Sendrier’s proposal of constant weight coding and its approximation variant [17, 18] is first revisited in Section 2. After analyzing the downside of these schemes, we are motivated to propose a new approximation method and to fine-tune it for an optimal source coding performance, presented in Section 3. Our detailed implementations for the proposed constant weight encoder/decoder and Niederreiter encryption unit on FPGAs are described in Section 4 and Section 5. We present our experimental results compared with the state of arts in Section 6. Finally, Section 7 summarizes this chapter.

2. Sendrier’s methods for constant weight coding

Sendrier presented an algorithm for encoding binary information into words of prescribed length and weight [17]. His encoding algorithm returns a t-tuple δ1δ2δtin which δis are the lengths of the longest strings of consecutive “0”s. This method is easy to implement and has linear complexity with a small loss of information efficiency. In this work, we unfold the recursive encoding and decoding algorithms originated from [17] and rewrite them in Algorithm 1 and Algorithm 2 [26].

We use the same notations from [17, 26] in the above two algorithms to keep consistency. For example, readBimoves forward and reads i bits in the stream B and returns the integer whose binary decomposition has been read, most significant bit first; WriteBimoves forward and writes the binary string i into the stream B; and encodefdδindexdreturns a binary string and decodefddBreturns an integer. These two functions are actually the run-length encoding and decoding methods proposed by Golomb [16, 17]. best_dntreturns an integer such that 1best_dntntand Sendrier suggested to choose it close to the number defined by Eq. (1). In fact, best_dntcan take any value in the range though the efficiency would be reduced if this value is too far from Eq. (1):

d=nt121121tE1

Sendrier also presented an approximation of the best d where the values of d (given by Eq. (1)) was restricted to the power of two [17]. More precisely, d is first computed via Eq. (1) and then round to 2log2d. This approximation greatly simplifies the functions of encodefdand decodefdand therefore outperforms in speed, while the loss of coding efficiency is trivial. The simplified versions of encoding and decoding with encodefdand decodefdafter approximation are described as follows [26]:

encodefdδd=base2δuE2
decodefddB=readBuE3

where base2δudenotes the u least significant bits of the integer δwritten in base 2 and u=log2d. For the above two equations, the minimum allowed value of d is noteworthy in the case of d=1. In this case we have u=0, and therefore we define by purpose that base2δ0=nulland readB0=0to guarantee that our algorithm applies to all possible u.

Recently, Heyse et al. implemented Niederreiter encryption scheme on embedded microcontrollers in which they used a lookup table to compute the value of d for constant weight encoding [18]. Their method is based on the approximation method from [17]. One major contribution of their work is they observe that the last few bits of n can be ignored for constant weight encoding because these bits make little difference to the value of d. They do not keep ntentries but instead n entries; the least significant log2tbits of n are not considered and are substituted by t. This method significantly reduces the size of the lookup table. According to our analysis, the lookup table is shrunk to roughly On. It works pretty well for small parameters of n, for example, n=211in the applications of Goppa code-based McEliece or Niederreiter encryption schemes. However, we occasionally found that it does not work well when we were implementing a Niederreiter signature scheme, called CFS signature. CFS requires an extremely large value of n, typically n=218,n=220. On the one hand, the size of lookup table increases linearly with n, resulting in somewhat unscalability. On the other hand, the coding efficiency drops dramatically and thus lowers the throughput of the constant weight encoder as n increases. All these downsides motivate us to figure out better ways of computing d. We would describe and analyze our new methods in the next section.

3. Proposed approximation method of d

3.1. Reduce memory footprint and computational steps

The computation of the value of d is the most crucial step of constant weight encoding and decoding, as suggested by Eq. (1) which involves floating-point arithmetic. However, many embedded/hardware systems do not have dedicated floating-point units for such computations. [19] proposed to replace floating-point units by a lookup table with predefined data for reconfigurable hardware. The problem of their method is that, for some large nt, the lookup table could be sizeable. For example, n=216t=9requests the size of lookup table to be 256 kb, which is obviously not a negligible memory overhead for embedded systems.

To solve this problem, we propose to eliminate such lookup table by computing d directly using fixed-point arithmetic. We separate the computation of d into two parts. In the first part, θt=11/21tis precomputed and stored in the fixed-point format. In the second part, nt12θis then efficiently calculated by a fixed-point multiplier. In this fashion we notably shrink the size of lookup table from Onto Ot[26].

Furthermore, we substitute nt12by n due to the following observations [26]:

  • ntsuch that nt12n.

  • Eventually d must be round to an integer, and hence the difference between nt12θand is very likely to be ignored.

This substitution enables the removal of the computational steps of nt1/2, and hence a faster and simpler realization of constant coding which makes use of a single integer multiplication is achievable.

In summary, our new proposal of the approximation of d is as follows [26]:

d=nθtE4

where θt=11/21tis a function of t and precomputed. Our new approximation of d is lightweight, requiring only one multiplication. In the following, we will demonstrate that this method also permits reasonable high-coding efficiency as a source coding algorithm.

3.2. Represent θtin fixed-point format

As aforementioned, θtis actually a vector of fractional numbers and should be stored in fixed-point format. Note that the integer part of θtis always 0, and therefore we only need to preserve its fractional part. Hereafter, we denote our fixed-point format as fixed_0_i, where i indicates the bits we have used for storing the fractional part of θt.

In practice, we hope to use fixed_0_i with the smallest i while maintaining desirable coding efficiency [26]. Smaller i means lower data storage and a simpler multiplication with smaller operand size, which is particularly advisable for resource-constrained computing platforms. Indeed, one of the key issues of this chapter is to determine the best fixed_0_i for θt. In the next section, we describe our experiments on exploring the optimal fixed_0_i.

3.3. Find the optimal precision for constant weight encoding

The purpose of the precision tuning is to find the lowest precision that still maintains a relatively high coding efficiency. A lower precision means we can use a multiplier of smaller operand size leading to better path delay and slice utilization. A higher coding efficiency means one can encode more bits from the source into a constant weight word. This is of interest, in particular, when someone encrypts a relatively large file using code-based crypto: It takes much less time for encryption if we have a high coding efficiency close to 1 [26].

To find the optimal precision of fixed_0_i, we studied the relationship between distinct i’s and their coding performance. In our experiments, all possible precision from fp_0_32 to fp_0_1 are investigated to compare with the Sendrier’s methods [17] and Heyse’s approximate version [18].

We measured the coding efficiency by calculating the average coding length for a successful encoding because longer coding length indicates a better coding efficiency. Since constant weight coding is variable length coding, we must consider how different plaintexts as input could affect the performance in order to determine which approximation is the best. To be thorough, different types of binary plaintexts, classified by the proportion of “1” contained, should be tested for evaluating the real performance of different encoding approximation methods. In our instances, we measure three particular types to simplify the model: “0” dominated texts (p = 0.1, “1” exists with probability of 0.1 in the plaintext), balanced texts (p = 0.5, “1” exists with probability of 0.5), and “1” dominated texts (p = 0.9, “1” exists with probability of 0.9) (Figure 1).

Figure 1.

The performance of different methods for choosing the optimal d. We have listed five most frequently used sets of n t for the Niederreiter cryptosystem. We have done three experiments for each n t in which the input binary message contains “1” with the probability p = 0.1, 0.5, and 0.9, respectively. The results of each experiment are obtained by running 10,000 different messages. The X-axis lists different methods including Senderier’s primitive [17], Senderier’s approximation [17], Heyse’s approximation [19], and our n*fixed_0_16 — N*fixed_0_2. The Y-axis represents the average length (bits) of the input message read for a successful constant weight encoding.

Figure 2 describes the coding performances when we adjust the precision of θt. Taken as a whole, the p = 0.1 group and the p = 0.5 group have a similar trend of average message length encoded as the arithmetic precision decreases: The message length drops slightly from n*fixed_0_16 — n*fixed_0_2 in consistency. On the contrary, the p = 0.9 group appears to be quite different where the numbers of bits read for a single constant weight coding first stay stable and then drop with the approximation precision decreasing. The numbers first keep stable because the loss of precision in θtis comparatively trivial but if the precision drops too low, for instance, with fixed_0_2 representation θt=0for 2t38and hence d=0. It leads to a constant n and small value of i in Algorithm 3 forcing us to read more bits of input stream before the algorithm halts. According to the evaluation criteria mentioned in the last paragraph, we compute the average length of the three types of plaintexts and identify the best approximation of d from our proposal after analyzing the statistics obtained. On the one hand, the n*fixed_0_5 group outperforms at n=210t=38and n=211t=27. On the other hand, the n*fixed_0_4 group beats the others at n=216t=9, n=218t=9, and n=220t=8.

Figure 2.

best_d module for CW encoder and decoder. We list here the detailed configurations of n = 2 11 , t = 27 for demonstrative purpose.

Table 2 compares our proposed methods with the Sendrier’s [17], Sendrier’s original approximation using power of 2 [17], and Heyse’s table lookup approximation [18]. From this table, it is seen that our proposal gains better coding efficiency than the original approximation and Heyse’s approximation among all five parameter sets used for the Niederreiter scheme. Note that the average number of bits we have to read before producing the constant weight words is upper bound by log2nt, and the thus the ratio of the average number read and the upper bound measures the coding efficiency [17]. Additionally, our proposal even outperforms the Sendrier’s method at two of these parameter sets—n=210t=38and n=211t=27with 5.05 and 0.95% of improvements, respectively. It is also worth mentioning that for n=216t=9, n=218t=9, and n=220t=8, the performance of our proposal falls slightly behind with 2.37, 2.08, and 2.22% of loss when compared with the Sendrier’s method; it nonetheless outruns Sendrier’s approximation and Heyse’s approximation. In particular, the performance of Heyse’s approximation becomes unfavorable with 23.56% loss at n=220t=8, and we are pushing the limits of Heyse’s method here as the lower bits of n are innegligible and cannot be removed with such large n.

ntMethodNumber of bits readCoding efficiencyEfficiency improved
MaximumMinimumAverage
21038Sendrier’s [17]236164214.7093.19%
Sendrier’s approx. [17]263124210.4591.32%−1.98%
Heyse’s approx. [19]261143210.7391.44%−1.85%
n*fixed_0_5311183225.5497.89%+5.05%
21127Sendrier’s [17]208160194.4695.51%
Sendrier’s approx. [17]227119187.5592.11%−3.56%
Heyse’s approx. [19]224127190.9693.78%−1.80%
n*fixed_0_5361132196.3096.41%+0.95%
2169Sendrier’s [17]133113124.8799.50%
Sendrier’s approx. [17]13377117.8093.84%−5.10%
Heyse’s approx. [19]13386116.3492.68%−6.83%
n*fixed_0_413595121.9897.20%2.37%
2189Sendrier’s [17]148132142.6199.38%
Sendrier’s approx. [17]15191135.1794.18%−5.22%
Heyse’s approx. [19]300101133.2192.81%−6.60%
n*fixed_0_4154112139.6497.31%2.08%
2208Sendrier’s [17]149135144.0099.52%
Sendrier’s approx. [17]15199136.9494.64%−4.90%
Heyse’s approx. [19]26517110.0776.07%−23.56%
n*fixed_0_4158109140.8197.31%2.22%

Table 2.

The coding performance of the optimal d chosen from our approximation method.

4. Proposed constant weight encoder and decoder

4.1. best_d module

The best_d module is the most critical arithmetic unit which computes the best value of d according to the inputs n and t. Our proposal of computation of best_dconsists of three stages which performs the following task in sequence:

  1. Compute θtvia a priority encoder. As discussed in Section 3, format fixed_0_5 is chosen to represent θtfor n=210t=38and n=211t=27, and fixed_0_4 is used for n=216t=9, n=218,t=9, and n=220,t=8. We notice via analysis that for some t, the values of θtare identical. For instance, θ6=θ7=0.000112=0.09375in fixed_0_5 format, shown in Table 3. This observation inspires us to exploit priority encoder to streamline the encoding of θt.

  2. Compute nθtvia a fixed-point multiplier. Xilinx LogiCORE IP is configured to implement high-performance, optimized multipliers for different pairs of nand t. The fractional part of the multiplication result is truncated, but its integer part is preserved for the next stage to process.

  3. Output the value of dand u. Recall that the value of nθtmust be round to d=2u. Another priority encoder is utilized to decode the integer part of nθt. The detailed decoding process is structured as lookup table mapping illustrated in Table 4.

Value of tθt=0.θ1θ2θ3θ4θ52*θt=0.θ1θ2θ3θ42
22t3800000N/A
11t2100001
8t10000100001
6t700011
t=5001000010
t=400101
t=3001100011
t=2010010100
t=110,0001000

Table 3.

Encoding of θt.

This θtis represented in fixed_0_5 form, e.g., θt=i=15θi2i. This format is used in n=210t=38and n=211t=27.


This θtis represented in fixed_0_4 form, e.g., θt=i=14θi2i. This format is used in n=216t=9, n=218t=9, and n=220t=8.


Integer part of nθtValue of dValue of u
t>21821919
217<t21821818
216<t21721717
215<t21621616
214<t21521515
213<t21421414
212<t21321313
211<t21221212
210<t21121111
29<t21021010
28<t29299
27<t28288
26<t27277
25<t26266
24<t25255
23<t24244
22<t23233
2<t22222
1<t221
t110

Table 4.

Decoding of nθt.

Figure 2 depicts our best_d unit. This unit works in three-stage pipelines. It first computes θtand then obtains nθtusing a multiplier. Finally, the value of d would be determined by a priority decoder.

4.2. Bin2CW encoder

Figure 3 overviews the architecture of the proposed constant weight encoder. Input binary message is passed inward the encoder by means of nonsymmetric 8-to-1 FIFO-read which exactly imitates the function of readB1. A serial-in-parallel-out shift register is instantiated to perform readBu,0ulog2n2. The proposed best_d module is exploited here to calculate the value of d. The values of n, t, and δare accordingly refreshed using three separate registers.

Figure 3.

General architecture of CW encoder.

4.3. CW2Bin decoder

Figure 4 renders the architecture of the proposed constant weight decoder. A symmetric m-to-m bit FIFO is used to read the input t-tuple word by word. This logic is indeed the bottleneck of the constant weight decoder when compared with the encoder. Three registers are utilized to update the values of n and t as the Bin2CW encoder does, δ. The major difference is that the shift register here outputs the value of δ bit by bit as step 9 of Algorithm 2 demands.

Figure 4.

General architecture of CW decoder.

5. Integrating with the Niederreiter encryptor

In this section, we demonstrate that the proposed Bin2CW encoder can integrate into the Niederreiter encryptor for data encryption, shown in Algorithm 3.

Algorithm 3. Niederreiter Message Encryption, referenced from [25]

  Input: message vector m, public key pk={Ĥ,t} where Ĥis an n by mt matrix
  Output: ciphertext c
1 Bob encodes the message m as a binary matrix/vector of length n and weight at most t.
2 Bob computes the ciphertext as c=ĤmT; mTis the transpose of matrix m.
3 return c.

The Bin2CW encoder performs the first step in Algorithm 3. Note that Bin2CW encoder returns a t-tuple of integers δ1δt, which represents the distance between consecutive “1”s in the string. Nevertheless such t-tuple cannot be directly transported to compute the ciphertext. We believe that the way Heyse et al. [19] encrypts c=ĤmTwith m=δ1δtis incorrect due to two reasons [26]:

  1. It is very likely that δi=δjwhere ijsuch that the number of errors is less than t, and it is assumed to be insecure for cryptanalysis.

  2. δ1δtreturns the integer ranging from 0 to nt, but the constant weight word exactly ranges from 0 to n. In other terms, the last t rows of the public key ĤTare never used.

To correct this weakness from [19], we propose to generate the “real” constant weight binary words of length n and Hamming weight t. Assume the constant weight is represented by i1it, the coordinates of the “1”s in ascending order, then i1=δ1, i2=δ2+i1+1, , and it=δt+it1+1are computed as the input of the second step, Algorithm 3.

Figure 5 illustrates our revision of Niederreiter encryption unit on the basis of [19]. The public key ĤTis stored in an internal BRAM and row-wise addressed by the 11-bit register. Two 11-bit integer adders are instantiated to transform δ1δtto i1itwhich are eventually stored in the 11-bit register. The vector–matrix multiplication in step 2, Algorithm 3, is equivalent to XOR operation of the selected rows of ĤT, which can be implemented as a GF2297adder in this figure. It is also worth noting that the vector–matrix multiplication works concurrently with the CW encoding: Whenever a valid ikhas been computed, it is transferred immediately to the GF2297adder for summing up the selected row. Once the last ithas been computed, the last indexed row of ĤTalso has been accumulated to the sum. This sum, stored in the 297-bit register, is now available to be interpreted as the ciphertext.

Figure 5.

Block diagram of the Niederreiter encryptor.

Our final remark is for the side channel attacks of code-based crypto using constant weight encoding (CWE). Admittedly, we cannot give a satisfying answer of them at the time being. We believe this is an open problem left to be solved. For the time being, if the users decide not to take the risk of timing attacks, we suggest forcing the CWE to be constant time. We can set the maximum time (it happens when we have all-zero input) for whatever the input is. Nevertheless, the price is a significant drop of timing performance.

We give here our analysis of timing attacks: The attackers can compromise the CWE if and only if he could analyze the timing differences among different inputs and use this information to recover the entire message. Unfortunately, the timing character of the operation of reading “1” or “0” is different: when reading “1,” it consumes only 1 clock cycle count, whereas when reading “1,” it continues to read log2dmore bits from B, consuming 1+log2dcounts. These behaviors appear at first sight to be vulnerable to timing attacks: For different inputs, the execution time is slightly different, and the distinction between reading “0” and reading “1” is also significant. However, statistical approaches as [27, 28] have introduced against RSA cryptosystems which seem to be not helpful: To our way of thinking, the situation we encounter is much more difficult. (1) We need to recover this particular message under attack, but the timing differences among different messages that we collect do not leak any useful information on the targeted message. On the contrary, in RSA we can use a large number of messages and compare the timing for recovering the secret key in a bit-by-bit fashion. (2) Note that dis changing each iteration according to the current state of nand t. That is to say, when reading “1,” the timing is variable, sensitive to how “1” and “0” are permuted in the message and thus difficult to predict. Most importantly, even if CWE is somehow compromised, it does not reveal any information about secret keys. In the case of decryption, the ciphertext is first decrypted by an error correcting decoder (typically, Goppa-code or MDPC-code decoder) which holds the secret key. The result after error correcting is a GF2nvector, and then this vector is encoded by CWE for the plaintext recovered. We can see the key points here: Timing attacks should be mounted on error correcting decoders rather than constant weight decoders for retrieving the secret keys. Perhaps a better strategy is to mount timing attacks on CWE for recovering the plaintext directly. This raises one more question: how do we distinguish or measure the peculiar timing of CWE out of the total execution time, given that error correcting decoders also take nonconstant time for decoding? This is indeed a very exciting topic for which we would investigate in our future work.

6. Results and comparisons

We captured our constant weight coding architecture in the Verilog language and prototyped our design on Xilinx Virtex-6 FPGA (Table 5). The reason why we did our experiments on Xilinx Virtex-6 is principally about a convention. Recent progress in FPGA implementations of code-based cryptography accept Virtex-6 or even lower ends for implementation aspects [19, 29, 30, 31, 32]. The benefit of using Virtex-6 from our standpoint is that we could fairly compare our design with others given that most of them are also implemented on Virtex-6. To the best of our knowledge, the only compact implementations of constant weight coding have been proposed by Heyse et al. [19]. Their lightweight architecture is generally identical to ours except the design of best_d module. Their best_d module works in two pipeline stages: In the first stage, it retrieves the value of u by table lookup. Then in the second stage, it outputs d according to the value of u using a simple decoder. Comparatively, our best_d module has three stages of the pipeline, and thus it leads to a lower throughput, but our architectures are smaller and improve the area-time tradeoff of the constant weight coding implementations proposed by Heyse et al. [19], shown in Table 5. In particular, we use only one 18 kb memory block for all parameter sets of our experiments.

AlgorithmPlatformArea [slices]Memory blocks [36 + 18 kb RAM]Frequency [MHz]Throughout [Mbps]Coding efficiency
(a) n = 210, t = 38
This workBin2CW, new approx.Xilinx xc6vlx240t740 + 1330160.297.89%
CW2Bin, new approx.790 + 1330148.1
Heyse et al. [19]Bin2CW, Heyse’s approx.Xilinx xc6vlx240t1100 + 1310178.891.44%
CW2Bin, Heyse’s approx.880 + 1310162.1
(b) n = 211, t = 27
This workBin2CW, new approx.Xilinx xc6vlx240t910 + 1350187.296.41%
CW2Bin, new approx.950 + 1340168.6
Heyse et al. [19]Bin2CW, Heyse’s approx.Xilinx xc6vlx240t1180 + 1340208.493.78%
CW2Bin, Heyse’s approx.1100 + 1340164.6
Sendrier [17]*Bin2CW, originalIntel Pentium 4240017.395.51%
Bin2CW, approximate33.092.11%
(c) n = 216, t = 9
This workBin2CW, new approx.Xilinx xc6vlx240t1030 + 1440316.197.20%
CW2Bin, new approx.1090 + 1310212.9
Heyse et al. [19]Bin2CW, Heyse’s approx.Xilinx xc6vlx240t9010 + 3240182.392.68%
CW2Bin, Heyse’s approx.9010 + 3230170.3
Sendrier [17]Bin2CW, originalIntel Pentium 4240018.399.50%
Bin2CW, approximate22.093.84%
(d) n = 218, t = 9
This workBin2CW, new approx.Xilinx xc6vlx240t1380 + 1410295.597.31%
CW2Bin, new approx.1180 + 1320219.5
Heyse et al. [19]Bin2CW, Heyse’s approx.Xilinx xc6vlx240t9440 + 3170106.092.81%
CW2Bin, Heyse’s approx.9340 + 3180104.9
(e) n = 220, t = 8
This workBin2CW, new approx.Xilinx xc6vlx240t1560 + 1370284.997.31%
CW2Bin, new approx.1220 + 1300222.7
Heyse et al. [19]Bin2CW, Heyse’s approx.Xilinx xc6vlx240t124160 + 313096.676.07%
CW2Bin, Heyse’s approx.125160 + 313098.8

Table 5.

Compact implementations of CW encoder and decoder on Xilinx Virtex-6 FPGA.

Sendrier implemented a different but very close parameter set n = 211, t = 30. We also put it here for reference.


We also observe that in our designs, the memory footprint does not increase, and the high clock frequency also maintains as the parameters grow. This is because the main difference among encoders or decoders with distinct parameters nand tis the data width of multiplier embedded in the best_d module, which increases logarithmically from 10bit×5bitto 20bit×4bit. On the other hand, the memory overhead of Heyse’s implementations grows linearly with nand might introduce problems when nis large as aforementioned. To verify this argument, we re-implemented Heyse’s work for n=216t=9, n=218t=9, and n=220t=8. The experimental results validate this point. Additionally, another negative side effect of heavy memory overhead is that the working frequency of circuits drops rapidly as shown in Table 6. For small parameters (a) and (b), the lookup table in Heyse’s design could be made of distributed memory (LUT) and therefore has little impact on frequency. However, for large parameters (c), (d), and (e), such lookup table can no longer be instantiated as LUTs because Xilinx Virtex-6 distributed memory generator only allows maximum data depth of 6,5536. We instead use block memory resource of the FPGAs to construct the table, and this accordingly hinders speed performance due to relatively far and complicated routing. The usage of block memory is the real bottleneck of Heyse’s work as ngrows.

Aspect (Virtex6-VLX240)Niederreiter [12]This work
Slices315183
LUTs926505
FFs875498
BRAMs1718*
Frequency300 MHz340 MHz
CW encode e = Bin2CW(m)≈200 cycles349.1 cycles
Encrypt c=eĤ≈200 cycles352.1 cycles

Table 6.

FPGA implementation results of Niederreiter encryption with n=2048,t=27compared with [19] after PAR.

We used 16×36kb RAMs and 2×16kb RAMs.


We finally implemented the Niederreiter encryptor, a cryptographic application where constant weight coding is used exactly as described in Section 5. Table 6 compares our work with the state of art [19, 26]. It is seen that our new implementation is the most compact, with better area-time tradeoffs. The same amount of block memory is occupied in our design as [19] did where 16×36kb + 1×18kb RAMs are utilized to save the public-key matrix Ĥand one 18 kb RAM for the 8-to-1 FIFO within the constant weight encoder.

7. Conclusion

A new approach for determining the optimal value d in constant weight coding is proposed in this chapter. This method innovates a more compact yet efficient architecture for constant weight encoder and decoder in resource-constrained computing systems. Afterward, we exploited this new encoder to implement the Niederreiter encryptor on a Xilinx device. Experiments show that our work competes for the state of art and works better in terms of both RAM usage and processing throughput for large parameters.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite and reference

Link to this chapter Copy to clipboard

Cite this chapter Copy to clipboard

Jingwei Hu and Ray C.C. Cheung (October 31st 2018). A New Approximation Method for Constant Weight Coding and Its Hardware Implementation, Recent Advances in Cryptography and Network Security, Pinaki Mitra, IntechOpen, DOI: 10.5772/intechopen.75041. Available from:

chapter statistics

221total chapter downloads

More statistics for editors and authors

Login to your personal dashboard for more detailed statistics on your publications.

Access personal reporting

Related Content

This Book

Next chapter

Protocol for Multiple Black Hole Attack Avoidance in Mobile Ad Hoc Networks

By Rushdi A. Hamamreh

Related Book

Frontiers in Guided Wave Optics and Optoelectronics

Edited by Bishnu Pal

First chapter

Frontiers in Guided Wave Optics and Optoelectronics

By Bishnu Pal

We are IntechOpen, the world's leading publisher of Open Access books. Built by scientists, for scientists. Our readership spans scientists, professors, researchers, librarians, and students, as well as business professionals. We share our knowledge and peer-reveiwed research papers with libraries, scientific and engineering societies, and also work with corporate R&D departments and government entities.

More About Us