A Continuous-Time Recurrent Neural Network for Joint Equalization and Decoding – Analog Hardware Implementation Aspects

Equalization and channel decoding are “ traditionally ” two cascade processes at the receiver side of a digital transmission. They aim to achieve a reliable and efficient transmission. For high data rates, the energy consumption of their corresponding algorithms is expected to become a limiting factor. For mobile devices with limited battery ’ s size, the energy consumption, mirrored in the lifetime of the battery, becomes even more crucial. Therefore, an energy-efficient implementation of equalization and decoding algorithms is desirable. The prevailing way is by increasing the energy efficiency of the underlying digital circuits. However, we address here promising alternatives offered by mixed (analog/digital) circuits. We are concerned with modeling joint equalization and decoding as a whole in a continuous-time framework. In doing so, continuous-time recurrent neural networks play an essential role because of their nonlinear characteristic and special suitability for analog very-large-scale integration (VLSI). Based on the proposed model, we show that the superiority of joint equalization and decoding (a well-known fact from the discrete-time case) preserves in analog. Additionally, analog circuit design related aspects such as adaptivity, connectivity and accuracy are discussed and linked to theoretical aspects of recurrent neural networks such as Lyapunov stability and simulated annealing.


Introduction
Energy efficiency has been increasingly attracting more interest due to economical and environmental reasons. Mobile communications sector has currently a share of 0.2% in global carbon emissions. This share is expected to double between 2007 and 2020 due to the ever-increasing demand for wireless devices [1,2]. The sustained interest in higher data rate transmission is strengthening this impact. While major resources are being invested in increasing the energy efficiency of digital circuits, there is, on the other hand, a growing interest pointing at alternatives to the digital realization [3], including a mixed (analog/digital) approach. In such an approach, specific energy consuming (sub)tasks are implemented in analog instead of a "conventional" digital realization. The analog implementation possesses a high potential to significantly improve the energy efficiency [4] because of the inherent parallel processing of signals that are continuous in both time and amplitude. This has been shown in the field of error correction coding with a focus on decoding of low-density parity-check (LDPC) codes. Our ongoing research on equalization reveals similar results. We do not intend "analog" for linear signal processing with all its disadvantages like component inaccuracies and susceptibility to noise and temperature dependency [5] but for nonlinear processing instead. The work of Mead [6] and others on Neuromorphic analog very-large-scale integration (VLSI) has shown that "analog signal processing systems can be built that share the robustness of digital systems but outperform digital systems by several orders of magnitude in terms of speed and/or power consumption" [5].
The nonlinearity makes the analog implementation of an algorithm as robust as its digital counterpart [3,5]. This profits from the match between the needed nonlinear operations for the algorithm and the physical properties of analog devices [7].
The capability of artificial neural networks (in the following neural networks) to successfully solve many scientific and engineering tasks has been shown oftentimes. Moreover, mapping algorithms to neural network structures can simplify the circuit design because of the regular (and repetitive) structure of neural networks and their limited number of well-defined arithmetic operations. Digital implementations can be considered precise (reproducibility of results under similar circumstances) but accurate (closeness of a result to the "true" value) only to the extent to which they have enough digits to represent [8]. This means, accuracy in digital implementations is achieved at the cost of efficiency (e.g., relatively larger chip area and more power consumption) [9]. An analog implementation is usually efficient in terms of chip area and processing speed [9], however, at the price of an inherent lack of the reproducibility of results [8] (because of a limited accuracy of the network components as an example [9]). However, by exploiting the distributed nature of neural structures the precision of the analog implementation can be improved despite inaccurate components and subsystems [8] 1 . In other words, it is the distributed massively parallel nonlinear collective behavior of an analog implementation (of neural networks) which offers the possibility to make it as robust as its digital counterpart but more energy efficient 2 (additionally to smaller chip area). Particularly for recurrent neural networks (the class we focus on when considered as nonlinear dynamical systems), the robustness can be additionally achieved by exploiting "attracting" equilibrium points. In the light of this discussion, we map in this chapter a joint equalization and decoding algorithm into a novel continuous-time recurrent neural network structure. This class of neural networks has been attracting a lot of interest because of their widespread applications. They can be either trained for system identification [10], or they can be considered as dynamical systems (dynamical solver). In the latter case, there is no need for a computationally complex and time-consuming training phase. This relies on the ability of these networks (under specific conditions) to be Lyapunov stable.
Equalization and channel decoding (together, in the following detection) are processes at the receiver side of a digital transmission. They aim to provide a reliable and efficient transmission. Equalization is needed to cope with the interference caused by multipath propagation, multiusers, multisubchannels, multiantennas and combinations thereof [11]. Channel (de)coding is applied for further improving the power efficiency. Equalization and decoding are nonlinear discrete optimization problems. The optimum solutions, in general, are computationally very demanding. Therefore, suboptimum solutions are applied, often soft-valued iterative schemes because of their good complexity-performance trade-off.
For high data rates, the energy consumption of equalization and decoding algorithms is expected to become a limiting factor. The need for floating-point computation and the nonlinear and iterative nature of (some of) these algorithms revive the option of an analog electronic implementation [12,13], embedded in an essentially digital receiver. This option has been strengthened since the emergence of the "soft-valued" computation in this context [4] since soft-values are a natural property of analog signals. In contrast to analog decoding, analog equalization did not attract that amount of attention.
Furthermore, joint equalization and decoding (a technique where equalizer and decoder exchange their local available knowledge) further improves the efficiency of the transmission as an example in terms of lower bit error rates, however, at the cost of more computational complexity [14]. Most of the work related to joint equalization and decoding is limited to the discrete-time realization. One of the very few contributions focusing on continuous-time joint equalization and decoding is given in [13]. The consideration in [13] is not "neural networksbased". Stability and convergence are observed but not "deeply" considered.
We introduce in this chapter a novel continuous-time joint equalization and decoding structure. For this purpose, continuous-time single-layer recurrent neural networks play an essential role because of their nonlinear and recursive characteristic, special suitability for analog VLSI and since they serve as promising computational models for analog hardware implementation [15]. Both, equalizer and decoder are modeled as continuous-time recurrent neural networks. An additional proper feedback between equalizer and decoder is established for joint equalization and decoding. We also review individually, both continuous-time equalization and continuous-time decoding based on recurrent neural network structures. No training is needed since the recurrent neural network is serving as a dynamical solver or a computational model [15,16]. This means, transmission properties are used to define the recurrent neural network (number of neurons, weight coefficients, activation functions, etc.) such that no training is needed. In addition, we highlight challenges emerging from the analog hardware implementation such as adaptivity, connectivity and accuracy. We also introduce our developed circuit for analog equalization based on continuous-time recurrent neural networks [3]. Characteristic properties of recurrent neural networks such as stability and convergence are addressed too. Based on the introduced model, we show by simulations that the superiority of joint equalization and decoding can be preserved in the analog "domain".
The main motivation for performing joint equalization and decoding in analog instead of using conventional digital circuits is to improve the energy efficiency and to minimize the area consumption in the VLSI chips [17]. The proposed continuous-time recurrent neural network serves as a promising computational model for analog hardware implementation.
The remainder of this chapter is organized as follows: In Section 2, we describe the block transmission model. Sections 3 and 4 are dedicated to the equalization process, the application of continuous-time recurrent neural networks and the analog circuit design and its corresponding performance and energy efficiency. Sections 5 and 6 are devoted to the channel decoding and the application of continuous-time recurrent neural networks for belief propagation (a decoding algorithm for LDPC codes). For both equalization and decoding cases, analog hardware design aspects and challenges and the behavior of the continuous-time recurrent neural network as a dynamical system are discussed. The continuous-time joint equalization and decoding based on recurrent neural networks is presented in Sections 7 and 8. Simulation results are shown in Section 9. We finish this chapter with a conclusion in Section 10.
Throughout this chapter, bold small and bold capital letters designate vectors (or finite discrete sets) and matrices, respectively. 3 All nonbold letters are scalars. diag m {B} returns the matrix B where the nondiagonal elements are set to zeros. diag υ fbg returns a matrix where the vector b is put on the diagonal. 0 N is the all-zero vector of length N. 0, 1 and I represent the all-zero, allone and the identity matrix of suitable size, respectively. We consider column vectors. ðÁÞ H represents the conjugate transpose of a vector or a matrix, whereas ðÁÞ T represents the transpose. z r ¼ ℜðzÞ, z i ¼ ℑðzÞ returns the real and imaginary part of the complex-valued argument . t and l are designated to the continuous-time variable and the discrete-time index, respectively.

Block transmission model
The block transmission model for linear modulation schemes is shown in Figure 1. For details, see [18]: • SRC (SNK) represents the digital source (sink). SRC repeatedly generates successive streams of k bits, i.e., q 1 , q 2 , ⋯ ,q M .
• q ðqÞ ∈ {0; 1} k is the vector of source (detected) bits of length k.
• q c ∈ {0; 1} n is the vector of encoded source bits of length n > k. For an uncoded transmission q c ¼ q (and thus k ¼ n).
• COD performs a bijective map from q to q c where n > k (adding redundancy). We consider in this chapter binary LDPC codes. Only 2 k combinations of n bits out of overall 2 n combinations are used. The set of the 2 k combinations represent the code book C. r c ¼ k=n is the code rate.
• x ∈ ψ N is the transmit vector of length N.
• N is the block size. Successive transmit vectors are separated by a guard time to avoid interference between different blocks. Thus, Figure 1 describes the transmission for a single block and stays valid for the next block (possibly with a different R).
• ψ ¼ fψ 1 ; ψ 2 ;…; ψ 2 m g, m ∈ ℕ=f0g is the symbol alphabet. There exist 2 mÁN possible transmit vectors. The set of all possible transmit vectors is χ. The mapping from q c to x is performed by M. Each symbol ψ represents m bits. A special class of symbol alphabets are the socalled separable symbol alphabet ψ ðsÞ [19,20].
•x is the receive vector of length N. In generalx ∈ ℂ N .
• We distinguish: -For an uncoded transmission M · k ¼ m · N.
-For a coded transmission and N < n=m: One codeword lasts over many transmit blocks.
-For a coded transmission and N ¼ n=m: One codeword lasts exactly over a single transmit block.
-For a coded transmission and N ¼ M · n=m: M codewords are contained in a single transmit block.
• R ¼ {r ij : i; j∈{1; 2;⋯; N}} is the block transmit matrix of size N · N. R is hermitian and positive semidefinite. The block transmit matrix R contains the whole knowledge about the transmission scheme (transmit and receive filters) and the physical propagation channel between transmitter(s) and receiver(s) [18]. •ñ is a sample function of an additive Gaussian noise vector process of length N with zero mean and covariance matrix Φññ ¼ N0 2 Á R where N0 2 is the double-sided noise power spectral density.
• DET is the detector including equalization and decoding.
The model in Figure 1 is a general model and fits to different transmission schemes like orthogonal frequency division multiplexing (OFDM), code division multiple access (CDMA), multicarrier CDMA (MC-CDMA) and multiple-input multiple-output (MIMO). The relation with the original continuous-time (physical) model can be found in [11,18]. The model in Figure 1 can be described mathematically as follows [11]: By decomposing R into a diagonal part R d = diag m {R} and a nondiagonal part R \d = R−R d , Eq. (1) can be rewritten as: x For the j-th element of the receive vector j ∈ {1; 2;⋯;N} Eq. (2) can be expressed as We notice from Eqs. (2), (3) that the nondiagonal elements of R describe the interference between the elements of the transmit vector at the receiver side. For interference-free transmission R \d = 0. For an interference-free transmission over an additive white Gaussian noise (AWGN) channel R = I. Figure 2 shows the channel matrix for a MIMO transmission scheme for different number of transmit/receive antennas. Figure 3 shows the channel matrix for OFDM with/without spreading. Figure 4 shows the channel matrix for MIMO-OFDM. In Figures 2-4, the darker the elements, the larger the absolute values of the entries of the corresponding matrix R, and hence larger the interference [21].
Remark 1. For a clear distinction between channel matrix and block transmit matrix, we refer to [11,18]. Generally speaking, the block transmit matrix R is a block diagonal matrix of "many" channel matrices.
The detector DET in Figure 1 has to deliver a vectorq with a minimum bit error rate compared to q (conditional to the available computational power) given that COD, M and R are known at the receiver side. The optimum detection (maximum likelihood detection) for realistic cases is often infeasible. Therefore, suboptimum schemes are used, mainly based on separating the detection into an equalization EQ (to cope with interference caused by R \d ) and a decoding DEC (to utilize the redundancy added by COD). In this case, we distinguish between separate   and joint equalization and decoding, cf. Figure 5. The superiority of the latter one is widely accepted: The separate equalization and decoding as in Figure 5(a) in general leads to a performance loss since the equalizer does not utilize the knowledge available at the decoder [14]. Each of the components DET, EQ and DEC can be seen as a pattern classifier. By separating the detection into equalization and decoding, an optimum detection in general cannot be achieved anymore (even if optimum equalization and optimum decoding are individually applied). Nevertheless, this is a common practice. Figure 5 is a hard decision function. For a coded transmission, DECI is a unit step function. For an uncoded transmission, COD and DEC are removed from Figure 1 and Figure 5, respectively. DECI in this case is a stepwise function depending on the symbol alphabet ψ which maps the (in general complex-valued) elements of the equalized vector x to the vector of detected symbolsx ∈ ψ N cf. Figure 8. The map fromx toq is then straightforward. In summary

Vector equalization
For an uncoded transmission, the detection DET reduces to a vector equalization EQ as shown in Figure 6.
The optimum vector equalization rule (the maximum likelihood one) is based on the minimum Mahalanobis distance and is given as [21] x ML ¼ arg min For each receive vectorx, the optimum vector equalizer calculates the Mahalanobis distance Eq. (4) to all possible transmit vectors χ of cardinality 2 mÁN and decides in favor of that possible transmit vectorx ML with the minimum Mahalanobis distance to the receive vectorx, i.e., exhaustive search is required in general. This can be performed for small 2 mÁN which is usually not the case in practice. Therefore, suboptimum equalization schemes are applied, which tradeoff performance against complexity.

Continuous-time single-layer recurrent neural networks for vector equalization
The dynamical behavior of continuous-time single-layer recurrent neural networks of dimension N′, abbreviated in the following by RNN 4 , is given by the state-space equations [22]: In Eq. (5), Υ e is a diagonal and positive definite matrix of size N′ the weight matrices. The real-valued RNN (all variables and functions in Eq. (5) are realvalued) is shown in Figure 7, which is known as "additive model" or "resistance-capacitance model" [23]. In this case, w jj ′ ¼ Rj R jj ′ is the weight coefficient between the output of the j′-th neuron and the input of the j-th neuron, w j0 ¼ Rj Rj0 is the weight coefficient of the j-th external input. We also notice that the feedback W Á v in Eq. (5) and Figure 7 is a linear function of the output v. Moreover, Υ e can be given in this case as As a nonlinear dynamical system, the stability of the RNN is of primary interest [16]. This has been proven under specific conditions by Lyapunov's stability theory in [24] for real-valued RNN and in [22,25] for complex-valued ones, among others. The RNN in Eq. (5) represents a general purpose structure. Based on N′, ϕ, W, W 0 a wide range of optimization problems can be solved. First and most well-investigated applications of the RNN include the content addressable memory [24,26], analog-to-digital converter (ADC) [27] and the traveling salesman problem [28]. In all these cases, no training is needed since the RNN is acting as a dynamical solver. This feature is desirable in many engineering fields like signal processing, communications, automatic control, etc., and has first been exploited by Hopfield in his pioneering work [24,29], where information has been stored in a dynamically stable RNN. We focus in the following on the vector equalization.
Remark 2. The dimension of a real-valued RNN is the same as the number of neurons.
Remark 3. Two real-valued RNNs each of N′ neurons are required to represent one complexvalued RNN (with dimension N′). This is possible by separating Eq. (5) into real and imaginary parts. However, this doubles in general the number of connections per neuron (and hence the number of multiplications) because of the required connections (represented by W i ) between the two real-valued RNNs as it can be seen from the following equation: Figure 7. Continuous-time single-layer real-valued recurrent neural network. v(t) is the output, u(t) is the inner state, e is the external input and ϕ(Á) is the activation function. This model is known as "additive model" or "resistance-capacitance model" [23].
Υ e in this case is a diagonal positive definite matrix of size 2 · N′ + 2 · N′ and

A. Vector equalization based on RNN
The usage of the RNN for vector equalization became known for multiuser interference cancellation in CDMA environments [30,31]. However, this was limited to the binary phaseshift keying (BPSK) symbol alphabet ψ = {−1, +1}. This has been generalized to complex-valued symbol alphabets in [21] by combining the results of references [20,22,32] 5 . Based thereon, it has been proven that the RNN ends in a local minimum of Eq. (4) if the following relations are fulfilled [21], cf. Eqs. (1), (2), (5) and Figures 6 and 7.
and thereforex ¼ DECIðvÞ. Figure 8 shows an example of an eight quadrature amplitude modulation (8 QAM) symbol alphabet and its corresponding DECI function. The relations in Eq. (7) are obtained by the comparison between the maximum likelihood function of the vector equalization and the Lyapunov function of the RNN.
The dynamical behavior of the vector equalization based on RNN can be given as, cf. Eqs. (1), (5), (7) The locally asymptotical stability of Eq. (8) based on Lyapunov functions has been proved in [21] (based on [22]) for separable symbol alphabets ψ (s) . When Eq. (8) reaches an equilibrium point u ep , i.e., duðtÞ (8) can be rewritten as If additionally, a correct equalization is achieved, i.e., x ep ¼ x, the inner state is Thus, the RNN as vector equalizer, Eq. (8) acts as "analog dynamical solver" and there is no need for a training. The covariance matrix of n e is Φ nene ¼ N0 In Eq. (7), θ (opt) (·) is the optimum activation function and depends on the symbol alphabet ψ.
B. Analog hardware implementation aspects: equalization The analog signal processing as a matter of topical importance for modern receiver architectures was recognized in [34], where an analog vector equalizer-designed in BiCMOS technology-was considered as a promising application for the analog processing of baseband signals. The equalizer accepts sampled vector symbols in analog form with an advantage that the equalizer does not require an ADC at the input interface. At very high data rates, the exclusion of an ADC softens the trade-off between chip area requirement and overall power consumption. We discuss in the following section the main features/challenges of the analog implementation of the vector equalizer based on RNN.
Structure: An RNN of dimension N′ (in general 2 ·N′ neurons) is capable to act as a vector equalizer as long as the block size at the transmitter side N (over all possible symbol alphabets, coding schemes and block sizes) is as maximum as N′, i.e., N ≤ N′.
Activation function: The definition of the optimum activation function θ (opt) (·) is not general, but depends on the symbol alphabet under consideration. Different symbol alphabets need different activation functions. However, we have proven in [20] that for square QAM symbol alphabets-the most relevant ones in practice-θ (opt) (·) can be approximated as a sum of a limited number of shifted and weighted hyperbolic tangent functions. Square QAM symbol alphabets are separable ones, cf. Remark 4. The analog implementation of the hyperbolic tangent well befits the large-signal transfer function of transconductance stages based on bipolar differential amplifiers [3,34].
Adaptivity: A vector equalizer must be capable to adapt to different and time-variant interference levels. The adaptivity is regulated by the measurement of the block transmit matrix R, a task performed by a "channel estimation unit" (CEU). The weight matrices W and W 0 are then computed as in Eq. (7) and forwarded to the RNN ( Figure 9). Thus, the weight matrices W and W 0 are not the outcome of any training algorithm but related directly to R, cf. Eq. (7). This represents a typical example for the mixed-signal integrated circuit, where the weight coefficients are (obtained and) stored digitally, converted into analog values, later used as weight coefficients for the analog RNN [8].
For the j-th neuron in the additive model Figure 7, the ratio between two resistors R j and R jj′ (R j and R j0 ) is used to configure each weight coefficient w jj′ (w j0 ). According to the additive Figure 9. Uncoded block transmission model. The detection reduces to a vector equalization EQ. The channel estimation unit (CEU) estimates the block transmit matrix R. model, R jj′ and R j0 can assume both positive and negative values, and the absolute value theoretically extends from R j to infinite (for w jj′ ∈ [−1, +1]). This puts serious limitations to the direct implementation of the model. In [3], we showed how this difficulty can be overcome by using a Gilbert cell as a four-quadrant analog multiplier. A Gilbert cell [35] is composed of two pairs of differential amplifiers with cross-coupled collectors, and is controlled by a differential voltage input G ji applied at the base gate of the transistors. When biased with a differential tail current I ji ¼ I þ ji −I − ji , the differential output current I ji;w ¼ I þ ji;w − I − ji;w is a fraction w of the tail current I ji , as a function of the input voltage G ji : Accuracy: Locally asymptotical Lyapunov stability can be guaranteed for the RNN in Eqs. (5), (8) if, among others, the hermitian property is verified for the weight matrix W (the symmetric property in the real-valued case). Inaccuracies in the weights' representation may jeopardize the Lyapunov stability and impact the performance of the vector equalizer. The first cause of weights' inaccuracy may arise from the limited accuracy of the analog design in terms of components' parasitics, devices' mismatch, process variation, just to name a few. Those inaccuracies (if modest) are expected to slightly degrade the performance without causing a catastrophic failure, thanks to the high nonlinearity of the equalization algorithm. Moreover, it has been shown in [8,36] that in some cases, they produce beneficial effects: These imperfections incorporate some kind of simulated annealing which enables escaping local minima by allowing occasionally "uphill steps" since the Lyapunov stable RNN is a gradient-like system. This feature is emulated in discrete-time by stochastic Hopfield networks [23]. Non-precision of the weights may also arise from an insufficient resolution of the digital-to-analog converter (DAC) (Figure 9). On the other hand, an overzealous DAC design increases the chip area, the power consumption and adds complexity to the interface between the analog vector equalizer and the digital CEU. In this case, a conservative approach suggests to use a DAC with enough resolution to match the precision used by the CEU.
Interneuron connectivity and reconfigurability: Scaling the architecture of an analog VLSI design is not straightforward. A vector equalizer based on recurrent neural networks is composed by the repetition of equal sub-systems, i.e., the neurons. Using a bottom-up approach, the first step to scale the system involves the redesign of the single neuron in order to handle more feedback inputs. In a successive step, the neurons are connected together and a systemlevel simulation is performed to check the functionality of the system. However, several design choices must be made during the process and it is not guaranteed that the optimum architecture for a certain number of neurons is still the best choice when the number of neurons changes. For large N, the block transmit matrix R, defining the weight matrix W, is usually sparse. If a maximum number of nonzero elements over the rows of R is assumed, the requirement for a full connectivity between the neurons in Figure 7 can be relaxed, and only a maximum number of connections per neuron will be necessary. In this case, however, in addition to the "adaptivity", the RNN must be reconfigured according to the position of the nonzero elements in R. The hardware simplification given by the partial connectivity may be counterbalanced by the necessity of a further routing (e.g., multiplexing/demultiplexing) of the feedback. For special cases, where the block transmit matrix can be reordered around the diagonal, more independent RNNs can be simply used in parallel. In Figures 3(b) and 3(c), four independent RNNs, each of dimension four, can be used in parallel. Additionally, for specific transmission schemes such as MIMO-OFDM in Figure 4, the connectivity can be assumed limited (number of transmit antennas minus one) and fixed (crosstalk only between same subcarriers, when used simultaneously on different transmit antennas).
Example 1. In Figure 4, eight RNNs (number of subcarriers) each of dimension of three (number of transmit antennas) can be used in parallel. Each neuron has two feedback inputs.

C. Circuit design
We review here the main features of the analog circuit design of an RNN as vector equalizer working with the BPSK symbol alphabet and composed of four neurons. Detailed explanation can be found in reference [3]. The RNN is realized in IHP 0.25 μm SiGe BiCMOS technology (SG25H3). A simplified schematic of a neuron is shown in Figure 10. Schematics of gray boxes are presented in Figure 11.
The dynamical behavior of the circuit in Figures 10 and 11 is described as [3] ϒ which is equivalent to Eq. (5). τ = R · C is the time constant of the circuit. R is shown in Figure 10 and C is a fictitious capacitance between the nodes and u þ j and u − j .V t is the thermal voltage and I t is the tail current in Figure 11. The circuit is fully differential and the differential currents and voltages are denoted as, cf. Figures 10 and 11: (1) Performance: Simulation results based on the above described analog RNN are shown in Figure 12. The interference is described by the channel matrix R test . The black dashed line shows the bit error rate (BER) for a BPSK symbol alphabet in an AWGN channel (an interference-free channel). Performance achieved by the maximum likelihood Figure 10. A simplified schematic of a single neuron as a part of a (four neurons) RNN analog vector equalizer. u ′ j is the inner state, e ′ j is the external input and G ji is used for adapting the weight coefficient w ji from the output of the i-th neuron to the input of the j-th neuron. The circuit is fully differential [3]. Figure 11. Details of the circuit building blocks. Gilbert cell used as a four-quadrant analog multiplier, buffer stages, BJT differential pairs for the generation of the hyperbolic tangent function and a metal-oxide-semiconductor field-effect transistor (MOSFET) switch used as a sequencer [3]. algorithm in Eq. (4) is included as a solid black line. The performance of the analog RNN vector equalizer 6 is presented in a solid red line with square markers. Compared to the optimum algorithm, the signal-to-noise ratio (SNR) loss for the analog RNN vector equalizer can be quantified in approximately 1.7 dB at a BER of 10 −4 . This loss in SNR emphasizes the suboptimality of the RNN as vector equalizer and depends on the channel matrix. Figure 13 shows an example of a transient simulation for the analog RNN vector equalizer. The time constant is approximately τ = 40 ps. The SNR ratio is set to 2 dB and a series of three receive vectors are equalized in sequence. Because of the channel matrix and noise, the sampled vectors at the input of the equalizerx present different signs and values, compared to the sent   vectors x (shown in square brackets). The equalization of each receive vector lasts 10 Á τ. First half of this interval (evolution time) is used to reach a stable state, while the second half of the interval (reset time) is used to return to a predefined inner state (all-zero state) before the equalization of a new vector starts. At the end of the evolution time, a decision is made based on the sign of the output vector (the decision function DECI for BPSK is a sign function). In our example, a comparison between the sent and the recovered bits shows an error of one bit out of twelve, equivalent to a BER≈ 1 12 , a result in line with the BER shown in Figure 12.
Remark 5. The evolution and reset times are the two limiting factors for the maximum throughput of the analog RNN vector equalizer. However, they cannot be unlimitedly minimized since the RNN needs a minimum evolution time to reach an equilibrium point representing a local minimum of the Lyapunov function, i.e., a local minimum of Eq. (4).
(2) Energy efficiency: The energy efficiency of a hardware "architecture" is the ratio between the power requirement (Watt) of the architecture and its achievement in a given time period. In our case, the throughput of the equalizer represents the achievement. Combining the value of τ and the power consumption, the abovementioned analog vector equalizer is expected to win the competition versus common digital signal processing, thanks to three to four orders of magnitude better energy efficiency [3].

Channel coding
Channel coding (including encoding at the transmitter side COD and decoding at the receiver side DEC) aims to enable an error-free transmission over noisy channels with maximum possible transmit rate. This is done by adding redundancy (extra bits) at the transmitter side, i.e., the bijective map from q to q c (Figure 14,) such that the codewords q c are sufficiently distinguishable at the receiver side even if the noisy channel corrupts some bits during the transmission. Figure 14 shows a coded transmission over an AWGN channel.
For every received codeword, the optimum decoding (the maximum likelihood one) needs to calculate the distance between the received codeword and all possible codewords C, which makes it infeasible for realistic cases (except for convolutional codes which are not considered here). We focus on binary LDPC codes and their corresponding suboptimum decoding algorithm: the belief propagation with BPSK symbol alphabet. LDPC codes [37] belong to the class of binary linear block codes and have been shown to achieve an error rate very close to the Shannon limit (a performance lower bound) for the AWGN channel and have been implemented in many practical systems such as the satellite digital video broadcast (DVB-S2) Figure 14. Coded transmission over an BER channel. [38]. A binary linear block code is characterized by a binary parity check matrix H of size (n − k) + n for n > k.

Continuous-time single-layer high-order recurrent neural networks for belief propagation
One of the largest drawbacks of RNNs is their quadratic Lyapunov function [39]. Optimization problems associated with cost functions of higher degree cannot be solved "satisfactorily" by RNNs. Increasing the order of the Lyapunov function leads to a nonlinear feedback in the network. In doing so, we obtain the single-layer high-order recurrent neural network, named differently in literature, depending on the nonlinear feedback [39][40][41][42].
Remark 6. High-order recurrent neural networks are in the literature exclusively real-valued. Figure 15 shows the continuous-time single-layer high-order recurrent neural network, abbreviated in the following by HORNN 7 .
The dynamical behavior is given by The parameters in Eq. (16) can be linked to Figure 15 in the same way as Eq. (5) linked to Figure 7. f ð vÞ is a real-valued continuously differentiable vector function. In addition, f ð0 n Þ ¼ 0 n . It is worth mentioning that the term "high-order" in this case refers to the interconnections between the neurons rather than the degree of the differential equation describing the dynamics. As for RNNs, this is still of first order, cf. Eq. (16).

Remark 7.
In the special case f ð vÞ ¼ v, the HORNN reduces to the (real-valued) RNN.
In order to apply HORNNs to solve optimization tasks, their stability has to be investigated. A property without which the behavior of dynamical systems is often suspected [39]. This was the topic of many publications [39][40][41][42]. A common denominator of the locally asymptotical stability proof of the HORNN based on Lyapunov functions is • ϕ( Á ) is continuously differentiable and a strictly increasing function.
• The right side of the first line of Eq. (16) can be rewritten as a gradient of a scalar function. 7 The abbreviation HORNN in this chapter inherently includes the continuous-time, single-layer and real-valued properties.

A. Belief propagation based on HORNN
Originally proposed by Gallager [37], belief propagation is a suboptimum graph-based decoding algorithm for LDPC codes. The corresponding graph is bipartite (n parity nodes and nk check nodes) and known as Tanner graph [43]. This is shown in Figure 16 for the Hamming code with the parity check matrix H Hamming Eq. (17) where n = 7, k = 4. The belief propagation algorithm iteratively exchanges "messages" between parity and check nodes.  For every binary linear block code characterized by the binary parity check matrix H of size (n − k) + n for n > k, three binary matrices P nh · nh , S nh · nh and B nh · n can be uniquely defined [44,45] such that Eq. (16) and Figure 15 In Eq. (18) 8 , • k is the length of the information word (q in Figures 1 and 14).
• n is the length of the codeword (q c in Figures 1 and 14).

ðn−kÞ is the number of nonzero elements in H.
• L ch;ðn · 1Þ is the vector of intrinsic log-likelihood ratio (LLR), which depends on the transition probability of the channel. For q c;j (the j-th element of q c for j ∈ {1; 2;⋯;n}) it is given as pð _ x j ¼x j jq c;j ¼ 1Þ: In the last relation, _ x j is the variable of the conditioned probability density function pð _ x j jq c;j Þ. ln(Á) is the natural logarithm. For an AWGN channel, N ð0, σ 2 n Þ : L ch;j ¼~x j 2Áσ 2 n . • L ðn h · 1Þ is the "message" sent from the variable nodes to the check nodes.
• f ðnh · 1Þ is the "message" sent from check nodes to variable nodes.
• I ðnh · nhÞ is an identity matrix of size n h · n h . The dynamical behavior of belief propagation can be described based on Eqs. (16), (18) and Figures 14, 15, and 16 [45] (20) LðtÞ is the soft-output of the decoding algorithm, cf. Figures 5 and 14. The discrete-time description is given as [44] L½l

B. Dynamical behavior of belief propagation
In a series of papers, Hemati et. al. [12,17,[46][47][48][49] also modeled the dynamics of analog belief propagation as a set of first-order nonlinear differential equations Eq. (20). This was motivated from a circuit design aspect, where ϒ d (the same is valid for ϒ e ) can be seen as a bandwidth limitation of the analog circuit, realized taking advantage of the low-pass filter behavior of transmission lines Figure 17. We have shown in [45] that the model in Figure 17 also has important dynamical properties when compared with the discrete-time belief propagation Eq. (21) [44]. Particularly, the equilibrium points of the continuous-time belief propagation of Eq. (20) coincide with the fixed points of the discrete-time belief propagation of Eq. (21). This has been proved in [45]. In both cases The absolute stability of belief propagation Eqs. (20), (21) was proven for repetition codes (one of the simplest binary linear block codes) in [44,45]. In this case Far away from repetition codes, it has been noticed that iterative decoding algorithms (belief propagation is one of them) exhibit depending on the SNR a wide range of phenomena associated with nonlinear dynamical systems such as existence of multiple fixed points, oscillatory behavior, bifurcation, chaos and transit chaos [50]. Equilibrium points are reached at "relatively" high SNR. The analysis in reference [50] is limited to the discrete-time case.

C. Analog hardware implementation aspects: decoding
Many analog hardware implementation aspects have been already mentioned in Section 4-B.
We mention here only additional aspects exclusively related to the analog belief propagation based on HORNN.
Structure: In practice, different coding schemes (different parity check matrices H) with various (k, n) constellations are applied to modify the code rate r c ¼ k=n depending on the channel state. The HORNN in Figure 15 is capable to act as a continuous-time belief propagation (decoder) as long as the number of neurons n in Figure 15 equals (or is larger than) the maximum number of nonzero elements over all parity check matrices and all (k;n) constellations, i.e., n ≥ max H n h .
Adaptivity: No training is needed. W 0 and W are directly related to the parity check matrix H. In contrast to the analog RNN vector equalizer, the weight coefficients are binary, i.e., the weight matrices W 0 and W define a feedback to be either existent or not. In such a case for Figure 15, R jj ′ , R j0 ∈ f R j ;∞g. Moreover, there is no need for high-resolution DAC for the weight coefficients.
Interneuron connectivity: No full connection is needed since the matrix P for LDPC codes is sparse. The number of connections per neuron must equal the maximum number of nonzero elements in P row-wise over all considered coding schemes and equals max H P Á 1 ðn h · 1Þ . If this is fulfilled and if interneuron connectivity control is available, the structure in Figure 15 becomes valid for all considered coding scheme.  Remark 9. For a specific coding scheme, the interneuron connectivity can be made fixed. The resulted HORNN structure in this case is valid also for all codeword lengths resulted after performing a puncturing of the original code.
Remark 10. Both, the interneuron connectivity and the weight adaptation play a significant role, in the equalization as well as in the decoding. It can safely be said that they represent the major challenge of the circuit, since the analog circuit must be capable to perform equalization and decoding for a given number of possible combinations of block size, symbol alphabet, coding scheme, etc. Particularly for the decoding, the advantage of having a non-full connectivity is counterbalanced by a double (and very complex) (de)multiplexing of the signals (once for the vector function f and once for the interneuron connectivity).

Joint equalization and decoding
Turbo equalization is a joint iterative equalization and decoding scheme. In this case, a symbolby-symbol maximum aposteriori probability (s/s MAP) equalizer exchanges in an iterative way reliability values L with a (s/s MAP) decoder [51,52]. This concept is inspired from the decoding concept of turbo codes, where two (s/s MAP) decoders exchange iteratively reliability values [53]. Despite its good performance, the main drawback of the turbo equalizer is the very high complexity of the s/s MAP-equalizer for multipath channels with long impulse response (compared with symbol duration) and/or symbol alphabets with large cardinality. Therefore, a suboptimum equalization (and a suboptimum decoding) usually replace the s/s-MAP ones ( Figure 18).
One discrete-time joint equalization and decoding approach has been introduced in [52] and is shown in Figure 19.x, R d and R are as in Eq. (2) and z −1 is a delay unit. We notice that there are two different (iteration) loops in Figure 19: the equalization loop (the blue one) on symbol basis (in the sense of ψ) and the decoding loop (the dashed one) on bit basis. a ¼ f1; 2; 3;⋯g; ρ ∈ ℕ, i.e., after each ρ equalization loops, one decoding loop is performed. The conversion between symbol basis and bit basis (u to L ch ) is performed by θ S=L ðÁÞ, the way around ( L to x) by θ L=S ðÁÞ. The expressions for θ L=S ðÁÞ and θ L=S ðÁÞ can be found in [52]. However, for BPSK, they are given as Figure 18. Two examples for joint equalization and decoding. Notice the feedback from the decoder to the equalizer, i.e., turbo principle. σ 2 is given in Eq. (11). If we consider only the equalization loop in Figure 19, we notice that it describes exactly the dynamical behavior of discrete-time recurrent neural networks [19,25,33,[54][55][56] u½l Remark 11. If θ L=S θ S=L ðuÞ ¼ θ ðoptÞ ðuÞ, Eqs. (8), (25) share the same equilibrium/fixed points.
For BPSK, it can be easily shown based on Eqs. (12), (24) that this is fulfilled.

Continuous-time joint equalization and decoding
Motivated by the expected improvement of the energy efficiency by analog implementation compared with the conventional digital one, we map in this section the joint equalization and decoding structure given in Figure 19 to a continuous-time framework. s/s MAP DEC in Figure 19 is replaced by a suboptimum decoding algorithm: the belief propagation. Moreover, equalization and decoding loops in Figure 19 are replaced by RNN and HORNN as discussed previously in Sections 4-A and 6-A, respectively. The introduced structure serves as a computational model for an analog hardware implementation and does not need any training. Figure 20 shows a novel continuous-time joint equalization and decoding based on recurrent neural network structures. The dynamical behavior of the whole system is described by the following differential equations: Figure 19. Joint equalization and decoding as described in [52]. L ext represents the "knowledge" obtained by exploiting the redundancy of the code.  • Eq. (26c) and Figure 20(b) describe the continuous-time belief propagation, cf. Eqs. (16), (18), (20).
Comparing Figure 20 with Figures 7 and 15, we notice that • The function ϕ(Á) in Figure 7 (the optimum activation function θ ðoptÞ ðÁÞ) has been split into two functions θ S=L ðÁÞ and θ L=S ðÁÞ in Figure 20

A. Special cases
The novel structure in Figure 20 is general and stays valid for the following cases: • Separate equalization and decoding: In this case, Figure 20(a) is modified such that no feedback from decoder to equalizer is applied. This is shown in Figure 21(a). Only at the end of the separate equalization and decoding process, the output is given as L ¼ L ext þ L ch . We distinguish between two cases 1. Equalization and decoding take place separately at the same time.
2. Successive equalization and decoding: only after the end of the equalization process, L ch are forwarded to the decoder and the decoder starts the evolution. We focus on this case.  • Uncoded transmission over an "interference-causing" channel: In this case, P ¼ 0, B ¼ 0 and Eq. (26c) becomes L ¼ 0 n h . Under these conditions, Eq. (26a) reduces to Eqs. (5), (7), (8) (notice, however, Remark 11).

B. Throughput, asynchronicity and scheduling
The diagonal elements in Υ d define the duration of the transient response the HORNN needs in order to converge eventually (in case of convergence). The larger they are, the longer is the transient response and consequently the less is the decoding throughput. The same is valid for Υ e . The diagonal elements of Υ e based on our analog RNN vector equalizer are in the range of a few tens of picoseconds.
Unequal diagonal elements in Υ e (and Υ d ) represent some kind of continuous-time asynchronicity [46]. Asynchronicity in discrete-time RNNs is desirable since it provides the ability to avoid limit cycles, which can probably occur in synchronous discrete-time RNNs [54,57].
Assuming Υ d ¼ τ d Á I and Υ e ¼ τ e Á I, we notice that the ratio τ e =τ d is comparable to the scheduling problem in the discrete-time joint equalization and decoding case. More precisely, how many iterations ρ within the equalizer should be performed before a decoding process takes place, cf. Figure 19. This is optimized usually by simulations and is case dependent.
From a dynamical point of view, the case τ e =τ d ≪ 1 (or τ d =τ e ≪ 1) could be seen as a singular perturbation (in time). In this case, one part of Figure 20 can be seen as "frozen" compared with the other part.
Remark 12. We notice that the parameters of the transmission model (block transmit matrix, symbol alphabet, block size, channel coding scheme) are utilized to define the parameters of the continuous-time recurrent neural network structure in Figure 20 such that no training is needed. This represents in practice a big advantage especially for analog hardware. However, to enable different coding schemes and symbol alphabets, either a full connectivity or a vector and interneuron connectivity controls are needed. Both structures are challenging from a hardware implementation point of view.
Remark 13. For the ease of depiction, Figures 20 and 21 assume that one transmitted block contains exactly one codeword. This is not necessarily the case in practice. As an example, if one transmitted block contains two codewords, one RNN and two parallel HORNNs will be needed. On the other hand, if one codeword lasts over two transmitted blocks, two parallel RNNs and one HORNN is needed.
• Multipath channels [60]: -Proakis-a abbreviated in the following by its channel impulse response h a leading to a small interference.
-Proakis-b abbreviated in the following by its channel impulse response h b leading to a moderate interference. The block transmission matrix R is a banded Toeplitz matrix of the autocorrelation function of the channel impulse response [61]. The following cases are considered: • Uncoded transmission over AWGN channel. The bit error rate can be obtained analytically and is given as n o [62]. erfcðÁÞ is the complementary error function and E b is the energy per bit.
• Coded transmission over AWGN channel and continuous-time decoding at the receiver (HORNN-belief propagation).
• Coded transmission over (the abovementioned) multipath channels. We distinguish between joint equalization and decoding ( Figure 20) and separate equalization and decoding ( Figure 21). In the latter case, equalization is performed firstly, and consequently the decoding.
The evolution time for the whole system in all cases is 20 Á τ, i.e., all simulated scenarios deliver the same throughput. For separate equalization and decoding, the evolution time of the equalization equals the evolution time of the decoding and equals 10 τ. The simulation results are shown in Figure 22. We notice the following: • Joint equalization and decoding outperforms the separate one, which is a fact we know from the discrete-time case. Our proposed model in Figure 20 is capable of "transforming" this advantage to the continuous-time case.
• For the channel h a , the BER (for continuous-time joint equalization and decoding) is close to the coded BER curve.
• For the channel h b , there exists a gap between the obtained results and the coded AWGN curve. This was expected, since h b represents a more severe multipath channel compared with h a .
• If only equalization performance is considered, we compare between "Uncoded & EQ" curves and "Uncoded BER" curves. In Figure 22(a), the vector equalizer based on continuous-time recurrent neural networks is capable to remove already all interferences caused by the multipath channel h a , whereas in Figure 22(b), the "Uncoded & EQ" curve approaches an error floor.

Remark 14.
Interleaving and antigray mapping often encountered in the context of iterative equalization and decoding can be easily integrated in the proposed model in Figure 20. Antigray mapping will influence the functions θ S=L ðÁÞ and θ L=S ðÁÞ, whereas interleaving affects the matrix B.

Conclusion
Joint equalization and decoding is a detection technique which possesses the potential for improving the bit error rates of the transmission at the cost of additional computational complexity at the receiver. Joint equalization and decoding is being considered only for the discrete-time case. However, for high data rates, the energy consumption of a digital implementation becomes a limiting factor and shortens the lifetime of the battery. Improving the energy efficiency revives the analog implementation option for joint equalization and decoding algorithms, particularly taking advantage of the nonlinearity of the corresponding algorithms.
Continuous-time recurrent neural networks serve as promising computational models for analog hardware implementation and stand out due to their Lyapunov stability (the proved existence of attracting equilibrium points under specific conditions) and special suitability for analog VLSI. They have often been applied for solving optimization problems even without the need for a training. The drop of the training is particularly favorable for analog hardware implementation.
In this chapter, we introduced a novel continuous-time recurrent neural network structure, which is capable to perform continuous-time joint equalization and decoding. This structure is based on continuous-time recurrent neural networks for equalization and continuous-time high-order recurrent neural networks for belief propagation, a well-known decoding algorithm for low-density parity-check codes. In both cases, the behavior of the underlying dynamical system has been addressed, Lyapunov stability and simulated annealing are a few examples. The parameters of the transmission system (channel matrix, symbol alphabet, block size, channel coding scheme) are used to define the parameters of the proposed recurrent neural network such that no training is needed.
Simulation results showed that the superiority of joint equalization and decoding preserves, if this is done in analog according to our proposed model. Compared with the digital implementation, the analog one is expected to improve the energy efficiency and consume less chip area. We confirmed this for the analog hardware implementation of the equalization part. In this case, the analog vector equalization achieves an energy efficiency of a few picojoule per equalized bit, which is three to four orders of magnitude better than the digital counterparts. Additionally, analog hardware implementation aspects have been discussed. We showed as an example the importance of the interneuron connectivity, especially pointing out the challenges represented either by the hardware implementation of a massively distributed network, or by the routing of the signals using (de)multiplexers.