In this chapter, a hardware processing architecture of real-time echo state network based on field-programmable gate array (FPGA) is proposed, which solves the problem that it is difficult to obtain the output weight of the network in real time. The design of this architecture strictly follows the reservoir calculation (RC) theory, and its five components are established in FPGA: input module, reservoir module, output module, training module, and system switch module. This paper implements the architecture in Altera FPGA chip and verifies it through the application of pattern recognition, waveform generation, and multiple-input multiple-output (MIMO) channel prediction. Experimental results show that the hardware-implemented real-time echo state network can identify the duty cycle of different input signals, generate floating-point waveforms, and predict the MIMO channel by training. In this paper, a real-time echo state network based on field programmable gate array is proposed, which has the advantages of fast computation speed, less resource consumption, and ideal simple task execution.
- pattern recognition
- waveform generation
- MIMO channel prediction
Echo state network  simplifies training tasks into linear regression tasks. It mainly solves the problems of large consumption of Recurrent Neural Network (RNN) training resources, long running time, and slow convergence. There are many studies on the applications of echo state network, such as wind power ramp time prediction [2, 3], medical image recognition classification , water flow prediction , etc. There are also many studies on the structure of the echo state network, such as the dynamic reservoirs that increase their stochastic properties  or delay characteristics , correlation entropy replaces traditional error function , change calculation model [9, 10], etc. Less research work on hardware platform implementation of neural networks, such as  proposed a software framework for simulating RNN circuits, [12, 13] proposed FPGA/software framework; however these frameworks are always trained in software such as MATLAB, which not be strictly said to be hardware implementation. The FPGA-based real-time echo state network structure proposed in this chapter trains the output weights on the FPGA platform without calculating the relevant parameters by means of software. In order to verify the performance of the proposed architecture, two types of benchmark tasks were performed: the output signal which was a binary signal  and a floating point number signal and a MIMO channel prediction task .
2. Theory and model
This section describes the mathematical model of the echo state network. The structural model is shown in Figure 1. Let the model have K input units whose vector form is
N reservoir units, the vector form is
L output units whose vector form is
where (•)T is the transpose, n is the discrete time, and the input/reservoir/output connection weight is represented by a weight matrix of size N × K/N × N/L×(K + N), i.e.
The output unit can select whether to feed back to the intermediate unit and the connection weight is represented by a feedback weight matrix of size N × L:
Intermediate cell status updates according to the formula
where u(n + 1) is the external given input at time n + 1, such as Eq. (1); ytarget(n) is the ideal output at time n, in the form of Eq. (2); and f represents the transfer function of the intermediate unit, mainly using the S function, but sometimes a linear network f = 1 is also used. Output calculation is based on
where (u(n + 1),x(n + 1)) represents the juxtaposition of the input and the intermediate state vector, as shown in the input layer to the output layer of the dashed arrow in Figure 1, in some applications, such as , where the data stream does not exist. That is, the output is calculated directly using the intermediate state value. The output transfer function is usually fout = tanh or fout = 1, depending on whether the output unit is nonlinear or linear. The output weight Wout is calculated according to the following formula:
where is the identity matrix, α is the regularization factor, is the set matrix of , is the ideal output set matrix, and is the matrix inversion.
3. Real-time FPGA echo state network structure
Real-time FPGA echo state network execution structure maps Eqs. (6)–(8) to six modules, which are input module, reservoir module, output module, training module, system switch module, and random number generator, as shown in Figure 2. The input module is a two-input single-output module, and the input is a random input weight Win generated by a random number generator and an external signal u(n + 1), performing a W in u(n + 1) multiplication operation, and encoding the input signal to form a data signal that can be calculated by the reservoir module. The reservoir module is a five-input single-output module that strictly performs the remainder of Eq. (6), the input including the encoded external signal from the input module, reservoir initial state value (this input is the state value of the previous clock reservoir module output as the operation progresses), network expected output, reservoir interconnection weights generated by the random number generator, and feedback. Connecting the weight, output high-dimensional state signal, the specific circuit is shown in Figure 3.
(Figure 3 circuit is the circuit that outputs the reservoir state x1, other state circuits are similar). The output module is a three-input single-output module, the input is an external input signal, the state value is obtained by the reservoir module calculation, and the output connection weight is calculated by the training module. The output is the final acquired network output. The function is to perform simple multiplication and addition on the input data and decode the result to form an output signal. The training module is a three-input single-output module, the input is an externally given desired output, a status signal is generated by the reservoir module, and an enable signal S is sent by the system switch module, When S = 0, the module stops working. When S = 1, the module works, and the output is the core parameter output connection weight of the echo state network. The function is expressed as Eq. (8). The specific implementation mechanism is shown in Figure 4 (in Eq. (8)), , , , ; then, the output weight is obtained; The system switch module is a two-input single-output module. The input is a network output signal and a desired output signal. The output is an enable signal for controlling the operation of the training module. Its function is mainly to judge the network performance. When the network output signal matches the expected output signal or the difference is within the receiving range, the output S = 0 and the other cases continue to output S = 1. A random number generator is used to generate inputs, reservoirs, and feedback weights.
The following sections detail how to train a real-time FPGA echo state network. As is well known, FPGAs implement digital systems that primarily process digital signals, providing a logic cell array that can be configured as a given function via a bitstream file. Its basic digital logic, the smallest programmable logic unit, is the logic gate. Therefore, FPGAs are the best device for performing echo state networks. The architecture is all implemented in the FPGA. On the one hand, the dynamic reservoir (i.e., the middle layer) is established. On the other hand, how to obtain the real-time output weights when the input signal is the digital signal “0” or “1” is established.
The dataflow and training process of the real-time FPGA echo state network structure will be briefly described as follows:
Given: Input and target output sequence u(n) and ytarget(n), .
Objective: The teacher signal input/output training acquires and acquires the network output signal by loading the input signal.
Proceed as follows:
The random number generator module generates an input, a reservoir, and a feedback weight of the echo state network and sends the input and feedback weights to the input module, and the reservoir weight is sent to the reservoir module.
The input module loads the input signal, encodes the input signal, and sends it to the reservoir module.
The reservoir module receives the encoded signal sent by the input module, loads the target output signal, acquires the feature value, and sends it to the output module and the training module.
The training module loads the input signal, the target output signal, and the characteristic value sent by the reservoir module; calculates the output weight; sends the result to the output module; and determines whether to stop training according to the signal sent by the system switch module.
The output module loads the input signal and calculates the network output according to the characteristic value sent by the reservoir module and the output weight sent by the training module and sends the network output value to the system switch module and outputs the actual network value.
The system switch module loads the target output signal and the network output signal to determine whether the match is matched and sends the judgment result back to the training module. If it matches, it sends back S = 0; if it does not match, it sends back S = 1;
Repeat steps 2–6 until the judgment result of the system switch module is to stop training, that is, S = 0.
This is followed by the regular echo state network function, which only performs input, reservoir, and output modules and outputs network prediction values.
4. Experiment and analysis
A total of two types of benchmark task experiments were performed: the output signal was a binary signal and a floating point number signal and a multiple-input multiple-output channel prediction task experiment. The programming language is Verilog HDL, the chip uses Altera Stratix III FPGA, and the integrated place and route are implemented in QUARTUS II.
4.1 Different duty cycle signal pattern recognition
The pattern recognition reference task with different duty cycles is very similar to the memory resistance based on reservoir calculation (RC) in . The mode signals with different duty cycles are shown in Figure 5. The input signal of the echo state network is U, the expected output is Y target, and the actual output signal is Yr; Wo1 and Wo2 are output weights, and B is the bias signal. The second line (signal Y_target) represents the expected response of the first line (signal U). When the duty cycle of the input signal is less than 50%, the signal Y_target should converge to 0 and should converge to 1 for a duty cycle greater than 50%. As shown in Figure 5, the online training echo state network in the FPGA obtains Wo1, Wo2, and B, and their values are 00Eh, 00Ah, and FC1h, respectively. After the output weight is obtained, different duty cycle mode signals are loaded into the trained echo state network, and the result is as shown in Figure 6. The output signal (Yr) changes between a duty cycle of the input signal (U) greater than 50% and less than 50%.
Figure 7 is a graph showing the percentage of the total number of nerve cells implemented in the FPGA logic and FPGA and the error curve. It can be seen that the logic utilization is less than 60% until the number of neurons is 512 units. When the number of nerves exceeds 16 units, the error between the actual output signal and the ideal output is zero. Therefore, the circuit resources proposed in this paper consume less, and the convergence speed is faster.
4.2 Sine wave generator
Here we test how to train the echo state network to generate a sine wave signal, which is a floating point number, which is very similar to the simple sine wave test performed in MATLAB when the echo state network is proposed in . The ideal sine wave signal is given by the equation . There is no input in this experiment, so set to 0; W and are matrices formed by random numbers, and the basic unit of the echo state network is the standard S unit (i.e., the activation function is S function tanh), and the training result is shown in Figure 8. In the process of generating sine wave training, the ideal output signal () is a floating point number, so the actual output signal (Yr), the normalized root mean square error signal (E), and the output weight signal (Wout1, Wout2, Wout3, Wout4) are all floating point numbers. The training error is calculated in the system switch module, and E = 3D0228E7h is calculated according to the normalized root mean square error calculation equation . The optimized output weights Wout1, Wout2, Wout3, and Wout4 of the training module are BDFEDD9Dh, 3CB22AA8h, 3CA0BDD8h, and 3F55E542h, respectively. The waveform shown in Figure 9 is acquired by the SignalTap II logic analyzer and is a floating-point sine wave generated for the trained echo state network. The Altera floating-point IP core was used in the experiment to set up the echo state network. As shown in Figure 10, the top is the Y target signal, the middle is the actual output y signal of the network, and the bottom is the error waveform of the network expected output and the actual output.
4.3 MIMO channel prediction
Recurrent neural networks have been widely used in MIMO systems [17, 18, 19, 20, 21, 22, 23]. Echo state networks are a way to train recurrent neural networks. They have faster convergence characteristics, and more efficient tracking channel state changes than other traditional training methods. For a 2 × 2 multiple-input multiple-output system with a binary phase shift keying (BPSK) modulator, as shown in Figure 11, the zero-forcing equalizer used in the receiver section can reduce symbol interference (ISI) due to the precise channel. It is estimated that the zero-forcing equalization can be improved by the degraded radio channel, and therefore the proposed architecture is used for MIMO channel prediction.
The matrix equation of the MIMO system shown in Figure 11 is given as
The system can be represented as a compact form Y = HX + n, where Y is a 2 × 1 received signal vector, H is a 2 × 2 channel coefficient matrix, X is a 2 × 1 propagation vector, and n is a 2 × 1 additive white Gaussian noise vector. The channel is considered to be a Rayleigh decay with a mean of 0 and a variance of 0.5. At the receiving end, the zero-forcing equalization performs the prediction of the propagated signal, and the equation is
where represents the pseudo inverse of H. The predicted loaded BPSK demodulator recovers the original information.
In order to dynamically update the channel state at each step, an echo state network is added to the 2 × 2 MIMO system in Figure 11. The system structure diagram after adding the echo state network is shown in Figure 12. The echo state network channel prediction strategy is shown in Figure 13. The channel coefficients are trained in the echo state network channel prediction. Once the training is completed, the echo state network channel prediction can automatically generate the predicted channel coefficients, and the predicted channel coefficients are loaded into the zero-forcing equalization, thereby completing the MIMO channel prediction.
Figure 14 shows the RTL level circuit diagram generated by the FPGA. The system mainly includes a transmitter (fx), a receiver (rx), and an echo state network module (ch_test). In the transmitting module (fx), the signals x1r, x2r and x1i, x2i are the real and imaginary parts of the MIMO system input signals X1 and X2, respectively, and h11r, h12r, h21r, h22r and h11i, h12i, h21i, h22i are, respectively, MIMO channels. The real and imaginary parts of the coefficient, y1r, y2r and y1i, y2i are the real and imaginary parts of the received signal, respectively. In the receiving module (rx), a11, a12, a21, a22 and b11, b12, b21, b22 are the real and imaginary parts of the channel prediction coefficients, and the output is calculated by the echo state network module (ch_test), c1, c2, and d1. d2 is the real and imaginary part of , respectively, corresponding to the demodulated signal X, and the signal “detu” is the demodulation scale factor of c1, c2, d1 and d2.
The designed MIMO system is downloaded to the FPGA chip through a bitstream file, and the waveform result is obtained by a SignalTap II logic analyzer, as shown in Figure 15. It can be seen from the waveform diagram that the signal waveforms of the C1 and C2 parts are completely identical. In the C1 portion, the signals in1 and in2 correspond to x1r and x2r, respectively, and in the C2 portion, the signals oute1 and oute2 correspond to the real part of the signal X. X1r, x2r, and x1i, x2i are the real and imaginary parts before demodulation of the demodulated signal, respectively.
In order to be able to explain the C3 part, the C3 part is enlarged here (see Figure 16). As can be seen from the C3 section, the received signals Y1 and Y2 and the estimated channel matrix are all changed. However, zero-forcing equalization can still predict values x1r, x2r, and C through the echo state network. After the processing is completed, BPSK demodulation signals “oute1” and “oute2” are obtained.
The real-time FPGA echo state network structure is proposed and studied. The input weight and the reservoir weight are randomly determined before training, and the output weight is calculated in real time in the FPGA by training the echo state network. In the above two benchmark experiments (pattern recognition and waveform generation) and MIMO channel prediction experiments, the proposed hardware architecture can recognize the duty cycle of different input signals, generate floating point waveforms, and predict channel coefficients. From the experimental results, the echo state network is faster, the resources are less occupied, and the simple task execution is ideal. In future research work, the proposed FPGA real-time echo state network will be used in more complex 5G-based wireless MIMO-OFDM systems.