Coherent Receiver for Turbo Coded Single-User Massive MIMO-OFDM with Retransmissions

Single-user massive multiple-input multiple-output (MIMO) systems have a large number of antennas at the transmitter and receiver. This results in a large overall throughput (bit-rate), of the order of tens of gigabits per second, which is the main objective of the recent fifth-generation (5G) wireless standard. It is feasible to have a large number of antennas in mm-wave frequencies, due to the small size of the antennas. This chapter deals with the coherent detection of orthogonal frequency division multiplexed (OFDM) signals transmitted through frequency-selective Rayleigh fading MIMO wireless channels. Low complexity, discrete-time algorithms are developed for channel estimation, carrier and timing synchronization, and finally turbo decoding of the data at the receiver. Computer simulation results are presented to validate the theory.


Introduction
The main objective of the fifth-generation [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15] wireless communication standard is to provide peak data rates of 10 gigabit per second (Gbps) for each user, ultralow latency (the time duration between transmission of information and getting a response) of less than 1 ms, and, last but not the least, very low bit error rates (BER) ( < 10 À10 ). High data rates are essential for streaming ultrahigh definition (4k) video. Low latency is required for future driverless cars and remote surgeries. An important feature of the 5G network is that it involves not only people but also smart devices. For example, it may be possible to control a microwave oven or geyser located in the home, from the office. High data rates are feasible by using a large number of transmitting antennas. For example, if each transmit antenna transmits at a rate of 100 megabits per second (Mbps), then using 100 transmit antennas would result in an overall bit-rate of 10 Gbps. This technique of increasing the overall bit-rate by using a large number of transmit antennas is also known as spatial multiplexing (not to be confused with spatial modulation [16][17][18][19][20], wherein not all the transmit antennas are simultaneously active). This is illustrated in Figure 1, where the i th transmit antenna sends C i bits of information and each of the receive antennas gets C=N bits of information, in each transmission (see Proposition A.1 and A.2 in [21]). It must be noted that a large array of transmit antennas can also be used for beamforming [22,23] and beam steering (the ability to focus the transmitted signal in a particular direction, without moving the antenna), which is not the topic of this chapter. In fact, the basic idea used in this chapter is captured in the following proposition.
Proposition 1.1 Signals transmitted and received by antennas separated by at least λ=2 (λ ¼ c=ν where c is the velocity of light and ν is the carrier frequency) undergo independent fading.
A typical massive MIMO antenna array is shown in Figure 2. The black dots denote the antennas, and the circles denote obstructions used to prevent mutual coupling between the antennas. While spatial multiplexing is a big advantage in massive MIMO, the main problem lies in the high complexity of data detection at the receiver. To understand this issue, consider the signal model: whereR ∈ C NÂ1 is the received vector,H ∈ C NÂN is the channel matrix, S ∈ C NÂ1 is the symbol vector drawn from an M-ary 2D constellation, andW ∈ C NÂ1 is the  additive white Gaussian noise (AWGN) vector. Here C denotes the set of complex numbers. Due to Proposition 1.1, the elements ofH are statistically independent. Moreover, if there is no line-of-sight (LOS) path between the transmitter and receiver, the elements ofH are zero-mean Gaussian. The elements ofW are also assumed to be independent. The real and imaginary parts of the elements ofH and W are also assumed to be independent. Now, the problem statement is find S giveñ R. There are several methods of solving this problem, assuming thatH is known.
1. Perform an exhaustive search over all the M N possibilities of S. This is known as the maximum likelihood (ML) approach, which has got an exponential complexity.
2. Pre-multiplyR withH À1 . This is known as the zero-forcing approach and has a complexity of the order of 2N 3 (N 3 complexity for computing the inverse and another N 3 for matrix multiplication). This approach usually leads to noise enhancement and a poor symbol error rate (SER) performance.
Data detection in single-user massive MIMO systems using retransmissions, having a complexity of N rt Â N 3 , where N rt is the number of retransmissions, has been proposed [34], where it was assumed thatH is known at the receiver. In this work, which is an extension of [34], we present a coherent receiver for massive MIMO systems, where not onlyH but also the carrier frequency offset and timing are estimated. Moreover, the signal model in Eq. (1) is valid for flat fading channels. When the channel is frequency selective (the length of the discrete-time channel impulse response is greater than unity), orthogonal frequency division multiplexing needs to be used, since OFDM converts a frequency-selective channel into a flat fading channel (length of the discrete-time channel impulse response is equal to unity) [35]. To this end, the channel estimation and carrier and timing synchronization algorithms developed in [36] for single-input single-output (SISO) OFDM, [37,38] for single-input multiple-output (SIMO) OFDM, and [21,39] for multipleinput multiple-output (MIMO) OFDM are used in this work. In [40], a linear prediction-based detection of serially concatenated QPSK is presented, which does not require any preamble. The prospect of using superimposed training [41] in the context of massive MIMO looks quite intimidating, since the signal at each receive antenna is already a superposition of the signals from a large number of transmit antennas.
This work is organized as follows. Section 2 presents the system model. The discrete-time receiver algorithms are presented in Section 3. The computer simulation results are discussed in Section 4, and the chapter concludes with Section 5.

System model
The transmitted frame structure is shown in Figure 3(a). The signal in the blue boxes is sent from transmit antenna n t . The signal in the red boxes is sent from other antennas. Note that in the preamble phase, only one transmit antenna is active at a time, whereas in the data phase, all transmit antennas are active simultaneously. In practice, each transmit antenna could use a different preamble. However, in this work, we assume that all transmit antennas use the same preamble. The signals in Figure 3(a) are defined as follows (similar to [21]): S 3, i, n t e j 2πni=L d for 0 ≤ n ≤ L d À 1 s 2, n, n t ¼s 3, L d ÀL cp þn, n t for 0 ≤ n ≤ L cp À 1 s 4, n ¼s 1, n for 0 ≤ n ≤ L cp À 1: (2) The term i in the above equations denotes the i th subcarrier, n denotes the time index, and 1 ≤ n t ≤ N is the index to the transmit antenna. Note that in this work, the same preamble is transmitted one after the other by each of the transmit antennas, as shown in Figure 3(a). In [21], different preambles are transmitted simultaneously from all the transmit antennas. The channel coefficientsh k, n, n r , n t associated with the receive antenna n r (1 ≤ n r ≤ N) and transmit antenna n t (1 ≤ n t ≤ N) for the Gaussian random variable) and satisfy the following relations [21]: Eh k, n, n r , n th * k, m, n r , n t Eh k, n, n r , n th * k, n, m r , n t where "*" denotes complex conjugate and δ K Á ð Þ is the Kronecker delta function. Observe that Eq. (3) implies a uniform power delay profile. Even though an exponential power delay profile is more realistic, we have used a uniform power delay profile, since it is expected to give the worst-case BER performance, as all the multipath components have the same power [21]. The channel is assumed to be quasi-static, that is,h k, n, n r , n t is time-invariant over one frame (retransmission). The length of all the N 2 channel impulse responses is assumed to be L h , which is proportional to the difference between the longest and shortest multipath [21]. The channel span assumed by the receiver is [21,36,39].
The length of the cyclic prefix or suffix is [21,36,39].
The length of the preamble is L p , and the length of the data is L d . The AWGN noise samplesw k, n, n r for the k th retransmission at time n and receive antenna n r are CN 0; 2σ 2 w À Á and satisfy The noise and channel coefficients are assumed to be independent. The frequency offset ω 0 is uniformly distributed over À0:03; 0:03 ½ radians, and the ML frequency offset estimator searches in the range Àω 0, max ; ω 0, max ½ radians [42] where ω 0, max ¼ 0:04 radian: For convenience, and without loss of generality, we assume that ω 0 is constant over N rt retransmissions.
During the preamble phase, the signal at receive antenna n r , for the k th retransmission, can be written as (for 0 ≤ n ≤ L p þ L cp þ L h À 2) r k, n, n r , n t , p ¼s 5, n ⋆h k, n, n r , n t e j ω 0 n þw k, n, n r , n t , p ¼ỹ k, n, n r , n t , p e j ω 0 n þw k, n, n r , n t , p where " ⋆ " denotes linear convolution,s 5, n is depicted in Figure 3(a),h k, n, n r , n t denotes the channel impulse response between transmit antenna n t and receive antenna n r for the k th retransmission, and y k, n, n r , n t , p ¼s 5, n ⋆h k, n, n r , n t : The subscript "p" in Eqs. (8) and (9) denotes the preamble. Note that any random carrier phase can be absorbed in the channel impulse response. We haves where " ⊙ L p " denotes an L p -point circular convolution, "⇌ L p " denotes the L p -point discrete Fourier transform (DFT) or the fast Fourier transform (FFT), and Due to the presence of the cyclic suffix, we havẽ Assuming perfect carrier and timing synchronization (ω 0 is perfectly canceled and the frame boundaries are perfectly known) at the receiver, the signal at the output of the L p -point FFT for the i th (0 ≤ i ≤ L p À 1) subcarrier and receive antenna n r , due to the preamble sent from transmit antenna n t during the k th retransmission, isR k, i, n r , n t , p ¼H k, i, n r , n t S 1, i þW k, i, n r , n t , p whereH k, i, n r , n t andW k, i, n r , n t , p denote the L p -point FFT ofh k, n, n r , n t and w k, n, n r , n t , p , respectively. The average SNR per bit corresponding to Eq. (13) is where [36] EH k, i, n r , n t where A is a constant to be determined and it is assumed that each sample of each receive antenna gets 2= NN rt ð Þbits of information during the preamble phase (see Proposition A.2 in [21]).
During the data phase, the signal for the k th retransmission at receive antenna n r can be written as (for 0 ≤ n ≤ L d þ L cp þ L h À 2) r k, n, n r , d ¼ ∑ N n t ¼1s 6, n, n t ⋆h k, n, n r , n t e j ω 0 n þw k, n, n r , d ¼ỹ k, n, n r , d e j ω 0 n þw k, n, n r , d wheres 6, n, n t is depicted in Figure 3(a) and y k, n, n r , d ¼ ∑ N n t ¼1s 6, n, n t ⋆h k, n, n r , n t : The subscript "d" in Eqs. (16) and (17) denotes data. Assuming perfect carrier and timing synchronization at the receiver, the signal at the output of the L d -point FFT for the i th (0 ≤ i ≤ L d À 1) subcarrier and receive antenna n r , during the k th retransmission, isR k, i, n r , n t S 3, i, n t þW k, i, n r , d whereH k, i, n r , n t andW k, i, n r , d denote the L d -point FFT ofh k, n, n r , n t andw k, n, n r , d , respectively. The average SNR per bit corresponding to Eq. (18) is where [36].
and it is assumed that each receive antenna gets 1= 2N rt ð Þbits of information in each transmission [34]. We impose the constraint that Let us now compare the average power of the preamble with that of the data, at the transmitter. The average power of the preamble in the time domain is where A is defined in Eqs. (15) and (21). Similarly, the average power of the data in the time domain is Therefore, the radio frequency (RF) amplifiers at the transmitter must have a dynamic range of at least (note that the RF amplifiers have to also deal with the peak-to-average power ratio (PAPR) problem [43- Let us now consider the case where the preamble power is equal to the data power at each transmit antenna. From Eqs. (22) and (23) Substituting for A from Eq. (25), we obtain the average SNR per bit of the preamble phase and the data phase as ) 10 log 10 SNR av, b, d SNR av, b, p ¼ 10 log 10 4 ð Þ dB: In other words, the average SNR per bit of the preamble phase would be less than that of the data phase by 6 dB. In what follows, we assume that A is given by (21).

Receiver algorithms
The receiver algorithms have been adapted from [21,36,37,39] and will be briefly described in the following subsections.

Start of frame and frequency offset estimation
The first task of the receiver is to detect the presence of a valid signal, that is, the start of frame (SoF). The SoF detection and coarse frequency offset estimation are performed for each receive antenna 1 ≤ n r ≤ N, transmit antenna 1 ≤ n t ≤ N, and retransmission 1 ≤ k ≤ N rt as given by the following rule (similar to Eq. (17) in [21]: choose that value ofm k n r ; n t ð Þandν k n r ; n t ð Þwhich maximizes r k, m, n r , n t , p e Àjν k n r ; n t ð Þm ⋆s * 1, L p À1Àm, n t (27) wherer k, m, n r , n t , p is given in Eq. (8) and for 0 ≤ l ≤ B 1 , where l and B 1 [21] are positive integers and ω 0, max is given in Eq. (7). Observe thatm k n r ; n t ð Þsatisfies Eqs. (18) and (19) in [21]. The average value of the frequency offset estimate is given bŷ

Channel estimation
We assume that the SoF has been estimated using Eq. (27) with outcome m 0 given by (assuming the condition (19) in [21] is satisfied for all k, n r , and n t ) and the frequency offset has been perfectly canceled [36,38]. Observe that any value of k, n r , and n t can be used in the computation of Eq. (30). We have taken k ¼ n r ¼ n t ¼ 1. Define [21,36,39].
The steady-state, preamble part of the received signal for the k th retransmission and receive antenna n r can be written as [21,36,39] r k, m 1 , n r , n t , p ¼s 5hk, n r , n t þw k, m 1 , n r , n t , p wherer k, m 1 , n r , n t , p ¼r k, m 1 , n r , n t , p …r k, m 1 þL p À1, n r , n t , p w k, m 1 , n r , n t , p ¼wk,m 1 , n r , n t , p …w k, m 1 þL p À1, n r , n t , p Â Ã T L p Â1 h k, n r , n t ¼h k, 0, n r , n t …h k, L hr À1, n r , n t Â Ã T L hr Â1 Observe thats 5 is independent of m 1 and due to the relations in Eqs. (10), (15), and (21), we haves where I L hr is an L hr Â L hr identity matrix. The estimate of the channel is [21,36,39] h k, n r , n t ¼s H 5s5 À Á À1s H 5rk, m 1 , n r , n t , p : To see the effect of noise on the channel estimate in Eq. (35), consider u ¼s H 5s5 À Á À1s H 5wk, m 1 , n r , n t , p : It can be shown that [21,39] Eũũ

Noise variance estimation
The noise variance per dimension is estimated aŝ k, m 1 , n r , n t , p Às 5ĥ k, n r , n t H r k, m 1 , n r , n t , p Às 5ĥk, n r , n t :

Post-FFT operations
In this section, we assume that the residual frequency offset given by is such that ω r L d < 0:1 radians (40) so that the effect of inter carrier interference (ICI) is negligible. Let where m 1 is defined in Eq. (31). Note that m 2 is the starting point of the data phase. Define the FFT input in the time domain for the k th retransmission and receive antenna n r as r k, m 2 , n r , d ¼r k, m 2 , n r , d …r k, m 2 þL d À1, n r , d ½ where we have followed the notation in Eq. (16). The L d -point FFT of Eq. (42) is whereR k, i, n r , d is given by Eq. (18). Construct a matrix: whereH which is similar to Eq. (1) in [34]. Let whereĤ k, i is constructed from the L d -point FFT ofĥ k, n r , n t in Eq. (35) andŶ k, i is similar toỸ k in Eq. (6) of [34]. The analysis when is given in [34]. LetỸ Note thatỸ i is an N Â 1 matrix, whose n th t elementỸ i, n t is a noisy version of S 3, i, n t in Eq. (18). The matrix constructed from the elements ofỸ i in Eq. (49) is fed to the turbo decoder. The forward (α) backward (β) recursions for decoder 1 of the turbo code is given by Eqs. (28) and (31) in [34]. The term γ 1, i, m, n in Eq. (30) of [34] should be replaced by whereỸ i, n t is an element ofỸ n t in Eq. (50) and n t is an odd integer. The term σ 2 U in Eq. (51) is given by Eq. (22) in [34] which is repeated here for convenience: k, i, n r , n t 2 (53) whereσ 2 w is given by Eq. (38) andĤ k, i, n r , n t is obtained by taking the L d -point FFT of (35). The term F i, n t in Eq. (51) is given by k, i, n r , n t 2 : The extrinsic information from decoder 1 to decoder 2 is computed using Eqs. (32) and (33) of [34], with γ 1, i, n, ρ þ n ð Þ replaced by γ 1, i, n, ρ þ n ð Þ, n t . The equations for decoder 2 are similar, except that γ 2, i, m, n in Eq. (34) of [34] should be replaced by where again n t is an odd integer.

Throughput and spectral efficiency
Recall from Figure 3(a) that during the preamble phase, only one transmit antenna is active at a time, whereas during the data phase, all the transmit antennas are simultaneously active. Thus the throughput can be defined as [36,37].
The numerator of Eq. (57) denotes the total number of data bits transmitted, and the denominator represents the total number of QPSK symbol durations over N rt retransmissions. The symbol rate during the preamble phase and data phase is the same. In the data phase, we are transmitting coded QPSK, that is, in each data bit duration, two coded QPSK symbols are sent simultaneously from two transmit antennas (see Figure 3(b)). Thus, during the data phase, each transmit antenna sends half a bit of information in each transmission. Therefore, the spectral efficiency is The throughput for various simulation parameters is given in Table 1. Observe that when L p ¼ L d =2, L cp ≪ L d , and N ≫ 1, T ! 1=N rt . In this work, we have used a rate-1=2 turbo code, that is, each data bit generates two coded QPSK symbols. The throughput can be doubled by using a rate-1 turbo code, obtained by puncturing.

Simulation results
The simulation parameters are given in Table 2. A "run" in Table 2 is defined as transmitting and receiving the frame in Figure 3(a) over N rt retransmissions. The generating matrix of each of the constituent encoders of the turbo code is given by Eq. (49) in [21]. A question might arise: how does N ¼ 4, 8 correspond to a massive MIMO system, whereas in [34] N was as large as 512? The answer is in [34], an ideal massive MIMO was considered, wherein the channel, timing, and carrier frequency offset were assumed to be known, whereas in this work, the channel, timing, and carrier frequency offset are estimated. The estimation complexity and memory requirement increase as N 2 , for an N Â N MIMO system. For example, the memory requirement of Eq. (27) when the number of frequency bins B 1 ¼ 1024 [21], preamble length L p ¼ 4096, cyclic prefix length L cp ¼ 18, channel length L h ¼ 10, N ¼ 8 transmit and receive antennas, and N rt ¼ 4 retransmissions is double precision values. In fact Eq. (27) is implemented using multidimensional arrays in Scilab, instead of using for loops. Note that from Eq. (8), the length of the received signal during the preamble phase is L p þ L cp þ L h À 1. If for loops are used, the memory requirement would be double precision values, which is much less than Eq. (59); however the simulations would run much slower. Does this mean that we cannot go higher than an  8 Â 8 MIMO system? The answer is no. The solution lies in using multiple carrier frequencies as illustrated in Figure 4. Observe that with 8 Â 8 MIMO and M carrier frequencies, we get an overall 8M Â 8M MIMO system. The bit error rate results for a 4 Â 4 MIMO system are shown in Figure 5. The bit error rate results for an 8 Â 8 MIMO system are shown in Figure 6. The following observations can be made from 1. There is only 0.75 dB difference in performance between the ideal (id) and estimated (est) receiver, for L d ¼ 1024, N rt ¼ 2, and bit error rate equal to 10 À4 . On the other hand, there is hardly any performance difference between the ideal and estimated receiver for L d ¼ 8192. This is because the noise  variance (σ 2 w ) decreases with increasing L p , L d , for a given average SNR per bit, as shown by Eq. (21).
2. There is only 0.5 dB improvement in performance for L d ¼ 8192 over L d ¼ 1024, at a BER of 10 À4 .
3. There is a significant improvement in performance between N rt ¼ 2 and N rt ¼ 4, for both L d ¼ 1024 and L d ¼ 8192. On the other hand, there is no significant difference in the BER for N rt ¼ 2 and N rt ¼ 4 in [34]. The reason is because in this work, the channelH k, i, n r , n t in Eq. (18) is highly correlated over the subcarrier index i, since it is obtained by taking an L d -point FFT of an L htap channel (see Eq. (37) of [36]). On the other hand, the channelH k, i, j in [34] is independent over all the indices k, i, and j, where k is the retransmission index, i denotes the receive antenna index, and j denotes the transmit antenna index. See also the discussion leading to Eq. (82) in [21].
4.There is not much BER performance difference between the 4 Â 4 and 8 Â 8 MIMO systems. A 4 Â 4 MIMO is computationally less complex than 8 Â 8; however the 4 Â 4 requires twice the number of carrier frequencies to achieve the same spectral efficiency as 8 Â 8, for the same number of retransmissions.
5. The BER performance of the 8 Â 8 MIMO system with N rt ¼ 4 and L d ¼ 8192 could not be simulated due to the large amount of memory involved (see Eq. (59)) and Scilab limitations.

Conclusions
This work describes the discrete-time algorithms for the implementation of a massive MIMO system. Due to the implementation complexity considerations, more than one carrier frequency is required to obtain a truly single-user massive MIMO system. Each carrier frequency needs to be associated with an 8 Â 8 or 4 Â 4 MIMO subsystem. The average SNR per bit has been used as a performance measure, which has not been done earlier in the literature. Perhaps the channel can also be estimated using Eq. (27), instead of using Eq. (35). This needs investigation.