Synthesis performance comparison of CFO compensation with four techniques
Synchronization issue is inevitable in all wireless communication receiver systems and it plays the key role to the system performance. Synchronization technique includes timing synchronization and frequency synchronization. Timing synchronization is to detect valid packet and the accurate start position of fast Fourier transform (FFT) window from noise. Frequency synchronization is to correct the phase error caused by the mismatch of local oscillator (LO) between transmitter and receiver.
Synchronization technique has been extensively studied for years. Although UWB system can leverage on successful experiences of orthogonal frequency division multiplexing (OFDM), it cannot use the traditional synchronization technology directly due to the distinct features. In IEEE 802.15.3a standard, the specified emission power spectral density is only -41 dBm/MHz, which is extremely small compared with other wireless systems. It indicates that timing synchronization for UWB system should be robust in high noise environment. In addition, to satisfy 528 Msps throughput, the UWB baseband receiver system should be designed in parallel architecture. The inherent high complexity, the requirements of high performance, high speed, low cost and low power consumption make the design of synchronization blocks for UWB quite a challenge work.
This chapter will be divided into three parts: timing synchronization, coarse frequency synchronization and fine frequency synchronization. The traditional algorithms and innovative methods with low complexity and good performance will be introduced. Architecture design of each part is also provided.
2. Timing synchronization
As soon as the receiver starts up, it searches for the presence of OFDM-based UWB packet in the received signals. Usually, packet detection can only acquire the rough timing information by exploiting the repetition in the received signal.The accurate timing information, such as the symbol boundary or the start position of FFT window, is necessary, which relies on matching the received waveform with the preamble waveform by a matched filter.
2.1. Effects of timing offset
Assume the channel maximum delay is shorter than the guard interval; the position of FFT window can have several situations, as shown in Fig. 1. The exact start position of FFT window is at the boundary of region B and C. If the start position is in region B, the signals in FFT window will not be contaminated by the previous symbol and thus no inter-symbol interference (ISI) occurs. The only effect is introducing phase shift. After demodulation, the received signal with timing offset in region B is expressed in (1).
When the FFT window leads or lags by a large degree, such as in region A or C, ISI will be introduced and both the magnitude and the phase of the received signal will be distorted, as shown in (2).
2.2. Timing synchronization algorithms
Timing synchronization can be divided into two categories: coarse timing synchronization and fine timing synchronization. Coarse timing synchronization is usually based on auto-correlation (AC), while fine timing synchronization is based on cross-correlation (CC). The traditional algorithms of AC, maximum likelihood (ML), minimum mean square error (MMSE) and CC will be introduced.
The AC algorithm (Schmidl & Cox, 1997) for coarse timing synchronization is quite straightforward. It searches for the repetition in the received signal with a correlator and a maximum searcher. Let the repetition interval length be denoted as
where * is the conjugated operation. The estimated time index of the maximum
If the maximum
σs 2/ σn 2 is SNR. The estimated symbol boundary is derived by searching the maximum output of ML function. The complexity of ML is quite high because the estimation of SNR is difficult and the errors in SNR estimation will make the system less reliable.
MMSE metric (Minn et al., 2003) is equivalent to a special case of the ML metric with ρ = 1. It shows almost the same timing estimation performance as ML. The principle is to search the minimum output of the metric, as shown in (7).
For AC, ML and MMSE algorithms, when the preamble has more than two identical segments, there will be a plateau or a wide basin in the correlator output waveforms. Theoretically, the plateau or basin indicates the ISI-free region for FFT window. However, noise in the received signal may cause the max/min to drift away from the optimal point. So AC, ML and MMSE are the methods to detect packet coarsely and the detection of accurate symbol boundary or FFT window needs fine timing synchronization, such as CC.
CC is the mechanism for fine timing synchronization. Instead of correlating the noisy received waveform with its delayed version, CC is defined as correlating the received signal with preamble waveform (Fort et al., 2003). It can fit into the low SNR situation and can be expressed as
Dual-threshold (DT) detection scheme is based on the idea in (Fan et al., 2009) for OFDM-based UWB system. Fig. 2 shows the block diagram of the DT detection scheme. The signal detection process is divided into two steps. The first step is based on CC algorithm. Express the peak CC energy of each symbol as
The second step is to read the moving sum from FIFO and auto-correlating with its delayed version. The energy of the cascaded auto-correlator can be derived as
where M is the repeated preamble interval length of UWB system. The delay interval of the auto-correlator is decided by the period of time frequency code (TFC). In order to ensure the moving sum and its delayed version are in the same band no matter what kind of TFC mode is adopted, the delay interval is set to six-symbol length. If the output energy of cascaded auto-correlator
Fig. 3 depicts the output waveforms of ML and MMSE algorithms at 10 dB SNR. There are plateaus and basins in the output waveforms of ML and MMSE, which make the peak energy ambiguous. It is much easier to find accurate timing information in the output waveform of CC in Fig. 4. However, there are glitches in CC output waveform, which will corrupt the detection of symbol boundary and increase the false alarm probability. The waveform of DT has much lower noise floor compared with CC and there is not any glitch.
2.3. Architecture of the matched filter
Matched filter is the basic component in timing synchronization for detecting a known piece of signal in noise. The architecture of mated filter determines the complexity and the powerconsumption of the timing synchronizer. An optimum architecture of the matched filter for OFDM-based UWB is provided, as shown in Fig. 5. To satisfy 528 Msps throughput, the baseband receiver system of UWB is designed at 132 MHz clock frequency with four parallel paths and twelve-level pipelines. For low complexity, both the received signal and the preamble coefficients are truncated to sign-bit. In this case, five-bit multipliers can be replaced with NXOR gates. In addition, the 128 sign-bits of preamble coefficients are generated by spreading a 16 sign-bit sequence with an 8 sign-bits sequence as follows
3. Coarse frequency synchronization
OFDM-based UWB system is sensitive and vulnerable to carrier frequency offset (CFO), which can be estimated and compensated by coarse frequency synchronization in time domain. Due to the Doppler Effect, even very small CFO will lead to very serious accumulated phase shift after a certain period.
3.1. Effects of carrier frequency offset
Define the normalized CFO,
where Es/No is the ratio of symbol energy to noise power spectral density.
3.2. Frequency synchronization algorithm
The most straightforward frequency synchronization algorithm is based on AC functions. CFO can be estimated by the phase difference between two symbols. For traditional OFDM system, the CFO can be estimated as
Although the SWL can be further reduced for lower complexity, the performance degradation requires a much longer period sum average to compensate. Tradeoff in complexity, performance and the processing period,
For UWB, the CFO compensation algorithm can be optimized as well. The basic idea is to take the CFO values on four-parallel paths as the same if the differences of the four CFO values are very small (Fan & Choy, 2010a). In the specification of UWB, the center frequency is about 4 GHz and the maximum impairment at clock synthesizer is ±20 ppm (parts per million). Therefore, the normalized CFO should be less than 0.04. And the maximum CFO difference between any two parallel samples should be less than 2.5 × 10-4, which is small enough and can be ignored. The optimized CFO compensation scheme can be expressed as
3.3. Implementation of frequency synchronizer
The design of frequency synchronizer is divided into two parts. The first part is to estimate the phase difference between two preambles by AC and arctangent calculation. The second part is to compensate the signals by multiplying a complex rotation vector. In this part, the phase accumulator and sin/cos generator are involved.
Fig. 7 shows the architecture of CFO compensation block. The phase accumulator produces a digital weep with a slope proportional to the input phase. The phase offset is scaled from [0, 2π] to [0, 8] by multiplying a factor 4/π, so that just the three most significant bits (MSBs) can be used to control the phase offset regions. During CFO compensation, the sine and cosine values of the phase offset in the range of [0,π/4] are necessary to be calculated. If the phase offset is in other ranges, input complement, output complement or output swap are operatedcorrespondingly.
In the design of frequency synchronizer, implementation of arctangent, sine and cosinefunctions is the most critical work since it decides the complexity of the synchronizer and the performance of the UWB receiver system. The traditional OFDM-based or CDMA-based systems usually employed classic coordinate rotation digital computer (CORDIC) algorithm for function evaluation (Tsai & Chiueh, 2007; Troya et al., 2008). Actually, there are other techniques for function evaluation, such as polynomial hyperfolding technique (PHT) (Caro et al., 2004), piecewise-polynomial approximation (PPA) technique (Caro & Steollo, 2005), hybrid CORDIC algorithm (Caro et al., 2009) and multipartite table method (MTM) (Caro et al., 2008).
PHT calculates sine and cosine functions using an optimized polynomial expression with constant coefficients. The sine and cosine functions can be expressed by polynomial expressions of degree K.
where 0 ≤x< 1 is the scaled input of sine and cosine functions. Optimization is conducted on two-order (K = 2) and three-order (K = 3) approximated polynomials, expressed as (19) and (20) respectively (Caro et al., 2004). The two-order PHT can achieve about 60 dBc spurious free dynamic range (SFDR) while the three-order PHT can achieve 80 dBc SFDR.
The technique of PPA is based on the idea of subdividing the interval in shorter subintervals. Polynomials of a given degree are used in each subinterval to approximate the trigonometric functions. The signal x represents the input phase scaled to a binary fraction in the interval of [0, 1], which is subdivided in s subintervals, with s = 2u. The u MSBs of x encode the segment starting point xk and are used as an address to the small lookup tables that store polynomial coefficients. The remaining bits of x represent the offset x–xk. The quadratic PPA of sine and cosine functions can be expressed as (Caro & Steollo, 2005)
Fig. 9 shows the architecture of sine and cosine blocks with PPA.Use r bits and t bits for the first-order and the second-order coefficients quantization respectively. The constant coefficients are (Q– 1) bits. The input and output of the sine and cosine functions are represented by P bits and Q bits. The constant, linear and quadratic coefficients are read from ROMs to conduct polynomial calculation. The partial products are generated by the PPGen block to compute linear terms. And the carry-save addition tree adds the partial products together after aligning all the bits according to their weights.
This approach splits the phase rotation in three steps. The first two steps are CORDIC-based with computing the rotation directions in parallel. The final step is multiplier-based (Caro et al., 2009).
Suppose theword length of input vector [Xin, Yin] and output vector [Xout, Yout] are 12 and 13 bits respectively. Represent the rotation phase φ∈ [0, π/4] with a binary fractional value in [0, 1] as
The least significant bit (LSB) of φ has a weight that will be indicated in the following as φLSB = (π/4)2-13. In the first step, the phase is divided in two subwords φ = α + β, where
The goal of the first stage is to perform a rotation by an angle close to α + φLSB/2. To that purpose, the first rotation uses CORDIC algorithm can be described by the following equations.
where σiis equal to the sign of Zi. The algorithm starts with X1 = Xin, Y1 = Yin and Z1 = α + φLSB/2.
The second and third stages rotate the output vector of the first stage by a phase γ = Zresidual + β, which is represented with 11 bits. γis then split as the sum of two subwords γ1+ γ2, where
The second rotation is aimed to perform the rotation by the phase γ1. The rotation directions are obtained by the bits of γ1 as follows.
The corresponding CORDIC equations are
And the operation to be performed in the final rotation block can be written as
where [XT2, YT2] is the output vector of the second rotation. The absolute value of γ2is smaller than 2-6. Therefore, sine and cosine functions can be approximated as sinγ2≈γ2and cosγ2≈ 1.
The architecture of hybrid CORDIC rotator is shown in Fig. 10. The elementary stage is composed with adders and shifters. The two final vector merging adders (VMAs) convert the results to two’s complement representation.
MTM is a very effective lookup table compression technique for function evaluation. It has been found ideally suited for high performance synthesizer, requiring both very small ROM size and simple arithmetic circuitry (Caro et al., 2008). The principle of MTM is to decompose Q-bit input signal x in K + 1 non-overlapping sub-words: x0, x1, …, xK with lengths of q0, q1, …, qK respectively, where x = x0 + x1 + … + xK and Q = q0 + q1 + … + qK. The angle [0, π/4] is scaled to a binary fraction in [0, 1]. A piecewise linear approximation of f(x) can be expressed as
The interval of x has been divided in 2q0 subintervals. x0 represents the starting point of each subinterval and x1 + … + xK is the offset in each interval between x and x0.α1 is a sub-word of x0 including its p1 ≤ q0 MSBs. Likewise,αi (i = 2... K) is a sub-word of x0 including itspi ≤ pi - 1. The term A(x0) can be realized with a ROM, which is named as table of initial values (TIV), with 2q0 entries. And the terms B(αi) xi(i = 1…K) can be implemented with K ROMs, which is named as table of offsets (TOi), with 2pi+qi entries each.Making the TOs symmetric, the size of ROMs can be reduced by a factor of two. Then, the equation (29) becomes
where the coefficients can be calculated as follows (Caro et al., 2008).
The architecture of MTM with symmetric TOs is shown in Fig. 11. The content of TOs is conditionally added or subtracted from the content stored in TIV. The addition or subtraction of the content in ROMs and complement operation of the inputs are controlled by the MSB of each subword.
In order to give a fair comparison of the four techniques, they are used to implement CFO compensation block. The parameters of the design are set to make the SFDR of the four techniques nearly the same. The inputs and outputs of the four algorithms are 12 bits. Synthesized with UMC 0.13 μm high speed library at 132 MHz clock frequency, the power, area and latency of the four methods are listed in Table 1. MSE is a statistical value, so it is not easy to set the MSEs of the four approaches exactly the same. But they are very closed. With the smallest MSE, MTM outperforms other algorithms in area, power and latency. Since MTM is proved to be an efficient approach for function evaluation, it can be applied to implement arctangent fucntion in CFO estimation block.
|(1) 4 rep.|
(2) 3 rep.
4. Fine frequency synchronization
Although CFO can be coarsely estimated by frequency synchronizer in time domain, the residual CFO (RCFO), sampling frequency offset (SFO) and common phase error will lead to accumulated phase shift after a certain period and thus degrade the system performance if they are not carefully tracked. In OFDM-based UWB systems, pilot subcarriers can help to solve the residual phase distortion issue in frequency domain, which is also called fine frequency synchronization.
4.1. Effects of sampling frequency offset
The oscillators used to generate the DAC and ADC sampling instants at the transmitter and receiver will never have exactly the same period. Thus, the sampling instants slowly shift relative to each other.The SFO has two main effects: a slow shift of the symbol timing, which rotates subcarriers; and a loss of SNR due to the ICI generated by the slightly incorrect sampling instants, which causes loss of the orthogonality of the subcarriers.
Define the normalized sampling error as Δt = (T’ - T)/T, where T’ and T are the receiver and transmitter sampling periods respectively. Then the overall effect on the received signal in frequency domain is expressed as
where Ts and Tu are the duration of the total symbol and the useful data respectively. Wk,l is additive white Gaussian noise (AWGN)and the last term NΔt(k, l) is the additional interference due to the SFO. The power of the last term is approximated by
Hence the degradation grows as the square of the produce of the offset Δt and the subcarrier indexk. This means that the outermost subcarriers are most severely affected. The degradation can also be expressed directly by SNR loss as (Pollet et al., 1995)
The OFDM-base UWB system does not have a large number of subcarriers and the value of Δt is quite small. So kΔt<< 1, and the interference caused by SFO can usually be ignored.However, the term showing the amount of rotation angle experienced by the different subcarriers will lead to serious problem. Since the rotated angle depends on both the subcarrier index and symbol index, the angle is the largest for the outermost subcarrier and increases with the consecutive symbols. Although Δt is very small, with the increasing of the symbol index, the phase shift will eventually corrupt the demodulation. In this case, tracking SFO is necessary.
4.2. Phase tracking algorithms
Conventionally, SFO can be estimated by computing a slope from the plot of pilot subcarrier differences versus pilot subcarrier indices (Speth et al., 2001). Recently, joint estimation of CFO and SFO has also been studied extensively, such as the linear least squares (LLS) algorithm (Liu & Chong, 2002) and joint weighted least squares (WLS) algorithm (Tsai et al., 2005).
The reveived signal with residual phase distortion in frequency domain after removing the channel noise can be modeled as
where Pk, l is the phase distortion vector and Φk, lis the residual phase error. The relationship of α, βl and Φk, l is shown in Fig. 12. α is the slope of the phase distortion and is contributed by SFO. βl is the intercept of phase distortion and is caused by RCFO of symbol l.
The basic idea of AC is to get the phase differences of pilot subcarriers between two symbols.
The pilot subcarriers are divided into two parts, C1 and C2. C1 is on the left of the spectrum, and C2 is on the right of the spectrum. Then the estimated intercept phase βl and the slope α are written as (Speth et al., 2001)
Such an estimation algorithm that is based on the phase differences between two symbols can remove the common channel fading terms in slow-fade scenarios. Consequently, this estimation scheme can be applied before channel estimation and equalization.
Though the joint LLS estimation algorithm provides accurate estimation results in the AWGN channel, diverse channel responses on the pilot subcarriers can render its estimation useless. For instance, phase of several deeply faded pilot subcarriers, when employ the estimation of the joint LLS, can lead to a large error in the estimation results. On the other hand, the phases of those subcarriers with little fading are naturally more reliable. Therefore, weighting the subcarrier data is advantageous, and data of serious faded subcarriers should be assigned smaller weights to minimize their adverse effect on estimation accuracy. The WLS algorithm for joint estimation of RCFO and SFO can be expressed as (Tsai et al., 2005)
The weight ωi should be inversely proportional to the variance of phase error, which depends on noise, ICI and the complex channel gain. Usually, the residual synchronization error is so small that the ICI term can be neglected and ωi only depends on the channel gain of the pilot subcarriers. The disadvantage is this algorithm is very complicated, especially the computation of the parameter of ωi. Without estimating the ωi accurately, there will be large error in phase tracking.
In traditional phase tracking solutions, arctangent, sine and cosine functions are necessary, which are quite complicated in hardware implementation. The algorithm presented in (Troya et al., 2007) simplifies the hardware cost significantly compared with the traditional approaches. However, it sacrifies system performance slightly. In (Fan & Choy, 2010b), a novel phase tracking method for UWB is proposed. It not only has low complexity, but also improves the performance.
Considering the condition |αk|<<1 is satisfied with k ∈ [-55, 55], the first order approximation can be made as cos(αk) ≈ 1 and sin(αk) ≈ αk. Then the phase distortion in (35) can be rewritten as
In (42), four parameters are of interests: sinβl, cosβl, α sinβl and α cosβl. The former two can be easily obtained by
whereanddenote the real and imaginary part respectively. There are 12 pilots in each symbol of OFDM-based UWB system. Since 1/8 is much easier to implement than 1/12 and the pilots near DC subcarrier suffer more channel noise than the ones far away from DC subcarrier, the pilots outermost should be used as many as possible.
Approximating the scaling factor 1/260 to 1/256, which can be easily implemented by 8-bit right-shifting, the parameters of α sinβl and α cosβl are given by
In the traditional algorithms, although LLS and WLS algorithms have better phase tracking performance than AC, they have very high complexity for practical application. For hardware implementation, AC is in low complexity and moderate phase correction performance. Therefore, the MSE performance of the novel approach for UWB and AC are compared in different phase dostrotion conditions, as shown in Fig. 13. Obviously, the novel phase tracking method for UWB has much better proformance than the traditional AC algorithm. In addition, with the increasing of phase error, the traditional AC algorithm degrades seriously, which is not associated with the novel method.
4.3. Architecture of the phase tracking block
The architecture of phase tracking block with the novel approach for UWB is shown in Fig. 14. The signals after channel equalization are stored in pilots buffer and data buffer separately. Considering that the transmitted pilots are known and have the modulus of one, the phase error vector of the pilots can be derived by multiplying the conjugation of transmitted pilots. As shown in Fig. 14, no arctangent, sine or cosine function appeared, they are replaced by eight complex adders and two complex shifters.
The values of parameters α sinβl and α cosβl are very small, so the phase errors contributed by SFO of four parallel data can be approximately thought the same, rewritten as α⌈k/4⌉sinβl and α⌈k/4⌉cosβl (⌈k/4⌉∈ [-12, 12]). Calculating the parameters of 4α sinβl and 4α cosβl instead of
This chapter provides a compreshensive review of the algorithms and architectures for timing and frequency synchronization. Although there are many literatures on UWB synchronization techniques, most of them do not take the real application or implementation into account. This chapter introduces three parts of the synchronizaiton progress.
In timing synchroniztion, DT detection scheme improves the detection performance significantly due to the cascaded auto-correlator. Although it meanwhile increases the hardware cost slightly, the optimum architecture of the matched filter with low complexity can save the hardware. In coarse frequency synchronization, the CFO estimation approach can be simplified by shortening the SWL and the sum average over three subbands will compensate the SNR degradation. MTM is proved to be a low cost, low power and high speed approach to implement arctangent, sine and cosine functions compared with other function evaluation techniques. In fine frequency synchronization, a novel phase tracking approach for UWB is proposed for good performance. Additionaly, there is not any arctangent, sine or cosine intensive computation unit appeared and they are replaced by adders and shifters, which indicates that the implementation complexity of the novel phase tracking method is low.
The low compxity and power efficent synchronization techniques provide possibilities of developing the robust, low cost, low power and high speed OFDM-based UWB receiver.