Synthesis performance comparison of CFO compensation with four techniques

### Keywords

## 1. Introduction

Synchronization issue is inevitable in all wireless communication receiver systems and it plays the key role to the system performance. Synchronization technique includes timing synchronization and frequency synchronization. Timing synchronization is to detect valid packet and the accurate start position of fast Fourier transform (FFT) window from noise. Frequency synchronization is to correct the phase error caused by the mismatch of local oscillator (LO) between transmitter and receiver.

Synchronization technique has been extensively studied for years. Although UWB system can leverage on successful experiences of orthogonal frequency division multiplexing (OFDM), it cannot use the traditional synchronization technology directly due to the distinct features. In IEEE 802.15.3a standard, the specified emission power spectral density is only -41 dBm/MHz, which is extremely small compared with other wireless systems. It indicates that timing synchronization for UWB system should be robust in high noise environment. In addition, to satisfy 528 Msps throughput, the UWB baseband receiver system should be designed in parallel architecture. The inherent high complexity, the requirements of high performance, high speed, low cost and low power consumption make the design of synchronization blocks for UWB quite a challenge work.

This chapter will be divided into three parts: timing synchronization, coarse frequency synchronization and fine frequency synchronization. The traditional algorithms and innovative methods with low complexity and good performance will be introduced. Architecture design of each part is also provided.

## 2. Timing synchronization

As soon as the receiver starts up, it searches for the presence of OFDM-based UWB packet in the received signals. Usually, packet detection can only acquire the rough timing information by exploiting the repetition in the received signal.The accurate timing information, such as the symbol boundary or the start position of FFT window, is necessary, which relies on matching the received waveform with the preamble waveform by a matched filter.

### 2.1. Effects of timing offset

Assume the channel maximum delay is shorter than the guard interval; the position of FFT window can have several situations, as shown in Fig. 1. The exact start position of FFT window is at the boundary of region B and C. If the start position is in region B, the signals in FFT window will not be contaminated by the previous symbol and thus no inter-symbol interference (ISI) occurs. The only effect is introducing phase shift. After demodulation, the received signal with timing offset in region B is expressed in (1).

where _{k,l}, _{k,l}and _{k,l}are the transmitted signal, channel impulse response (CIR) and the noise signal respectively at the k-th subcarrier and the l-th symbol in frequency domain.Δ* n*is defined as the delayed samples to the correct FFT window position.

When the FFT window leads or lags by a large degree, such as in region A or C, ISI will be introduced and both the magnitude and the phase of the received signal will be distorted, as shown in (2).

where _{ISI} is the introduced ISI noise. Due to the introduced ISI and the phase rotation, there is slight magnitude attenuation in the signal.

### 2.2. Timing synchronization algorithms

Timing synchronization can be divided into two categories: coarse timing synchronization and fine timing synchronization. Coarse timing synchronization is usually based on auto-correlation (AC), while fine timing synchronization is based on cross-correlation (CC). The traditional algorithms of AC, maximum likelihood (ML), minimum mean square error (MMSE) and CC will be introduced.

The AC algorithm (Schmidl & Cox, 1997) for coarse timing synchronization is quite straightforward. It searches for the repetition in the received signal with a correlator and a maximum searcher. Let the repetition interval length be denoted as * L*.

r

_{n}is the received signal in time domain. The timing metric can be defined as

where * is the conjugated operation. The estimated time index of the maximum * M*(

*) can be expressed as*n

If the maximum * M*(

*) is over the threshold, the packet is presented and the estimated timing index is the symbol boundary. The drawback of this scheme is when the correlation window moves away from the repeated period, the power of timing metric*n

*(*M

*) may not fall off as expected, especially in low signal-to-noise ratio (SNR). In this case, there may be a large error in the detected symbol boundary.*n

ML algorithm (Van de Beek et al., 1997; Coulson, 2001) improves the performance of AC. ML function can be expressed as

σ_{s} ^{2}/ σ_{n} ^{2} is SNR. The estimated symbol boundary is derived by searching the maximum output of ML function. The complexity of ML is quite high because the estimation of SNR is difficult and the errors in SNR estimation will make the system less reliable.

MMSE metric (Minn et al., 2003) is equivalent to a special case of the ML metric with ρ = 1. It shows almost the same timing estimation performance as ML. The principle is to search the minimum output of the metric, as shown in (7).

For AC, ML and MMSE algorithms, when the preamble has more than two identical segments, there will be a plateau or a wide basin in the correlator output waveforms. Theoretically, the plateau or basin indicates the ISI-free region for FFT window. However, noise in the received signal may cause the max/min to drift away from the optimal point. So AC, ML and MMSE are the methods to detect packet coarsely and the detection of accurate symbol boundary or FFT window needs fine timing synchronization, such as CC.

CC is the mechanism for fine timing synchronization. Instead of correlating the noisy received waveform with its delayed version, CC is defined as correlating the received signal with preamble waveform (Fort et al., 2003). It can fit into the low SNR situation and can be expressed as

where _{k} is the preamble coefficients and * Q*is the length of preamble.

Dual-threshold (DT) detection scheme is based on the idea in (Fan et al., 2009) for OFDM-based UWB system. Fig. 2 shows the block diagram of the DT detection scheme. The signal detection process is divided into two steps. The first step is based on CC algorithm. Express the peak CC energy of each symbol as

where _{n} is the moving sum of CC value; * c*[

*] is the preamble coefficients;*k

*[*r

*] is the received signal and*k

*is the FFT size. If the peak CC energy*N

T

_{1}is over the first threshold, the estimated sample index of symbol boundary and the moving sum will be stored in FIFO for further use by the following auto-correlator. Otherwise, the peak CC energy of the next symbol will be calculated.

The second step is to read the moving sum from FIFO and auto-correlating with its delayed version. The energy of the cascaded auto-correlator can be derived as

where M is the repeated preamble interval length of UWB system. The delay interval of the auto-correlator is decided by the period of time frequency code (TFC). In order to ensure the moving sum and its delayed version are in the same band no matter what kind of TFC mode is adopted, the delay interval is set to six-symbol length. If the output energy of cascaded auto-correlator _{2} is over the second threshold, the packet is detected. Otherwise, fetch the next value in FIFO and calculate _{2} again.

Fig. 3 depicts the output waveforms of ML and MMSE algorithms at 10 dB SNR. There are plateaus and basins in the output waveforms of ML and MMSE, which make the peak energy ambiguous. It is much easier to find accurate timing information in the output waveform of CC in Fig. 4. However, there are glitches in CC output waveform, which will corrupt the detection of symbol boundary and increase the false alarm probability. The waveform of DT has much lower noise floor compared with CC and there is not any glitch.

### 2.3. Architecture of the matched filter

Matched filter is the basic component in timing synchronization for detecting a known piece of signal in noise. The architecture of mated filter determines the complexity and the powerconsumption of the timing synchronizer. An optimum architecture of the matched filter for OFDM-based UWB is provided, as shown in Fig. 5. To satisfy 528 Msps throughput, the baseband receiver system of UWB is designed at 132 MHz clock frequency with four parallel paths and twelve-level pipelines. For low complexity, both the received signal and the preamble coefficients are truncated to sign-bit. In this case, five-bit multipliers can be replaced with NXOR gates. In addition, the 128 sign-bits of preamble coefficients are generated by spreading a 16 sign-bit sequence with an 8 sign-bits sequence as follows

where _{i} and _{j} are 1 or -1. According to (12), the 128 taps matched filter can be decomposed to 16 taps cascaded with 8 taps, as shown in Fig. 5. With the decomposition, the processing period of the matched filter can be reduced to 19% and the length of the circle shift register can be reduced to 20. In CC operation, if the shift register is full, shift the data from address of [5:20] to [1:16] and save the coming four sign-bits to the address of [17:20]. The data with the addresses of [1:16], [2:17], [3:18] and [4:19] are distributed to four parallel data paths and cross-correlated with the coefficients _{i}. This optimum architecture of the matched filter not only guarantees the high speed, but also reduces the cost of the hardware.

## 3. Coarse frequency synchronization

OFDM-based UWB system is sensitive and vulnerable to carrier frequency offset (CFO), which can be estimated and compensated by coarse frequency synchronization in time domain. Due to the Doppler Effect, even very small CFO will lead to very serious accumulated phase shift after a certain period.

### 3.1. Effects of carrier frequency offset

Define the normalized CFO, _{f}= Δ* f*/

f

_{s}, as the ratio of CFO to subcarrier frequency spacing. The received signal with CFO in frequency domain can be expressed as (Moose, 1994)

where _{k,l}, _{k,l} and _{k,l}stand for the transmitted signal, channel impulse response and noise respectively at * k*-th subcarrier and

*-th symbol.*l

W

_{ICI}is the noise contributed by inter-carrier interference (ICI). ICI will not only destroy the orthogonality of the subcarriers in OFDM-based UWB system, but also degrade SNR. The SNR degradation can be approximated as (Pollet et al., 1995)

where E_{s}/N_{o} is the ratio of symbol energy to noise power spectral density.

### 3.2. Frequency synchronization algorithm

The most straightforward frequency synchronization algorithm is based on AC functions. CFO can be estimated by the phase difference between two symbols. For traditional OFDM system, the CFO can be estimated as

where * N*is the FFT size and

*is the interval of two symbols. If apply traditional AC algorithm in UWB system, the sliding window length (SWL) is 128. The four-parallel architecture with 128 SWL will be in high complexity. Shortening the SWL can reduce the complexity with degradation of the estimation performance. To improve the performance with low complexity, an optimized AC algorithm is provided by shortening the SWL to 64 and making a sum average over three symbols located at three different subbands, as expressed in (16).*M

where * L*denotes the SWL of each symbol. The values of G

_{i}(i = 1,2) depend on TFC. If TFC is {1 2 3 1 2 3} or {1 3 2 1 3 2},

G

_{1}= 3,

G

_{2}= 1; if TFC is {1 1 2 2 3 3} or {1 1 3 3 2 2},

G

_{1}= 1,

G

_{2}= 2.

Although the SWL can be further reduced for lower complexity, the performance degradation requires a much longer period sum average to compensate. Tradeoff in complexity, performance and the processing period, * L*= 64 is the best choice. Fig. 6 shows the MSE performance comparison with different SWL. The normalized CFO is set to 0.01. Due to the sum average over three subbands, the optimized AC algorithm with SWL 64 has better performance than the traditional AC algorithm with SWL 128. The optimized AC algorithm with SWL 32 cannot perform as good as traditional AC algorithm with SWL 128. It needs longer period for sum average to compensate the performance degradation.

For UWB, the CFO compensation algorithm can be optimized as well. The basic idea is to take the CFO values on four-parallel paths as the same if the differences of the four CFO values are very small (Fan & Choy, 2010a). In the specification of UWB, the center frequency is about 4 GHz and the maximum impairment at clock synthesizer is ±20 ppm (parts per million). Therefore, the normalized CFO should be less than 0.04. And the maximum CFO difference between any two parallel samples should be less than 2.5 × 10^{-4}, which is small enough and can be ignored. The optimized CFO compensation scheme can be expressed as

where 4(* m*-1)+

*is the sample index. The optimum CFO compensation strategy not only reduces the four-parallel digital synthesizer to one, but also alleviates the workload of the phase accumulator.*q

### 3.3. Implementation of frequency synchronizer

The design of frequency synchronizer is divided into two parts. The first part is to estimate the phase difference between two preambles by AC and arctangent calculation. The second part is to compensate the signals by multiplying a complex rotation vector. In this part, the phase accumulator and sin/cos generator are involved.

Fig. 7 shows the architecture of CFO compensation block. The phase accumulator produces a digital weep with a slope proportional to the input phase. The phase offset is scaled from [0, 2π] to [0, 8] by multiplying a factor 4/π, so that just the three most significant bits (MSBs) can be used to control the phase offset regions. During CFO compensation, the sine and cosine values of the phase offset in the range of [0,π/4] are necessary to be calculated. If the phase offset is in other ranges, input complement, output complement or output swap are operatedcorrespondingly.

In the design of frequency synchronizer, implementation of arctangent, sine and cosinefunctions is the most critical work since it decides the complexity of the synchronizer and the performance of the UWB receiver system. The traditional OFDM-based or CDMA-based systems usually employed classic coordinate rotation digital computer (CORDIC) algorithm for function evaluation (Tsai & Chiueh, 2007; Troya et al., 2008). Actually, there are other techniques for function evaluation, such as polynomial hyperfolding technique (PHT) (Caro et al., 2004), piecewise-polynomial approximation (PPA) technique (Caro & Steollo, 2005), hybrid CORDIC algorithm (Caro et al., 2009) and multipartite table method (MTM) (Caro et al., 2008).

PHT calculates sine and cosine functions using an optimized polynomial expression with constant coefficients. The sine and cosine functions can be expressed by polynomial expressions of degree K.

where 0 ≤x< 1 is the scaled input of sine and cosine functions. Optimization is conducted on two-order (K = 2) and three-order (K = 3) approximated polynomials, expressed as (19) and (20) respectively (Caro et al., 2004). The two-order PHT can achieve about 60 dBc spurious free dynamic range (SFDR) while the three-order PHT can achieve 80 dBc SFDR.

The technique of PPA is based on the idea of subdividing the interval in shorter subintervals. Polynomials of a given degree are used in each subinterval to approximate the trigonometric functions. The signal x represents the input phase scaled to a binary fraction in the interval of [0, 1], which is subdivided in s subintervals, with s = 2^{u}. The u MSBs of x encode the segment starting point x_{k} and are used as an address to the small lookup tables that store polynomial coefficients. The remaining bits of x represent the offset x–x_{k}. The quadratic PPA of sine and cosine functions can be expressed as (Caro & Steollo, 2005)

Fig. 9 shows the architecture of sine and cosine blocks with PPA.Use r bits and t bits for the first-order and the second-order coefficients quantization respectively. The constant coefficients are (Q– 1) bits. The input and output of the sine and cosine functions are represented by P bits and Q bits. The constant, linear and quadratic coefficients are read from ROMs to conduct polynomial calculation. The partial products are generated by the PPGen block to compute linear terms. And the carry-save addition tree adds the partial products together after aligning all the bits according to their weights.

This approach splits the phase rotation in three steps. The first two steps are CORDIC-based with computing the rotation directions in parallel. The final step is multiplier-based (Caro et al., 2009).

Suppose theword length of input vector [X_{in}, Y_{in}] and output vector [X_{out}, Y_{out}] are 12 and 13 bits respectively. Represent the rotation phase φ∈ [0, π/4] with a binary fractional value in [0, 1] as

The least significant bit (LSB) of φ has a weight that will be indicated in the following as φ_{LSB} = (π/4)2^{-13}. In the first step, the phase is divided in two subwords φ = α + β, where

The goal of the first stage is to perform a rotation by an angle close to α + φ_{LSB}/2. To that purpose, the first rotation uses CORDIC algorithm can be described by the following equations.

where σ_{i}is equal to the sign of Z_{i}. The algorithm starts with X_{1} = X_{in}, Y_{1} = Y_{in} and Z_{1} = α + φ_{LSB}/2.

The second and third stages rotate the output vector of the first stage by a phase γ = Z_{residual} + β, which is represented with 11 bits. γis then split as the sum of two subwords γ_{1}+ γ_{2}, where

The second rotation is aimed to perform the rotation by the phase γ_{1}. The rotation directions are obtained by the bits of γ_{1} as follows.

The corresponding CORDIC equations are

And the operation to be performed in the final rotation block can be written as

where [X_{T2}, Y_{T2}] is the output vector of the second rotation. The absolute value of γ_{2}is smaller than 2^{-6}. Therefore, sine and cosine functions can be approximated as sinγ_{2}≈γ_{2}and cosγ_{2}≈ 1.

The architecture of hybrid CORDIC rotator is shown in Fig. 10. The elementary stage is composed with adders and shifters. The two final vector merging adders (VMAs) convert the results to two’s complement representation.

MTM is a very effective lookup table compression technique for function evaluation. It has been found ideally suited for high performance synthesizer, requiring both very small ROM size and simple arithmetic circuitry (Caro et al., 2008). The principle of MTM is to decompose Q-bit input signal x in K + 1 non-overlapping sub-words: x_{0}, x_{1}, …, x_{K} with lengths of q_{0}, q_{1}, …, q_{K} respectively, where x = x_{0} + x_{1} + … + x_{K} and Q = q_{0} + q_{1} + … + q_{K}. The angle [0, π/4] is scaled to a binary fraction in [0, 1]. A piecewise linear approximation of f(x) can be expressed as

The interval of x has been divided in 2^{q0} subintervals. x_{0} represents the starting point of each subinterval and x_{1} + … + x_{K} is the offset in each interval between x and x_{0}.α_{1} is a sub-word of x_{0} including its p_{1} ≤ q_{0} MSBs. Likewise,α_{i} (i = 2... K) is a sub-word of x_{0} including itsp_{i} ≤ p_{i - 1}. The term A(x_{0}) can be realized with a ROM, which is named as table of initial values (TIV), with 2^{q0} entries. And the terms B(α_{i}) x_{i}(i = 1…K) can be implemented with K ROMs, which is named as table of offsets (TO_{i}), with 2^{pi}+^{qi} entries each.Making the TOs symmetric, the size of ROMs can be reduced by a factor of two. Then, the equation (29) becomes

where the coefficients can be calculated as follows (Caro et al., 2008).

The architecture of MTM with symmetric TOs is shown in Fig. 11. The content of TOs is conditionally added or subtracted from the content stored in TIV. The addition or subtraction of the content in ROMs and complement operation of the inputs are controlled by the MSB of each subword.

In order to give a fair comparison of the four techniques, they are used to implement CFO compensation block. The parameters of the design are set to make the SFDR of the four techniques nearly the same. The inputs and outputs of the four algorithms are 12 bits. Synthesized with UMC 0.13 μm high speed library at 132 MHz clock frequency, the power, area and latency of the four methods are listed in Table 1. MSE is a statistical value, so it is not easy to set the MSEs of the four approaches exactly the same. But they are very closed. With the smallest MSE, MTM outperforms other algorithms in area, power and latency. Since MTM is proved to be an efficient approach for function evaluation, it can be applied to implement arctangent fucntion in CFO estimation block.

Technique | MTM | PPA | PHT | Hybrid CORDIC |

Design parameter | _{0 }= 4 _{1 }= 2_{2 }= 3 _{3 }= 3_{1 }= 3 _{2 }= 3_{3 }=1 | = 64= 6= 7 | = 3 | (1) 4 rep. (2) 3 rep. (3) 8 × 8 |

MSE (×10 ^{-7}) | 2.97 | 4.91 | 7.82 | 5.73 |

Area (mm ^{2}) | 0.018 | 0.027 | 0.031 | 0.146 |

Power (mW) | 0.84 | 0.88 | 1.55 | 13.93 |

Latency (Clock cycs.) | 3 | 3 | 4 | 6 |

## 4. Fine frequency synchronization

Although CFO can be coarsely estimated by frequency synchronizer in time domain, the residual CFO (RCFO), sampling frequency offset (SFO) and common phase error will lead to accumulated phase shift after a certain period and thus degrade the system performance if they are not carefully tracked. In OFDM-based UWB systems, pilot subcarriers can help to solve the residual phase distortion issue in frequency domain, which is also called fine frequency synchronization.

### 4.1. Effects of sampling frequency offset

The oscillators used to generate the DAC and ADC sampling instants at the transmitter and receiver will never have exactly the same period. Thus, the sampling instants slowly shift relative to each other.The SFO has two main effects: a slow shift of the symbol timing, which rotates subcarriers; and a loss of SNR due to the ICI generated by the slightly incorrect sampling instants, which causes loss of the orthogonality of the subcarriers.

Define the normalized sampling error as Δt = (T’ - T)/T, where T’ and T are the receiver and transmitter sampling periods respectively. Then the overall effect on the received signal in frequency domain is expressed as

where T_{s} and T_{u} are the duration of the total symbol and the useful data respectively. W_{k,l} is additive white Gaussian noise (AWGN)and the last term N_{Δt}(k, l) is the additional interference due to the SFO. The power of the last term is approximated by

Hence the degradation grows as the square of the produce of the offset Δt and the subcarrier indexk. This means that the outermost subcarriers are most severely affected. The degradation can also be expressed directly by SNR loss as (Pollet et al., 1995)

The OFDM-base UWB system does not have a large number of subcarriers and the value of Δt is quite small. So kΔt<< 1, and the interference caused by SFO can usually be ignored.However, the term showing the amount of rotation angle experienced by the different subcarriers will lead to serious problem. Since the rotated angle depends on both the subcarrier index and symbol index, the angle is the largest for the outermost subcarrier and increases with the consecutive symbols. Although Δt is very small, with the increasing of the symbol index, the phase shift will eventually corrupt the demodulation. In this case, tracking SFO is necessary.

### 4.2. Phase tracking algorithms

Conventionally, SFO can be estimated by computing a slope from the plot of pilot subcarrier differences versus pilot subcarrier indices (Speth et al., 2001). Recently, joint estimation of CFO and SFO has also been studied extensively, such as the linear least squares (LLS) algorithm (Liu & Chong, 2002) and joint weighted least squares (WLS) algorithm (Tsai et al., 2005).

The reveived signal with residual phase distortion in frequency domain after removing the channel noise can be modeled as

where P_{k, l} is the phase distortion vector and Φ_{k, l}is the residual phase error. The relationship of α, β_{l} and Φ_{k, l} is shown in Fig. 12. α is the slope of the phase distortion and is contributed by SFO. β_{l} is the intercept of phase distortion and is caused by RCFO of symbol l.

The basic idea of AC is to get the phase differences of pilot subcarriers between two symbols.

The pilot subcarriers are divided into two parts, C_{1} and C_{2}. C_{1} is on the left of the spectrum, and C_{2} is on the right of the spectrum. Then the estimated intercept phase β_{l} and the slope α are written as (Speth et al., 2001)

where

Applying LLS estimation to (37) with K pilots in one symbol, each pilot is located at the subcarrier of k_{i}. The RCFO and SFO estimation yield (Liu & Chong, 2002)

where

Such an estimation algorithm that is based on the phase differences between two symbols can remove the common channel fading terms in slow-fade scenarios. Consequently, this estimation scheme can be applied before channel estimation and equalization.

Though the joint LLS estimation algorithm provides accurate estimation results in the AWGN channel, diverse channel responses on the pilot subcarriers can render its estimation useless. For instance, phase of several deeply faded pilot subcarriers, when employ the estimation of the joint LLS, can lead to a large error in the estimation results. On the other hand, the phases of those subcarriers with little fading are naturally more reliable. Therefore, weighting the subcarrier data is advantageous, and data of serious faded subcarriers should be assigned smaller weights to minimize their adverse effect on estimation accuracy. The WLS algorithm for joint estimation of RCFO and SFO can be expressed as (Tsai et al., 2005)

The weight ω_{i} should be inversely proportional to the variance of phase error, which depends on noise, ICI and the complex channel gain. Usually, the residual synchronization error is so small that the ICI term can be neglected and ω_{i} only depends on the channel gain of the pilot subcarriers. The disadvantage is this algorithm is very complicated, especially the computation of the parameter of ω_{i}. Without estimating the ω_{i} accurately, there will be large error in phase tracking.

In traditional phase tracking solutions, arctangent, sine and cosine functions are necessary, which are quite complicated in hardware implementation. The algorithm presented in (Troya et al., 2007) simplifies the hardware cost significantly compared with the traditional approaches. However, it sacrifies system performance slightly. In (Fan & Choy, 2010b), a novel phase tracking method for UWB is proposed. It not only has low complexity, but also improves the performance.

Considering the condition |αk|<<1 is satisfied with k ∈ [-55, 55], the first order approximation can be made as cos(αk) ≈ 1 and sin(αk) ≈ αk. Then the phase distortion in (35) can be rewritten as

In (42), four parameters are of interests: sinβ_{l}, cosβ_{l}, α sinβ_{l} and α cosβ_{l}. The former two can be easily obtained by

where

Approximating the scaling factor 1/260 to 1/256, which can be easily implemented by 8-bit right-shifting, the parameters of α sinβ_{l} and α cosβ_{l} are given by

In the traditional algorithms, although LLS and WLS algorithms have better phase tracking performance than AC, they have very high complexity for practical application. For hardware implementation, AC is in low complexity and moderate phase correction performance. Therefore, the MSE performance of the novel approach for UWB and AC are compared in different phase dostrotion conditions, as shown in Fig. 13. Obviously, the novel phase tracking method for UWB has much better proformance than the traditional AC algorithm. In addition, with the increasing of phase error, the traditional AC algorithm degrades seriously, which is not associated with the novel method.

### 4.3. Architecture of the phase tracking block

The architecture of phase tracking block with the novel approach for UWB is shown in Fig. 14. The signals after channel equalization are stored in pilots buffer and data buffer separately. Considering that the transmitted pilots are known and have the modulus of one, the phase error vector of the pilots can be derived by multiplying the conjugation of transmitted pilots. As shown in Fig. 14, no arctangent, sine or cosine function appeared, they are replaced by eight complex adders and two complex shifters.

The values of parameters α sinβ_{l} and α cosβ_{l} are very small, so the phase errors contributed by SFO of four parallel data can be approximately thought the same, rewritten as α⌈k/4⌉sinβ_{l} and α⌈k/4⌉cosβ_{l} (⌈k/4⌉∈ [-12, 12]). Calculating the parameters of 4α sinβ_{l} and 4α cosβ_{l} instead of _{l} and α cosβ_{l} further simplifies the architecture of phase tracking block.

## 5. Conclusion

This chapter provides a compreshensive review of the algorithms and architectures for timing and frequency synchronization. Although there are many literatures on UWB synchronization techniques, most of them do not take the real application or implementation into account. This chapter introduces three parts of the synchronizaiton progress.

In timing synchroniztion, DT detection scheme improves the detection performance significantly due to the cascaded auto-correlator. Although it meanwhile increases the hardware cost slightly, the optimum architecture of the matched filter with low complexity can save the hardware. In coarse frequency synchronization, the CFO estimation approach can be simplified by shortening the SWL and the sum average over three subbands will compensate the SNR degradation. MTM is proved to be a low cost, low power and high speed approach to implement arctangent, sine and cosine functions compared with other function evaluation techniques. In fine frequency synchronization, a novel phase tracking approach for UWB is proposed for good performance. Additionaly, there is not any arctangent, sine or cosine intensive computation unit appeared and they are replaced by adders and shifters, which indicates that the implementation complexity of the novel phase tracking method is low.

The low compxity and power efficent synchronization techniques provide possibilities of developing the robust, low cost, low power and high speed OFDM-based UWB receiver.