Open access

Synchronization Technique for OFDM-Based UWB System

Written By

Wen Fan and Chiu-Sing Choy

Submitted: October 21st, 2010 Published: July 27th, 2011

DOI: 10.5772/17507

Chapter metrics overview

3,011 Chapter Downloads

View Full Metrics


1. Introduction

Synchronization issue is inevitable in all wireless communication receiver systems and it plays the key role to the system performance. Synchronization technique includes timing synchronization and frequency synchronization. Timing synchronization is to detect valid packet and the accurate start position of fast Fourier transform (FFT) window from noise. Frequency synchronization is to correct the phase error caused by the mismatch of local oscillator (LO) between transmitter and receiver.

Synchronization technique has been extensively studied for years. Although UWB system can leverage on successful experiences of orthogonal frequency division multiplexing (OFDM), it cannot use the traditional synchronization technology directly due to the distinct features. In IEEE 802.15.3a standard, the specified emission power spectral density is only -41 dBm/MHz, which is extremely small compared with other wireless systems. It indicates that timing synchronization for UWB system should be robust in high noise environment. In addition, to satisfy 528 Msps throughput, the UWB baseband receiver system should be designed in parallel architecture. The inherent high complexity, the requirements of high performance, high speed, low cost and low power consumption make the design of synchronization blocks for UWB quite a challenge work.

This chapter will be divided into three parts: timing synchronization, coarse frequency synchronization and fine frequency synchronization. The traditional algorithms and innovative methods with low complexity and good performance will be introduced. Architecture design of each part is also provided.


2. Timing synchronization

As soon as the receiver starts up, it searches for the presence of OFDM-based UWB packet in the received signals. Usually, packet detection can only acquire the rough timing information by exploiting the repetition in the received signal.The accurate timing information, such as the symbol boundary or the start position of FFT window, is necessary, which relies on matching the received waveform with the preamble waveform by a matched filter.

2.1. Effects of timing offset

Assume the channel maximum delay is shorter than the guard interval; the position of FFT window can have several situations, as shown in Fig. 1. The exact start position of FFT window is at the boundary of region B and C. If the start position is in region B, the signals in FFT window will not be contaminated by the previous symbol and thus no inter-symbol interference (ISI) occurs. The only effect is introducing phase shift. After demodulation, the received signal with timing offset in region B is expressed in (1).

R k , l = S k , l H k , l e j 2 π Δ n / N + W k , l E1

where S k,l, H k,land W k,lare the transmitted signal, channel impulse response (CIR) and the noise signal respectively at the k-th subcarrier and the l-th symbol in frequency domain.Δn is defined as the delayed samples to the correct FFT window position.

Figure 1.

The scenario of timing offset

When the FFT window leads or lags by a large degree, such as in region A or C, ISI will be introduced and both the magnitude and the phase of the received signal will be distorted, as shown in (2).

R k , l = S k , l H k , l e j 2 π Δ n / N N Δ n N + W k , l + W I S I E2

where W ISI is the introduced ISI noise. Due to the introduced ISI and the phase rotation, there is slight magnitude attenuation in the signal.

2.2. Timing synchronization algorithms

Timing synchronization can be divided into two categories: coarse timing synchronization and fine timing synchronization. Coarse timing synchronization is usually based on auto-correlation (AC), while fine timing synchronization is based on cross-correlation (CC). The traditional algorithms of AC, maximum likelihood (ML), minimum mean square error (MMSE) and CC will be introduced.


The AC algorithm (Schmidl & Cox, 1997) for coarse timing synchronization is quite straightforward. It searches for the repetition in the received signal with a correlator and a maximum searcher. Let the repetition interval length be denoted as L. r n is the received signal in time domain. The timing metric can be defined as

M ( n ) = | k = 0 L 1 r n + k * r n + k + L | 2 ( k = 0 L 1 | r n + k + L | 2 ) 2 E3

where * is the conjugated operation. The estimated time index of the maximum M(n) can be expressed as

n ^ = arg max n M ( n ) E4

If the maximum M(n) is over the threshold, the packet is presented and the estimated timing index is the symbol boundary. The drawback of this scheme is when the correlation window moves away from the repeated period, the power of timing metric M(n) may not fall off as expected, especially in low signal-to-noise ratio (SNR). In this case, there may be a large error in the detected symbol boundary.

Maximum likelihood

ML algorithm (Van de Beek et al., 1997; Coulson, 2001) improves the performance of AC. ML function can be expressed as

M ( n ) = 2 | k = 0 L 1 r n + k * r n + k + L | ρ k = 0 L 1 ( | r n + k | 2 + | r n + k + L | 2 ) E5
ρ | E { r n + k * r n + k + L } E { | r n + k | 2 } E { | r n + k + L | 2 } | = σ s 2 σ s 2 + σ n 2 = S N R S N R + 1 E6

σs 2/ σn 2 is SNR. The estimated symbol boundary is derived by searching the maximum output of ML function. The complexity of ML is quite high because the estimation of SNR is difficult and the errors in SNR estimation will make the system less reliable.

Minimum mean square error

MMSE metric (Minn et al., 2003) is equivalent to a special case of the ML metric with ρ = 1. It shows almost the same timing estimation performance as ML. The principle is to search the minimum output of the metric, as shown in (7).

M ( n ) = k = 0 L 1 | r n + k | 2 + k = 0 L 1 | r n + k + L | 2 2 | k = 0 L 1 r n + k * r n + k + L | E7

For AC, ML and MMSE algorithms, when the preamble has more than two identical segments, there will be a plateau or a wide basin in the correlator output waveforms. Theoretically, the plateau or basin indicates the ISI-free region for FFT window. However, noise in the received signal may cause the max/min to drift away from the optimal point. So AC, ML and MMSE are the methods to detect packet coarsely and the detection of accurate symbol boundary or FFT window needs fine timing synchronization, such as CC.


CC is the mechanism for fine timing synchronization. Instead of correlating the noisy received waveform with its delayed version, CC is defined as correlating the received signal with preamble waveform (Fort et al., 2003). It can fit into the low SNR situation and can be expressed as

M ( n ) = k = 0 Q 1 r n + k c k * E8

where c k is the preamble coefficients and Q is the length of preamble.

Dual-threshold detection

Dual-threshold (DT) detection scheme is based on the idea in (Fan et al., 2009) for OFDM-based UWB system. Fig. 2 shows the block diagram of the DT detection scheme. The signal detection process is divided into two steps. The first step is based on CC algorithm. Express the peak CC energy of each symbol as

T 1 = max n | A n | 2 = max n | k = 0 N 1 r [ n + k ] c * [ k ] | 2 E9
k = 0 N 1 c 2 [ k ] = N E10

where A n is the moving sum of CC value; c[k] is the preamble coefficients; r[k] is the received signal and N is the FFT size. If the peak CC energy T 1is over the first threshold, the estimated sample index of symbol boundary and the moving sum will be stored in FIFO for further use by the following auto-correlator. Otherwise, the peak CC energy of the next symbol will be calculated.

Figure 2.

Block diagram of DT detection scheme

The second step is to read the moving sum from FIFO and auto-correlating with its delayed version. The energy of the cascaded auto-correlator can be derived as

T 2 = | A n ^ A n ^ + 6 M * | 2 E11

where M is the repeated preamble interval length of UWB system. The delay interval of the auto-correlator is decided by the period of time frequency code (TFC). In order to ensure the moving sum and its delayed version are in the same band no matter what kind of TFC mode is adopted, the delay interval is set to six-symbol length. If the output energy of cascaded auto-correlator T 2 is over the second threshold, the packet is detected. Otherwise, fetch the next value in FIFO and calculate T 2 again.

Figure 3.

Output waveforms of ML and MMSE algorithms at 10 dB SNR

Figure 4.

Output waveforms of CC and DT algorithms at 10 dB SNR

Fig. 3 depicts the output waveforms of ML and MMSE algorithms at 10 dB SNR. There are plateaus and basins in the output waveforms of ML and MMSE, which make the peak energy ambiguous. It is much easier to find accurate timing information in the output waveform of CC in Fig. 4. However, there are glitches in CC output waveform, which will corrupt the detection of symbol boundary and increase the false alarm probability. The waveform of DT has much lower noise floor compared with CC and there is not any glitch.

2.3. Architecture of the matched filter

Matched filter is the basic component in timing synchronization for detecting a known piece of signal in noise. The architecture of mated filter determines the complexity and the powerconsumption of the timing synchronizer. An optimum architecture of the matched filter for OFDM-based UWB is provided, as shown in Fig. 5. To satisfy 528 Msps throughput, the baseband receiver system of UWB is designed at 132 MHz clock frequency with four parallel paths and twelve-level pipelines. For low complexity, both the received signal and the preamble coefficients are truncated to sign-bit. In this case, five-bit multipliers can be replaced with NXOR gates. In addition, the 128 sign-bits of preamble coefficients are generated by spreading a 16 sign-bit sequence with an 8 sign-bits sequence as follows

s g n ( c 16 ( j 1 ) + i ) = a i × b j i = 1,2,...,16 j = 1,2,...,8 E12

where a i and b j are 1 or -1. According to (12), the 128 taps matched filter can be decomposed to 16 taps cascaded with 8 taps, as shown in Fig. 5. With the decomposition, the processing period of the matched filter can be reduced to 19% and the length of the circle shift register can be reduced to 20. In CC operation, if the shift register is full, shift the data from address of [5:20] to [1:16] and save the coming four sign-bits to the address of [17:20]. The data with the addresses of [1:16], [2:17], [3:18] and [4:19] are distributed to four parallel data paths and cross-correlated with the coefficients a i. This optimum architecture of the matched filter not only guarantees the high speed, but also reduces the cost of the hardware.

Figure 5.

Architecture of the matched filter for UWB


3. Coarse frequency synchronization

OFDM-based UWB system is sensitive and vulnerable to carrier frequency offset (CFO), which can be estimated and compensated by coarse frequency synchronization in time domain. Due to the Doppler Effect, even very small CFO will lead to very serious accumulated phase shift after a certain period.

3.1. Effects of carrier frequency offset

Define the normalized CFO, ε f= Δf/f s, as the ratio of CFO to subcarrier frequency spacing. The received signal with CFO in frequency domain can be expressed as (Moose, 1994)

R k , l = S k , l H k , l sin ( π ε f ) N sin ( π ε f / N ) e j 2 π ε f ( N 1 ) / N + W k , l + W I C I E13

where S k,l, H k,l and W k,lstand for the transmitted signal, channel impulse response and noise respectively at k-th subcarrier and l-th symbol. W ICI is the noise contributed by inter-carrier interference (ICI). ICI will not only destroy the orthogonality of the subcarriers in OFDM-based UWB system, but also degrade SNR. The SNR degradation can be approximated as (Pollet et al., 1995)

D S N R 10 3 ln 10 ( π ε f ) 2 E s N o E14

where Es/No is the ratio of symbol energy to noise power spectral density.

3.2. Frequency synchronization algorithm

The most straightforward frequency synchronization algorithm is based on AC functions. CFO can be estimated by the phase difference between two symbols. For traditional OFDM system, the CFO can be estimated as

ε ^ f = N 2 π M t a n 1 ( k = 0 N 1 r n + k * r n + k + M ) E15

where N is the FFT size and M is the interval of two symbols. If apply traditional AC algorithm in UWB system, the sliding window length (SWL) is 128. The four-parallel architecture with 128 SWL will be in high complexity. Shortening the SWL can reduce the complexity with degradation of the estimation performance. To improve the performance with low complexity, an optimized AC algorithm is provided by shortening the SWL to 64 and making a sum average over three symbols located at three different subbands, as expressed in (16).

ε ^ f = N 2 π G 1 M t a n 1 ( k = 1 L r [ N L k + G 1 M ] r * [ N L k ] + k = 1 L r [ N L k + ( G 1 + G 2 ) M ] r * [ N L k + G 2 M ] + k = 1 L r [ N L k + ( G 1 + 2 G 2 ) M ] r * [ N L k + 2 G 2 M ] ) E16

where L denotes the SWL of each symbol. The values of Gi (i = 1,2) depend on TFC. If TFC is {1 2 3 1 2 3} or {1 3 2 1 3 2}, G 1 = 3, G 2 = 1; if TFC is {1 1 2 2 3 3} or {1 1 3 3 2 2}, G 1 = 1, G 2 = 2.

Although the SWL can be further reduced for lower complexity, the performance degradation requires a much longer period sum average to compensate. Tradeoff in complexity, performance and the processing period, L = 64 is the best choice. Fig. 6 shows the MSE performance comparison with different SWL. The normalized CFO is set to 0.01. Due to the sum average over three subbands, the optimized AC algorithm with SWL 64 has better performance than the traditional AC algorithm with SWL 128. The optimized AC algorithm with SWL 32 cannot perform as good as traditional AC algorithm with SWL 128. It needs longer period for sum average to compensate the performance degradation.

For UWB, the CFO compensation algorithm can be optimized as well. The basic idea is to take the CFO values on four-parallel paths as the same if the differences of the four CFO values are very small (Fan & Choy, 2010a). In the specification of UWB, the center frequency is about 4 GHz and the maximum impairment at clock synthesizer is ±20 ppm (parts per million). Therefore, the normalized CFO should be less than 0.04. And the maximum CFO difference between any two parallel samples should be less than 2.5 × 10-4, which is small enough and can be ignored. The optimized CFO compensation scheme can be expressed as

r ˜ [ 4 ( m 1 ) + q ] = r [ 4 ( m 1 ) + q ] e x p ( j 2 π 4 m ε ^ f / M ) m = 1,2,..., [ M / 4 ] , q = 1,2,3,4 E17

where 4(m-1)+q is the sample index. The optimum CFO compensation strategy not only reduces the four-parallel digital synthesizer to one, but also alleviates the workload of the phase accumulator.

Figure 6.

MSE performance comparison with different SWL

3.3. Implementation of frequency synchronizer

The design of frequency synchronizer is divided into two parts. The first part is to estimate the phase difference between two preambles by AC and arctangent calculation. The second part is to compensate the signals by multiplying a complex rotation vector. In this part, the phase accumulator and sin/cos generator are involved.

Fig. 7 shows the architecture of CFO compensation block. The phase accumulator produces a digital weep with a slope proportional to the input phase. The phase offset is scaled from [0, 2π] to [0, 8] by multiplying a factor 4/π, so that just the three most significant bits (MSBs) can be used to control the phase offset regions. During CFO compensation, the sine and cosine values of the phase offset in the range of [0,π/4] are necessary to be calculated. If the phase offset is in other ranges, input complement, output complement or output swap are operatedcorrespondingly.

In the design of frequency synchronizer, implementation of arctangent, sine and cosinefunctions is the most critical work since it decides the complexity of the synchronizer and the performance of the UWB receiver system. The traditional OFDM-based or CDMA-based systems usually employed classic coordinate rotation digital computer (CORDIC) algorithm for function evaluation (Tsai & Chiueh, 2007; Troya et al., 2008). Actually, there are other techniques for function evaluation, such as polynomial hyperfolding technique (PHT) (Caro et al., 2004), piecewise-polynomial approximation (PPA) technique (Caro & Steollo, 2005), hybrid CORDIC algorithm (Caro et al., 2009) and multipartite table method (MTM) (Caro et al., 2008).

Figure 7.

Architecture of the CFO compensation block

Polynomial hyperfolding technique

PHT calculates sine and cosine functions using an optimized polynomial expression with constant coefficients. The sine and cosine functions can be expressed by polynomial expressions of degree K.

S ( x ) = s i n ( π 4 x + L S B 2 ) a K x K + a K 1 x K 1 + + a 0 C ( x ) = c o s ( π 4 x + L S B 2 ) b K x K + b K 1 x K 1 + + b 0 E18

where 0 ≤x< 1 is the scaled input of sine and cosine functions. Optimization is conducted on two-order (K = 2) and three-order (K = 3) approximated polynomials, expressed as (19) and (20) respectively (Caro et al., 2004). The two-order PHT can achieve about 60 dBc spurious free dynamic range (SFDR) while the three-order PHT can achieve 80 dBc SFDR.

S ( x ) 0.004713 + 0.838015 x 2 3 x 2 C ( x ) 0.9995593 0.011408 x + ( 2 2 2 5 ) x 2 E19
S ( x ) 0.00015005 + 0.77436217 x 0.00530040 x 2 + ( 2 2 + 2 5 ) x 3 / 3 C ( x ) 0.98423596 + 0.00452969 x 0.32417224 x 2 + ( 2 3 2 5 ) x 3 / 3 E20

Piecewise polynomial approximation

The technique of PPA is based on the idea of subdividing the interval in shorter subintervals. Polynomials of a given degree are used in each subinterval to approximate the trigonometric functions. The signal x represents the input phase scaled to a binary fraction in the interval of [0, 1], which is subdivided in s subintervals, with s = 2u. The u MSBs of x encode the segment starting point xk and are used as an address to the small lookup tables that store polynomial coefficients. The remaining bits of x represent the offset x–xk. The quadratic PPA of sine and cosine functions can be expressed as (Caro & Steollo, 2005)

f s ( x ) = y s k + m s k ( x x k ) p s k ( x x k ) 2 f c ( x ) = y c k m c k ( x x k ) p c k ( x x k ) 2 x k x x k + 1 ; k = 1,2,..., s ; x 1 = 0 ; x s + 1 = 1 E21

Fig. 9 shows the architecture of sine and cosine blocks with PPA.Use r bits and t bits for the first-order and the second-order coefficients quantization respectively. The constant coefficients are (Q– 1) bits. The input and output of the sine and cosine functions are represented by P bits and Q bits. The constant, linear and quadratic coefficients are read from ROMs to conduct polynomial calculation. The partial products are generated by the PPGen block to compute linear terms. And the carry-save addition tree adds the partial products together after aligning all the bits according to their weights.

Figure 8.

Architecture of sine and cosine blocks with PPA (Caro & Steollo, 2005)

Hybrid coordinate rotation digital computer

This approach splits the phase rotation in three steps. The first two steps are CORDIC-based with computing the rotation directions in parallel. The final step is multiplier-based (Caro et al., 2009).

Suppose theword length of input vector [Xin, Yin] and output vector [Xout, Yout] are 12 and 13 bits respectively. Represent the rotation phase φ∈ [0, π/4] with a binary fractional value in [0, 1] as

4 π φ = f 1 2 1 + f 2 2 2 + + f 13 2 13 E22

The least significant bit (LSB) of φ has a weight that will be indicated in the following as φLSB = (π/4)2-13. In the first step, the phase is divided in two subwords φ = α + β, where

α = ( f 1 2 1 + ... + f 3 2 3 + 2 4 ) π 4 β = ( f 4 ¯ 2 4 + f 5 2 5 + ... + f 13 2 13 ) π 4 E23

The goal of the first stage is to perform a rotation by an angle close to α + φLSB/2. To that purpose, the first rotation uses CORDIC algorithm can be described by the following equations.

{ X i + 1 = X i σ i 2 i Y i Y i + 1 = Y i + σ i 2 i X i i = 1,...,4 Z i + 1 = Z i σ i t a n 1 2 i E24

where σiis equal to the sign of Zi. The algorithm starts with X1 = Xin, Y1 = Yin and Z1 = α + φLSB/2.

The second and third stages rotate the output vector of the first stage by a phase γ = Zresidual + β, which is represented with 11 bits. γis then split as the sum of two subwords γ1+ γ2, where

γ 1 = 2 3 ( g 0 + g 1 2 1 + g 2 2 2 + 2 3 ) γ 2 = 2 3 ( g 3 ¯ 2 3 + g 4 2 4 + ... + g 10 2 10 ) E25

The second rotation is aimed to perform the rotation by the phase γ1. The rotation directions are obtained by the bits of γ1 as follows.

τ 0 = 2 g 0 ¯ 1 τ i = 2 g i 1 i = 1,2 E26

The corresponding CORDIC equations are

{ X i + 1 ' = X i ' τ i 2 ( i + 4 ) Y i ' Y i + 1 ' = Y i ' + τ i 2 ( i + 4 ) X i ' i = 0,1,2 E27

And the operation to be performed in the final rotation block can be written as

{ X o u t = X T 2 cos γ 2 Y T 2 sin γ 2 Y o u t = X T 2 sin γ 2 + Y T 2 cos γ 2 E28

where [XT2, YT2] is the output vector of the second rotation. The absolute value of γ2is smaller than 2-6. Therefore, sine and cosine functions can be approximated as sinγ2≈γ2and cosγ2≈ 1.

The architecture of hybrid CORDIC rotator is shown in Fig. 10. The elementary stage is composed with adders and shifters. The two final vector merging adders (VMAs) convert the results to two’s complement representation.

Figure 9.

Architecture of hybrid CORDIC technique (Caro et al., 2009)

Multipartite table method

MTM is a very effective lookup table compression technique for function evaluation. It has been found ideally suited for high performance synthesizer, requiring both very small ROM size and simple arithmetic circuitry (Caro et al., 2008). The principle of MTM is to decompose Q-bit input signal x in K + 1 non-overlapping sub-words: x0, x1, …, xK with lengths of q0, q1, …, qK respectively, where x = x0 + x1 + … + xK and Q = q0 + q1 + … + qK. The angle [0, π/4] is scaled to a binary fraction in [0, 1]. A piecewise linear approximation of f(x) can be expressed as

f ( x ) = f ( x 0 + x 1 + ... + x K ) A ( x 0 ) + B ( x 0 ) ( x 1 + ... + x K ) = A ( x 0 ) + B ( x 0 ) x 1 + ... + B ( x 0 ) x K A ( x 0 ) + B 1 ( α 1 ) x 1 + ... + B K ( α K ) x K E29

The interval of x has been divided in 2q0 subintervals. x0 represents the starting point of each subinterval and x1 + … + xK is the offset in each interval between x and x01 is a sub-word of x0 including its p1 ≤ q0 MSBs. Likewise,αi (i = 2... K) is a sub-word of x0 including itspi ≤ pi - 1. The term A(x0) can be realized with a ROM, which is named as table of initial values (TIV), with 2q0 entries. And the terms B(αi) xi(i = 1…K) can be implemented with K ROMs, which is named as table of offsets (TOi), with 2pi+qi entries each.Making the TOs symmetric, the size of ROMs can be reduced by a factor of two. Then, the equation (29) becomes

f ( x ) A ˜ ( x 0 ) + B 1 ( α 1 ) ( x 1 δ 1 2 ) + + B K ( α K ) ( x K δ K 2 ) E30

where the coefficients can be calculated as follows (Caro et al., 2008).

A ˜ ( x 0 ) = f ( x 0 ) + f ( x 0 + Δ 0 ) 2 B i ( α i ) = f ( α i + δ i ) f ( α i ) + f ( α i + δ i + σ i ) f ( α i + σ i ) 2 δ i TO i ( α i , x i ) = B i ( α i ) ( x i + 2 s i 1 ) δ i = ( 2 q i 1 ) 2 s i ; s i = j = 0 i q j ; σ i = 2 p i 2 q i s i ; Δ 0 = j = 1 K δ j = 2 q 0 2 Q E31

The architecture of MTM with symmetric TOs is shown in Fig. 11. The content of TOs is conditionally added or subtracted from the content stored in TIV. The addition or subtraction of the content in ROMs and complement operation of the inputs are controlled by the MSB of each subword.

Figure 10.

Architecture of MTM with symmetric TOs

In order to give a fair comparison of the four techniques, they are used to implement CFO compensation block. The parameters of the design are set to make the SFDR of the four techniques nearly the same. The inputs and outputs of the four algorithms are 12 bits. Synthesized with UMC 0.13 μm high speed library at 132 MHz clock frequency, the power, area and latency of the four methods are listed in Table 1. MSE is a statistical value, so it is not easy to set the MSEs of the four approaches exactly the same. But they are very closed. With the smallest MSE, MTM outperforms other algorithms in area, power and latency. Since MTM is proved to be an efficient approach for function evaluation, it can be applied to implement arctangent fucntion in CFO estimation block.

Technique MTM PPA PHT Hybrid
q 0 = 4 q 1 = 2
q 2 = 3 q 3 = 3
p 1 = 3 p 2 = 3
p 3 =1
s = 64
r = 6
t = 7
K = 3 (1) 4 rep.
(2) 3 rep.
(3) 8b × 8b
2.97 4.91 7.82 5.73
0.018 0.027 0.031 0.146
0.84 0.88 1.55 13.93
(Clock cycs.)
3 3 4 6

Table 1.

Synthesis performance comparison of CFO compensation with four techniques


4. Fine frequency synchronization

Although CFO can be coarsely estimated by frequency synchronizer in time domain, the residual CFO (RCFO), sampling frequency offset (SFO) and common phase error will lead to accumulated phase shift after a certain period and thus degrade the system performance if they are not carefully tracked. In OFDM-based UWB systems, pilot subcarriers can help to solve the residual phase distortion issue in frequency domain, which is also called fine frequency synchronization.

4.1. Effects of sampling frequency offset

The oscillators used to generate the DAC and ADC sampling instants at the transmitter and receiver will never have exactly the same period. Thus, the sampling instants slowly shift relative to each other.The SFO has two main effects: a slow shift of the symbol timing, which rotates subcarriers; and a loss of SNR due to the ICI generated by the slightly incorrect sampling instants, which causes loss of the orthogonality of the subcarriers.

Define the normalized sampling error as Δt = (T’ - T)/T, where T’ and T are the receiver and transmitter sampling periods respectively. Then the overall effect on the received signal in frequency domain is expressed as

R k , l = S k , l H k , l e j 2 π k Δ t l T s / T u s i n c ( π k Δ t ) + W k , l + N Δ t ( k , l ) E32

where Ts and Tu are the duration of the total symbol and the useful data respectively. Wk,l is additive white Gaussian noise (AWGN)and the last term NΔt(k, l) is the additional interference due to the SFO. The power of the last term is approximated by

P Δ t π 2 3 ( k Δ t ) 2 E33

Hence the degradation grows as the square of the produce of the offset Δt and the subcarrier indexk. This means that the outermost subcarriers are most severely affected. The degradation can also be expressed directly by SNR loss as (Pollet et al., 1995)

D n 10 l o g 10 ( 1 + π 2 3 E s N 0 ( k Δ t ) 2 ) ( d B ) E34

The OFDM-base UWB system does not have a large number of subcarriers and the value of Δt is quite small. So kΔt<< 1, and the interference caused by SFO can usually be ignored.However, the term showing the amount of rotation angle experienced by the different subcarriers will lead to serious problem. Since the rotated angle depends on both the subcarrier index and symbol index, the angle is the largest for the outermost subcarrier and increases with the consecutive symbols. Although Δt is very small, with the increasing of the symbol index, the phase shift will eventually corrupt the demodulation. In this case, tracking SFO is necessary.

4.2. Phase tracking algorithms

Conventionally, SFO can be estimated by computing a slope from the plot of pilot subcarrier differences versus pilot subcarrier indices (Speth et al., 2001). Recently, joint estimation of CFO and SFO has also been studied extensively, such as the linear least squares (LLS) algorithm (Liu & Chong, 2002) and joint weighted least squares (WLS) algorithm (Tsai et al., 2005).


The reveived signal with residual phase distortion in frequency domain after removing the channel noise can be modeled as

Z k , l = S k , l P k , l = S k , l e x p ( j Φ k , l ) = S k , l e x p ( j ( α k + β l ) ) E35

where Pk, l is the phase distortion vector and Φk, lis the residual phase error. The relationship of α, βl and Φk, l is shown in Fig. 12. α is the slope of the phase distortion and is contributed by SFO. βl is the intercept of phase distortion and is caused by RCFO of symbol l.

The basic idea of AC is to get the phase differences of pilot subcarriers between two symbols.

Figure 11.

The relationship of phase distortion and subcarriers

The pilot subcarriers are divided into two parts, C1 and C2. C1 is on the left of the spectrum, and C2 is on the right of the spectrum. Then the estimated intercept phase βl and the slope α are written as (Speth et al., 2001)

β ^ l = 1 2 ( Φ k , l k C 1 + Φ k , l + k C 2 ) α ^ = Φ k , l + k C 2 Φ k , l k C 1 k C 2 k k C 1 k E36


Φ k , l = t a n 1 k C 1 Z k , l 1 Z k , l * Φ k , l + = t a n 1 k C 2 Z k , l Z k , l 1 * E37

Linear least squares

Applying LLS estimation to (37) with K pilots in one symbol, each pilot is located at the subcarrier of ki. The RCFO and SFO estimation yield (Liu & Chong, 2002)

Δ θ ^ f = i = 1 K Φ k i , l 2 π M K N δ ^ = i = 1 K Φ k i , l k i 2 π M K N i = 1 K k i 2 E38


Φ k i , l = t a n 1 i = 1 K Z k i , l Z k i , l 1 * E39

Such an estimation algorithm that is based on the phase differences between two symbols can remove the common channel fading terms in slow-fade scenarios. Consequently, this estimation scheme can be applied before channel estimation and equalization.

Weighted least squares

Though the joint LLS estimation algorithm provides accurate estimation results in the AWGN channel, diverse channel responses on the pilot subcarriers can render its estimation useless. For instance, phase of several deeply faded pilot subcarriers, when employ the estimation of the joint LLS, can lead to a large error in the estimation results. On the other hand, the phases of those subcarriers with little fading are naturally more reliable. Therefore, weighting the subcarrier data is advantageous, and data of serious faded subcarriers should be assigned smaller weights to minimize their adverse effect on estimation accuracy. The WLS algorithm for joint estimation of RCFO and SFO can be expressed as (Tsai et al., 2005)

Δ θ ^ f = i = 0 K ω i k i 2 i = 0 K ω i Φ k i , l i = 0 K ω i k i i = 0 K ω i Φ k i , l k i 2 π M K N [ i = 0 K ω i i = 0 K ω i k i 2 ( i = 0 K ω i k i ) 2 ] E40
δ ^ = i = 0 K ω i i = 0 K ω i Φ k i , l k i i = 0 K ω i k i i = 0 K ω i Φ k i , l 2 π M K N [ i = 0 K ω i i = 0 K ω i k i 2 ( i = 0 K ω i k i ) 2 ] E41

The weight ωi should be inversely proportional to the variance of phase error, which depends on noise, ICI and the complex channel gain. Usually, the residual synchronization error is so small that the ICI term can be neglected and ωi only depends on the channel gain of the pilot subcarriers. The disadvantage is this algorithm is very complicated, especially the computation of the parameter of ωi. Without estimating the ωi accurately, there will be large error in phase tracking.

Novel approach for UWB

In traditional phase tracking solutions, arctangent, sine and cosine functions are necessary, which are quite complicated in hardware implementation. The algorithm presented in (Troya et al., 2007) simplifies the hardware cost significantly compared with the traditional approaches. However, it sacrifies system performance slightly. In (Fan & Choy, 2010b), a novel phase tracking method for UWB is proposed. It not only has low complexity, but also improves the performance.

Considering the condition |αk|<<1 is satisfied with k ∈ [-55, 55], the first order approximation can be made as cos(αk) ≈ 1 and sin(αk) ≈ αk. Then the phase distortion in (35) can be rewritten as

P k , l = c o s β l α k s i n β l + j ( s i n β l + α k c o s β l ) E42

In (42), four parameters are of interests: sinβl, cosβl, α sinβl and α cosβl. The former two can be easily obtained by

{ c o s β l = 1 8 k = ± 25, ± 35, ± 45, ± 55 { P k , l } s i n β l = 1 8 k = ± 25, ± 35, ± 45, ± 55 { P k , l } E43

where ( ) and ( ) denote the real and imaginary part respectively. There are 12 pilots in each symbol of OFDM-based UWB system. Since 1/8 is much easier to implement than 1/12 and the pilots near DC subcarrier suffer more channel noise than the ones far away from DC subcarrier, the pilots outermost should be used as many as possible.

Approximating the scaling factor 1/260 to 1/256, which can be easily implemented by 8-bit right-shifting, the parameters of α sinβl and α cosβl are given by

{ α s i n β l 1 256 ( k = 55, 35, 25, 15 { P k , l } k = 15,25,35,55 { P k , l } ) α c o s β l 1 256 ( k = 55,35,25,15 { P k , l } k = 15, 25, 35, 55 { P k , l } ) E44

In the traditional algorithms, although LLS and WLS algorithms have better phase tracking performance than AC, they have very high complexity for practical application. For hardware implementation, AC is in low complexity and moderate phase correction performance. Therefore, the MSE performance of the novel approach for UWB and AC are compared in different phase dostrotion conditions, as shown in Fig. 13. Obviously, the novel phase tracking method for UWB has much better proformance than the traditional AC algorithm. In addition, with the increasing of phase error, the traditional AC algorithm degrades seriously, which is not associated with the novel method.

Figure 12.

MSE performance comparison between traditional AC algorithm and the novel approach for UWB

4.3. Architecture of the phase tracking block

The architecture of phase tracking block with the novel approach for UWB is shown in Fig. 14. The signals after channel equalization are stored in pilots buffer and data buffer separately. Considering that the transmitted pilots are known and have the modulus of one, the phase error vector of the pilots can be derived by multiplying the conjugation of transmitted pilots. As shown in Fig. 14, no arctangent, sine or cosine function appeared, they are replaced by eight complex adders and two complex shifters.

Figure 13.

Highly simplified architecture of the phase tracking block

The values of parameters α sinβl and α cosβl are very small, so the phase errors contributed by SFO of four parallel data can be approximately thought the same, rewritten as α⌈k/4⌉sinβl and α⌈k/4⌉cosβl (⌈k/4⌉∈ [-12, 12]). Calculating the parameters of 4α sinβl and 4α cosβl instead of α sinβ l and α cosβl further simplifies the architecture of phase tracking block.


5. Conclusion

This chapter provides a compreshensive review of the algorithms and architectures for timing and frequency synchronization. Although there are many literatures on UWB synchronization techniques, most of them do not take the real application or implementation into account. This chapter introduces three parts of the synchronizaiton progress.

In timing synchroniztion, DT detection scheme improves the detection performance significantly due to the cascaded auto-correlator. Although it meanwhile increases the hardware cost slightly, the optimum architecture of the matched filter with low complexity can save the hardware. In coarse frequency synchronization, the CFO estimation approach can be simplified by shortening the SWL and the sum average over three subbands will compensate the SNR degradation. MTM is proved to be a low cost, low power and high speed approach to implement arctangent, sine and cosine functions compared with other function evaluation techniques. In fine frequency synchronization, a novel phase tracking approach for UWB is proposed for good performance. Additionaly, there is not any arctangent, sine or cosine intensive computation unit appeared and they are replaced by adders and shifters, which indicates that the implementation complexity of the novel phase tracking method is low.

The low compxity and power efficent synchronization techniques provide possibilities of developing the robust, low cost, low power and high speed OFDM-based UWB receiver.


  1. 1. Caro D. D. Napoli E. Steollo A. G. M. 2004 Direct digital frequency synthesizer with polynomial hyperfolding technique. IEEE Transactions on Circuits and Systems II, Express Briefs, 51 7 Jul. 2004, 337 344 , 1549-7747
  2. 2. Caro D. D. Steollo A. G. M. 2005 High-performance direct digital frequency synthesizer using piecewise-polynomial approximation. IEEE Transactions on Circuits and Systems I, Regular Papers, 52 2 Feb. 2005, 324 337 , 1549-8328
  3. 3. Caro D. D. Petra N. Steollo A. G. M. 2008 Reducing lookup-table size in direct digital frequency synthesizers using optimized multipartite table method. IEEE Transactions on Circuits and Systems I, Regular Papers, 55 7 Aug. 2008, 2116 2127 , 1549-8328
  4. 4. Caro D. D. Petra N. Steollo A. G. M. 2009 Digital synthesizer/mixer with hybrid CORDIC-multiplier architecture : error analysis and optimization. IEEE Transactions on Circuits and Systems I, Regular Papers, 56 2 Feb. 2009, 364 373 , 1549-8328
  5. 5. Coulson A. J. 2001 Maximum likelihood synchronization for OFDM using a pilot symbol : algorithms. IEEE Journal on Selected Areas in Communications, 19 12 Dec. 2001, 2486 2494 , 0733-8716
  6. 6. Fan W. Choy-S C. Leung-N K. 2009 Robust and low complexity packet detector design for MB-OFDM UWB. Proceedings of IEEE Int. Symposium on Circuits and Systems, 693 696
  7. 7. Fan W. Choy-S C. 2010a Power efficient and high speed frequency synchronizer design for MB-OFDM UWB. Proceedings of IEEE Int. Conference on UWB, 669 673
  8. 8. Fan W. Choy-S C. 2010b Efficient and low complexity phase tracking method for MB-OFDM UWB receiver. Proceedings of IEEE Midwest Symposium on Circuits and Systems, 221 224
  9. 9. Fort A. Weijers J. W. Derudder V. et al. 2003 A performance and complexity comparison of auto-correlation and cross-correlation for OFDM burst synchronization. Proceedings of IEEE Int. Conference on Acoustics, Speech, and Signal Processing, 341 344
  10. 10. Liu S. Y. Chong J. W. 2002 A study of joint tracking algorithms of carriers frequency offset and sampling clock offset for OFDM-based WLANs. Proceedings of IEEE Int. Conference on Communications, Circuits and Systems and West Sino Expositions, 109 133 .
  11. 11. Minn H. Bhargava V. K. Letaief K. B. 2003 A robust timing and frequency synchronization for OFDM systems. IEEE Transactions on Wireless Communications, 2 4 Jul. 2003, 822 839 , 1536-1276
  12. 12. Moose P. H. 1994 A technique for orthogonal frequency division multiplexing frequency offset correction. IEEE Transactions on Communications, 42 10 Oct. 1994, 2908 2914 , 0090-6778
  13. 13. Pollet T. Van Bladel M. Moeneclaey M. 1995 BER sensitivity of OFDM systems to carrier frequency offset and Wiener phase noise. IEEE Transactions on Communications,43 2 Mar. ~ Apr. 1995, 191 193 , 0090-6778
  14. 14. Schmidl T. M. Cox D. C. 1997 Robust frequency and timing synchronization for OFDM. IEEE Transactions on Communications,45 12 Dec. 1997, 1613 1621 , 0090-6778
  15. 15. Speth M. Fechtel S. Fock G. et al. 2001 Optimum receiver design for OFDM-based broadband transmission-part II: a case study. IEEE Transactions on Communications, 49 4 Apr. 2001, 571 578 , 0090-6778
  16. 16. Troya A. Maharatna K. Krstic M. et al. 2007 Efficient inner receiver design for OFDM-based WLAN systems: algorithm and architecture. IEEE Transactions on Wireless Communications, 6 4 Apr. 2007, 1374 1385 , 1536-1276
  17. 17. Troya A. Maharatna K. Krstic M. 2008 Low-power VLSI implementation of the inner receiver for OFDM-based WLAN systems. IEEE Transactions on Circuits and Systems I, Regular Papers, 55 2 Mar. 2008, 672 686 , 1549-8328
  18. 18. Tsai-Y P. Kang-Y H. Chiueh-D T. 2005 Joint weighted least squares estimation of carrier frequency offset and timing offset for OFDM systems over multipath fading channel. IEEE Transactions on Vehicular Technology, 54 1 Jan. 2005, 211 224 , 0018-9545
  19. 19. Tsai-Y P. Chiueh-D T. 2007 A low-power multicarrier-CDMA downlink baseband receiver for future cellular communication systems. IEEE Transactions on Circuits and Systems I, Regular Papers, 54 10 Oct. 2007, 2229 2239 , 1549-8328
  20. 20. Van de Beek J. J. Sandell M. Borjesson P. O. 1997 ML estimation of time and frequency offset in OFDM systems. IEEE Transactions on Signal Processing, 45 7 Jul. 1997, 1800 1805 , 0105-3587X

Written By

Wen Fan and Chiu-Sing Choy

Submitted: October 21st, 2010 Published: July 27th, 2011