FPGA Implementation of Inverse Fast Fourier Transform in Orthogonal Frequency Division Multiplexing Systems

In modern communication systems, Orthogonal Frequency Division Multiplexing (OFDM) systems are used to transmit with higher data rate and avoid Inter Symbol Interference (ISI). The OFDM transmitter and receiver contain Inverse Fast Fourier Transform (IFFT) and Fast Fourier Transform (FFT), respectively. The IFFT block provides orthogonality between adjacent subcarriers. The orthogonality makes the signal frame relatively secure to the fading caused by natural multipath environment. As a result OFDM system has become very popular in modern telecommunication systems. Beside all the advantages of OFDM system, there is a main drawback of high Peak to Average Power Ratio (PAPR). There have been many approaches on reducing PAPR in time domain and frequency domain. Some of them work in time domain such as Partial Transmit Sequence Insertion (PTS) and some other methods perform in frequency domain such as Dummy Sequence Insertion (DSI) and Selected Mapping (SLM) methods (Bauml et al., 1996; Muller et al., 1997). Since according to (Baxley et al., 2007), the SLM method reduce PAPR with the least computational complexity and least additional modification requirements on the current technology, therefore most of recent researches have considered SLM based method modifications for their work. Most of these methods modified the OFDM transmitter in a way that multiple IFFT processors are required for implementation. This will increase the number of additions and multiplications that are needed for implementation.


Introduction
In modern communication systems, Orthogonal Frequency Division Multiplexing (OFDM) systems are used to transmit with higher data rate and avoid Inter Symbol Interference (ISI). The OFDM transmitter and receiver contain Inverse Fast Fourier Transform (IFFT) and Fast Fourier Transform (FFT), respectively. The IFFT block provides orthogonality between adjacent subcarriers. The orthogonality makes the signal frame relatively secure to the fading caused by natural multipath environment. As a result OFDM system has become very popular in modern telecommunication systems. Beside all the advantages of OFDM system, there is a main drawback of high Peak to Average Power Ratio (PAPR). There have been many approaches on reducing PAPR in time domain and frequency domain. Some of them work in time domain such as Partial Transmit Sequence Insertion (PTS) and some other methods perform in frequency domain such as Dummy Sequence Insertion (DSI) and Selected Mapping (SLM) methods (Bauml et al., 1996;Muller et al., 1997). Since according to (Baxley et al., 2007), the SLM method reduce PAPR with the least computational complexity and least additional modification requirements on the current technology, therefore most of recent researches have considered SLM based method modifications for their work. Most of these methods modified the OFDM transmitter in a way that multiple IFFT processors are required for implementation. This will increase the number of additions and multiplications that are needed for implementation.
In this Chapter, the OFDM system and the main block of IFFT are introduced. The IFFT block is implemented on FPGA and verification results are discussed. The Optimum Phase Sequence Insertion with Dummy Insertion (OPS-DSI) method is one of recent PAPR reduction techniques and a good example of application for IFFT processor is studied in this Chapter and the FPGA implementation result is verified with simulation results. Fig. 1 shows how OFDM signal is processed. The data input signal with high data rate is split into narrow band channels with lower data rate and then, they are modulated by using general signal modulation (PSK, QAM) and followed by with Inverse Fast Fourier www.intechopen.com Fourier Transform -Signal Processing 136 Transform (IFFT) which provides orthogonality between adjacent sub-channels. After IFFT, the last portion of signal is copied to the head to provide immunity to Inter Symbol Interference (ISI) which is shown by Cyclic Prefix (CP) in Fig. 1

Fig. 2. Comparison between FDM and OFDM signals
Although OFDM process is similar to Frequency Division Multiplexing (FDM) signal but, there are some differences. FDM is a single carrier signal in which signal is divided into frequency bands with some guard interval between them to avoid interferences. This is not an issue in OFDM signal since neighboring subcarriers are orthogonal to each other; overlapping does not create interference and the bandwidth is used more efficiently as shown in Fig. 2

IFFT implementation on Field Programmable Gate Array (FPGA)
Field Programmable Gate Arrays (FPGAs) are configurable and re-programmable digital logic devices, and programming code is usually written in Hardware Description Languages (HDL).

Fig. 3. Virtex-5 Pro development board
Virtex-5 FXT Evaluation Kit is used for implementation which has the Xilinx Virtex-5 XC5VFX30T-FF665 FPGA chip. The datasheet of this FPGA is provided in Appendix D. This board also has 64 MB DDR2 SDRAM memory and 16 MB FLASH memory with variety of I/ O gates which makes it suitable for a typical implementation.
As shown in Fig. 3, the J TAG-USB cable is connected to the JTAG connector of the FPGA board for programming the FPGA. This cable should be connected to the USB port of the PC. The other connector is named serial port which is used to command while the program is performing.
This section introduces fundamentals of IFFT block prototype at the transmitter and the FFT block at the receiver. The basic equations of the FFT and the Inverse FFT (IFFT) are given by:  n=0,...,N-1 N π ∑ (2) where N is the transform size or the number of sample points in the data frame and 1 j =−. X(k) is the frequency output of the FFT at k th point where k=0, 1, …, N-1 and x(n) is the time sample at n th point with n=0, 1,…, N-1.
Due to the symmetric of the exponential matrix 2 j kn / N e −π , it can be represented as twiddle factor that is shown with . The computation can be performed faster by using twiddle factor as it depends on the number of points used and there is no need to recalculate it and the values can be referred to a matrix of twiddle factors. As the transform time is very crucial in FFT process, there is always a trade-off between the core size and the transform time. In Xilinx there are four architectures of Pipelined-Streaming I/ O, Radix4-Burst I/ O, Radix2-Burst I/ O and Radix2-Lite-Burst I/ O. They have different features to cover different time and size requirements.
In Pipelined Streaming I/ O architecture, the data is processed continuously. The Radix4 uses an iterative approach to process the data. The data is loaded and processed separately. It is smaller in size than the pipelined solution howeverhas a longer transform time. The third architecture has the same iterative approach as Radix4 althoughthas longer transform time. The Radix2 is based on Decimation In Frequency (DIF) and separates the input data into two halves of: The FFT formula for both even and odd conditions can be written in two summations as follows: This operation for 2 points can be graphically presented in Fig. 4. is smaller. This means it is smaller in size than the Radix4 solution. The forth scheme is based on the Radix2 architecture. The " Radix2-Lite-Burst I/ O" uses a time-multiplexed approach to the butterfly and the butterfly is even smaller howeverthe transform time is longer. In this project the Radix2-Burst I/ O architecture as shown in Figure 5.4, is used due to the less hardware resource requirement compared to the other algorithms to prototype FFT (Yiqun et al., 2006).  The point sizes can be from 8 to 65536 and a minimum of block memories is used in this algorithm. When the point size is equal or less than 1024, both block memory and distributed memory can be used for data memories and phase memories.
In order to have accurate IFFT block, the model of targeted FPGA should be indicated in AccelDSP tool window. The AccelDSP is a synthesis tool that transforms a design in Matlab into a hardware module. This module can be VHDL or Verilog code. This tool controls an integrated environment with other design tools such as Matlab and Xilinx ISE tools.
There is a browser in GUI that shows the design hierarchy, the M-files, and the generated HDL source files. In this project, AccelDSP is used to generate the IFFT and FFT blocks. To guide the synthesis process, the design objects in the project explorer window is used. There are some parameters that should be defined here.
One of the important parameters in design of IFFT block is the algorithm to implement it which was discussed before and it is selected the Radix. The other parameter is the IFFT length that denotes the number of differential points in the IFFT. There is also option for I/ O  Fig. 6. System Generator block diagram of IFFT block and input output blocks format. With the data I/ O format option in AccelDSP GUI, input and output data can be initialized. Single buffering does not parallel any operations. Double buffering parallels the loading and unloading of frames of data. Natural Order I/ O only applies to Single Fly architectures. Decimation Algorithm will naturally have inputs or outputs in digit/ bit reverse ordering; DIF has natural order input and digit/ bit reverse output, DIT has digit/ bit reverse input and natural order output. The 'Yes' will force input and output to be natural order regardless of decimation type. The input data can be set to complex or real. Decimation algorithm parameter will be set to Decimation In Time (DIT) or Decimation In Frequency (DIF) algorithm. Scaling is the 1/ IFFT Length ratio that can be set. Complex multiplier is another option that chooses different complex multiplier architectures. Round Mode sets the Quantizer round mode property for all data path quantizers. If Floor is selected for round mode, the numbers between 0 and -1 will be rounded to 0 and all the other numbers, bigger than zero and lower than -1, will round to the closest number. For example -1.8 will be -2. There is also a section for input data width that shows the number of bits used to represent the input. Input Data Fract Width shows the number of bits used to represent fractional part of input word width. Twiddle width is another parameter that shows the number of bits used to represent twiddle factors. In addition twiddle factor width is the number of bits used to represent the fractional part of phase factors. The range of the phase factors is (-1, 1) and therefore 2 bits are always needed for the integer part of the phase factors. The fractional part will always, twiddle factor width=twiddle width -2. The data width can be also modified for output of IFFT. Output data width depends on the scaling option. If Scaling is set to 'Yes', output data width = input data width. If scaling is set to 'No" , output data width = input data width + log2 (IFFT length) + 1. The output data fractional width indicates the greater of input data fractional width or twiddle fractional width.
Another important setting in AccelDSP tool is about the form of flow in the design. For this particular application, the flow should be set to System Generator. At the end of design, a library including the IFFT block with desired n a me i s cr e a t e d. F r om Ma t l a b s i mu l i n k environment the library and IFFT block is accessible.
When the IFFT block is inserted in simulink window, some other components are required to complete the model which is shown in Fig. 6. These components are Input data, signal Synchronization line, output gate, and complex to real imaginary converter. At this time the model should be able to run successfully. As shown in Fig. 6, the centre block with Xilinx sign at the background is the IFFT model which is generated by AccelDSP tools.
Then with system generator block, the NGC Netlist file can be generated. This file contains information that the ISE software is able to analyze it and estimate the hardware resource consumption which is presented in a table in ISE.

Results and discussions
When the IFFT block is designed in AccelDSP tool, the fixed point model of the design is generated. The AccelDSP automatically runs a MATLAB fixed-point simulation. Then the verification process can be done visually which is to compare the Fixed-Point Plot with the Floating-Point Plot to verify a match. The other form of verification can be performed using real and imaginary constellation graph. In order to verify designed IFFT, the AccelDSP tool generates real and imaginary constellation graphs and the verification can be done visually. Fig. 9 presents constellation based on IFFT with N=256 and Radix 2. In Fig. 9, the real part of fixed point model is shown by (a) which agree with (c) which is the real part in floating point model of IFFT design. The imaginary part of fixed point model is shown by (b) in Fig. 9 and it agree with (d) which is the imaginary part of floating point model of IFFT design.
As a result of discussed verifications, the error between fixed point model and floating point model is negative which is presented in Fig. 10. The other important parameters in designing hardware modules are hardware resource consumption and power consumption. These parameters can be estimated using Xilinx ISE tool. First the NGC Netlist file should be generated using System generator block in Matlab simulink and then through ISE the hardware consumption can be measured. In ISE GUI, from file folder, the saved project can be opened and then using Implement Top Module bottom, the hardware resource consumption table is generated which is presented in Table 1.
The main consideration is the percentages of DSP48 and IO Utilization. As shown in Table 1, the DSP48 and IO Utilization units of IFFT are used 6% and 16%, respectively.
The power consumed by the implemented DSI-SLM scheme is estimated by ISE XPower analyzer, Xilinx tool, after the place and route process. The processor consumes a total power of about 630 milliWatts and dynamic power of 10 milliWatts. Table 2 presents the details of power report. The ISE tool is also able to generate power consumption report.
www.intechopen.com  As mentioned before, one of main consideration in designing OFDM systems is the computational complexity which can be defined by the number of real additions and multiplications that is required for hardware implementation of a system.
According to [Baxely et al., 2007] Each IFFT requires N/2logN+N/2 complex multiplication and NlogN complex addition. A complex multiplication takes four real multiplications and two real additions. Total number of real addition required for an IFFT, A IFFT is presented in Eq.
(1) and total number of real multipications for an IFFT, M IFFT is presented in Eq.
(2). Hence the total number of multiplication and addition of one IFFT can be given by Eq. (3) 22 22 where N is the number of subcarriers. When N=256, T IFFT =3850. For N=512 and 1024, the value of T IFFT is 8471 and 18484 respectively. It is obvious that by increasing the lenght of IFFT, the number of additions and multipication required for Implementing the IFFT is increased.
Design process of IFFT for higher number of subcarriers (N=512 and 1024) and Radix-4 is very similar to IFFT with Radix-2 and N=256. The Hardware resource Consumption and Power consumption for N=512 is presented in Table 3.  Table 3. Hardware resource consumption of IFFT block with Radix-2 and N=512

Percentages of Consumption
As shown in Table 3, the DSP48 and IO Utilization units of IFFT are used 4% and 16%, respectively.
According to ISE estimation, this IFFT processor consumes a total power of about 631 milliWatts and dynamic power of 11 milliWatts. Table 4 presents the details of this power report.
The Hardware Resource Consumption of IFFT processor with Radix-2 and N=1024 is presented in Table 5. It is shown that, the DSP48 and IO Utilization units of this IFFT block are used by 6% and 16%, respectively.
This IFFT processor consumes a total power of about 630 milliWatts and dynamic power of 10 milliWatts which are estimated by ISE tools. Table 6 presents the details of power report. When comparing Table 7 and Table 1, it can be observed that the IO Utilization has no changes. However the consumption of DSP48 slices is increased by about 12%.

Resources
The IFFT processor with Radix-4 and N=256 consumes a total power of 0.65099 Watt and dynamic power of 0.02939 Watt which are estimated by ISE tools. Table 8 presents the details of power report. By Comparing Table 8 with Table 2, it is seen that power consumption of Radix-4 is increased compared to Radix-2 by about 0.02 Watt.  Table 9. Hardware resource consumption of IFFT block with Radix-4 and N=1024

Resources
The Hardware resource consumption of IFFT processor with Radix-4 and N=1024 is presented in Table 9. It is shown that, the DSP48 and IO Utilization units of this IFFT block are used by 18% and 16%, respectively. The IFFT processor with Radix-4 and N=1024 consumes a total power of 0.64698 Watt and dynamic power of 0.02573 Watt which are estimated by ISE tools. Table 10 presents the details of power report.

Recent application of IFFT processor
Simple structure of an OFDM symbol consist of 4 sinusoids is shown in Fig. 11. The OFDM signal is created by the sum of multiple sinusoidal signals. Due to the constructive interference, as shown in Fig. 3 high peaks will be structured and as a result of destructive interference, the average power might be as low as zero. Hence, the ratio between peak and average will be high (Higashinaka et al., 2009). High Peak to Average Power Ratio (PAPR) is a major design challenge in OFDM systems (Krongold et al., 2003;Bauml et al., 1996;Wei et al., 2006).
The reason is that when OFDM signal with high PAPR is introduced to amplification stage, Power Amplifier (PA) which is usually peak power limited, is forced to operate in the nonlinear region (Nieto, 2005). This will cause two impacts, out-of-band distortion or spreading the spectrum that can be measured by Adjacent Channel Power Ratio (ACPR) metric and inband distortion, which can be measured by Error Vector Magnitude (EVM) metric.
There are some PAs with a wide dynamic linear region (Class AB), however they are generally expensive, consume more power, and less efficient (Cooper, 2008;Varahram et al., 2009;Sharma et al., 2010). Hence, in order to have high-efficiency OFDM signal and extended battery life, the PAPR must be reduced and the linearity of PA should be maximized. www.intechopen.com

FPGA Implementation of Inverse Fast Fourier Transform in Orthogonal Frequency Division Multiplexing Systems 149
The PAPR is calculated as the ratio of the maximum power and the average power of signal and can be defined by: where s(t) is the N carriers OFDM envelope presented as below (Ochiai, 2003): where A = (A 0 ,A 1 , ...,A (N-1) ) is a modulated data sequence of length N in the time interval (0,T), where A i is a symbol from a signal constellation and T is the OFDM symbol duration.
Basically the performance of a PAPR reduction is measured using Complementary Cumulative Distribution Function (CCDF) graph. It denotes the probability that the PAPR of a data symbol exceeds a predefined threshold as expressed by (Han et al., 2005;Heo et al., 2009): where N is the number of subcarriers and z is the threshold. Basically, this probability function is used as a graph to determine the ability of an algorithm in reducing the PAPR of the OFDM signal and the PAPR is usually compared to unmodified OFDM signal at 0.01% CCDF which is shown by 10 -4 CCDF in horizontal vector of graphs. A typical OFDM signal without any PAPR reduction technique has about 8dB to 13dB PAPR at 10 -4 CCDF (Raab et al., 2011). Therefore, when a PAPR reduction technique is applied to the OFDM system, it is expected to reduce the 13dB PAPR to some lower value. According to the IEEE standard (IEEE STD 802.16e™-2005), the reduction should be at least 3dB.
Several techniques have been developed to reduce PAPR of the OFDM signal. There are two main categories for these techniques, distortion based methods (which means that applying these methods result in out-of-band distortion) and distortion less methods (there is no outof-band distortion). First category includes Clipping , Windowing (Van et al., 1998), Envelope Scaling (Foomooljareon et al., 2002), Random Phase Updating (Nikookar et al., 2002), Peak Reduction Carrier (Tan et al., 2003), Companding (Hao et al., 2006;Hao et al., 2010;Cao et al., 2007;Chang et al, 2010;Hao et al., 2008;Kim et al., 2008) and other modified version of these methods.
Clipping is a simple technique for PAPR reduction, where in the transmitter, the signal is clipped to a desired level and the phase information remains unchanged. The clipping method applies distortion to the system; therefore normally clipping technique is integrated with filtering method in expense of additional IFFT and FFT blocks which increase the 150 complexity of the system. In windowing technique a large signal peak is multiplied with a certain frame. Envelope scaling method is an algorithm to reduce PAPR by scaling the input envelope for some subcarriers before they are sent to IFFT. In the random phase updating algorithm, some random phases are generated and assigned for each carrier. The process of updating is continued till the peak value of the OFDM signal is below the threshold. The peak reduction carrier involves the use of a higher order modulation scheme to represent a lower order modulation symbol (Vijayarangan et al., 2009). The Companding technique is used to compress and expand the OFDM signal in order to reduce PAPR. The speech processing is the main application of companding method as it has less frequent peaks problem.
Most of the recent researches are concentrating on modified SLM and PTS methods Ghassemi et al. 2010;Naeiny et al., 2011;Kim et al., 2006;Jeon et al., 2011;Hong et al., 2010). According to the review, most of the modified methods reduce PAPR at the expense of complexity in the transmitter or degrading the spectrum efficiency of the system. It should be noted that improving the performance of SLM based techniques requires high number of IFFT processors which leads to high complexity. The PTS based methods also have drawback of complexity from another aspect. The improvement of these methods requires extra number of additions and multiplication to be implemented for finding optimum value which leads to high complexity. Hence, there is good scope to design a new method to overcome previous drawbacks and enhance the PAPR performance.
Here one of recently proposed methods for reducing PAPR of OFDM signal is presented (Mohammady et al., 2011). This method is named Optimum Phase Sequence insertion with Dummy Sequence Insertion (OPS-DSI). As shown in block diagram of OPS-DSI scheme in Fig. 13, there are two loops in OPS-DSI algorithm. If the PAPR is not less than the threshold, Loop_a with specific number of iterations is performed and the PAPR will be compared. If the PAPR is less than the threshold, the signal will be transmitted regardless of the second loop, otherwise the second loop Loop_b with predefined number of iterations is executed and the PAPR is calculated similarly. When Loop_a is performed, a new random dummy is generated and inserted to the signal, howeverthe phase sequence is the same as last iteration. It should be noted that the number of iterations is specified based on the PAPR reduction requirement and data rate. The value of the PAPR threshold is also based on each standard in wireless broadband.
It should be noted that when Loop_b is running, Loop_a is repeated. It means that in Loop_b, new random phase sequence will be selected and multiplied to the signal and then a new random dummy is inserted to the signal. When the threshold condition is passed, the signal will be transmitted, however if the iterations for both loops are performed and still the PAPR is not less than the threshold, the signal with minimum PAPR among them will be transmitted. The CCDF result of a typical OFDM system with OPS-DSI scheme is compared with C-SLM and DSI methods. As shown in Fig. 13 (a), the PAPR of original OFDM signal is about 11.8dB at 10 -4 CCDF or 0.01% CCDF. When DSI method is applied to this system, the PAPR is reduced to about 9.9dB shown by Fig. 13 (b) which means that the PAPR performance is enhanced by about 1.9dB compared to original OFDM signal.
When the C-SLM method with 8 IFFTs (number of candidate signals, M=8) is applied to the OFDM signal, the PAPR of 8.5dB is achieved which is shown by Fig. 13 (c). In this case, the PAPR is enhanced by about 3.4dB.
It is shown by Fig. 13 (d) that when OPS-DSI scheme is applied to the OFDM signal, the PAPR of about 7.7dB is achieved. In other words, the PAPR is enhanced by about 4.2dB compared to original signal. The PAPR performance of implemented OPS-DSI scheme is shown by Fig. 13 (e). The implemented system shows slightly degraded PAPR performance which is due to the Hardware input bit resolution. The ISE tol is able to generate total data path delay for OPS-DSI design which is 10.937 ns. This delay is within the accepted range according to Shannon-Hartly theorem (Hartley, 1928).
While comparing this result with recent works (Jeon et al., 2011;Naeiny et al., 2011;Hong et al., 2010;Kim et al., 2006), the face that 4.2dB reduction is achieved with only one IFFT and lowest complexity makes OPS-DSI method a very attractive method suitable for FPGA implementation.
In some literature papers, the PAPR performance is studied using time domain symbols. Fig. 14 presents 1024 samples of output signal with and without PAPR. Blue color samples are the output signal without PAPR reduction and the red color samples are the output signal when PAPR reduction is applied. It can be observed that the OFDM signal peaks are suppressed. However the reduction seems to be insignificant. The reason is that OPS-DSI scheme is a probabilistic method and the reduction is based on signal modification, therefore, time domain graph is not an accurate study tool for this case.

Conclusion
In this chapter the OFDM transmition system is studied. The main component of IFFT processor is introduced. Hardware implementation of this block is performed and results are compared with the simulation. A very iportant application of IFFT in PAPR reduction scheme of OPS-DSI is reviewed. Different type of algorithms for IFFT are tested and Hardware resource consumption and power consumption are estimated using ISE tools. The complexity of implementing one IFFT block in the FPGA is mathematicaly computed. The field of signal processing has seen explosive growth during the past decades; almost all textbooks on signal processing have a section devoted to the Fourier transform theory. For this reason, this book focuses on the Fourier transform applications in signal processing techniques. The book chapters are related to DFT, FFT, OFDM, estimation techniques and the image processing techqniques. It is hoped that this book will provide the background, references and the incentive to encourage further research and results in this area as well as provide tools for practical applications. It provides an applications-oriented to signal processing written primarily for electrical engineers, communication engineers, signal processing engineers, mathematicians and graduate students will also find it useful as a reference for their research activities.