Open access peer-reviewed chapter

Flexible Baseband Modulator Architecture for Multi-Waveform 5G Communications

Written By

Mário Lopes Ferreira and João Canas Ferreira

Submitted: 11 June 2019 Reviewed: 21 January 2020 Published: 28 February 2020

DOI: 10.5772/intechopen.91297

From the Edited Volume

Field Programmable Gate Arrays (FPGAs) II

Edited by George Dekoulis

Chapter metrics overview

862 Chapter Downloads

View Full Metrics

Abstract

The fifth-generation (5G) revolution represents more than a mere performance enhancement of previous generations: it will deeply transform the way humans and/or machines interact, enabling a heterogeneous expansion in the number of use cases and services. Crucial to the realization of this revolution is the design of hardware components characterized by high degrees of flexibility, versatility and resource/power efficiency. This chapter proposes a field-programmable gate array (FPGA)-oriented baseband processing architecture suitable for fast-changing communication environments such as 4G/5G waveform coexistence, noncontiguous carrier aggregation (CA) or centralized cloud radio access network (C-RAN) processing. The proposed architecture supports three 5G waveform candidates and is shown to be upgradable, resource-efficient and cost-effective. Through hardware virtualization, enabled by dynamic partial reconfiguration (DPR), the design space exploration of our architecture exceeds the hardware resources available on the Zynq xc7z020 device. Moreover, dynamic frequency scaling (DFS) enables the runtime adjustment of processing throughput and power reductions by up to 88%. The combined resource overhead for DPR and DFS is very low, and the reconfiguration latency stays two orders of magnitude below the control plane latency requirements proposed for 5G communications.

Keywords

  • FPGA
  • reconfigurable computing
  • dynamic partial reconfiguration
  • baseband processing
  • OFDM
  • FBMC
  • UFMC
  • waveform coexistence
  • carrier aggregation

1. Introduction

The fifth-generation (5G) cellular network technology will have a tremendous impact on society by optimizing existing telecommunication services and applications and enabling solutions in new application fields, such as transportation, education or medical science. The scope of the anticipated changes is clear from the three main types of 5G use cases and services defined by the International Telecommunication Union (ITU): enhanced mobile broadband (eMBB), ultra-reliable and low-latency communications (URLLC) and massive machine-type communications (mMTC) [1]. Therefore, the handling of the physical layer (PHY) for 5G systems will be far more complex than in the current generation.

Orthogonal frequency-division multiplexing (OFDM) is the preferred waveform in 4G standards, and the 3GPP Release 15 [2] recently defined it as the multiple access scheme for the 5G New Radio (NR) PHY, especially due its high frequency selectivity, flexibility, efficient hardware implementation by FFT/IFFT modules, and good Multiple-Input Multiple-Output (MIMO) compatibility [3]. However, the spectrum of OFDM symbols presents large side lobes that cause high out-of-band (OOB) emissions. Moreover, the interference between adjacent time-domain symbols is mitigated by adding redundancy to each symbol, which reduces spectral efficiency. Together, these characteristics may make 5G requirements in certain communication scenarios hard to achieve, which has led to the proposal of other waveforms [4]. The most popular ones are filter bank multicarrier modulation (FBMC), Universal Filtered Multicarrier modulation (UFMC), Filtered OFDM (f-OFDM) and generalized frequency-division multiplexing (GFDM). Different waveforms imply different baseband processing operations. Especially for sub-6 GHz spectrum bands, the coexistence of multiple numerologies and waveforms and the close interworking between 5G and current systems is likely to occur in the near future [5].

The expansion of wireless communication caused by 5G systems and services raises concerns about the inefficient use of the electromagnetic spectrum. In addition, to expand spectrum utilization to frequency bands above 6 GHz, a more efficient spectral utilization of heavily used bands must be achieved. To tackle this issue, future baseband processor designs should support dynamic spectrum access (DSA) [6] and carrier aggregation (CA) schemes.

In summary, baseband processing infrastructures for 5G systems must be (1) flexible, to adapt their operation for different communication setups (i.e. waveforms and their parameterization); (2) scalable, to tune performance and capacity according to communication demands; (3) resource and power efficient, for cost-effectiveness and reduced environmental impact [7]; (4) forward compatible, to easily integrate the support for new services and requirements, extending system lifetime. Modern field-programmable gate arrays (FPGAs) represent an implementation platform that favors the design of systems with the characteristics mentioned. The intrinsic FPGA reconfigurability can be enhanced by means of dynamic partial reconfiguration (DPR), i.e. by reconfiguring modules of the design without halting the system. The hardware virtualization allowed by DPR enhances system flexibility, feature wealth, upgradability and cost-effectiveness [8]. This chapter discusses how DPR and dynamic frequency scaling (DFS) can be combined to produce a dynamically reconfigurable baseband processing architecture for multimode, multi-waveform coexistence and dynamic spectrum aggregation.

After a brief summary of the state of the art in Section 2, the implementation of datapaths for baseband processing of three waveforms (OFDM, FBMC and UFMC) is described in Section 3. The implementation of a dynamically reconfigurable baseband modulator that combines these datapaths is described in Section 4, together with a discussion of the results. Some final remarks are presented in Section 5.

Advertisement

2. Summary of the state of the art

Application of DPR to baseband processing in wireless communications started with the adoption of small-scale and relatively simple functional elements such as FIR filters, constellation mappers or channel encoders [9, 10]. Possibly the first multi-waveform flexible PHY architecture was proposed by He et al. [11]. It is a software-defined radio (SDR) architecture implemented on a Xilinx Virtex-5 FPGA, which combines two reconfiguration techniques: (a) DPR to dynamically change the baseband processing mode of operation (e.g. FFT size, modulation scheme and CP length) and (b) DFS to adapt the clock frequency of the digital up-converter and the baseband processor. The design supports two waveforms (OFDM and WCDMA) and several 3G/4G standards and modes of operation. Compared with a static multimode design, the DPR-based design achieves a reduction in the number of used slices, DSP blocks (DSPs) and block RAMs (BRAMs). However, the comparison is not accurate as the static design uses parallel and independent processing chains for each standard, ignoring potential optimization from the reutilization of common modules.

CoPR, an automated framework for DPR-based adaptive systems on a Xilinx Zynq device, is described in [12]. An illustrative case study is presented, where a reconfigurable multistandard baseband OFDM transmitter is designed. The design supports three standards (IEEE 802.11, IEEE 802.16 and IEEE 802.22) and contains two reconfigurable partitions (RPs): one to implement the digital modulation scheme and the other for the OFDM processing datapath. The paper only reports reconfiguration time results and does not provide figures for power consumption or the amount of resources of each RP.

An ARM-FPGA-based platform is also used in [13]. Several processes run on the ARM processor and retrieve communication environment information, which is employed by a configuration controller to reconfigure an OFDM baseband processing modulator. An OFDM transmitter supporting Wi-Fi and WiMAX is implemented on Zynq’s programmable logic (PL) with four RPs used for scrambling, interleaving, FEC encoding and IFFT. Results for resource utilization and DPR latency are discussed, together with power consumption measurements. However, the sampling period used for the measurements is of the order of magnitude of the reconfiguration times (milliseconds) and, therefore, not suitable for accurate real-time measurements at the time scale of interest.

The Zynq is also used in [14], which presents a HW/SW codesign for CR systems combining parameter reconfiguration and DPR. Only DPR latency and RP resources are reported.

Pham et al. [15] present a reconfigurable multistandard OFDM transceiver supporting IEEE 802.11, IEEE 802.16 and IEEE 802.22 on a Xilinx Virtex-6 FPGA. The modulator uses a single RP, whereas the demodulator explores a mixture of DPR-based and static multimode modules. The authors put more weight on the whole architecture and only provide data about reconfiguration times and bitstream size.

All works mentioned target 3G/4G standards and waveforms. From a system perspective, they focus primarily on the enhanced flexibility DPR can offer, with less attention paid to the global impact of this technique on the design of the hardware infrastructure. Additionally, no architecture with multiple and independent processors suitable for noncontiguous spectrum aggregation is studied.

Advertisement

3. Implementation of datapaths for baseband processing

This section describes the implementation of pipelined datapaths for three different waveforms (OFDM, FBMC and UFMC) and respective variants. Each variant is defined by the values assigned to the parameters of the design. The possible sets of values are sometimes called “numerologies” in the literature. In this case study, two sets of parameter values for each waveform are considered, as described in the remainder of this section and summarized in Tables 13.

ParameterMode 1Mode 2
# subcarriers, N (IFFT size)5121024
length of cyclic prefix, LCP40 (1st slot symbol)80 (1st slot symbol)
36 (other symbols)72 (other symbols)
# WOLA samples, W46

Table 1.

Parameter values supported by the OFDM datapath.

ParameterMode 1Mode 2
# subcarriers, Nc5121024
Overlapping factor, K44
IFFT size, K × Nc20484096

Table 2.

Parameter values supported by the FBMC datapath.

ParameterMode 1Mode 2
# subcarriers, N5121024
# subcarriers per PRB1212
# active PRBs33
IFFT size, N′6464
Upsampling factor, N/N′816
Filter length, L3773
Filter typeDolph-Chebyshev (60-dB side lobe attenuation)

Table 3.

Parameter values supported by the UFMC datapath.

3.1 Baseband datapath for OFDM

OFDM is the main reference among multicarrier modulation waveforms; it is used in a wide range of standards such as DSL, DVB-T, DVB-C, Wi-Fi (IEEE 802.11), WiMAX (IEEE 802.16) and 3GPP LTE. The conceptual structure of an OFDM modulator is illustrated in Figure 1.

Figure 1.

OFDM baseband modulation. GB, zero-valued guard bands.

With exception of the inverse FFT (IFFT), the tasks required for a modulator involve only simple arithmetic, data selection and reordering. The first module is the QAM mapper. For a general M-ary QAM case, the module is simply implemented with an M:1 multiplexer: a log2M-bit input signal selects a complex value out of the M prestored constants that form the constellation. In the implementation used for this work, Gray mapping and average power normalization are considered in the definition of the constellation point values.

After digital modulation, the subcarrier mapping module is responsible for mapping the A input active subcarriers to the central bins of an N-element array and zeroing the centre bin (the DC null subcarrier). The remaining NA1 bins correspond to null subcarriers that serve as guard bands. As the IFFT DC bin is at index 0, an IFFT shift operation is performed on the N-element array. The resulting vector is then fed to the IFFT core. Here, it is assumed that the A active subcarriers include both data and pilot subcarriers and that the higher levels of the communication system provide them in their correct relative locations. The main modules required for subcarrier mapping are a double buffer and a control unit. The double buffer is implemented with a dual-port RAM, with each half storing N complex samples. This allows for simultaneous reading and writing of consecutive A-element arrays without any data conflicts: while one buffer is used for input (writing), the other is used for output (reading). The read/write access to the double buffer is managed by a control unit that receives and correctly maps the data to the correct IFFT input bin. The index mapping scheme implemented by the control unit combines subcarrier mapping and IFFT shifting.

The IFFT module implements a Cooley-Tukey Mixed-Radix algorithm using a pipelined single-delay feedback architecture as in [16]. The IFFT module has several processing stages which are comprised of shift registers, ROM memories, complex multipliers and arithmetic blocks (called “butterflies”). Information on the internal structures of Radix-22 and Radix-2 butterflies can be found in [17, 18]. Apart from processing elements, the IFFT module also includes blocks for input data reordering and bit-reversed reordering of intermediate results, which are performed with RAM-based double buffers.

The IFFT module produces time-domain OFDM symbols. The next module in the datapath is responsible for cyclic prefix (CP) insertion. It receives a data array of size N (corresponding to a time-domain OFDM symbol) and stores it in memory. Then, the module starts to read and output the last LCP + W memory positions. The cyclic prefix extension by W samples allows for the following weighted overlap and add (WOLA) operation. After outputting the last LCP + W memory positions, the CP insertion unit continues by reading and outputting the complete OFDM symbol from the beginning. Thus, the output of this module is an extended ODFM symbol with N+LCP+W complex samples. Its main hardware elements are an N-elements dual-port RAM and a unit for controlling write/read memory operations.

The final module performs the WOLA operation. It can be divided into two stages: first, OFDM symbols are multiplied by a window (windowing), and then the symbol’s tail is overlapped and added with next symbol’s head (overlap-and-add). The windowing operation is implemented using two multipliers (to handle the real and imaginary parts) and a ROM memory with prestored non-unitary raised-cosine window coefficients. In turn, the overlap-and-add operation is implemented with a finite-state machine (FSM) and arrays of registers to temporarily store each symbol’s head and tail.

3.2 Baseband datapath for FBMC

The conceptual structure of the FBMC baseband modulator implemented for this work is shown in Figure 2. The OQAM mapper consists of two stages: first, the incoming data is QAM-modulated; then, the resulting in-phase and quadrature components are decoupled and alternately transmitted on successive subcarriers and on successive transmitted symbols [19]. For instance, if the a symbol includes the in-phase (I) and quadrature (Q) components with the pattern I,Q,I,Q,…, the next symbol will use the pattern Q,I,Q,I…. The QAM mapper is implemented as for the OFDM modulator. The I/Q decoupling is efficiently performed with an FSM that alternately stores or outputs the I/Q components of a QAM symbol.

Figure 2.

Frequency spreading FBMC-OQAM baseband modulation.

The following datapath modules are mainly characterized by parameters K and Nc. The guard band insertion module places the OQAM symbols in the central bins of an Nc-element array. The remaining subcarriers are zero and represent frequency guard bands. The operation of this module is similar to subcarrier mapping in OFDM modulation, except that that no DC null component is inserted.

The frequency spreading operation comprises upsampling by K and FIR filtering. The upsampler outputs K1 zero values between two incoming I/Q samples. It uses registers to store the input data and a counter to control the number of zero values at the output. For pulse shaping, a FIR filter architecture with a transpose structure was adopted because, unlike the direct FIR model, it does not require an extra input shift register, nor a tree of pipelined adders to achieve high throughput. The number of filter coefficients is odd (2×K1), and their values are symmetric with a single-centre coefficient equal to one (Table 4). The multiplications by the centre coefficient can be ignored, as they do not affect the input value. However, the remaining coefficients imply non-trivial multiplications. The amount of non-trivial multiplications per FIR filter can be halved by exploiting the symmetry of the coefficients. As the sub-band signal is complex-valued, two FIR filters are required to separately filter the real and imaginary parts. The IFFT modules are the same as those used in the OFDM modulator.

The final operation is to overlap-and-add consecutive IFFT output stream blocks delayed by Nc/2 samples [19]. This operation uses an array of 2×K×Nc elements as temporary storage; the first half stores the current FBMC symbol, and the second half accumulates IFFT output blocks. For each IFFT output block, the whole array is shifted by Nc/2 positions and then the IFFT output block is added to the second half of the array. A direct mapping of this approach to a hardware implementation would require the use of replicated memory structures to perform two read operations per clock cycle on the temporary array [21]. Instead, the overlap-and-add (OAA) module used was inspired by the architecture used in [22]. The main difference has to do with the fact that OQAM is not employed in [22] and, for the overlap-and-add operation, the consecutive IFFT output block streams are delayed by Nc. To continuously accumulate consecutive IFFT output blocks delayed by Nc/2, a feedback shift register of 2×K1×Nc/2 samples is used to align the previous IFFT block with the incoming IFFT block.

3.3 Baseband datapath for UFMC

UFMC, sometimes called Universal Filtered OFDM (UF-OFDM), is an OFDM-based waveform that attempts to reduce OOB emissions by time-domain filtering. The N subcarriers of each symbol are divided into B physical resource blocks (PRBs) of N/B subcarriers each. Usually, only part of the PRBs is used for transmission (active PRBs). For each active PRB, IFFT and bandpass L-order FIR filtering are performed. Instead of the CP, a zero-valued guard interval with length L is inserted after the IFFT. Frequency-shifted versions of the FIR filter are applied to all active PRBs, and, finally, the filtered sub-bands are superimposed to form an UFMC multicarrier symbol. Chebyshev filters are normally used for bandpass filtering in UFMC [23, 24, 25].

The classic UFMC modulation scheme [26] uses an N-point IFFT and FIR filters with complex coefficients for each active sub-band. To reduce this increased complexity, Knopp et al. [27] combine a smaller N′-point IFFT with N/N upsampling. Moreover, the same real-coefficient FIR filter is used in all sub-bands, followed by frequency shifters implemented as multiplications by a complex exponential. Figure 3 illustrates the datapath structure for the UFMC modulator considered in this work.

Figure 3.

Conceptual structure of the UFMC baseband modulator.

The UFMC modulator of this work has three processing branches, one to process each active PRB (B = 3). These branches share the same architecture and start with QAM mapping of the incoming data. The QAM mapper is equal to the one used in the OFDM and FBMC datapaths. The subcarrier mapping module maps the 12 PRB subcarriers to the central bins of an array with N (64) elements and zeroes the remaining N12 elements. It follows the same approach as subcarrier mapping in the FBMC modulator: a double buffer of 2×N elements and read/write control engines.

UFMC performs well for short-packet lengths and sporadic burst transmission [28, 29]. Moreover, the parallel sub-band processing in UFMC requires an IFFT core per branch. Therefore, instead of the high-performance pipelined IFFT architectures adopted for the OFDM and FBMC datapaths, low-resource memory-based FFT architectures are adopted in the UFMC modulator. The memory-based architecture adopted here is detailed in [30]. The upsampler architecture and operation is similar to the one used for frequency spreading in FBMC modulation. Here, the number of zeros between consecutive IFFT output samples is N/N1.

Dolph-Chebyshev FIR filters with a transpose structure are used for bandpass sub-band filtering. Again, the FIR coefficients are symmetric: there are an odd number of symmetric coefficients, and the centre coefficient is equal to one. However, the higher FIR order used in UFMC modulation requires further discussion. Considering an L-order FIR filter, L1 coefficients imply non-trivial multiplications that can be halved due to coefficient symmetry (L12). As each processing branch requires two FIR filters—for the real and imaginary parts—there are L1 non-trivial multiplications per branch.

In Xilinx FPGAs, non-trivial multiplications can be efficiently performed by DSP blocks. These blocks are embedded into the logic fabric in a column arrangement. Cost-optimized devices have a smaller amount of DSP blocks, and their utilization should be carefully considered. For instance, the xc7z020 device has 220 DSP blocks. Considering the modes of operation from Table 4, the overall amount of non-trivial multiplications for FIR filtering (3×L1) is 108 for mode 1 and 216 for mode 2. This represents a high DSP utilization, and the sparse distribution of these types of blocks throughout the logic fabric degrades the scalability of the UFMC modulator. In addition, placement and routing of the design is more difficult and likely to affect the overall timing closure. To reduce DSP utilization, a multiplier-less architecture for FIR filters was adopted. The FIR coefficients are represented in Q1.5 format, using the Canonic Signed Digit (CSD) system with minimum non-zero bits. Then, non-trivial multiplications are substituted by shifters and adders. For example, the multiplication by 0.90625 can be implemented as:

This strategy eliminates the use of DSP blocks in FIR filters, but increases slice utilization. However, slices are the most numerous type of resource (13,300 slices in the xc7z020 device), making this approach well-suited for the present application.

After FIR filtering, each sub-band signal is shifted to the corresponding frequency band. The frequency shift module for each branch has a ROM memory to store the complex exponential values and a complex multiplier. Finally, the filtered sub-band responses are summed to create the UFMC symbol.

Advertisement

4. A dynamically reconfigurable baseband modulator for 5G communication

After the preceding overview of the architecture of high-performance baseband engines for three different waveforms, this section presents the architecture of a baseband processing engine that is flexible, scalable, resource and power efficient and forward compatible. Here, DPR and DFS are combined to produce a dynamically reconfigurable baseband processing architecture for multimode, multi-waveform coexistence and dynamic spectrum aggregation. To enable the full potential of 5G, carrier aggregation should also be possible across separated frequency bands [31] (noncontiguous CA). For noncontiguous CA, a multidimensional PHY layer (and, therefore, baseband architecture) is needed, even when data aggregation is not performed in the PHY layer, but in the media access control (MAC) communication layer instead [32]. In this context, multidimensional means that the PHY layer is an array of independent processing blocks, rather than a monolithic structure.

The baseband architecture presented in this chapter features three independent modulators, whose functionality and clock frequency can be dynamically reconfigured through DPR and DFS, respectively. This setup enables the processing of multiple component carriers with different waveforms and/or baseband parameters in noncontiguous CA schemes.

A prototype of the multidimensional baseband modulator was implemented on an Avnet Zedboard equipped with a Zynq xc7z020 device. The system top level combines features from the designs of the previous section and can be divided into three parts. The Zedboard’s 512 MB DDR memory is used as a repository for the partial bitstreams used for DPR. The Zynq’s ARM CPU act as the system management unit: it is responsible for triggering the reconfiguration of the multidimensional baseband modulator and setting up data transfers between the DDR memory and the modulators implemented in the programmable logic together with the infrastructure for DPR and DFS. Figure 4 shows the top-level architecture.

Figure 4.

Top-level architecture for the multidimensional and reconfigurable baseband modulator. HPx, high-performance ports; GPIO, general purpose I/O.

The proposed architecture targets the communication scenario described in [33], which combines multi-waveform coexistence with dynamic spectrum access. In this scenario, 5G communications build on the pre-existing 4G infrastructure (non-stand-alone 5G): the primary 4G-LTE communications are OFDM-based, and the secondary 5G communications opportunistically exploit vacant spectrum resources through DSA, transmitting with different waveforms (OFDM, FBMC or UFMC). The basic unit for DPR is a complete baseband datapath, and each one is implemented in a reconfigurable partition. From the three RPs, RP1 is exclusively used for primary OFDM-based transmission. The two remaining RPs can be used for primary or secondary transmission: RP2 implements FBMC or OFDM transmission modes; RP3 implements UFMC or OFDM transmission modes. For instance, if the primary transmission requires more capacity, the three RPs can be used to independently modulate different component carriers in a noncontiguous CA scheme. If the primary transmission is not so demanding, RP2 and RP3 can be used for secondary multi-waveform 5G transmission. Figure 5 illustrates a potential multi-waveform coexistence scenario by showing the combined periodograms of the OFDM, FBMC and UFMC baseband signals obtained from the implemented modulator datapaths.

Figure 5.

Periodograms for OFDM, FBMC and UFMC baseband signals.

During system initialization, the ARM CPU manages the downloading of partial bitstreams and input data files from an SD card to the DDR memory. For the purpose of validating the baseband engines, the input data is retrieved from the DDR and sent to the baseband modulator(s), and the results are stored back to the DDR (and used for validating the implementation). Each RP has an associated DMA controller to accelerate the access to the DDR.

To achieve the specialization of computation at runtime, the configuration interface adopted is the ICAP. This high-bandwidth internal interface permits the FPGA to reconfigure itself. Xilinx sets the maximum ICAP bandwidth at 400, for a 100 clock frequency and 32-bit data width [34]. Nevertheless, the ICAP can be overclocked to further enhance the reconfiguration throughput [35]. In the present work, the ICAP is overclocked at 200 MHz. To take advantage of ICAP overclocking, a dedicated DMA controller is used to accelerate the transfer of partial bitstreams to the ICAP.

The implementation of DFS follows the reference design from [36]. This design considers an FSM that reads configuration parameters from a ROM and writes them to the clock management module available in the FPGA. To change the frequency of the output clocks, the input signal en must be enabled, and the desired mode of operation should be given through the mode port. The DFS controller is fed by a 100 reference input clock that is used to synthesize the clock signal used for baseband processing. Its frequency (fclkBB) can be configured to one of four values: 16.7, 33.3, 66.7 and 100 MHz. All modulator datapaths can work at 100 MHz. The other values are based on the scaling of subcarrier spacing by 2μ as in 5G New Radio systems [2], where μ is an integer that specifies the mode of operation. In this system, primary communications are based on the LTE OFDM numerologies (Table 1), where the subcarrier spacing (Δf) is 15 kHz. For OFDM mode 2 [cf. (Table 1)], the sampling frequency required is N×Δf=15.36 MHz. Scaling the subcarrier spacing by 2μ with μ=12, results in sampling frequencies of 21×15.36=30.72 MHz and 22×15.36=61.44 MHz.

A general overview of the resource utilization of the prototype is presented in Table 5. The static part occupies around 32 and 5% of the slices and BRAMs, respectively. Apart from PS-PL interconnect cores and DMA controllers to accelerate the baseband modulators, the static part also implements the infrastructure for reconfiguration (DPR and DFS). The hardware required to implement DPR and DFS is below 2% of the available LUTs, FFs and BRAMs. The three RPs form the system’s reconfigurable part and occupy 52.6, 64.3 and 72.7% of the available slices, BRAMs and DSPs, respectively. Overall, the resource utilization for the complete system implementation represents a considerable share of the resources available in the xc7z020 device: 84.3% of slices, 69.7% of BRAMs and 72.7% of DSPs.

KH0H1=H1H2=H2H3=H3
212/2
310.9114380.411438
410.9719602/20.235147

Table 4.

Frequency domain prototype filter coefficients [20].

ResourceAvailableStatic part (total)Reconfig. overheadRP1RP2RP3All RPs
DFSDPR
Slice13,3004210244241400240032007000
LUT53,20010,700759385600960012,80028,000
FF106,40013,11079129211,20019,20025,60056,000
BRAM1407.501.520403090
DSP220000408040160

Table 5.

Post place-and-route resource utilization for the static and reconfigurable system parts.

The resource utilization of each modulator is presented in Table 6. The results lead to a key observation: the hardware virtualization achieved with the 7000 slices, 90 BRAMs and 160 DSPs reserved by the three RPs allows the implementation of six baseband modulators, which would need 11,322 slices, 99.5 BRAMs and 106 DSPs in total. Adding these virtualized resources to the static resources exceeds the available xc7z020 slices by 17%. This is an unequivocal demonstration of the resource efficiency benefits that DPR brings to multimode baseband processors. An equivalent static multimode design could benefit from the reuse of common hardware blocks between different modulator datapaths (especially between OFDM and FBMC datapaths). However, implementing the multidimensional baseband modulator as a static multimode design would be challenging given the resource budget available on cost-optimized devices like the xc7z020. There are FPGA/SoC devices with larger area and logic density. However, using them would decrease the system’s cost-effectiveness: an FPGA with a larger chip area is more expensive and likely to consume more power [37].

ResourceMode 1Mode 2
OFDMFBMCUFMCOFDMFBMCUFMC
Slice101515752315112622103100
LUT2829510380903400787611,782
FF210723076279217022849912
BRAM71911.510.54011.5
DSP142118142118

Table 6.

Post place-and-route resource utilization for each baseband modulator datapath.

Device, xc7z020; fclk= 100 MHz.

Considering the modes of operation shown in Tables 13, and that all 3 RPs are in use, the proposed design supports 32 combinations of baseband modulators: 2RP1modes×4RP2modes×4RP3modes. The use of DPR simplifies system upgrade with new modes of operation in order to extend the system’s useful lifetime. The addition of modes of operation is not limited by the available resources on the FPGA device, but instead by the resources reserved by the RPs and the capacity to store partial bitstreams (512 MB DDR memory, in this case).

During the DPR design with the Xilinx Vivado EDA tool, the different system configurations are created from a design checkpoint that saves the floorplanning and routing of the system’s static part, leaving the RPs as empty black boxes. New configurations can be created by designing new circuit configurations for these black boxes and generating the corresponding partial bitstreams. This design reusability makes the system adaptable and reduces the upgrade design time.

The dynamic power consumption for each modulator datapath and baseband clock frequency was estimated with the power analysis tool from Vivado 2015.2. The high-confidence estimates were performed using placed and routed netlists and accurate node activity files. The results are presented in Table 7. The UFMC modulator modes have a higher dynamic power consumption compared to FBMC and OFDM. This is mainly due to the higher resource usage and node activity of UFMC datapaths. The clock frequency adaptation allowed by DFS results in power savings that tend to be more evident for the most resource-demanding modes of operation (UFMC and FBMC). Compared to a design with baseband clock frequency fixed at 100 MHz, the clock frequency adaptation to:

  • 66.7 MHz results in dynamic power savings between 39 mW (35% reduction in OFDM mode 1) and 82 mW (51% reduction in FBMC mode 2)

  • 33.3 MHz results in dynamic power savings between 79 mW (70% reduction in OFDM mode 1) and 156 mW (67% reduction in UFMC mode 2)

  • 16.7 MHz results in dynamic power savings between 99 mW (88% reduction in OFDM mode 1) and 194 mW (83% reduction in UFMC mode 2)

fclkMode 1Mode 2
OFDMFBMCUFMCOFDMFBMCUFMC
100 MHz113148180123161233
66.7 MHz74841197879155
33.3 MHz342560332877
16.7 MHz14830101039

Table 7.

Dynamic power consumption estimates for the six implemented baseband modulator cores (in).

Device, xc7z020; analysis tool, Vivado 2015.2; post place-and-route power analysis with high confidence level; node activity derived from post place-and-route simulation.

For the set of baseband clock frequencies defined, the DFS procedure took on average 47 μs to modify the clock frequency, a latency which is acceptable in 5G NR communications.

In the multidimensional baseband modulator, the area and amount of RP resources are higher than in the individual designs, resulting in larger bitstream sizes. However, the reconfiguration speed was increased through ICAP overclocking. Table 8 quantifies the DPR latency and compressed bitstream size for the worst-case scenario in each RP. The largest RP (RP3) takes up to 767 μs to be reconfigured, corresponding to the transfer of a 596 kB bitstream to the ICAP. In all DPR latency measurements, the reconfiguration throughput was at least 790 MB/s. This value is about 99% of the theoretical ICAP throughput, considering 32-bit transfers and overclocking at 200 MHz. In general, the DPR latency for each individual RP is below 1 ms, while the overall reconfiguration of the three RPs takes less than 2 ms. These latency values are within an acceptable range considering the control plane requirements from [38].

CharacteristicRP1RP2RP3
DPR latency400 μs677 μs767 μs
Partial bitstream size309 kB526 kB596 kB

Table 8.

Measured DPR latency and size of compressed partial bitstreams for the worst-case scenarios.

The ITU report [38] states that in critical, ultralow-latency scenarios, a make-before-break approach must be adopted to completely mitigate the control plane latency. In other words, the control plane latency must be hidden by setting up a new communication channel before breaking the current one. Under these circumstances, a high-priority communication can reserve a spare RP to seamlessly adapt the transmission mode. This scenario is exemplified in Figure 6. Let us assume that RP1 is currently performing baseband modulation for an ultralow-latency communication. This transmission needs to be adapted from OFDM mode 1 to OFDM mode 2, without breaking the current communication link. RP2 is currently unused and is reconfigured to OFDM mode 2 before baseband processing at RP1 terminates. In this way, the baseband processing datapath can be modified without incurring any latency penalty due to DPR.

Figure 6.

Example of make-before-break approach to mitigate DPR latency.

Advertisement

5. Conclusion

This chapter presents a reconfigurable, multidimensional baseband modulator architecture suitable for multimode, multiple waveform coexistence and dynamic spectrum aggregation scenarios. The design combines the runtime specialization of computation and performance. By featuring three independent and reconfigurable baseband modulators, the architecture allows the processing of up to three component carriers using different waveforms (OFDM, FBMC and UFMC) and/or numerologies. The total reconfigurable area of the system covers more than half the available xc7z020 resources; the ICAP overclocking contributes to maintain the DPR latency low enough for the analyzed scenarios. In this design, the performance specialization through DFS resulted in dynamic power savings of up to 194 mW. Besides flexibility, scalability and forward compatibility, cost-effectiveness is perhaps the most relevant feature of this architecture. It is clearly demonstrated how the hardware virtualization through DPR enables implementations that exceed the hardware resources available on an FPGA device. This allows for system implementations on a small-form, cost-optimized devices with immediate cost and power consumption benefits and without compromising system functionality.

Advertisement

Acknowledgments

This work was financed by the ERDF (European Regional Development Fund) through the Operational Programme for Competitiveness and Internationalization (COMPETE) 2020 Programme within Project POCI-01-0145-FEDER-006961 and by the National Fund through a Ph.D. Grant (PD/BD/105860/2014) from the FCT (Fundação para a Ciência e a Tecnologia) (Portuguese Foundation for Science and Technology).

References

  1. 1. ITU-R. IMT Vision—Framework and Overall Objectives of the Future Development of IMT for 2020 and beyond. ITU-R; 2015. ITU-R M.2083-0
  2. 2. TS G. NR; NR and NG-RAN Overall Description; Stage 2 (Release 15); 2018. 38.300 V15.3.1. Available from: http://www.3gpp.org/DynaReport/38-series.htm
  3. 3. Andrews JG, Buzzi S, Choi W, Hanly SV, Lozano A, Soong ACK, et al. What will 5G be? IEEE Journal on Selected Areas in Communications. 2014;32(6):1065-1082
  4. 4. Luo FL, Zhang C. Signal Processing for 5G: Algorithms and Implementations. United Kingdom: John Wiley & Sons Ltd; 2016
  5. 5. Jue G. Exploring 5G Coexistence Scenarios Using a Flexible Hardware/Software Testbed—Application Note; 2017
  6. 6. Zhao Q, Sadler BM. A survey of dynamic Spectrum access. IEEE Signal Processing Magazine. 2007;24(3):79-89
  7. 7. Akyildiz IF, Nie S, Lin SC, Chandrasekaran M. 5G roadmap: 10 key enabling technologies. Computer Networks. 2016;106:17-48
  8. 8. Crockett LH, Elliot RA, Enderwitz MA, Stewart RW. The Zynq Book: Embedded Processing with the ARM Cortex-A9 on the Xilinx Zynq-7000 all Programmable SoC. Glasgow, United Kingdom: Strathclyde Academic Media; 2014
  9. 9. Delahaye JP, Palicot J, Moy C, Leray P. Partial reconfiguration of FPGAs for dynamical reconfiguration of a software radio platform. In: 16th IST Mobile and Wireless Communications Summit. Budapest: IEEE; 2007. pp. 1-5. DOI: 10.1109/ISTMWC.2007.4299250
  10. 10. Delorme J, Martin J, Nafkha A, Moy C, Clermidy F, Leray P, et al. A FPGA partial reconfiguration design approach for cognitive radio based on NoC architecture. In: 2008 Joint 6th International IEEE Northeast Workshop on Circuits and Systems and TAISA Conference. Montreal, QC: IEEE; 2008. pp. 355-358. DOI: 10.1109/NEWCAS.2008.4606394
  11. 11. He K, Crockett L, Stewart R. Dynamic reconfiguration technologies based on FPGA in software defined radio system. Journal of Signal Processing Systems. 2011;69(1):75-85
  12. 12. Vipin K, Fahmy SA. Mapping adaptive hardware systems with partial reconfiguration using CoPR for Zynq. In: 2015 NASA/ESA Conference on Adaptive Hardware and Systems (AHS). Montreal, QC: IEEE; 2015. pp. 1-8. DOI: 10.1109/AHS.2015.7231169
  13. 13. Rihani MAF, Mroue M, Prévotet JC, Nouvel F, Mohanna Y. ARM-FPGA-based platform for reconfigurable wireless communication systems using partial reconfiguration. EURASIP Journal on Embedded Systems. 2017;2017(1):35
  14. 14. Shreejith S, Banarjee B, Vipin K, Fahmy SA. Dynamic cognitive radios on the Xilinx Zynq Hybrid FPGA. In: Weichold M, Hamdi M, Shakir M, Abdallah M, Karagiannidis G, Ismail M, editors. Cognitive Radio Oriented Wireless Networks. CrownCom 2015. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering. Vol. 156. Cham: Springer; 2015. DOI:10.1007/978-3-319-24540-9_35
  15. 15. Pham TH, Fahmy SA, McLoughlin IV. An end-to-end multi-standard OFDM transceiver architecture using FPGA partial reconfiguration. IEEE Access. 2017;5:21002-21015
  16. 16. Ferreira ML, Barahimi A, Ferreira JC. Reconfigurable FPGA-based FFT processor for cognitive radio applications. In: Proceedings of the Applied Reconfigurable Computing: 12th International Symposium, ARC 2016; March 22–24 March 2016; Mangaratiba, RJ, Brazil: Springer International Publishing; 2016. pp. 223-232
  17. 17. He S, Torkelson M. A new approach to pipeline FFT processor. In: Proceedings of International Conference on Parallel Processing. Honolulu, HI, USA: IEEE; 1996. pp. 766-770. DOI: 10.1109/IPPS.1996.508145
  18. 18. Löfgren J, Nilsson P. On hardware implementation of radix 3 and radix 5 FFT kernels for LTE systems. In: 2011, NORCHIP. Lund: IEEE; 2011. pp. 1-4. DOI: 10.1109/NORCHP.2011.6126703
  19. 19. Doré JB, Gerzaguet R, Cassiau N, Ktenas D. Waveform contenders for 5G: Description, analysis and comparison. Physical Communication. Elsevier; 2017;24:46-61. DOI: 10.1016/j.phycom.2017.05.004. ISSN: 1874-4907
  20. 20. Bellanger M, Ruyet DL, Roviras D, Terr’e M, Nossek J, Baltar L, et al. FBMC physical layer: A primer. PHYDYAS Project; 2010
  21. 21. Carvalho M. FPGA implementation of a baseband processor for FBMC transmission [MSc thesis]. Faculty of Engineering of the University of Porto; 2017
  22. 22. Bellanger M. FS-FBMC: An alternative scheme for filter bank based multicarrier transmission. In: 2012 5th International Symposium on Communications, Control and Signal Processing. Rome: IEEE; 2012. pp. 1-4
  23. 23. Wang X, Wild T, Schaich F, dos Santos AF. Universal filtered multi-carrier with leakage-based filter optimization. In: European Wireless 2014; 20th European Wireless Conference. Barcelona, Spain: VDE; 2014. pp. 1-5
  24. 24. Jafri AR, Majid J, Zhang L, Imran MA, Najam-ul-Islam M. FPGA implementation of UFMC based baseband transmitter: Case study for LTE 10MHz channelization. Wireless Communications and Mobile Computing. Hindawi; 2018;2018:1-12. Article ID: 2139794. DOI: 10.1155/2018/2139794
  25. 25. Nadal J, Nour CA, Baghdadi A. Flexible hardware platform for demonstrating new 5G waveform candidates. In: 2017 29th International Conference on Microelectronics (ICM). Beirut: IEEE; 2017. pp. 1-4. DOI: 10.1109/ICM.2017.8268851
  26. 26. Vakilian V, Wild T, Schaich F, ten Brink S, Frigon J. Universal-filtered multi-carrier technique for wireless systems beyond LTE. In: 2013 IEEE Globecom Workshops (GC Wkshps). Atlanta, GA: IEEE; 2013. pp. 223-228. DOI: 10.1109/GLOCOMW.2013.6824990
  27. 27. Knopp R, Kaltenberger F, Vitiello C, Luise M. Universal filtered multicarrier for machine type communications in 5G. In: Proceedings of EUCNC 2016, European Conference on Networks and Communications; 2016. Available from: http://www.eurecom.fr/publication/4910. Unpublished material provided by EURECOM
  28. 28. Schaich F, Wild T, Chen Y. Waveform contenders for 5G—Suitability for short packet and low latency transmissions. In: 2014 IEEE 79th Vehicular Technology Conference (VTC Spring). Seoul: IEEE; 2014. pp. 1-5. DOI: 10.1109/VTCSpring.2014.7023145
  29. 29. Parvez I, Rahmati A, Guvenc I, Sarwat AI, Dai H. A survey on low latency towards 5G: RAN, core network and caching solutions. IEEE Communications Surveys Tutorials. 2018;20(4):3098-3130
  30. 30. Lopes Ferreira M, Canas FJ. An FPGA-oriented baseband modulator architecture for 4G/5G communication scenarios. Electronics. 2019;8(1):1-19
  31. 31. Bhushan N, Ji T, Koymen O, Smee J, Soriaga J, Subramanian S, et al. Industry perspective—5G air Interface system design principles. IEEE Wireless Communications. 2017;24(5):6-8
  32. 32. Yuan G, Zhang X, Wang W, Yang Y. Carrier aggregation for LTE-advanced mobile communication systems. IEEE Communications Magazine. 2010;48(2):88-93
  33. 33. Kaltenberger F, Knopp R, Vitiello C, Danneberg M, Festag A. Experimental analysis of 5G candidate waveforms and their coexistence with 4G systems. In: XAPP888 - MMCM and PLL Dynamic Reconfiguration. 2015. Available from: http://www.eurecom.fr/fr/publication/4725/download/cm-publi-4725.pdf. Unpublished material provided by EURECOM
  34. 34. UG909 - Vivado Design Suite User Guide: Partial Reconfiguration; 2015
  35. 35. Claus C, Ahmed R, Altenried F, Stechele W. Towards rapid dynamic partial reconfiguration in video-based driver assistance systems. In: Sirisuk P, Morgan F, El-Ghazawi T, Amano H, editors. Applied Reconfigurable Computing: Architectures, Tools and Applications. Springer: Berlin Heidelberg; 2010. pp. 55-67
  36. 36. Tatsukawa J. XAPP888 - MMCM and PLL Dynamic Reconfiguration; V1.7. Xilinx Inc.; April 2017
  37. 37. Vipin K, Fahmy SA. FPGA dynamic and partial reconfiguration: A survey of architectures, methods, and applications. ACM Computing Surveys. 2018;51(4):1-39
  38. 38. ITU-R. Minimum Requirements Related to Technical Performance for IMT-2020 Radio Interface(s). ITU-R; 2017. M.2410–0. Available from: https://www.itu.int/pub/R-REP-M.2410-2017

Written By

Mário Lopes Ferreira and João Canas Ferreira

Submitted: 11 June 2019 Reviewed: 21 January 2020 Published: 28 February 2020