Parameter values supported by the OFDM datapath.
The fifth-generation (5G) revolution represents more than a mere performance enhancement of previous generations: it will deeply transform the way humans and/or machines interact, enabling a heterogeneous expansion in the number of use cases and services. Crucial to the realization of this revolution is the design of hardware components characterized by high degrees of flexibility, versatility and resource/power efficiency. This chapter proposes a field-programmable gate array (FPGA)-oriented baseband processing architecture suitable for fast-changing communication environments such as 4G/5G waveform coexistence, noncontiguous carrier aggregation (CA) or centralized cloud radio access network (C-RAN) processing. The proposed architecture supports three 5G waveform candidates and is shown to be upgradable, resource-efficient and cost-effective. Through hardware virtualization, enabled by dynamic partial reconfiguration (DPR), the design space exploration of our architecture exceeds the hardware resources available on the Zynq xc7z020 device. Moreover, dynamic frequency scaling (DFS) enables the runtime adjustment of processing throughput and power reductions by up to 88%. The combined resource overhead for DPR and DFS is very low, and the reconfiguration latency stays two orders of magnitude below the control plane latency requirements proposed for 5G communications.
- reconfigurable computing
- dynamic partial reconfiguration
- baseband processing
- waveform coexistence
- carrier aggregation
The fifth-generation (5G) cellular network technology will have a tremendous impact on society by optimizing existing telecommunication services and applications and enabling solutions in new application fields, such as transportation, education or medical science. The scope of the anticipated changes is clear from the three main types of 5G use cases and services defined by the International Telecommunication Union (ITU): enhanced mobile broadband (eMBB), ultra-reliable and low-latency communications (URLLC) and massive machine-type communications (mMTC) . Therefore, the handling of the physical layer (PHY) for 5G systems will be far more complex than in the current generation.
The expansion of wireless communication caused by 5G systems and services raises concerns about the inefficient use of the electromagnetic spectrum. In addition, to expand spectrum utilization to frequency bands above 6 GHz, a more efficient spectral utilization of heavily used bands must be achieved. To tackle this issue, future baseband processor designs should support
In summary, baseband processing infrastructures for 5G systems must be (1)
After a brief summary of the state of the art in Section 2, the implementation of datapaths for baseband processing of three waveforms (OFDM, FBMC and UFMC) is described in Section 3. The implementation of a dynamically reconfigurable baseband modulator that combines these datapaths is described in Section 4, together with a discussion of the results. Some final remarks are presented in Section 5.
2. Summary of the state of the art
Application of DPR to baseband processing in wireless communications started with the adoption of small-scale and relatively simple functional elements such as FIR filters, constellation mappers or channel encoders [9, 10]. Possibly the first multi-waveform flexible PHY architecture was proposed by He et al. . It is a software-defined radio (SDR) architecture implemented on a Xilinx Virtex-5 FPGA, which combines two reconfiguration techniques: (a) DPR to dynamically change the baseband processing mode of operation (e.g. FFT size, modulation scheme and CP length) and (b) DFS to adapt the clock frequency of the digital up-converter and the baseband processor. The design supports two waveforms (OFDM and WCDMA) and several 3G/4G standards and modes of operation. Compared with a static multimode design, the DPR-based design achieves a reduction in the number of used slices, DSP blocks (DSPs) and block RAMs (BRAMs). However, the comparison is not accurate as the static design uses parallel and independent processing chains for each standard, ignoring potential optimization from the reutilization of common modules.
CoPR, an automated framework for DPR-based adaptive systems on a Xilinx Zynq device, is described in . An illustrative case study is presented, where a reconfigurable multistandard baseband OFDM transmitter is designed. The design supports three standards (IEEE 802.11, IEEE 802.16 and IEEE 802.22) and contains two reconfigurable partitions (RPs): one to implement the digital modulation scheme and the other for the OFDM processing datapath. The paper only reports reconfiguration time results and does not provide figures for power consumption or the amount of resources of each RP.
An ARM-FPGA-based platform is also used in . Several processes run on the ARM processor and retrieve communication environment information, which is employed by a configuration controller to reconfigure an OFDM baseband processing modulator. An OFDM transmitter supporting Wi-Fi and WiMAX is implemented on Zynq’s programmable logic (PL) with four RPs used for scrambling, interleaving, FEC encoding and IFFT. Results for resource utilization and DPR latency are discussed, together with power consumption measurements. However, the sampling period used for the measurements is of the order of magnitude of the reconfiguration times (milliseconds) and, therefore, not suitable for accurate real-time measurements at the time scale of interest.
The Zynq is also used in , which presents a HW/SW codesign for CR systems combining parameter reconfiguration and DPR. Only DPR latency and RP resources are reported.
Pham et al.  present a reconfigurable multistandard OFDM transceiver supporting IEEE 802.11, IEEE 802.16 and IEEE 802.22 on a Xilinx Virtex-6 FPGA. The modulator uses a single RP, whereas the demodulator explores a mixture of DPR-based and static multimode modules. The authors put more weight on the whole architecture and only provide data about reconfiguration times and bitstream size.
All works mentioned target 3G/4G standards and waveforms. From a system perspective, they focus primarily on the enhanced flexibility DPR can offer, with less attention paid to the global impact of this technique on the design of the hardware infrastructure. Additionally, no architecture with multiple and independent processors suitable for noncontiguous spectrum aggregation is studied.
3. Implementation of datapaths for baseband processing
This section describes the implementation of pipelined datapaths for three different waveforms (OFDM, FBMC and UFMC) and respective variants. Each variant is defined by the values assigned to the parameters of the design. The possible sets of values are sometimes called “numerologies” in the literature. In this case study, two sets of parameter values for each waveform are considered, as described in the remainder of this section and summarized in Tables 1–3.
|Parameter||Mode 1||Mode 2|
|# subcarriers, ||512||1024|
|length of cyclic prefix, ||40 (1st slot symbol)||80 (1st slot symbol)|
|36 (other symbols)||72 (other symbols)|
|# WOLA samples, ||4||6|
|Parameter||Mode 1||Mode 2|
|# subcarriers, Nc||512||1024|
|Overlapping factor, ||4||4|
|IFFT size, K Nc||2048||4096|
|Parameter||Mode 1||Mode 2|
|# subcarriers, ||512||1024|
|# subcarriers per PRB||12||12|
|# active PRBs||3||3|
|IFFT size, ||64||64|
|Upsampling factor, ||8||16|
|Filter length, ||37||73|
|Filter type||Dolph-Chebyshev (60-dB side lobe attenuation)|
3.1 Baseband datapath for OFDM
OFDM is the main reference among multicarrier modulation waveforms; it is used in a wide range of standards such as DSL, DVB-T, DVB-C, Wi-Fi (IEEE 802.11), WiMAX (IEEE 802.16) and 3GPP LTE. The conceptual structure of an OFDM modulator is illustrated in Figure 1.
With exception of the inverse FFT (IFFT), the tasks required for a modulator involve only simple arithmetic, data selection and reordering. The first module is the QAM mapper. For a general
After digital modulation, the
The IFFT module implements a Cooley-Tukey Mixed-Radix algorithm using a pipelined single-delay feedback architecture as in . The IFFT module has several processing stages which are comprised of shift registers, ROM memories, complex multipliers and arithmetic blocks (called “butterflies”). Information on the internal structures of Radix-22 and Radix-2 butterflies can be found in [17, 18]. Apart from processing elements, the IFFT module also includes blocks for input data reordering and bit-reversed reordering of intermediate results, which are performed with RAM-based double buffers.
The IFFT module produces time-domain OFDM symbols. The next module in the datapath is responsible for
The final module performs the
3.2 Baseband datapath for FBMC
The conceptual structure of the FBMC baseband modulator implemented for this work is shown in Figure 2. The
The following datapath modules are mainly characterized by parameters and . The
The frequency spreading operation comprises
The final operation is to
3.3 Baseband datapath for UFMC
UFMC, sometimes called Universal Filtered OFDM (UF-OFDM), is an OFDM-based waveform that attempts to reduce OOB emissions by time-domain filtering. The
The classic UFMC modulation scheme  uses an
The UFMC modulator of this work has three processing branches, one to process each active PRB (B = 3). These branches share the same architecture and start with QAM mapping of the incoming data. The QAM mapper is equal to the one used in the OFDM and FBMC datapaths. The
UFMC performs well for short-packet lengths and sporadic burst transmission [28, 29]. Moreover, the parallel sub-band processing in UFMC requires an IFFT core per branch. Therefore, instead of the high-performance pipelined IFFT architectures adopted for the OFDM and FBMC datapaths, low-resource memory-based FFT architectures are adopted in the UFMC modulator. The memory-based architecture adopted here is detailed in . The
Dolph-Chebyshev FIR filters with a transpose structure are used for bandpass sub-band filtering. Again, the FIR coefficients are symmetric: there are an odd number of symmetric coefficients, and the centre coefficient is equal to one. However, the higher FIR order used in UFMC modulation requires further discussion. Considering an
In Xilinx FPGAs, non-trivial multiplications can be efficiently performed by DSP blocks. These blocks are embedded into the logic fabric in a column arrangement. Cost-optimized devices have a smaller amount of DSP blocks, and their utilization should be carefully considered. For instance, the xc7z020 device has 220 DSP blocks. Considering the modes of operation from Table 4, the overall amount of non-trivial multiplications for FIR filtering () is 108 for mode 1 and 216 for mode 2. This represents a high DSP utilization, and the sparse distribution of these types of blocks throughout the logic fabric degrades the scalability of the UFMC modulator. In addition, placement and routing of the design is more difficult and likely to affect the overall timing closure. To reduce DSP utilization, a multiplier-less architecture for FIR filters was adopted. The FIR coefficients are represented in Q1.5 format, using the Canonic Signed Digit (CSD) system with minimum non-zero bits. Then, non-trivial multiplications are substituted by shifters and adders. For example, the multiplication by 0.90625 can be implemented as:
This strategy eliminates the use of DSP blocks in FIR filters, but increases slice utilization. However, slices are the most numerous type of resource (13,300 slices in the xc7z020 device), making this approach well-suited for the present application.
After FIR filtering, each sub-band signal is shifted to the corresponding frequency band. The
4. A dynamically reconfigurable baseband modulator for 5G communication
After the preceding overview of the architecture of high-performance baseband engines for three different waveforms, this section presents the architecture of a baseband processing engine that is
The baseband architecture presented in this chapter features three independent modulators, whose functionality and clock frequency can be dynamically reconfigured through DPR and DFS, respectively. This setup enables the processing of multiple component carriers with different waveforms and/or baseband parameters in noncontiguous CA schemes.
A prototype of the multidimensional baseband modulator was implemented on an Avnet Zedboard equipped with a Zynq xc7z020 device. The system top level combines features from the designs of the previous section and can be divided into three parts. The Zedboard’s 512 MB DDR memory is used as a repository for the partial bitstreams used for DPR. The Zynq’s ARM CPU act as the system management unit: it is responsible for triggering the reconfiguration of the multidimensional baseband modulator and setting up data transfers between the DDR memory and the modulators implemented in the programmable logic together with the infrastructure for DPR and DFS. Figure 4 shows the top-level architecture.
The proposed architecture targets the communication scenario described in , which combines
During system initialization, the ARM CPU manages the downloading of partial bitstreams and input data files from an SD card to the DDR memory. For the purpose of validating the baseband engines, the input data is retrieved from the DDR and sent to the baseband modulator(s), and the results are stored back to the DDR (and used for validating the implementation). Each RP has an associated DMA controller to accelerate the access to the DDR.
To achieve the specialization of computation at runtime, the configuration interface adopted is the ICAP. This high-bandwidth internal interface permits the FPGA to reconfigure itself. Xilinx sets the maximum ICAP bandwidth at 400, for a 100 clock frequency and 32-bit data width . Nevertheless, the ICAP can be overclocked to further enhance the reconfiguration throughput . In the present work, the ICAP is overclocked at 200 MHz. To take advantage of ICAP overclocking, a dedicated DMA controller is used to accelerate the transfer of partial bitstreams to the ICAP.
The implementation of DFS follows the reference design from . This design considers an FSM that reads configuration parameters from a ROM and writes them to the clock management module available in the FPGA. To change the frequency of the output clocks, the input signal
A general overview of the resource utilization of the prototype is presented in Table 5. The static part occupies around 32 and 5% of the slices and BRAMs, respectively. Apart from PS-PL interconnect cores and DMA controllers to accelerate the baseband modulators, the static part also implements the infrastructure for reconfiguration (DPR and DFS). The hardware required to implement DPR and DFS is below 2% of the available LUTs, FFs and BRAMs. The three RPs form the system’s reconfigurable part and occupy 52.6, 64.3 and 72.7% of the available slices, BRAMs and DSPs, respectively. Overall, the resource utilization for the complete system implementation represents a considerable share of the resources available in the xc7z020 device: 84.3% of slices, 69.7% of BRAMs and 72.7% of DSPs.
|Resource||Available||Static part (total)||Reconfig. overhead||RP1||RP2||RP3||All RPs|
The resource utilization of each modulator is presented in Table 6. The results lead to a key observation: the hardware virtualization achieved with the 7000 slices, 90 BRAMs and 160 DSPs reserved by the three RPs allows the implementation of six baseband modulators, which would need 11,322 slices, 99.5 BRAMs and 106 DSPs in total. Adding these
|Resource||Mode 1||Mode 2|
Considering the modes of operation shown in Tables 1–3, and that all 3 RPs are in use, the proposed design supports 32 combinations of baseband modulators:
During the DPR design with the Xilinx Vivado EDA tool, the different system configurations are created from a design checkpoint that saves the floorplanning and routing of the system’s static part, leaving the RPs as empty
The dynamic power consumption for each modulator datapath and baseband clock frequency was estimated with the power analysis tool from Vivado 2015.2. The high-confidence estimates were performed using placed and routed netlists and accurate node activity files. The results are presented in Table 7. The UFMC modulator modes have a higher dynamic power consumption compared to FBMC and OFDM. This is mainly due to the higher resource usage and node activity of UFMC datapaths. The clock frequency adaptation allowed by DFS results in power savings that tend to be more evident for the most resource-demanding modes of operation (UFMC and FBMC). Compared to a design with baseband clock frequency fixed at 100 MHz, the clock frequency adaptation to:
66.7 MHz results in dynamic power savings between 39 mW (35% reduction in OFDM mode 1) and 82 mW (51% reduction in FBMC mode 2)
33.3 MHz results in dynamic power savings between 79 mW (70% reduction in OFDM mode 1) and 156 mW (67% reduction in UFMC mode 2)
16.7 MHz results in dynamic power savings between 99 mW (88% reduction in OFDM mode 1) and 194 mW (83% reduction in UFMC mode 2)
|Mode 1||Mode 2|
For the set of baseband clock frequencies defined, the DFS procedure took on average 47 μs to modify the clock frequency, a latency which is acceptable in 5G NR communications.
In the multidimensional baseband modulator, the area and amount of RP resources are higher than in the individual designs, resulting in larger bitstream sizes. However, the reconfiguration speed was increased through ICAP overclocking. Table 8 quantifies the DPR latency and compressed bitstream size for the worst-case scenario in each RP. The largest RP () takes up to 767 μs to be reconfigured, corresponding to the transfer of a 596 kB bitstream to the ICAP. In all DPR latency measurements, the reconfiguration throughput was at least 790 MB/s. This value is about 99% of the theoretical ICAP throughput, considering 32-bit transfers and overclocking at 200 MHz. In general, the DPR latency for each individual RP is below 1 ms, while the overall reconfiguration of the three RPs takes less than 2 ms. These latency values are within an acceptable range considering the control plane requirements from .
|DPR latency||400 μs||677 μs||767 μs|
|Partial bitstream size||309 kB||526 kB||596 kB|
The ITU report  states that in critical, ultralow-latency scenarios, a
This chapter presents a reconfigurable, multidimensional baseband modulator architecture suitable for multimode, multiple waveform coexistence and dynamic spectrum aggregation scenarios. The design combines the runtime specialization of computation and performance. By featuring three independent and reconfigurable baseband modulators, the architecture allows the processing of up to three component carriers using different waveforms (OFDM, FBMC and UFMC) and/or numerologies. The total reconfigurable area of the system covers more than half the available xc7z020 resources; the ICAP overclocking contributes to maintain the DPR latency low enough for the analyzed scenarios. In this design, the performance specialization through DFS resulted in dynamic power savings of up to 194 mW. Besides flexibility, scalability and forward compatibility,
This work was financed by the ERDF (European Regional Development Fund) through the Operational Programme for Competitiveness and Internationalization (COMPETE) 2020 Programme within Project POCI-01-0145-FEDER-006961 and by the National Fund through a Ph.D. Grant (PD/BD/105860/2014) from the FCT (Fundação para a Ciência e a Tecnologia) (Portuguese Foundation for Science and Technology).