Flexible Baseband Modulator Architecture for Multi-Waveform 5G Communications

The fifth-generation (5G) revolution represents more than a mere performance enhancement of previous generations: it will deeply transform the way humans and/or machines interact, enabling a heterogeneous expansion in the number of use cases and services. Crucial to the realization of this revolution is the design of hardware components characterized by high degrees of flexibility, versatility and resource/power efficiency. This chapter proposes a field-programmable gate array (FPGA)-oriented baseband processing architecture suitable for fast-changing communication environments such as 4G/5G waveform coexistence, noncontiguous carrier aggregation (CA) or centralized cloud radio access network (C-RAN) processing. The proposed architecture supports three 5G waveform candidates and is shown to be upgradable, resource-efficient and cost-effective. Through hardware virtualization, enabled by dynamic partial reconfiguration (DPR), the design space exploration of our architecture exceeds the hardware resources available on the Zynq xc7z020 device. Moreover, dynamic frequency scaling (DFS) enables the runtime adjustment of processing throughput and power reductions by up to 88%. The combined resource overhead for DPR and DFS is very low, and the reconfiguration latency stays two orders of magnitude below the control plane latency requirements proposed for 5G communications.


Introduction
The fifth-generation (5G) cellular network technology will have a tremendous impact on society by optimizing existing telecommunication services and applications and enabling solutions in new application fields, such as transportation, education or medical science. The scope of the anticipated changes is clear from the three main types of 5G use cases and services defined by the International Telecommunication Union (ITU): enhanced mobile broadband (eMBB), ultrareliable and low-latency communications (URLLC) and massive machine-type communications (mMTC) [1]. Therefore, the handling of the physical layer (PHY) for 5G systems will be far more complex than in the current generation.
Orthogonal frequency-division multiplexing (OFDM) is the preferred waveform in 4G standards, and the 3GPP Release 15 [2] recently defined it as the multiple access scheme for the 5G New Radio (NR) PHY, especially due its high frequency selectivity, flexibility, efficient hardware implementation by FFT/IFFT modules, and good Multiple-Input Multiple-Output (MIMO) compatibility [3]. However, the spectrum of OFDM symbols presents large side lobes that cause high out-of-band (OOB) emissions. Moreover, the interference between adjacent time-domain symbols is mitigated by adding redundancy to each symbol, which reduces spectral efficiency. Together, these characteristics may make 5G requirements in certain communication scenarios hard to achieve, which has led to the proposal of other waveforms [4]. The most popular ones are filter bank multicarrier modulation (FBMC), Universal Filtered Multicarrier modulation (UFMC), Filtered OFDM (f-OFDM) and generalized frequency-division multiplexing (GFDM). Different waveforms imply different baseband processing operations. Especially for sub-6 GHz spectrum bands, the coexistence of multiple numerologies and waveforms and the close interworking between 5G and current systems is likely to occur in the near future [5].
The expansion of wireless communication caused by 5G systems and services raises concerns about the inefficient use of the electromagnetic spectrum. In addition, to expand spectrum utilization to frequency bands above 6 GHz, a more efficient spectral utilization of heavily used bands must be achieved. To tackle this issue, future baseband processor designs should support dynamic spectrum access (DSA) [6] and carrier aggregation (CA) schemes.
In summary, baseband processing infrastructures for 5G systems must be (1) flexible, to adapt their operation for different communication setups (i.e. waveforms and their parameterization); (2) scalable, to tune performance and capacity according to communication demands; (3) resource and power efficient, for costeffectiveness and reduced environmental impact [7]; (4) forward compatible, to easily integrate the support for new services and requirements, extending system lifetime. Modern field-programmable gate arrays (FPGAs) represent an implementation platform that favors the design of systems with the characteristics mentioned. The intrinsic FPGA reconfigurability can be enhanced by means of dynamic partial reconfiguration (DPR), i.e. by reconfiguring modules of the design without halting the system. The hardware virtualization allowed by DPR enhances system flexibility, feature wealth, upgradability and cost-effectiveness [8]. This chapter discusses how DPR and dynamic frequency scaling (DFS) can be combined to produce a dynamically reconfigurable baseband processing architecture for multimode, multi-waveform coexistence and dynamic spectrum aggregation.
After a brief summary of the state of the art in Section 2, the implementation of datapaths for baseband processing of three waveforms (OFDM, FBMC and UFMC) is described in Section 3. The implementation of a dynamically reconfigurable baseband modulator that combines these datapaths is described in Section 4, together with a discussion of the results. Some final remarks are presented in Section 5.

Summary of the state of the art
Application of DPR to baseband processing in wireless communications started with the adoption of small-scale and relatively simple functional elements such as FIR filters, constellation mappers or channel encoders [9,10]. Possibly the first multi-waveform flexible PHY architecture was proposed by He et al. [11]. It is a software-defined radio (SDR) architecture implemented on a Xilinx Virtex-5 FPGA, which combines two reconfiguration techniques: (a) DPR to dynamically change the baseband processing mode of operation (e.g. FFT size, modulation scheme and CP length) and (b) DFS to adapt the clock frequency of the digital up-converter and the baseband processor. The design supports two waveforms (OFDM and WCDMA) and several 3G/4G standards and modes of operation. Compared with a static multimode design, the DPR-based design achieves a reduction in the number of used slices, DSP blocks (DSPs) and block RAMs (BRAMs). However, the comparison is not accurate as the static design uses parallel and independent processing chains for each standard, ignoring potential optimization from the reutilization of common modules.
CoPR, an automated framework for DPR-based adaptive systems on a Xilinx Zynq device, is described in [12]. An illustrative case study is presented, where a reconfigurable multistandard baseband OFDM transmitter is designed. The design supports three standards (IEEE 802.11, IEEE 802.16 and IEEE 802.22) and contains two reconfigurable partitions (RPs): one to implement the digital modulation scheme and the other for the OFDM processing datapath. The paper only reports reconfiguration time results and does not provide figures for power consumption or the amount of resources of each RP.
An ARM-FPGA-based platform is also used in [13]. Several processes run on the ARM processor and retrieve communication environment information, which is employed by a configuration controller to reconfigure an OFDM baseband processing modulator. An OFDM transmitter supporting Wi-Fi and WiMAX is implemented on Zynq's programmable logic (PL) with four RPs used for scrambling, interleaving, FEC encoding and IFFT. Results for resource utilization and DPR latency are discussed, together with power consumption measurements. However, the sampling period used for the measurements is of the order of magnitude of the reconfiguration times (milliseconds) and, therefore, not suitable for accurate real-time measurements at the time scale of interest.
The Zynq is also used in [14], which presents a HW/SW codesign for CR systems combining parameter reconfiguration and DPR. Only DPR latency and RP resources are reported.
Pham et al. [15] present a reconfigurable multistandard OFDM transceiver supporting IEEE 802.11, IEEE 802.16 and IEEE 802.22 on a Xilinx Virtex-6 FPGA. The modulator uses a single RP, whereas the demodulator explores a mixture of DPR-based and static multimode modules. The authors put more weight on the whole architecture and only provide data about reconfiguration times and bitstream size.
All works mentioned target 3G/4G standards and waveforms. From a system perspective, they focus primarily on the enhanced flexibility DPR can offer, with less attention paid to the global impact of this technique on the design of the hardware infrastructure. Additionally, no architecture with multiple and independent processors suitable for noncontiguous spectrum aggregation is studied.

Implementation of datapaths for baseband processing
This section describes the implementation of pipelined datapaths for three different waveforms (OFDM, FBMC and UFMC) and respective variants. Each variant is defined by the values assigned to the parameters of the design. The possible sets of values are sometimes called "numerologies" in the literature. In this case study, two sets of parameter values for each waveform are considered, as described in the remainder of this section and summarized in Tables 1-3.

Baseband datapath for OFDM
OFDM is the main reference among multicarrier modulation waveforms; it is used in a wide range of standards such as DSL, DVB-T, DVB-C, Wi-Fi (IEEE 802.11), WiMAX (IEEE 802.16) and 3GPP LTE. The conceptual structure of an OFDM modulator is illustrated in Figure 1.
With exception of the inverse FFT (IFFT), the tasks required for a modulator involve only simple arithmetic, data selection and reordering. The first module is the QAM mapper. For a general M-ary QAM case, the module is simply Filter length, L 37 73 Filter type Dolph-Chebyshev (60-dB side lobe attenuation) Table 3.
Parameter values supported by the UFMC datapath. implemented with an M:1 multiplexer: a log 2 M-bit input signal selects a complex value out of the M prestored constants that form the constellation. In the implementation used for this work, Gray mapping and average power normalization are considered in the definition of the constellation point values.
After digital modulation, the subcarrier mapping module is responsible for mapping the A input active subcarriers to the central bins of an N-element array and zeroing the centre bin (the DC null subcarrier). The remaining N À A À 1 bins correspond to null subcarriers that serve as guard bands. As the IFFT DC bin is at index 0, an IFFT shift operation is performed on the N-element array. The resulting vector is then fed to the IFFT core. Here, it is assumed that the A active subcarriers include both data and pilot subcarriers and that the higher levels of the communication system provide them in their correct relative locations. The main modules required for subcarrier mapping are a double buffer and a control unit. The double buffer is implemented with a dual-port RAM, with each half storing N complex samples. This allows for simultaneous reading and writing of consecutive A-element arrays without any data conflicts: while one buffer is used for input (writing), the other is used for output (reading). The read/write access to the double buffer is managed by a control unit that receives and correctly maps the data to the correct IFFT input bin. The index mapping scheme implemented by the control unit combines subcarrier mapping and IFFT shifting.
The IFFT module implements a Cooley-Tukey Mixed-Radix algorithm using a pipelined single-delay feedback architecture as in [16]. The IFFT module has several processing stages which are comprised of shift registers, ROM memories, complex multipliers and arithmetic blocks (called "butterflies"). Information on the internal structures of Radix-2 2 and Radix-2 butterflies can be found in [17,18]. Apart from processing elements, the IFFT module also includes blocks for input data reordering and bit-reversed reordering of intermediate results, which are performed with RAM-based double buffers.
The IFFT module produces time-domain OFDM symbols. The next module in the datapath is responsible for cyclic prefix (CP) insertion. It receives a data array of size N (corresponding to a time-domain OFDM symbol) and stores it in memory. Then, the module starts to read and output the last L CP + W memory positions. The cyclic prefix extension by W samples allows for the following weighted overlap and add (WOLA) operation. After outputting the last L CP + W memory positions, the CP insertion unit continues by reading and outputting the complete OFDM symbol from the beginning. Thus, the output of this module is an extended ODFM symbol with N þ L CP þ W complex samples. Its main hardware elements are an N-elements dual-port RAM and a unit for controlling write/read memory operations.
The final module performs the WOLA operation. It can be divided into two stages: first, OFDM symbols are multiplied by a window (windowing), and then the symbol's tail is overlapped and added with next symbol's head (overlap-and-add). The windowing operation is implemented using two multipliers (to handle the real and imaginary parts) and a ROM memory with prestored non-unitary raised-cosine window coefficients. In turn, the overlap-and-add operation is implemented with a finite-state machine (FSM) and arrays of registers to temporarily store each symbol's head and tail.

Baseband datapath for FBMC
The conceptual structure of the FBMC baseband modulator implemented for this work is shown in Figure 2. The OQAM mapper consists of two stages: first, the incoming data is QAM-modulated; then, the resulting in-phase and quadrature components are decoupled and alternately transmitted on successive subcarriers and on successive transmitted symbols [19]. For instance, if the a symbol includes the in-phase (I) and quadrature (Q) components with the pattern I,Q ,I,Q ,..., the next symbol will use the pattern Q ,I,Q ,I.... The QAM mapper is implemented as for the OFDM modulator. The I/Q decoupling is efficiently performed with an FSM that alternately stores or outputs the I/Q components of a QAM symbol.
The following datapath modules are mainly characterized by parameters K and N c . The guard band insertion module places the OQAM symbols in the central bins of an N c -element array. The remaining subcarriers are zero and represent frequency guard bands. The operation of this module is similar to subcarrier mapping in OFDM modulation, except that that no DC null component is inserted.
The frequency spreading operation comprises upsampling by K and FIR filtering. The upsampler outputs K À 1 zero values between two incoming I/Q samples. It uses registers to store the input data and a counter to control the number of zero values at the output. For pulse shaping, a FIR filter architecture with a transpose structure was adopted because, unlike the direct FIR model, it does not require an extra input shift register, nor a tree of pipelined adders to achieve high throughput. The number of filter coefficients is odd (2 Â K À 1), and their values are symmetric with a single-centre coefficient equal to one ( Table 4). The multiplications by the centre coefficient can be ignored, as they do not affect the input value. However, the remaining coefficients imply non-trivial multiplications. The amount of nontrivial multiplications per FIR filter can be halved by exploiting the symmetry of the coefficients. As the sub-band signal is complex-valued, two FIR filters are required to separately filter the real and imaginary parts. The IFFT modules are the same as those used in the OFDM modulator.
The final operation is to overlap-and-add consecutive IFFT output stream blocks delayed by N c =2 samples [19]. This operation uses an array of 2 Â K Â N c elements as temporary storage; the first half stores the current FBMC symbol, and the second half accumulates IFFT output blocks. For each IFFT output block, the whole array is shifted by N c =2 positions and then the IFFT output block is added to the second half of the array. A direct mapping of this approach to a hardware implementation would require the use of replicated memory structures to perform two read operations per clock cycle on the temporary array [21]. Instead, the overlap-and-add (OAA) module used was inspired by the architecture used in [22]. The main difference has to do with the fact that OQAM is not employed in [22] and, for the overlap-  and-add operation, the consecutive IFFT output block streams are delayed by N c . To continuously accumulate consecutive IFFT output blocks delayed by N c =2, a feedback shift register of 2 Â K À 1 ð ÞÂN c =2 samples is used to align the previous IFFT block with the incoming IFFT block.

Baseband datapath for UFMC
UFMC, sometimes called Universal Filtered OFDM (UF-OFDM), is an OFDMbased waveform that attempts to reduce OOB emissions by time-domain filtering. The N subcarriers of each symbol are divided into B physical resource blocks (PRBs) of N=B subcarriers each. Usually, only part of the PRBs is used for transmission (active PRBs). For each active PRB, IFFT and bandpass L-order FIR filtering are performed. Instead of the CP, a zero-valued guard interval with length L is inserted after the IFFT. Frequency-shifted versions of the FIR filter are applied to all active PRBs, and, finally, the filtered sub-bands are superimposed to form an UFMC multicarrier symbol. Chebyshev filters are normally used for bandpass filtering in UFMC [23][24][25].
The classic UFMC modulation scheme [26] uses an N-point IFFT and FIR filters with complex coefficients for each active sub-band. To reduce this increased complexity, Knopp et al. [27] combine a smaller N 0 -point IFFT with N=N 0 upsampling. Moreover, the same real-coefficient FIR filter is used in all sub-bands, followed by frequency shifters implemented as multiplications by a complex exponential. Figure 3 illustrates the datapath structure for the UFMC modulator considered in this work.
The UFMC modulator of this work has three processing branches, one to process each active PRB (B = 3). These branches share the same architecture and start with QAM mapping of the incoming data. The QAM mapper is equal to the one used in the OFDM and FBMC datapaths. The subcarrier mapping module maps the 12 PRB subcarriers to the central bins of an array with N 0 (64) elements and zeroes the remaining N 0 À 12 elements. It follows the same approach as subcarrier mapping in the FBMC modulator: a double buffer of 2 Â N 0 elements and read/write control engines. UFMC performs well for short-packet lengths and sporadic burst transmission [28,29]. Moreover, the parallel sub-band processing in UFMC requires an IFFT core per branch. Therefore, instead of the high-performance pipelined IFFT architectures adopted for the OFDM and FBMC datapaths, low-resource memory-based FFT architectures are adopted in the UFMC modulator. The memory-based architecture adopted here is detailed in [30]. The upsampler architecture and operation is similar to the one used for frequency spreading in FBMC modulation. Here, the number of zeros between consecutive IFFT output samples is N=N 0 ð ÞÀ1. Dolph-Chebyshev FIR filters with a transpose structure are used for bandpass sub-band filtering. Again, the FIR coefficients are symmetric: there are an odd number of symmetric coefficients, and the centre coefficient is equal to one. However, the higher FIR order used in UFMC modulation requires further discussion. Considering an L-order FIR filter, L À 1 coefficients imply non-trivial multiplications that can be halved due to coefficient symmetry ( LÀ1 2 ). As each processing branch requires two FIR filters-for the real and imaginary parts-there are L À 1 non-trivial multiplications per branch.
In Xilinx FPGAs, non-trivial multiplications can be efficiently performed by DSP blocks. These blocks are embedded into the logic fabric in a column arrangement. Cost-optimized devices have a smaller amount of DSP blocks, and their utilization should be carefully considered. For instance, the xc7z020 device has 220 DSP blocks. Considering the modes of operation from Table 4, the overall amount of non-trivial multiplications for FIR filtering (3 Â L À 1 ð Þ) is 108 for mode 1 and 216 for mode 2. This represents a high DSP utilization, and the sparse distribution of these types of blocks throughout the logic fabric degrades the scalability of the UFMC modulator. In addition, placement and routing of the design is more difficult and likely to affect the overall timing closure. To reduce DSP utilization, a multiplier-less architecture for FIR filters was adopted. The FIR coefficients are represented in Q1.5 format, using the Canonic Signed Digit (CSD) system with minimum non-zero bits. Then, non-trivial multiplications are substituted by shifters and adders. For example, the multiplication by 0.90625 can be implemented as: forward compatible. Here, DPR and DFS are combined to produce a dynamically reconfigurable baseband processing architecture for multimode, multi-waveform coexistence and dynamic spectrum aggregation. To enable the full potential of 5G, carrier aggregation should also be possible across separated frequency bands [31] (noncontiguous CA). For noncontiguous CA, a multidimensional PHY layer (and, therefore, baseband architecture) is needed, even when data aggregation is not performed in the PHY layer, but in the media access control (MAC) communication layer instead [32]. In this context, multidimensional means that the PHY layer is an array of independent processing blocks, rather than a monolithic structure.
The baseband architecture presented in this chapter features three independent modulators, whose functionality and clock frequency can be dynamically reconfigured through DPR and DFS, respectively. This setup enables the processing of multiple component carriers with different waveforms and/or baseband parameters in noncontiguous CA schemes.
A prototype of the multidimensional baseband modulator was implemented on an Avnet Zedboard equipped with a Zynq xc7z020 device. The system top level combines features from the designs of the previous section and can be divided into three parts. The Zedboard's 512 MB DDR memory is used as a repository for the partial bitstreams used for DPR. The Zynq's ARM CPU act as the system management unit: it is responsible for triggering the reconfiguration of the multidimensional baseband modulator and setting up data transfers between the DDR memory and the modulators implemented in the programmable logic together with the infrastructure for DPR and DFS. Figure 4 shows the top-level architecture.
The proposed architecture targets the communication scenario described in [33], which combines multi-waveform coexistence with dynamic spectrum access. In this scenario, 5G communications build on the pre-existing 4G infrastructure (nonstand-alone 5G): the primary 4G-LTE communications are OFDM-based, and the secondary 5G communications opportunistically exploit vacant spectrum resources through DSA, transmitting with different waveforms (OFDM, FBMC or UFMC). The basic unit for DPR is a complete baseband datapath, and each one is implemented in a reconfigurable partition. From the three RPs, RP 1 is exclusively used for primary OFDM-based transmission. The two remaining RPs can be used for primary or secondary transmission: RP 2 implements FBMC or OFDM transmission modes; RP 3 implements UFMC or OFDM transmission modes. For instance, if the primary transmission requires more capacity, the three RPs can be used to independently modulate different component carriers in a noncontiguous CA scheme. If the primary transmission is not so demanding, RP2 and RP3 can be used for secondary multi-waveform 5G transmission. Figure 5 illustrates a potential multi-waveform coexistence scenario by showing the combined periodograms of the OFDM, FBMC and UFMC baseband signals obtained from the implemented modulator datapaths.
During system initialization, the ARM CPU manages the downloading of partial bitstreams and input data files from an SD card to the DDR memory. For the purpose of validating the baseband engines, the input data is retrieved from the DDR and sent to the baseband modulator(s), and the results are stored back to the DDR (and used for validating the implementation). Each RP has an associated DMA controller to accelerate the access to the DDR.
To achieve the specialization of computation at runtime, the configuration interface adopted is the ICAP. This high-bandwidth internal interface permits the FPGA to reconfigure itself. Xilinx sets the maximum ICAP bandwidth at 400, for a 100 clock frequency and 32-bit data width [34]. Nevertheless, the ICAP can be overclocked to further enhance the reconfiguration throughput [35]. In the present work, the ICAP is overclocked at 200 MHz. To take advantage of ICAP overclocking, a dedicated DMA controller is used to accelerate the transfer of partial bitstreams to the ICAP.
The implementation of DFS follows the reference design from [36]. This design considers an FSM that reads configuration parameters from a ROM and writes them  to the clock management module available in the FPGA. To change the frequency of the output clocks, the input signal en must be enabled, and the desired mode of operation should be given through the mode port. The DFS controller is fed by a 100 reference input clock that is used to synthesize the clock signal used for baseband processing. Its frequency (f clkBB ) can be configured to one of four values: 16.7, 33.3, 66.7 and 100 MHz. All modulator datapaths can work at 100 MHz. The other values are based on the scaling of subcarrier spacing by 2 μ as in 5G New Radio systems [2], where μ is an integer that specifies the mode of operation. In this system, primary communications are based on the LTE OFDM numerologies ( A general overview of the resource utilization of the prototype is presented in Table 5. The static part occupies around 32 and 5% of the slices and BRAMs, respectively. Apart from PS-PL interconnect cores and DMA controllers to accelerate the baseband modulators, the static part also implements the infrastructure for reconfiguration (DPR and DFS). The hardware required to implement DPR and DFS is below 2% of the available LUTs, FFs and BRAMs. The three RPs form the system's reconfigurable part and occupy 52.6, 64.3 and 72.7% of the available slices, BRAMs and DSPs, respectively. Overall, the resource utilization for the complete system implementation represents a considerable share of the resources available in the xc7z020 device: 84.3% of slices, 69.7% of BRAMs and 72.7% of DSPs.
The resource utilization of each modulator is presented in  Device, xc7z020; f clk ¼ 100 MHz. 90 BRAMs and 160 DSPs reserved by the three RPs allows the implementation of six baseband modulators, which would need 11,322 slices, 99.5 BRAMs and 106 DSPs in total. Adding these virtualized resources to the static resources exceeds the available xc7z020 slices by 17%. This is an unequivocal demonstration of the resource efficiency benefits that DPR brings to multimode baseband processors. An equivalent static multimode design could benefit from the reuse of common hardware blocks between different modulator datapaths (especially between OFDM and FBMC datapaths). However, implementing the multidimensional baseband modulator as a static multimode design would be challenging given the resource budget available on cost-optimized devices like the xc7z020. There are FPGA/SoC devices with larger area and logic density. However, using them would decrease the system's cost-effectiveness: an FPGA with a larger chip area is more expensive and likely to consume more power [37]. Considering the modes of operation shown in Tables 1-3, and that all 3 RPs are in use, the proposed design supports 32 combinations of baseband modulators: 2 RP 1 modes Â 4 RP 2 modes Â 4 RP 3 modes. The use of DPR simplifies system upgrade with new modes of operation in order to extend the system's useful lifetime. The addition of modes of operation is not limited by the available resources on the FPGA device, but instead by the resources reserved by the RPs and the capacity to store partial bitstreams (512 MB DDR memory, in this case).
During the DPR design with the Xilinx Vivado EDA tool, the different system configurations are created from a design checkpoint that saves the floorplanning and routing of the system's static part, leaving the RPs as empty black boxes. New configurations can be created by designing new circuit configurations for these black boxes and generating the corresponding partial bitstreams. This design reusability makes the system adaptable and reduces the upgrade design time.
The dynamic power consumption for each modulator datapath and baseband clock frequency was estimated with the power analysis tool from Vivado 2015.2. The high-confidence estimates were performed using placed and routed netlists and accurate node activity files. The results are presented in Table 7. The UFMC modulator modes have a higher dynamic power consumption compared to FBMC and OFDM. This is mainly due to the higher resource usage and node activity of UFMC datapaths. The clock frequency adaptation allowed by DFS results in power savings that tend to be more evident for the most resource-demanding modes of operation (UFMC and FBMC). Compared to a design with baseband clock frequency fixed at 100 MHz, the clock frequency adaptation to: • 66.7 MHz results in dynamic power savings between 39 mW (35% reduction in OFDM mode 1) and 82 mW (51% reduction in FBMC mode 2) Device, xc7z020; analysis tool, Vivado 2015.2; post place-and-route power analysis with high confidence level; node activity derived from post place-and-route simulation. Table 7.
Dynamic power consumption estimates for the six implemented baseband modulator cores (in).
• 33.3 MHz results in dynamic power savings between 79 mW (70% reduction in OFDM mode 1) and 156 mW (67% reduction in UFMC mode 2) • 16.7 MHz results in dynamic power savings between 99 mW (88% reduction in OFDM mode 1) and 194 mW (83% reduction in UFMC mode 2) For the set of baseband clock frequencies defined, the DFS procedure took on average 47 μs to modify the clock frequency, a latency which is acceptable in 5G NR communications.
In the multidimensional baseband modulator, the area and amount of RP resources are higher than in the individual designs, resulting in larger bitstream sizes. However, the reconfiguration speed was increased through ICAP overclocking. Table 8 quantifies the DPR latency and compressed bitstream size for the worst-case scenario in each RP. The largest RP (RP 3 ) takes up to 767 μs to be reconfigured, corresponding to the transfer of a 596 kB bitstream to the ICAP. In all DPR latency measurements, the reconfiguration throughput was at least 790 MB/s. This value is about 99% of the theoretical ICAP throughput, considering 32-bit transfers and overclocking at 200 MHz. In general, the DPR latency for each individual RP is below 1 ms, while the overall reconfiguration of the three RPs takes less than 2 ms. These latency values are within an acceptable range considering the control plane requirements from [38].
The ITU report [38] states that in critical, ultralow-latency scenarios, a makebefore-break approach must be adopted to completely mitigate the control plane latency. In other words, the control plane latency must be hidden by setting up a new communication channel before breaking the current one. Under these circumstances, a high-priority communication can reserve a spare RP to seamlessly adapt the transmission mode. This scenario is exemplified in Figure 6. Let us assume that RP 1 is currently performing baseband modulation for an ultralow-latency communication. This transmission needs to be adapted from OFDM mode 1 to OFDM mode 2, without breaking the current communication link. RP 2 is currently unused and is reconfigured to OFDM mode 2 before baseband processing at RP 1 terminates. In this way, the baseband processing datapath can be modified without incurring any latency penalty due to DPR.  Table 8. Measured DPR latency and size of compressed partial bitstreams for the worst-case scenarios.

Figure 6.
Example of make-before-break approach to mitigate DPR latency.

Conclusion
This chapter presents a reconfigurable, multidimensional baseband modulator architecture suitable for multimode, multiple waveform coexistence and dynamic spectrum aggregation scenarios. The design combines the runtime specialization of computation and performance. By featuring three independent and reconfigurable baseband modulators, the architecture allows the processing of up to three component carriers using different waveforms (OFDM, FBMC and UFMC) and/or numerologies. The total reconfigurable area of the system covers more than half the available xc7z020 resources; the ICAP overclocking contributes to maintain the DPR latency low enough for the analyzed scenarios. In this design, the performance specialization through DFS resulted in dynamic power savings of up to 194 mW. Besides flexibility, scalability and forward compatibility, cost-effectiveness is perhaps the most relevant feature of this architecture. It is clearly demonstrated how the hardware virtualization through DPR enables implementations that exceed the hardware resources available on an FPGA device. This allows for system implementations on a small-form, cost-optimized devices with immediate cost and power consumption benefits and without compromising system functionality.