Read power consumption in various SA .
The design of low-power SRAM cell becomes a necessity in today's FPGAs, because SRAM is a critical component in FPGA design and consumes a large fraction of the total power. The present chapter provides an overview of various factors responsible for power consumption in FPGA and discusses the design techniques of low-power SRAM-based FPGA at system level, device level, and architecture levels. Finally, the chapter proposes a data-aware dynamic SRAM cell to control the power consumption in the cell. Stack effect has been adopted in the design to reduce the leakage current. The various peripheral circuits like address decoder circuit, write/read enable circuits, and sense amplifier have been modified to implement a power-efficient SRAM-based FPGA.
- static power
- dynamic power
- leakage current
- SRAM cell
- subthreshold cell
- data-aware SRAM cell
Field programmable gate array (FPGA) is prefabricated integrated circuit (IC), which contains programmable gate matrix to implement logic functions and interconnect resources to connect the logic functions and I/O blocks. These interconnect resources can be electrically programmed by the user to implement any digital circuits and systems. Due to faster time to market, lower cost, and flexibility, FPGA prefers over ASIC (application‐specific IC) design although it has disadvantages like larger size, slower speed, and larger power consumption. Due to the flexibility of FPGA, it is possible to partially program any portion of the FPGA depending on the requirement even when the rest of an FPGA is still running. Computer‐aided design (CAD) tools and architecture are the two important technologies, which differentiate FPGAs. First memory‐based programming FPGAs were introduced in 1986 by Xilinx Inc., San Jose, CA .
The programmable term in FPGA only reflects that any new function can be implemented on the chip even after its fabrication. Programmability/reconfigurability of an FPGA is based on an underlying programming technology, which can cause a change in behavior of a prefabricated chip. The main programming technologies used in FPGAs are static random memory (SRAM), flash memory, and antifuse [2–5].
The SRAM‐based FPGAs provide ideal prototyping medium and are widely used to integrate FPGAs in an embedded system [6–8] due to the use of standard CMOS technologies, higher performance, and reprogrammability. However, the larger static power consumption in SRAM cell limits the use of SRAM‐based FPGAs in portable embedded system compared to flash‐based FPGAs [9, 10]. The other concern related to SRAM‐based FPGA is its volatile nature. Although the dynamic power management and duty‐cycling techniques [11, 12] have been used to save static power during idle mode of FPGA, these techniques are not very effective due to the energy consumption associated with the resulting reconfiguration process. Due to large load capacitance and high access rate, SRAM cells are responsible for consuming significant portion of the total power of the design. Thus, SRAM power consumption is an important consideration for designers to find the balance between the performance and the overall power consumption. The speed of the SRAM cell in FPGA is not a critical factor because it does not affect the operating speed of the circuit implemented in FPGA as mentioned in ref. .
In this chapter, we investigate the various factors responsible for power consumption in SRAM‐based FPGAs and review the different techniques proposed in the literature to save the power. We will also consider the static and dynamic power in the conventional 6T SRAM cell and its architecture. Various design techniques, presented in the literature, to reduce power consumption in SRAM cell will be reviewed in detail with their merits and demerits. A data‐aware power‐efficient SRAM cell will be discussed to save power and to optimize the stability.
2. SRAM‐based FPGAs
SRAM cells are the basic cells used for SRAM‐based FPGA. These cells are scattered throughout the design in form of an array and mainly used to program: (1) the routing interconnects of FPGAs and (2) configurable logic blocks (CLBs) that are used to implement logic functions. SRAM‐based programming technology has become the dominant approach for FPGAs because of its reprogrammability and the use of standard CMOS process technology, which results in larger package density and higher speed. Due to the volatile nature of SRAM technology, SRAM‐based FPGAs lose their configured data whenever power supply is switched off and need to be reprogrammed every time when the power supply is turned on. Hence, almost every system using SRAM‐based FPGAs contains an additional nonvolatile memory such as flash programmable read only memory (PROM) or EEPROM to store the configuration data and load it into the SRAM‐based FPGA whenever power is on. In many applications, a complex programmable logic device (CPLD) is used in addition to the external configuration memory to perform the vital functions of the system necessary at power‐up. The first static memory‐based FPGA (commonly called an SRAM‐based FPGA) was proposed by Wahlstrom in 1967 . This architecture is allowed for both logic and interconnection configuration using a stream of configuration bits. From a practical standpoint, an SRAM cell can be programmed indefinite number of times. Dedicated circuitry on the FPGA initializes all the SRAM bits on power up and configures the bits with a user‐supplied configuration. No special processing steps are needed in SRAM cells unlike other programming technologies. Although static memory offers the most flexible approach for device programmability, it imposes a significant area penalty per programmable switch compared to ROM implementations.
3. Power consumption in SRAM‐based FPGAs
In the recent years, the traditional FPGA research area has shifted from speed and area overhead issues to design of power‐efficient FPGAs due to increased applications of FPGA in portable and nonportable devices. In portable devices power saving is required to enhance the battery life time, whereas in nonmobile devices power saving decides the cost, performance, and reliability of the device. The main sources of power consumption in FPGA are static and dynamic power [10, 12, 15, 16].
Static power is consumed when device/system is idle and leakage current flows in the system. The various leakage currents in OFF transistor are subthreshold leakage current, gate‐induced drain leakage, junction leakage current, and direct tunneling current [17–19].
Dynamic power consumption is due to the switching activity of the transistors in normal operational mode. The dynamic power consumption depends on the parasitic capacitance, power supply, switching activity, and frequency of operation and mathematically expressed as :
where CL is the load capacitance, Vdd is the power supply, f is the frequency of operation, and η is the switching activity.
FPGA design consumes larger static power than the ASIC design due to excessive leakage currents [21–23], which is due to more number of transistors per logic. Other components, which are responsible for larger power consumption, are circuits used to provide flexibility to FPGA, number of configuration bits, lookup‐tables (LUTs), and presence of large number of programmable switches.
4. Techniques adopted to reduce power consumption in SRAM‐based FPGA
4.1. Leakage power reduction
The important method to control the leakage current in the system is to switch off the transistors, which are not being used at that time. This can be achieved by using the dual threshold voltage transistor FPGA routing design [24–26]. In this technique, high threshold voltage is applied to one subset of multiplexer transistors and low threshold voltage to the rest of the transistors. High threshold voltage controls the leakage current effectively on the cost of performance degradation. This technique increases the complexity at router level. By allowing body‐bias effect, the threshold voltage of a multiplexer transistor, which is not a part of the selected path, can be raised . This method increases the fabrication complexity and cost. The leakage current can also be controlled by applying negative bias voltage on the gate of the OFF multiplexer transistor, which results in drastic drop in subthreshold current on the cost of hardware burden .
Stack effect is another effective method to reduce the leakage current in any circuit [29–31]. Stack effect means two series connected OFF transistors in the same path. These two OFF transistors offer a high resistive path to the current flow. To utilize this concept in the FPGA design, researchers [32, 33] have introduced an extra configuration SRAM cells (redundant cells) to allow multiple OFF transistors on unselected path. Due to redundant cell approach, the unselected path contains two OFF transistors, which limits the subthreshold current along the unselected path.
Calhoun et al.  have proposed the creation of fine‐grained “sleep region” to control the leakage current in the system. With this technique, it becomes possible to put unused LUTs and flip‐flops to sleep mode independently. Gayasen et al.  have proposed coarse‐grained sleep strategy. In this technique, the entire region of the FPGA is partitioned into logic blocks so that each region can be put into sleep mode independently whenever it is not used.
Several methods have been proposed by researchers to save the leakage/static power consumption in FPGA design at the architectural level [36–39]. Tran et al.  have proposed low‐power FPGA architecture based on fine‐grained Vdd control scheme, called micro‐Vdd‐hopping. They have grouped four CLB into one block to share the Vdd. In the micro‐Vdd‐hopping scheme, Vdd of each block is varied between high and low Vdd to save power consumption without scarifying performance. In their design, they have introduced a level shifter and incorporated zigzag power‐gating scheme to control the sneak leakage path problem. They have experimentally observed that the dynamic power can be reduced by 86% when the required speed is half of the highest speed. They have simulated their proposed designed at 90 nm technology and observed that 95% static power saving on the cost of 2% area overhead. In zigzag power gating scheme wake up time is smaller than other gating technique because the INVs and 2‐NAND are always in between Vdd and Vss during standby mode. Since they have off‐off stacking structure, leakage current is suppressed by an order of magnitude even if the overdrive voltage is zero.
Srinivasan et al.  have proposed a technique to reduce the leakage current of interconnect fabric. They have put every multiplexer in its least‐leakage state by setting its undriven inputs to desired values with a circuit‐level modification in the routing multiplexer. The main advantage of this technique is that it has negligible impact on the performance of the design and has small area penalty.
In their research paper, Hasan et al.  have reduced the leakage current in the multiplexer‐based interconnect matrix by controlling the inputs of unused FPGA routing multiplexers. The simulation results on different sizes and topologies of routing multiplexers show that the minimum leakage vector varies significantly at 22 nm compared to the 65 nm nodes because of higher gate leakage current and output stage loading effects. Their proposed technique reduces the static power significantly without imposing any area overhead because most of the routing multiplexers are unused in an FPGA.
A directional coarse‐grained power‐gated FPGA switch box and power gating aware routing algorithm was proposed by Hoo et al.  to address the leakage current concern in FPGA. After considering the trade‐offs among different PG designs, authors have considered: (1) A novel directional coarse‐grained power‐gated FPGA switch box. (2) A power‐aware routing algorithm to leverage on new PG architecture. In their proposed architecture, multiple buffers in each direction of the switch box are power gated independently of the buffers in the other directions. Due to the homogeneous structure of the switch box, proper sizing of the sleep transistors is not an issue. To maximize the leakage reduction of the coarse‐grained PG architecture, they have also adopted the routing algorithm. They have proposed a new cost function for the VPR routing algorithm to support the new routing architecture.
4.2. Dynamic power reduction
Dynamic power is consumed during normal operation when switch toggles. It depends on the frequency of the operation, load capacitance, and square of power supply as clear from Eq. (1). The total dynamic power consumed by a device is given by the sum of the dynamic power of each resource. Due to the programmability of the FPGA, the dynamic power is design dependent. The important contributors for dynamic power are effective parasitic capacitance of the resources, resource utilization, and switching activity of the resources . The effective capacitance of the resources come from parasitic capacitance of interconnect wires and transistors. The dynamic power of the device can be reduced by addressing each of the parameters in Eq. (1) effectively. Various methods have been proposed by researchers to handle the dynamic power consummation [37, 45–47]. The general adopted methods are using clock scheme, reducing toggling activity of the logic, reducing RAM and I/O powers.
Since faster switching logic consumes more dynamic power than the slower switching logic, it is required to partition the clock so that the fast clock should be assigned to those portions of the logic which require a fast clock and slow clock should be assign to those which can be run at a slower speed. This way the switching activity of various logics can be controlled to save the overall dynamic power [9, 10, 15].
Dynamic voltage scaling is another power‐saving design technique because supply voltage significantly impacts power efficiency. The power supply scaling technique can be utilized in the design of power‐efficient FPGA by considering devices like tunnel‐FET, FinFET, etc. [48–51] because these devices can operate at ultra‐low voltage.
The dual or multi‐Vdd techniques [52–54] are other important methods to save the dynamic power. In dual Vdd scheme, the noncritical delay circuit is connected with low power supply, whereas delay‐critical circuit is powered by high voltage. This concept is also applied in the FPGA design [55–57]. In heterogeneous architecture, some logic blocks are fixed to operate at high power supply and some logic blocks (not limited by speed) are fixed to operate at low voltage. This heterogeneous scheme helps only in small power saving due to the rigidity of the fixed fabric and loss associated with the mandatory use of low‐Vdd in certain cases. The dual Vdd technique cannot be applied to the interconnect wires which is the main source of power consumption. To overcome this problem, Li et al.  have proposed Vdd programmability technique to reduce power consumption of interconnect wire. They have selectively applied low‐Vdd to interconnect circuits such as routing and connection switches. The Vdd selection for different applications is obtained by programmable dual‐Vdd technique to both logic blocks and interconnect. On average, they observed a total of 50-55% power is reduction.
Although voltage scaling is the best way to reduce the power consumption in FPGA array, one has to scarify the performance of the circuit. To improve the power efficiency of FPGAs without scarifying performance, Li et al.  have explored the different supply voltage (Vdd) levels option. According to the authors, a predefined dual‐Vdd FPGA fabric, in general, cannot achieve better power performance trade‐off than the Vdd scaling because the predefined dual‐Vdd fabric is not flexible enough for a variety of applications. To address this issue they have introduced the field programmability for the Vdd level by proposing three types of logic blocks: H‐block, L‐block, and a p‐block as shown in Figure 1. H‐block and L‐block are connected to supply voltages VDDH and VDDL, respectively. H‐block provides higher speed due to high supply voltage whereas L‐block has reduced power consumption at the cost of the increased delay. They have implemented P‐block by inserting PMOS transistors (called power switches) between the power supply rails and the logic block. The configuration bits were used to control the switching behavior of these switches so that an appropriate supply voltage can be chosen for the P‐block. To avoid the short circuit current, they have introduced a level converter in between VDDH and VDDL.
Selective power‐down is another method to save power in FPGA. This technique (known as power gating) refers to shut down the power supply of certain portions of a chip which are not performing any task for a long time to save the static power considerably. This can be achieved by implementing a multisupply strategy in which the power grid of some blocks is decorrelated from others in order to allow for selective shutdown. Sleep modes within the FPGA architecture can also be deployed to selectively reduce the power supply of those blocks, which are not in use [60, 61].
Power consumption in interconnect dominates dynamic power in FPGAs [62–64] due to the interconnect structure, which consist of prefabricated wire segments. Each segment is attached with used and unused switches. Wire lengths in FPGAs are generally longer than in ASICs due to the larger area consumed by SRAM cells and circuitry. The larger power consumption in interconnect in FPGA makes it high‐level target for power optimization. Anderson et al.  have presented a novel FPGA routing switch design to reduce the leakage and dynamic power consumption. The switch can be programmed to operate in any one of the mode: high speed, low speed, or sleep mode. In high‐speed mode, power and performance characteristics are similar to those of current FPGA routing switches. Low‐power mode offers reduced leakage and dynamic power on the cost of degraded performance. Sleep mode, which is suitable for unused switches, reduces the static power drastically. Three key observations (which hold for majority of Xilinx Spartan‐3 commercial FPGA and are specific to FPGA interconnect) were made, namely (1) routing switch inputs are tolerant to “weak‐1” signals, (2) there exists sufficient timing slack in typical FPGA designs to allow a considerable fraction of routing switches to be slowed down, without impacting the overall design performance, and (3) most routing switches simply feed other routing switches, authors have proposed the design of new switch as shown in Figure 2. The designed switch includes parallel combination of NMOS and PMOS sleep transistors which can operate in three different modes as follows: In high‐speed mode, the PMOS is turned ON which results in full rail‐to‐rail swing of output. The gate terminal of NMOS is left at Vdd in high‐speed mode. During 0–1 logic transition the virtual Vdd may temporarily drop below Vdd - VTH, causing the NMOS to leave cut‐off and assist with charging the switch's output load. In low‐power mode, the PMOS is turned OFF and NMOS is turned ON. The buffer is powered by the reduced voltage, VVD ≈ Vdd – VTH.
Clock‐gating is an effective and most widely used method to reduce the dynamic power. This technique is based on the principle that only active portion of the system should be connected to the clock tree and others should not be served by the clock tree. A logic circuit must be included in the design for the selection of which portions are clocked and which portions are blocked. This reduces switching activity which results in dynamic power saving. The clock gating can be applied at the chip level as well as at the design level. The gating technique has been successfully used in ASICs, but it is not very effective in SRAM‐based FPGAs because a large component of power consumption in FPGA is due to the switching activities of the clock signals along the routing switches. For this reason, researchers investigated the possibility of modifying the way a circuit is mapped on the FPGA array by acting on the synthesis, technology mapping, or placement and routing algorithms [66, 67]. Since clock is distributed in the chip through the global FPGA routing network, the placement of clock loads has a considerable impact on clock wire usage. Clock load placement should be done in such a way that one should get lower clock capacitance, which results in lower dynamic power consumption.
Placement and routing (P&R) on the chip also affects the dynamic power consumption because it decides the total parasitic capacitance in the design. To minimize the parasitic capacitance, it is essential to optimize the P&R strategy. It is always advisable to place two connected functional instances closer because it will reduce the interconnect wire‐length which in turn can reduce the capacitive loading of the net and lead to dynamic power reduction. The modern FPGA development software typically supports power‐driven layout to automatically accomplish this task. Power‐driven layout tools examine connection between functional instances for optimization [68–70]. Power‐analysis tools are used to further optimize the power saving. Power‐analysis tools examine each subcomponent in a design hierarchy to highlight power consumption. Careful examination of this information and subsequent manipulation of the design can result in significant power savings.
Reducing the power supply of I/O can save up to 80% dynamic power. The switching activity of I/O can be controlled by using techniques like time multiplexing, minimum I/O count design portioning [71–73], and reducing I/O drive strength/slew rates. A considerable amount of dynamic power can be saved by adopting differential I/O standards and resistively terminated I/O standards for highest toggling frequency and single‐ended I/O standard for low toggling frequency.
Tsang et al.  have studied the effectiveness of employing precomputation in reducing dynamic power consumption in commercial off‐the‐shelf (COTS) FPGAs. Precomputation is a high‐level logic optimization technique that lowers power consumption of a design by disabling part of the circuit based on a few relatively simple precomputation conditions. With careful design considerations and increased logic utilization, its associated power consumption can be reduced by disabling much larger part of the design with negligible increase in resource overhead.
5. SRAM power reduction
The design of low power and high performance SRAM cell becomes a necessity in today's FPGAs because SRAM is a critical component in FPGA design. Although SRAM‐based FPGA acquires larger area on the chip but still one of the most useful SRAM‐based structure is the lookup table (LUT).
SRAM‐based FPGAs such as those manufactured by Xilinx and Altera comprise the largest fraction of the overall market. These FPGAs utilize SRAM for routing and programmability, typically through the use of LUTs and multiplexers. Due to the large number of cells within SRAM FPGA interconnects, a considerable leakage current (of order of milliamps) flows at standby . However, leakage current increases as process geometry shrinks which further exacerbates the power problem. The dynamic power consumption in cell is a serious threat because of large parasitic capacitance (due to longer metallic bitline) which results in larger charging/discharging activity at the bitline. Study on the leakage current and dynamic power in Xilinx Spartan‐3 FPGA  (Figure 3) and Xilinx Virtex‐4  (Figure 4) show that the major contributor for power consumption in FPGA is configurable SRAM; hence, the new design technique becomes essential to increase the lifetime of the battery. Several techniques have been proposed in the literature [81–85] to address the power consumption problem in SRAM cell. It is worth to disable the SRAM devices that are temporarily unused. This technique will avoid the power consumption by unused components. A system controller can deactivate the device when it is not required in the current operation, or put the device in its sleep mode when that device will not be accessed for an extended period of time. Implementing such a system controller in FPGA reduces the overall switching activity of the system. As discussed by Tuan et al. , the data of the configurable SRAM cell alter only when FPGA is configured. FPGA is configured only when power supply is turned on. Therefore, it is necessary to control the leakage current in the cell during idle phase to save the overall power.
Wang et al.  have proposed the design of an ultra‐low voltage 9T SRAM cell. Their designed cell consists of a 6T SRAM part (for write operation) and a dedicated read port. The read port comprises three NMOS transistors for realizing equalized bitline leakage and improving bitline sensing margin in a single‐ended read bitline (RBL). The write access paths and the data storage latch are implemented with HVT devices for leakage reduction while the read port employs LVT devices for better performance. Their test chip shows an improvement of 40% in energy efficiency with the minimum energy per operation of 2.07 pJ at 0.4 V. This design increases the fabrication complexity due to the use of LVT and HVT transistors.
Although much research has been done in order to design a power‐efficient SRAM circuit, still interest in power‐efficient cell design at the architecture level continues to increase due to the occupation of considerable fraction of total area on chip by configurable SRAM cells and circuitry in the FPGA design. Ye et al.  have observed that more than 40% of the total FPGA's logic block area is occupied by SRAM cells. Such huge area overhead results in larger wire length, which leads in larger parasitic capacitance at load. This increased capacitance increases the dynamic power consumption. The most widely used and well accepted SRAM cell is 6T cell  (as shown in Figure 5) due to its symmetric structure and larger data storage capacity. The cell has two cross‐coupled inverters which form latch to keep the programmed data intact. Two pass transistors are used to transfer the data from bitline to cell node (write operation) or cell node to bitline (read operation). The actual control of the FPGA is handled by the Q and Qbar outputs. The main drawbacks of the conventional 6T cell are: poor stability, large power consumption, and degraded performance.
5.1. Subthreshold SRAM cell
Subthreshold operation is achieved when the device is allowed to operate at power supply (Vdd) lower than its threshold voltage. Using this concept, researchers [89–94] have proposed the subthreshold SRAM cells to reduce the overall power consumption in the cell. Teman et al.  have designed a robust, low‐voltage SRAM bit cell with reduced 5 transistors compared to the standard 6T circuit. Their designed cell can operate at voltage as low as 400 mV in a commercial 40 nm CMOS process. At this supply voltage, the proposed bit cell provides 6σ stability and an average static power reduction of 21× compared to the 6T cell. The main drawback of the circuit is its extra processing complexity due to HVT and SVT transistors.
Calhoun et al.  have proposed 10T subthreshold bit cell (Figure 6). Transistors M1 through M6 forms conventional 6T cell except that the source of M3 and M6 tie to a virtual supply voltage rail (VVDD). The proposed cell has distinct read and write ports to improve the stability of the cell. Eliminating the read SNM problem allows this bitcell to operate at half of the Vdd of a 6T cell while retaining the same 6σ stability. Transistors M7–M10 are used to remove the read SNM problem by buffering the stored data during read operation. M10 is mainly included in the cell to control the leakage current. Their experimental results show that the proposed cell saves 2.5× and 3.8× leakage power at Vdd = 0.6 V and Vdd = 0.4 V at room temperature. This saving is more aggressive (60×) when power supply is scaled down to 0.3 V.
A design of 10T SRAM is proposed by Jiangzheng et al.  by employing voltage lowering techniques to effectively control the leakage current in the cell after allowing cell to operate in subthreshold region. The proposed circuit generates a subthreshold read pulse for transferring the data out of the SRAM. The floating write bitlines minimizes write bitline leakage on the cost of degraded stability. Short read bitlines improve read speed and suppress read power on the cost of area overhead.
Kushwah et al.  have proposed a single‐ended dynamic feedback control 8T static RAM (SRAM) cell to enhance the static noise margin (SNM) for ultralow power supply. It achieves write SNM of 1.4× and 1.28× as that of isoarea 6T and read‐decoupled 8T (RD‐8T), respectively at 300 mV. The standard deviation of write SNM for 8T cell is reduced to 0.4× and 0.56× as that for 6T and RD‐8T, respectively. The proposed 8T consumes about 0.6× less write power and 0.48× less read power than 6T cell.
5.2. Data‐aware power‐efficient SRAM cell
The main drawbacks of subthreshold cells are poor stability and degraded performance. Besides the cell leakage, the bitline leakage is another dominating factor for power consumption. The overall bitline power consumption is data dependent. Many data‐aware cells have been reported in the literature to control the bitline power consumption [98–102]. Chiu et al.  have proposed 8T single‐ended subthreshold SRAM with cross‐point data‐aware write operation. In the circuit write operation is performed by traditional write circuit as in 6T cell, whereas 2T stacked read buffer is used for read operation. Due to stack read circuit, leakage current is controlled and stability is improved. The data‐aware cross‐point write operation improves the writeability. The main drawback of the circuit is large voltage swing on bitline during write operation.
A 130 mV SRAM with expanded write and read margins for subthreshold applications was proposed by Chang et al.  to reduce the voltage swing on the respective bitlines during write operation. They have used two separate signals SCR and SCL to perform write operation. The proper selected value of these two signals controls the write power consumption after reducing the discharging activity at the bitline. The isolated read circuit improves the stability of the cell on the cost of large parasitic capacitance and resource burden due to two extra signals.
Singh et al.  have designed a data aware dynamic 9T SRAM cell to reduce the bitline power consumption. The dynamic nature of the cell flips the data faster at the bitline so that the average discharging activity is reduced. The cell contains nine transistors with isolated read and writes circuits. The write operation is performed using write signal WS. The value of write signal is chosen based on the write operation. The simulation results predicted the 47% lower write power consumption compared to the 6T. They also observed that power saving varies from 42.45 to 61.3% when no peripheral devices are included in the array during hold mode because of lower leakage current from write bitlines and lower discharging activity at RBL. The cell imposes hardware and wiring burden due to extra signal.
The bit‐interleaving‐enabled 8T SRAM architecture is proposed by Wen et al. . The proposed cell features shared data‐aware write structure and utterly eliminates the half‐select disturbance. In their proposed design, shared write and separated read behaviors are implemented by activating horizontal cells and vertical bitlines instead of enabling blocks. They also proposed a reference‐based sense amplifier (SA) to coordinate the column‐selection array to further optimize the area efficiency. The proposed SRAM operates at a frequency of 125 kHz and consumes a total power of 5.1 μW.
5.3. Data‐dependent‐write‐assist dynamic (DDWAD) SRAM cell
Recently, we have designed a power‐efficient SRAM cell  by utilizing dynamic data aware concept for write operation and stack effect to control the read leakage current. The architecture of the cell is shown in Figure 7(a). The designed cell has distinct read and write ports with single bitline to improve the overall stability of the cell. To flip the data at the storage node faster without waiting bitline BL to charge/discharge completely we have introduced a write signal WS and broken the latch of the cell (since WL = high). To control the leakage current in read circuit during write operation and hold mode, stack technique is (three series connected OFF transistors in read path) used on the cost of increased delay. The write signal (WS) has been generated according to the data to be stored at Q and Qbar with the help of circuit as shown in Figure 7(b) . During read and hold mode, WS maintains its previous value and latch nature of the cell is restored to keep the stored data intact. The proposed cell and other cells were simulated at layout level using Cadence 6.1 CMOS design rules for 65 nm technology. The large write power saving (Figure 8) is due to no discharging activity at the bitline BL due to high resistive path (NM1 Turns OFF because WS = 0 (write 1 operation)). Similarly, for WS = high, OFF transistor PM1 does not allow any current to flow between Vdd and ground. This causes low voltage at the storage node Q. In both write operations, a small voltage drops at BL results in considerable dynamic power saving. Due to OFF transistors NM4 and NM6 (since RWL = 0 during write operation) in the read path, the leakage current through RBL is restricted.
Due to the forbidden discharging of precharged RBL during read 1 operation and stack effect in read path, a considerable power saving is achieved compared to the conventional 6T cell (Figure 9). In hold mode, WS maintains its value due to internal latch. The static power consumption in the proposed cell is lower than the 6T cell and other proposed cells in the literature irrespective of the power supply (Figure 10). The lower static power in the proposed cell  is due to lower leakage current through write bitline BL and stack effect in read circuit. During simulation, we observed that the proposed cell shows a nominal variation in static power consumption with temperature, which reflects the robustness of the cell against temperature. The data at the storage nodes maintained strongly at their respective values for power supply range of 300 mV ≤ Vddmin ≤ 400 mV. The proposed cell shows larger immunity toward the statistical variation due to signal WS as discussed in our published paper in detail .
Although the proposed cell imposes area overhead compared to the conventional 6T cell, it is not a serious threat in FPGA implementation because of lower leakage current through bitline, more number of cells can be connected on a single bitline in the array.
In SRAM‐based FPGA memory accesses are performed with a designed clock and series of interface circuits like row/column decoder, write/read enabled circuit, etc. These peripheral circuits consume a considerable power in the chip. To implement an array using the proposed cell, we have adopted the hierarchical design approach in which instead of giving individual signals (WS, WL, and RWL) to each cell, global signal circuits are used . The main advantage of using the hierarchical design is the use of shorter wires within local blocks, which reduces parasitic capacitances. In this approach, at one time only one block address can be activated which saves considerable power. Each global signal is connected to corresponding local signal through NMOS pass transistor to save the area. The column‐based approach is adopted in which signal WS is routed parallel to write bitline BL. To avoid the column half selected disturbance in the array due to toggle of the signal WS during write operation, we proposed a circuit as shown in Figure 7(b) .
5.4. Proposed decoder circuits and sense amplifier
The most important signals that affect the power dissipation in SRAM memory are the address lines, read and write enable circuits, block select, and sense amplifier. To address these concerns, we have designed new architectures for these circuits to reduce the power consumptions. The detail about these circuits is available in our published work [108, 109].
The proposed column decoder circuit is shown in Figure 11 , where CLj represents the address of the columns to be selected (j is an integer number). The architecture of the other decoder circuits is explained in Ref. . Since the proposed decoder is implemented without using NAND gates as in the conventional decoder, the number of transistors is reduced to 546 compared to 1939 in the conventional decoder . The reduced number of transistor results in lower parasitic capacitance, which leads to approximately 76% power saving . The proposed WL driver consumes lower power compared to other designs due to the compactness of the circuit.
As we know most of the current will be dissipated in the SRAM cell by sense amplifier. To address this issue we have also designed a single‐ended sense amplifier . The proposed SA (sense amplifier) reduces the power consumption by controlling the leakage current during evaluation/precharge mode. The circuit can be used even at higher temperature with minimum power consumption. The working of the circuit is explained in detail in Ref. .
Table 1 gives the comparison of read power consumption in various sense amplifiers. The main reason for lower power consumption in the proposed circuit is due to lower average current during evaluation mode, small voltage drops on RBL, and lower leakage current compared to other circuits [110, 111]. During hold mode, power consumption in the proposed circuit is lower than the other circuits [110, 111] due to gating effect.
We have implemented 32Kb SRAM array using the proposed cell and proposed decoder circuits/sense amplifier. The simulation results were compared with ref.  array. The results were encouraging in terms of power consumption as seen in Figures 12 and 13, respectively. The lower hold power obtained in the implemented cache is due to write signal WS and stack effect (read path).
The overall reduction in dynamic and static power in the proposed cell, decoder, and sense amplifier make them an ideal choice for the implementation of power‐efficient and reliable SRAM‐based FPGA.
The various issues related with the power consumption in FPGA have been discussed in detail with solutions/techniques as presented in the literature. Power gating/clock gating, dual threshold/multithreshold voltage, programmable Vdd, etc. are the important and well‐accepted methods to control the static and dynamic power consumption in the SRAM‐based FPGA. SRAM is the basic component used in the implementation of SRAM‐based FPGA and occupies larger area in the chip and consumes considerable amount of static/dynamic power. The power consumption in the cell can be reduced by reducing the bitline length, designing compact peripheral circuits, or improving the cell at the architecture level. Researchers have proposed subthreshold SRAM cell to reduce the power consumption but it degrades the reliability of the cell. To address dynamic power and static power consumption in the cell, a data aware cell is proposed with isolated write and read ports. Both operations are performed on single bitline. Power‐efficient peripheral circuits like write/read decoder, address decoder circuit, and sense amplifier were also presented in the chapter to realize the SRAM array. The proposed cell and implemented array consume lower overall power due to lower discharging activity at BL and leakage current control due to stack effect. The area overhead in the proposed cell is not a serious threat in the implementation of array because of lower bitline leakage more number of cells can be connected on the same bitline.