Open access peer-reviewed chapter

Design of Low-Cost Reliable and Fault-Tolerant 32-Bit One Instruction Core for Multi-Core Systems

Written By

Shashikiran Venkatesha and Ranjani Parthasarathi

Submitted: 18 January 2022 Reviewed: 23 January 2022 Published: 20 March 2022

DOI: 10.5772/intechopen.102823

From the Edited Volume

Quality Control - An Anthology of Cases

Edited by Leo D. Kounis

Chapter metrics overview

224 Chapter Downloads

View Full Metrics

Abstract

Billions of transistors on a chip have led to integration of many cores leading to many challenges such as increased power dissipation, thermal dissipation, occurrence of faults in the circuits, and reliability issues. Existing approaches explore the usage of redundancy-based solutions for fault tolerance at core level, thread level, micro-architectural level, and software level. Core-level techniques improve the lifetime reliability of multi-core systems with asymmetric cores (large and small cores), which have gained momentum and focus among a large number of researchers. Based on the above implications, multi-core system using one instruction cores (MCS-OIC) factoring its features are proposed in this chapter. The MCS-OIC is an asymmetric multi-core architecture with MIPS core as the conventional core and OICs as the warm standby-redundant core. OIC executes only one instruction named ‘subleq _ subtract if less than or equal to zero’. When there is one of the functional units (i.e., ALU) of any conventional core fails, the opcode of the instruction is sent to the OIC. The OIC decodes the instruction opcode and emulates the faulty instruction by repeated execution of the ‘subleq’ instruction, thus providing fault tolerance. To evaluate the idea, the OIC is synthesized using ASIC and FPGA. Performance implications due to OICs at instruction and application level are evaluated. Yield analysis is estimated for various configurations of multi-core system using OICs.

Keywords

  • fault tolerance
  • reliability
  • one instruction core
  • multi-core
  • yield

1. Introduction

Researchers have predicted about an eight percent increase in soft-error rate per logic state bit in each technology generation [1]. According to the International Telecommunication Roadmap for Semiconductors (ITRS) 2005 and 2011, reduction in dynamic power, increase in resilience to faults and heterogeneity in computing architecture pose a challenge for researchers. According to the International Roadmap for Device and System (IRDS) roadmap 2017, device scaling will touch the physical limits with failures reaching one failure per hour as shown in Figure 1. The soft error rate (SER) is the rate at which a device or system encounters or is predicted to encounter soft errors per unit of time, and is typically expressed as failures-in-time (FIT). It can be seen, from Figure 1 [2, 3, 4] that, at 16 nm process node size, a chip with 100 cores could come across one failure every hour due to soft errors.

Figure 1.

SERs at various technology node.

This decrease in process node size and increase in integration density as seen in Figure 1, has the following effects.

  1. Number of cores per chip has increased. Due to increase in number of cores, size of the last level cache (LLC) has increased. For example, NVIDIA’s GT200 architecture GPU did not have an L2 cache, the Fermi GPU, Kepler GPU, Maxwell GPU has 768KB LLC, 1536KB LLC and 2048KB LLC respectively [5]. Similarly, Intel’s 22 nm Ivytown processor has a 37.5 MB static random-access memory (SRAM) LLC (Rusu 2014) [6] and 32 nm Itanium processor had a 32 MB SRAM LLC (Zyuban 2013) [7]. Consequence of larger cache size has led to exponential increase in SER.

  2. Low swing interconnect circuits are being used in CMOS transmission system. This has proved to be an energy efficient signalling system compared to conventional full swing interconnects circuits. However, incorrect sampling of the signals in low swing interconnect circuits together with interference and noise sources can induce transient voltages in wires or internal receiver nodes resulting in incorrect value being stored at receiver output latch [8].

This scenario can be envisaged as a "fault wall”. In order to surmount the fault wall scenario, reliability has been identified as a primary parameter for future multi-core processor design [9, 10]. Similarly, ITRS 2005 and 2011, have also identified increase in resilience to faults as a major challenge for researchers. Hence, a number of researchers have started focusing on resilience to faults and reliability enhancement in multi-core processors. The chapter focuses on providing fault tolerance solutions for processor cores in multi-core systems.

Advertisement

2. Motivation

As seen in Figure 1, the total FIT per chip increases with number of cores per chip increasing. In order to accommodate higher number of cores per chip, (1) total FIT per chip has to be maintained constant (or no change), and (2) SER per core needs to be reduced. In the present-day processor cores, the frontend of the core comprises of decode queue, instruction translation lookaside buffer, and latches. The backend of the core comprises of arithmetic logic unit, register files, data translation lookaside buffer, reorder buffers, memory order buffer, and issue queue. SER from backend and the frontend of the core is 74.48% and 25.22% respectively. In the present processor cores, latches are hardened [11, 12] cache and large memory arrays are protected using error correcting codes (ECC) [13, 14]. The SER from backend of the processor is more when compared to front end and is mainly due to arithmetic logic unit. The FIT from the arithmetic logic unit of the processor core has started reaching higher levels which needs robust fault mitigation approaches for present and future processors. Hence addressing the reliability issues of the core (arithmetic logic unit in backend) is more significant in improving the reliability of the multi-core system [15, 16]. Conventional approaches to handle soft errors consumes more power and area. Hence, the chapter focuses on using heterogeneous model with low cost (“low cost” denote low power and lesser area of OICs) fault tolerant cores to improve reliability of multi-core systems.

2.1 Chapter contributions

Contributions of the chapter are briefly presented below.

  1. The microarchitecture consisting of control and data path for OIC is designed. Four modes of operation in 32-bit OIC namely (a) baseline mode (b) DMR mode (c) TMR mode and (d) TMR with self-checking subtractor (TMR + SCS) are introduced.

  2. The microarchitecture of 32-bit OIC and multi-core system integrated with 32-bit OIC are implemented using Verilog HDL. The design is synthesized in Cadence Encounter (R) RTL Compiler RC14.28 –V14.20 (Cadence design systems 2004) using TSMC 90nm technology library (tcbn90lphptc 150).

  3. Dynamic power, area, critical path and leakage power for four modes of OIC are estimated and compared.

  4. Dynamic power and area of OIC and URISC++ are compared.

  5. Area and power are estimated for multi-core system consisting of 32-bit OIC.

  6. The OIC is synthesized using Quartus prime Cyclone IVE (Intel, Santa Clara, CA) with device EP4CE115FE29C7. Number of logical elements and registers are estimated.

  7. Number of logical elements and registers in OIC and URISC++ are compared.

  8. Using Weibull distribution, the reliability for the four modes of OIC are evaluated and compared.

  9. Using Weibull distribution, the reliability for OIC and URISC++ are evaluated and compared.

  10. Performance overhead at instruction level and application level is estimated.

  11. Yield analysis for proposed multi-core system with OICs is presented.

2.2 Chapter organization

The remaining portion of the chapter is organized as follows as: Section titled “3. An Overview on 32-bit OIC” presents (a) an outline of 32-bit OIC (b) one instruction set of OIC (c) modes of operation of OIC (d) microarchitecture of OIC (e) microarchitecture of multi-core system consisting of OIC (f) instruction execution flow in multi-core system using one instruction cores (MCS-OIC); Section titled “4. Experimental results and discussion” presents power, area, register and logical elements estimation for OIC, and power, area estimation for MCS-OIC; Section titled “5. Performance implications in multi-core systems” presents performance implications at instruction level and application level; Section titled “6. Yield analysis for MCS-OIC” presents yield estimates for the proposed MCS-OIC; Section titled “7. Reliability analysis of 32-bit OIC” presents reliability modelling of OIC and its estimate in different operational modes; the conclusion of the chapter is presented in the Section titled “8. Conclusion”; the relevant references are citated in the Section titled “References”.

Advertisement

3. An overview on 32-bit one instruction core

A 32-bit OIC [17] is designed to provide fault tolerance to a multi-core system with 32-bit integer instructions of conventional MIPS cores. OIC is an integer processor. The terms “32-bit OIC” and “OIC” are interchangeably used in this thesis. OIC executes only one instruction, namely, “subleq – subtract if less than or equal”. The OIC has three conventional subtractors and an additional self-checking subtractor. A conventional core that detects faults in one of the functional units (i.e., ALU) sends the opcode with operands to the OIC. In this thesis, the OIC is designed to support the instruction set of 32-bit MIPS core. However, it can be designed to support 32 bit ×86/ARM instruction set by making necessary changes in the instruction decoder. The OIC emulates the instruction by repetitively executing the subleq instruction in a predetermined manner. There are four modes of operation in OIC and they are (a) baseline mode (b) DMR mode (c) TMR mode and (d) TMR + Self Checking Subtractor (SCS) or TMR + SCS mode. TMR + SCS is the “high resilience mode” of OIC. Baseline mode is invoked only when soft error detection and correction alone are required.

3.1 One instruction set

“Subleq – subtract if less than or equal” is the only instruction executed by the OIC. The syntactic construct of the subleq instruction is given below.

Subleq A, B, C; Mem [B] = Mem [B] – Mem [A]

; If (Mem [B] ≤ 0) go to C;

It is interpreted as: “subtract the value at the memory location A from the value at the memory location B; store the result at the memory location B; If the value at the memory location B is less than or equal to zero, then jump to C.” The subleq instruction is Turing complete. The instruction set of a core or processor is said to be Turing complete, if in principle, it can perform any calculation that any other programmable computer can. As an illustration, the equivalent synthesized subleq instructions for ADD, INC, MOV, DEC and RSB (Reverse subtract) instructions are given in the Table 1.

ADD a,a,bINC aMOV a, bRSB b, a, b
1.Subleq a, z,21.Subleq One, z,21.Subleq a, a,21.Subleq a, b,2
2.Subleq z, b,32.Subleq z, a,32.Subleq b, z,32 ret
3 Subleq z, z,43.Subleq z, z,43.Subleq z, a,4DEC a
4 ret4.ret4.Subleq z, z,5Subleq one, a
5.retret

Table 1.

Sequence of synthesized Subleq instruction.

3.2 Modes of operation

The OIC operates in four modes as mentioned above. They are (a) baseline mode (b) DMR mode (c) TMR mode and (d) TMR + Self Checking Subtractor (SCS) or TMR + SCS mode.

  1. Baseline mode: In this mode, only the self-checking subtractor is operational. The results from the subtractor are verified by the self-checker. If the results differ, the subtraction operation is repeated to correct the transient faults. Transient faults are detected and corrected in this mode. If the results do not match again, a permanent fault is detected.

  2. DMR mode: In this mode, only two subtractors are operational. The results of the two subtractors are compared using a comparator. If the results differ, the subtraction operation is repeated to correct the transient faults. The transient faults are detected and corrected in this mode. If one of the two subtractors fails, a permanent fault is detected, and the OIC switches to baseline mode.

  3. TMR mode: In this mode, all three subtractors are operational. The results from the three subtractors are compared using three comparators. The voters check the results from the comparators and perform majority voting. To correct the transient faults, the operations are repeated. If anyone subtractor fails, the faulty subtractor is disabled. In this mode, results from the redundant subtractors are fed back on special interconnects to the inputs of the multiplexer. OIC then switches to DMR mode. It is assumed that two subtractors do not fail simultaneously. Occurrence of one permanent fault is detected and tolerated in this mode.

  4. TMR + SCS mode: TMR + SCS mode is the initial mode of operation in OIC. In this mode, all three subtractors and SCS are operational. Both permanent and transient faults are detected and corrected. The results of three subtractors and SCS are compared using a comparator. If the results differ, then entire operation is repeated to correct the transient faults. If results continue to differ, then OIC switches to TMR mode.

3.3 Micro-architecture of OIC

The micro-architecture of the OIC is given in Figure 2. The micro-architecture of the OIC can be divided into two parts: the control unit and data-path unit. The control unit consists of a 12-bit program counter (PC), an instruction decoder, a 12-bit control word memory register and control word memory. The control memory is safeguarded by (12, 4) Hamming codes [18]. All single-bit errors are detected and corrected by Hamming codes. The data-path unit consists of four multiplexers, one demultiplexer, three subtractors, one self-checking subtractor (SCS), three comparators and one voter unit. Normally, the register files occupy a large die area in a core and are exposed to high energy particles. In the spheres of replication, the register files also have high access latency and power overhead due to their fortification from ECC. The OIC does not have large register files that are likely to propagate transient faults or soft errors to other subsystems. The OIC uses very few registers. Once the operands from faulty core are admitted, they are stored in the registers. The results computed by the subtractors are compared and fed back on a separate interconnect line to the respective multiplexers. The intermediate results are not stored in the registers.

Figure 2.

Control unit and data path unit of 32-bit OIC.

3.4 Microarchitecture and instruction execution flow in MCS-OIC

A Multicore system comprising one 32-bit MIPS core and one 32 bit OIC occupying the upper half and lower half portions respectively in the micro-architecture, is shown in Figure 3. The MIPS core is a five-stage pipelined scalar processor. Instruction Fetch (IF), Instruction Decode (ID), Execution (EXE), Memory access (MEM) and Write Back (WB) are the five stages in the MIPS pipeline. IF/ID, ID/EXE, EXE/MEM, and MEM/WB are the pipeline registers. PC is a program counter and LMD, Imm, A, B, IR, NPC, Aluoutput, and Cond are temporary registers that hold state values between clock cycles of one instruction. The fault detection logic (FDL) detects faults in all the arithmetic instructions (except logical instructions) by concurrently executing the instructions. The results of ID/EXE.Aluoutput and FDL are compared to detect the fault. If a fault is found then the pipeline is stalled. The IF/ID.opcode (in IR) and operands ID/EXE.A and ID/EXE.B are transferred to OIC as shown in Figure 4. The IF/ID.opcode is decoded and concurrently ID/EXE.A and ID/EXE.B values are loaded into the OIC registers (X & Y). The OIC.PC is initialized and simultaneously first control word from memory is loaded into its control word register. During every clock cycle, the control bits from control word register are sent to the selection lines of the multiplexer that control the input lines to the subtractors. At every clock cycle, subtraction is performed to emulate the instruction sent from the MIPS core. Finally, the computed result is loaded into MEM/WB.Aluoutput and the MIPS pipeline operation is resumed. The sequence of events from fault detection to results loaded into MEM/WB.Aluoutput register of the MIPS core is shown in Figure 4.

Figure 3.

Multi-core system consisting of one 32-bit MIPS core and one 32-bit OIC.

Figure 4.

Sequence of events from fault detection to loading of results into Mem/WB.Aluoutput register of MIPS core.

Advertisement

4. Experimental results and discussion

The micro-architecture of the OIC is implemented using Verilog HDL and synthesised in ASIC and FPGA platforms to estimate hardware parameters (area, critical path delay, leakage power, dynamic power) and number of logical elements, register usage respectively. In the Section 4.1, comparison of area, power, registers and number of logical elements of OIC with an approach named URISC proposed by [19] and URISC++ proposed by [20] is presented. Notably, URISC/URISC++ implement one instruction set. The URISC/URISC++, a co-processor for TigerMIPS, emulates instructions through the execution of subleq instruction. TigerMIPS performs static code insertion in both control flow and data flow invariants so as to detect faults by performing repeated executions of subleq within the co-processor. Comparative analysis on hardware parameters for different modes of OIC are discussed in Section 4.2.

ASIC simulation: The OIC given in Figure 2 and multi-core system in Figure 3 has been implemented using Verilog HDL and then synthesized in Cadence Encounter (R) RTL Compiler RC14.28 –V14.20 (Cadence design systems 2004) using TSMC 90 nm technology library (tcbn90lphptc 150). The area, power (dynamic, leakage, net, internal) and critical path delay are estimated for the OIC and tabulated in Table 2.

Block nameArea (μm2)Leakage power (nW)Internal (nW)Net (nW)Dynamic power (nW)Critical path delay (ps)
Control path59039.8779,498.4821,881.4010,1379.88
(Control path + data path)8122704.0810,51,631.88346,487.4513,98,115.348608
Sub blocks
Subtractor58167.9841,676.83671148,387.83
Comparator61567.0442,457.839954.3852,411.44

Table 2.

Implementation of 32 bit OIC results using 90 nm TSMC technology.

FPGA synthesis: The OIC is synthesized using Quartus prime Cyclone IVE with device EP4CE115FE29C7 and the results are illustrated in Tables 3 and 4.

(A) blocksLogical elementsDedicated registers
OIC (TMR + SCS)530160
Subtractor (1)33
Comparator (1)43
(B) modesLogical elements
Baseline100
DMR303
TMR486

Table 3.

FPGA synthesis results for OIC.

CoresLogical elementsDedicated registers
OIC530160
URISC15,0195232
URISC++15,0815233

Table 4.

FPGA synthesis results comparison.

Leakage power and dynamic power: Power dissipation shown in Table 2 is understood as sum of dynamic power and static power (or cell leakage). Static power is consumed when gates are not switching. It is caused by current flowing through transistors when they are turned off and is proportional to the size of the circuit. Dynamic power is a sum of net switching power and cell internal power. The net switching power is the power dissipated in the interconnects and the gate capacitance. The cell internal power is the power consumed within a cell by charging and discharging cell internal capacitances. The total power is a sum of the dynamic power and the leakage power.

Multi2sim (version 5.0): Multi2sim supports emulation for 32 bit MIPS/ARM binaries and simulation for 32-bit ×86 architectures. It performs both functional and timing simulations. The performance loss is estimated for compute intensive and memory intensive micro-benchmarks using a Multi2sim simulator. Performance loss for micro-benchmarks listed in Table 6 are illustrated in Figures 611.

4.1 Comparative analysis: power, area, registers and logical elements

With the critical path delay at 8608 ps, the operating frequency of the circuit is 115 MHz with power supply at 1.2v. OIC is a low power core consuming 1.3 mW, with die area of 8122 μm2. The die area of conventional MIPS core is 98,558 μm2 which is 14.2× larger than OIC core. The MIPS core consumes a total power of 1.153 W and the 32-bit OIC consumes 1.39 mW; order of difference in powers of 10 is three. The registers in OIC are PC and temporary registers which hold the operands. But they are not designed and managed as a register file. Tables 3 and 4 provide the register count and logical elements count for OIC and URISC++. The number of logical elements in OIC is 3.51% and 3.52% of the logical elements in URISC and URISC++ respectively. The number of registers in OIC is 3.05% of URISC++. URISC++ adds 62 logical elements and one additional register to the architecture of URISC. The logical elements in URISC++ consume 6.6 mW. URISC++ has 650 registers or 14.3% of registers in TigerMIPS. URISC++ has two large register files. URISC++ altogether consumes 1.96 W. Thus, OIC consumes less power than URISC++.

4.2 Comparative analysis: four modes of OIC

The critical path delay, area, dynamic power and leakage power for the four modes of OIC namely baseline mode, DMR mode, TMR mode and TMR + SCS mode are normalized to baseline mode and shown in Figure 5. The area overhead of TMR + SCS mode is 68.43% of the baseline, area overhead of TMR mode is 65.37% of the baseline and for DMR mode it is 51.4%. The comparators and subtractors occupy 22.71% and 28.6% of TMR + SCS mode area respectively. The size of the voter is negligible in TMR + SCS mode and TMR mode. In the critical path delay, 10% increase is noticed from the baseline to TMR + SCS mode. The critical path traverses from the subtractor input to the comparator, and then to the voter, passing through select logic and ends at an input line. Delay would not differ much between TMR mode and TMR + SCS mode.

Figure 5.

(a) Area, (b) critical path delay, (c) leakage power and (d) dynamic power (y-axis—normalized values to baseline).

Both the dynamic power and leakage power for TMR mode and DMR mode increase significantly due to redundant subtractors and comparators which are not in the baseline. The dynamic power overhead of TMR mode and DMR mode is 60% and 73% of the baseline. It is 75% for TMR + SCS mode. The static power or leakage power is proportional to the size of the circuit. The TMR + SCS mode has leakage power which is 76% more than the baseline. The TMR and DMR mode have leakage power which is 72% and 50% more than the baseline. In Table 4 which depicts FPGA synthesis results, it is observed that the number of logical elements in TMR + SCS mode and DMR mode is 79% and 66% more than the baseline. From Tables 2 and 4, it is observed that TMR mode with additional self-checking subtractor in TMR + SCS mode costs more than the baseline, but still TMR + SCS/OIC will be a suitable fault tolerant core for a low power embedded system.

4.3 Power and area estimation for MCS-OIC

The area and power for the micro-architecture of multi-core system (one MIPS core with one OIC) shown in Figure 3, are estimated using ASIC simulation. The multi-core system occupies a total area of 306,283 μm2 and consumes a total power of 1.1554 W. The FDL occupies an area of 6203 μm2 which is 2% of the total area occupied by the system. The OIC occupies an area of 8122 μm2 which is 2.6% of the total area occupied by the system. The FDL consumes a power of 1.2 mW and OIC consumes a power of 1.4 mW which are negligible when compared to the total power. The redundancy-based core level fault mitigation techniques/approaches such as Slipstream [21], dynamic core coupling (DCC) approach proposed by [22], configurable isolation [23], reunion is a fingerprinting technique proposed by Smolens et al. [24] have nearly 100% area overhead and obviously larger power overhead.

Advertisement

5. Performance implications in MCS-OIC

For every instruction emulated on OIC, an additional three clock cycles are needed for transfer of opcodes and operands, and two clock cycles are needed to resume the pipeline in the MIPS processor. The two terms defined below highlight the latency that incur in instruction execution, presented in the following subsection.

5.1 Performance overhead at instruction level

Definitions: (a) The instruction execution time by emulation (IETE) is defined as the number of cycles needed to execute the instruction on OIC. (b) Total execution time (TET) is defined as the sum of IETE and time (in clock cycles) to transfer opcodes, operands (from MIPS to OIC) and results (from OIC to MIPS). In other words, this indicates that time in clock cycles between pipeline stall and resumption of pipeline. The TET and IETE for instructions are tabulated in Table 5.

InstructionsIETETETClock cycle in MIPS/LEON 2FT/3FT
ADD491
MOV5101
INC491
DEC151
SUB151
MUL7 (per iteration)Min 126
DIV5 (per iteration)Min 1034

Table 5.

IETE and TET for instructions.

5.2 Performance overhead at application level

In the previous section, performance loss at instruction level caused due to transfer of operands and results back to host core is discussed. This would cause an accumulative loss in performance of application and the same is discussed in this section. The OIC supports 32 bit ISA of MIPS R3000/4000 processor operating at a frequency of 250 MHz. OIC operates at a frequency of 115 MHz, thereby incurring a performance loss while emulating the instructions from a faulty functional unit in MIPS core. The Multi2sim, a cycle accurate simulator together with a cross compiler, mips-linux-gnu-gcc/mips-unknown-gnu-gcc is used to estimate the simulation execution time for a set of micro-benchmarks. The emulation of only arithmetic instructions on OIC is considered for estimating the performance loss as they constitute nearly 60% of total instructions in integer application programs. The compute intensive and memory intensive micro-benchmark programs considered are listed in Table 6.

S. noMicro-benchmarkCPU/memory intensiveInput formInput size
1Matrix multiplication (single/multithreaded)CPUMatrix[10 × 10], [100 × 100], [1000 × 1000] elements
2Binary search (single/multithreaded)MemoryArray3000, 30,000, 300,000 elements
3Sieve of EratosthenesCPUArray1000, 10,000, 100,000 prime number limit
4CPU-schedulingCPUArray1000, 10,000, 100,000 processes
5Quicksort (recursion)MemoryArraySorted 100, 1000, 10,000 elements for worst case analysis
6Radix sortMemoryArray1000, 10,000, 100,000 elements

Table 6.

CPU intensive and memory intensive micro-benchmarks.

5.2.1. Memory intensive micro-benchmarks

The performance loss for memory intensive micro-benchmark programs, namely, binary search, quicksort (using recursion), and radix sort, are given in Figures 68 respectively. The performance loss for CPU intensive micro-benchmark programs, namely, matrix multiplication, CPU scheduling, and sieve of Eratosthenes, are given in the Figures 911 respectively. The baseline indicates the simulated execution time of micro-benchmarks with no arithmetic instructions emulated on OIC. The performance loss is quantified for micro-benchmarks with respect to simulated execution time of the baseline (with varying input data sets/size).

Figure 6.

Performance overhead in binary search by emulating ADD using subleq instruction.

Figure 7.

Performance overhead in Quicksort by emulating ADD using subleq instruction.

Figure 8.

Performance overhead in Radix sort by emulating ADD and DIV using subleq instruction.

Figure 9.

Performance overhead in matrix multiplication by emulating ADD and MUL using subleq instruction.

Figure 10.

Performance overhead in CPU scheduling by emulating ADD and SUB using subleq instruction.

Figure 11.

Performance overhead in Sieve of Eratosthenes by emulating MUL and ADD using subleq instruction.

As shown in Figure 6, Binary search with emulation of ADD instructions incurs performance loss of 1.77×, 3.59× and 4.59× with input size of 3000, 30,000 and 300,000 respectively, when compared to baselines. Significant proportion of ADD instructions is associated with incrementing or decrementing counters and effective addresses. OIC do not fetch operands or store results directly to main memory. Main memory latency is not taken into account for performance loss estimation. The number of ADD instructions executing as a part of the algorithmic phase of program execution does not increase exponentially with increase in input data sets. Hence, performance loss impact is minimal in algorithmic phase and is higher during fetching and storing of the input data sets. In case of multithreaded binary search, multi-core setup consisting of two cores core-0 and core-1 each with single thread is used to estimate the performance loss. The performance loss is similar to that of single threaded binary search. It is due to the fact that majority of the ADD instructions are associated with LOAD and STORE instructions.

Quicksort (with emulation of ADD instruction), implemented using recursion for sorted data elements (worst case analysis) incurs performance loss of 3.85×, 6.31×, and 6.99× for data size of 100, 1000 and 10,000 respectively as shown in Figure 7. For best case analysis of quicksort for 10,000 elements, performance loss reduces to 1.008×. Due to recursion, majority of ADD instructions are associated with LOAD/STORE instructions. In radix sort, occurrence of ADD instructions is more compared to DIV instructions. Since it is memory intensive method of sorting, large number of ADD instructions is used to increment counters and constants associated with LOAD/STORE instructions. The performance loss due to emulation of ADD instructions for radix sort is 2.45×, 4.79×, and 5.96× for input sizes of 1000, 10,000 and 10,000 as shown in Figure 8. For DIV instructions, performance loss is 1.4×, 2.05×, and 2.37× for input sizes of 1000, 10,000 and 10,000 elements.

As shown in Figure 9, matrix multiplication with emulation of ADD and MUL instructions executing in the algorithmic phase of the program, incurs a performance loss of <1.56×, 4.09×, 4.0×> (for ADD) and <1.632×, 7.62×, 7.99×> (for MUL), for input matrix sizes of 10 × 10, 100 × 100, and 1000 × 1000 respectively. In CPU scheduling, ADD and SUB instructions emulation incur a performance loss of <2.45×, 4.79×, 5.96×> and <1.4×, 2.05×, 2.3×> for input data set of 1000, 10,000 and 100,000 processes respectively as shown in Figure 10. In sieve of Eratosthenes, emulation of MUL and ADD instructions incur a performance loss of <1.89×, 5.03×, 7.63×> and <1.48×, 2.9×, 3.8×> for input data set size of 1000, 10,000 and 100,000 respectively as shown in Figure 11.

For multithreaded matrix multiplication, multi-core configuration consisting of two cores: core-0 and core-1 with single thread each is considered. The ADD and MUL instructions of core-0 and core-1 are emulated on single OIC due to failures in adder and multiplier units respectively. The performance loss is estimated as 2.04×, 10.07×, and 10.99× for matrix size of 10 × 10, 100 × 100, and 1000 × 1000 respectively as shown in Figure 9. Since, simultaneous access to single OIC from two cores is not permitted, performance loss includes the waiting time between subsequent ADD and MUL instructions emanating from core-0 and core-1. Waiting time alone is greater than 45% of the performance loss. In this multi-core configuration, consisting of two MIPS cores with single OIC, it bears the brunt of multiple functional unit failures in two cores. An Additional OIC would bring down the performance loss by 1.5× (for matrix size of 10 × 10) and 7× (for matrix size of 100 × 100/1000 × 1000) and eliminate the need for instructions to wait for execution on OIC. On 1:1 and 1: N basis i.e., one MIPS core with one or more OICs can scale to 100 MIPS core with 100 or more OICs.

It may be noticed that the performance loss does not vary when there is change of mode in OIC from TMR + SCR to TMR, or TMR to DMR as the number of instructions executed remains the same.

Advertisement

6. Yield analysis for MCS-OIC

This section examines the effect of fault tolerance provided in MCS-OIC on the yield. As discussed in the section which presents design of OIC, it is assumed that two subtractors do not fail simultaneously. In the TMR + SCR, TMR, and DMR modes, OIC repeats the instruction execution if the results differ, to avoid transient failures. The spatial and temporal redundancy to avoid permanent and transient faults in OIC makes it defect tolerant. The arithmetic logic unit in MIPS is protected by functional support provided by OIC. The remaining portion of MIPS are hardened and protected by ECC. The die yield for proposed different configurations of MCS-OIC is estimated using the equations presented below.

6.1 Terms and parameters

  1. Original die: It is the die consisting of MIPS cores only.

  2. Fault tolerant die: It is the die consisting of MIPS cores and OICs.

  3. Regular dies per wafer: It is the number of original dies per wafer. The number of regular dies per wafer is estimated using the Eq. (1).

    Regular diesperwafer=πdiameter/22Areaπ×diameter2×AreaE1

    Where diameter refers to the diameter of the wafer, Area refers to the area of the die.

  4. Die yield: Ignoring full wafer damage, the yield for single die is approximated using negative binomial approximation as given in the Eq. (2).

    Dieyield=1+defect density×AreacpcpE2

    Where cp denotes cluster parameter or manufacturing complexity, defect density denotes number of defects per unit area.

  5. Regular working dies per wafer: It is die yield times the regular dies per wafer. It is estimated using the Eq. (3).

    Regular working diesperwafer=1+defect density×Areacpcp.πdiameter/22Areaπ×diameter2×AreaE3

  6. Regular fault tolerant dies per wafer:

    The area of fault tolerant core is expressed as summation of area of original die and area of OIC. If the area of OIC is expressed as δ0<δ<1 times the area of original design then ((1 + δ) × area of the original design)) denotes the area of the fault tolerant die. By substituting (1 + δ) × area in the number of regular fault tolerant cores per wafer can be estimated and is given in the Eq. (4).

    Regular fault tolerant diesperwafer=πdiameter/221+δAreaπ×diameter(21+δArea)E4

  7. Regular working fault tolerant dies per wafer: It is die yield times the regular fault tolerant die. It is estimated using the Eq. (5).

Regular working fault tolerant diesperwafer=(1+defect density×Areacp)cp.πdiameter/221+δAreaπ×diameter(21+δArea)E5

6.2 Parametric evaluation and discussion

The die yield for the original die and fault tolerant die estimated for one MIPS core with one/two/four/ OICs, two MIPS core with one/two/four/ OICs, four MIPS core with one/two/four/ OICs, and eight MIPS core with one/two/four/six OICs is tabulated in Tables 710 respectively. The defect density is varied from 9.5, 5.0, to 1.0 and wafer diameters varied from 300 mm, 200 mm to 100 mm to estimate die yield of the original die and fault tolerant die. The cp is fixed at 4.0. The die yield of the original die at defect densities are 1.0, 5.0, and 9.5 are 0.9971, 0.9855, and 0.9727 respectively. The die yields for three fault tolerant dies each consisting of one MIPS core with first die with one OIC, second with two OICs, third with four OICs for 300 mm wafer with defect density at 1.0 is <0.9970/0.9969/0.9967> respectively as shown in the Table 7, which is slightly lesser than the yield of the original die. The average of the differences between yield of original die and fault tolerant dies with defect density 1.0 is 0.0002 which is negligible value. Similarly, the average of the differences between yield of original die and fault tolerant dies at defect density 5.0 and 9.5 are 0.0009 and 0.0017 respectively. It is observed that an increase in the defect density decreases the yield.

Wafer diameter100 mm200 mm300 mm
Defect density9.55.01.09.55.01.09.55.01.0
Number of regular dies per wafers26,48926,48926,48910,678110,678110,678124,087624,087624,0876
Die yield of original die0.97270.98550.99710.97270.98550.99710.97270.98550.9971
Number of regular working dies per wafer25,76726,10626,41210,387010,523710,647023,430923,739124,0174
Number of regular fault tolerant dies per waferOne OIC25,76825,76825,768103,884103,884103,884234,347234,347234,347
Two OICs25,08425,08425,084101,139101,139101,139228,163228,163228,163
Four OICs23,82023,82023,82096,06196,06196,061216,723216,723216,723
Die yield of fault tolerant dieOne OIC0.97190.98510.99700.97190.98510.99700.97190.98510.9970
Two OICs0.97120.98470.99690.97120.98470.99690.97120.98470.9969
Four OICs0.96970.98390.99670.96970.98390.99670.96970.98390.9967
Number of regular working fault dies per waferOne OIC25,04625,38525,69110,097410,2340103,573227,784230,864233,645
Two OICs24,36324,70125,00798,23199,595100,828221,603224,681227,461
Four OICs23,10023,43723,74393,15794,51895,750210,170213,243216,021

Table 7.

Die yield for fault tolerant die consisting of one MIPS core with one/two/four OICs.

Wafer diameter100 mm200 mm300 mm
Defect density9.55.01.09.55.01.09.55.01.0
Number of regular dies per wafers13,15913,15913,15953,22053,22053,220120,182120,182120,182
Die yield for original die0.94630.97130.99420.94630.97130.99420.94630.97130.9942
Number of regular working dies per wafer12,45412,78213,08250,36851,69452,911113,740116,736119,484
Number of regular fault tolerant dies per waferOne OIC12,97712,97712,97752,48752,48752,487118,529118,529118,529
Two OICs12,49412,49412,49450,54450,54450,544114,150114,150114,150
Four OICs12,45912,45912,45950,40350,40350,403113,833113,833113,833
Die yield for fault tolerant dieOne OIC0.94560.97090.99410.94560.97090.99410.94560.97090.9941
Two OICs0.94490.97050.99400.94490.97050.99400.94490.97050.9940
Four OICs0.94280.96930.99370.94280.96930.99370.94280.96930.9937
Number of regular working fault tolerant dies per waferOne OIC12,27212,60012,90049,63650,96252,177112,091115,085117,830
Two OICs12,09512,42312,72348,92450,24951,464110,486113,478116,222
Four OICs11,59211,91912,21946,90048,22249,435105,923108,908111,650

Table 8.

Die yield for fault tolerant die consisting of two MIPS core with one/two/four OICs.

Wafer diameter100 mm200 mm300 mm
Defect density9.55.01.09.55.01.09.55.01.0
Number of regular dies per wafers65196519651926,48926,48926,48959,91059,91059,910
Die yield for original die0.89630.94360.99840.89630.94360.99840.89630.94360.9984
Number of regular working dies per wafer58436152644423,74424,99726,18253,70056,53659,216
Number of working fault tolerant dies per waferOne OIC64746474647426,30526,30526,30559,49559,49559,495
Two OICs64286428642826,12426,12426,12459,08559,08559,085
Four OICs63406340634025,76825,76825,76858,28258,28258,282
Die yield for fault tolerant dieOne OIC0.89560.94330.98830.89560.94330.98830.89560.94330.9883
Two OICs0.89490.94290.98820.89490.94290.98820.89490.94290.9882
Four OICs0.89290.94170.98800.89290.94170.98800.89290.94170.9880
Number of regular working fault tolerant dies per waferOne OIC57986106639823,56124,81425,99853,28856,12258,800
Two OICs57536062635323,38124,63325,81752,88155,71358,391
Four OICs56235930622122,85524,10425,28651,69454,52057,195

Table 9.

Die yield for fault tolerant die consisting of four MIPS core with one/two/four OICs.

Wafer diameter100 mm200 mm300 mm
Defect density9.55.01.09.55.01.09.55.01.0
Number of regular dies per wafers32173217321713,15913,15913,15929,82729,82729,827
Die yield for original die0.80570.89120.97700.80570.89120.97700.80570.89120.9770
Number of regular working dies per wafer25922867314310,60311,72812,85624,03426,58429,141
Number of regular fault tolerant dies per waferOne OIC32053205320513,11313,11313,11329,72329,72329,723
Two OICs31943194319413,06813,06813,06829,62029,62029,620
Four OICs31723172317212,97712,97712,97729,41529,41529,415
Six OICs31503150315012,88812,88812,88829,21429,21429,214
Die yield for fault tolerant dieOne OIC0.80510.89090.97690.80510.89090.97690.80510.89090.9769
Two OICs0.80400.89020.97670.80400.89020.97670.80400.89020.9767
Four OICs0.80280.88950.97650.80280.88950.97650.80280.88950.9765
Six OICs0.80160.88880.97640.80160.88880.97640.80160.88880.9764
Number of regular working fault tolerant dies per waferOne OIC25812856313110,55911,68312,81023,93326,48129,037
Two OICs25592833310910,47011,59212,71923,73226,27728,831
Four OICs25372811308710,38211,50312,62923,53526,07528,628
Six OICs25162790306510,29611,41512,54123,34025,87728,428

Table 10.

Die yield for fault tolerant die consisting of eight MIPS core with one/two/four/six OICs.

The die yield of the fault tolerant dies each consisting of two MIPS cores with <one/two/four> OICS with defect density 1.0 is <0.9941, 0.9940, 0.9937> respectively as shown in Table 8. The die yield of the original die at defect densities 1.0, 5.0, and 9.5 is 0.9942, 0.9713, and 0.9463 slightly higher than yield of fault tolerant dies. The average of the differences between yield of original die and fault tolerant dies is 0.00026. The average of the differences between yield of original die and fault tolerant dies increases by 0.0009 and 0.0018 for defect density 5.0 and 9.5 respectively.

The die yields of the original die at defect densities 1.0, 5.0, and 9.5 are 0.9884, 0.9436, and 0.8963 respectively. From Table 9, the die yield of the fault tolerant dies each consisting of four MIPS cores with <one/two/four> OICS with defect density 1.0 are <0.9883, 0.9882, 0.9880> respectively. It is observed that average of the differences between yield of the original die and fault tolerant dies at varying defect densities is similar with other alternatives discussed above.

From Table 10, the die yield of the fault tolerant dies each consisting of eight MIPS cores with <one/two/four/six> OICS with defect density 1.0 is <0.9769, 0.9767, 0.9765, 0.9764> respectively. The die yield of the original die at defect densities 1.0, 5.0, and 9.5 is 0.9769, 0.8912, and 0.8057 respectively. The average of the differences between the original die and fault tolerant dies with defect density of 9.5 is 0.0031, is the highest among the averages. From this data, it is inferred that larger chips with increasing redundancy widens gap between the yield of the original dies and fault tolerant dies. Thus, a trade-off exists between the die yield and fault tolerance provided by the design alternatives (discussed above) having redundancy ranging between 2% and 11%.

Advertisement

7. Reliability analysis of 32-bit OIC

In order to assess the endurance for the four modes of OIC, reliability is evaluated and compared. The reliability, denoted by R(t), is defined as the probability of its survival at least until time t, which is estimated using Weibull distribution and can be determined in the following manner:

Rt=PT>t=eλtβE6

where β is the shape parameter, T denotes the lifetime and λ denotes the failure rate of a component. Defect induced faults occur in the early stage of the lifetime, but the wear-out induced faults increase in the tail end of the lifetime. β < 1, is used to model infant mortality and it is a period of growing reliability and decreasing failure rate. When β = 1, the R(t) of Weibull distribution and exponential distribution are identical. β > 1, is used to model wear out and the end of useful life where failure rate is increasing. The initial failure rate is computed using the failure rate formula:

λ=C1πTπV+C2πEπQπLE7

here, C1,C2 are the complexity factors, πT,πV,πE,πQ,πL are temperature, voltage stress, environment, quality and learning factors respectively. Failure rate λ is assumed as a function of the number of logical elements in the micro-architectural components.

The reliabilities of the four modes of OIC given in the Eqs. (8)(11) are expressed in terms of Rselect logict,Rsubt,Rsubsct,RcomptRvoter which denote the reliabilities of select logic, subtractor, SCS, comparator and voters logic respectively.

TMR + SCS mode reliability is expressed as:

RTMT+SCSt=RsubsctRselect logictRcomptRvoterti=244iRsubti1Rsubt4iE8

TMR mode reliability is expressed as:

RTMRt=Rselect logictRcomptRvoterti=233iRsubti1Rsubt3iE9

DMR mode reliability is expressed as:

RDMRt=Rselect logictRcompti=122iRsubti1Rsubt2iE10

Baseline mode reliability is expressed as:

Rbaselinet=Rselect logictRsubsctE11

The reliabilities are plotted for TMR + SCS, TMR, DMR and Baseline modes in Figure 12 for β = 0.9 and 1.0, (which denote defect induced fault phase) and in Figure 13 for β = 1.1 and 1.2 (which denote wear out induced fault phase). λ is a function of number of logical elements as given in the Table 3.

Figure 12.

Reliability vs. time for (a) β = 0.9 and (b) β = 1.0.

Figure 13.

Reliability vs. time for (a) β = 1.1 and (b) β = 1.2

In all these cases, TMR + SCS mode is observed to have a better failure tolerance when compared to all other modes. For β = 1.2, the reliabilities of TMR mode and DMR mode are less than that of TMR + SCS mode during the interval 3 × 104 to 15 × 104 hours, as illustrated in Figure 13. The levels of reliability of TMR modes decline far below DMR, and baseline modes in wear out induced fault phase due to the fact that a single component reliability is below 0.5 and that the redundancy does not have any merit in the TMR mode. In Table 11, reliability of subtractor goes below 0.5 at t = 180,000 h or 20.5 years and reliability gap between TMR and DMR widens endorses the above argument.

t (h)R (subtractor)R (comparator)R (TMR)R (DMR)
120,000 (13.7 years)0.62560.69480.169310.2112
150,000 (17.12 years)0.54170.62260.090600.1272
180,000 (20.5 years)0.46630.55450.046330.07706

Table 11.

Reliabilities of components in OIC for β = 1.2.

7.1 Comparative analysis: OIC and URISC/URISC++

In this section, reliability of OIC is compared with that of URISC++. The reliability function of Weibull distribution with λ as a function of number of logical elements is used to estimate the reliability of URISC/URISC++. The number of logical elements in OIC and URISC++ are given in Table 4. In the defect induced fault phase (β = 0.9 and β = 1.0), a drastic fall in the URISC++ reliability is observed as shown in Figures 14 and 15. OIC continues to maintain a reliability of 0.96, unlike URISC++ with endurance reaching 0.87 after 210,000 hours. In the wear-out induced fault phase, the reliability gap widens between 32-bit OIC and URISC++ when β = 1.1 (Figure 16) after 60,000 hours or 6.84 years. The reliability levels of OIC fall below that of URISC++ because single component reliability reduces below 0.5 after 23.4 years as shown in Figure 17 and the redundancy in the OIC does not have any merit thereafter.

Figure 14.

β = 0.9 reliability vs. time (hours).

Figure 15.

β = 1.0 reliability vs. time (hours).

Figure 16.

β = 1.1 reliability vs. time (hours).

Figure 17.

β = 1.2 Reliability vs. time (hours).

Advertisement

8. Conclusion

  1. Power, area and total power for OIC and for its contender URISC++ are evaluated. OIC consumes less power and area compared to its contender. The registers count in OICs is significantly less compared to URISC++. It is observed that two large register files in URISC++ consume more power, unlike OIC which does not maintain register files.

  2. The performance overheads at instruction level and application level are evaluated. In terms of performance overhead, based on the analysis in the Section 5, performance loss is incurred in compute intensive and memory intensive micro-benchmarks mainly due to MUL and DIV instructions in the programs. But the performance loss will not be high in programs with right mix of arithmetic instructions.

  3. In 1:1 configuration of multi-core system with OICs i.e., one conventional core with one OIC, all the emulation request from the conventional core is handled by OIC. In 2:1 configuration (two cores and one OIC), simultaneous failures in two conventional cores results in higher performance loss for the application executing in the system. This performance loss can be reduced by augmenting the multi-core configuration with an additional OIC. That is, 1:1 model proves to be a viable solution with minimal performance loss. This is validated by the simulation results presented in this chapter. On 1:1 and 1: N basis i.e., one MIPS core with one or more OICs can scale to 100 MIPS core with 100 or more OICs. Hence, MCS-OIC model is a scalable design alternative.

  4. As expected, it is observed from the reliability analysis of OIC that an increase in the number of subtractors results in higher reliability. Alternatively, it can be understood that replication of functional units improves reliability of the OIC significantly. Hence, TMR + SCS mode has higher reliability compared to the other modes.

  5. The yield of the fault tolerant die is slightly lesser than the original die for all the design alternatives of MCS-OIC. It is inferred that larger chips with increasing redundancy widens gap between the yield of the original dies and fault tolerant dies. Thus, a trade-off exists between the die yield and fault tolerance provided by the design alternatives (discussed above) having redundancy ranging between 2% and 11%.

  6. Reliability of OIC and URISC++ are evaluated and compared. Evaluation results indicate that OIC is more reliable than URISC++ both in the defect induced phase and the wear out induced phase. It can be understood that the level of redundancy is significantly less in URISC++ compared to OIC.

References

  1. 1. Borkar S. Designing reliable systems from unreliable components: The challenges of transistor variability and degradation. IEEE Micro. 2005;25(6):10-16
  2. 2. Shivakumar P, Kistler M, Keckler SW, Burger D, Alvisi L. Modeling the effect of technology trends on the soft error rate of combinational logic. In: Proceedings of International Conference on Dependable Systems and Networks. IEEE Explorer. 2002. pp. 389-398. DOI: 10.1109/DSN.2002.1028924
  3. 3. Feng S, Gupta S, Ansari A, Mahlke S. Shoestring: Probabilistic soft error reliability on the cheap’. ACM SIGARCH Computer Architecture News. 2010;38(1):385-396
  4. 4. Li T, Ambrose JA, Ragel R, Parameswaran S. Processor design for soft errors: Challenges and state of the art. ACM Computing Surveys. 2016;49(3):1-44
  5. 5. Mittal S. A survey of techniques for managing and leveraging caches in GPUs. Journal of Circuits, Systems, and Computers. 2014;23(08):1430002
  6. 6. Rusu S, Muljono H, Ayers D, Tam S, Chen W, Martin A, et al. 5.4 Ivytown: A 22 nm 15-core enterprise Xeon® processor family. In: 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). IEEE Explorer; 2014. pp. 102-103. DOI: 10.1109/ISSCC.2014.6757356
  7. 7. Zyuban V, Taylor SA, Christensen B, Hall AR, Gonzalez CJ, Friedrich J, et al. IBM POWER7+ design for higher frequency at fixed power. IBM Journal of Research and Development. 2013;57(6):1-1
  8. 8. Postman J, Chiang P. A survey addressing on-chip interconnect: Energy and reliability considerations. International Scholarly Research Notices. 2012;2012:1-9. Article ID: 916259. DOI: 10.5402/2012/916259
  9. 9. Nassif SR, Mehta N, Cao Y. A resilience roadmap. In: 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010). IEEE Explorer; 2010. pp. 1011-1016. DOI: 10.1109/DATE.2010.5456958
  10. 10. Karnik T, Tschanz J, Borkar N, Howard J, Vangal S, De V, et al. Resiliency for many-core system on a chip. In: 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE Explorer; 2014. pp. 388-389. DOI: 10.1109/ASPDAC.2014.6742921
  11. 11. Gaisler J. A portable and fault-tolerant microprocessor based on the SPARC v8 architecture. In: Proceedings International Conference on Dependable Systems and Networks. IEEE Explorer; 2002. pp. 409-415. DOI: 10.1109/DSN.2002.1028926
  12. 12. Lin S, Kim YB, Lombardi F. Design and performance evaluation of radiation hardened latches for nanoscale CMOS. IEEE Transactions on Very Large-scale Integration Systems. 2010;19(7):1315-1319
  13. 13. Slayman CW. Cache and memory error detection, correction, and reduction techniques for terrestrial servers and workstations. IEEE Transactions on Device and Materials Reliability. 2005;5(3):397-404
  14. 14. Pomeranz I, Vijaykumar TN. FaultHound: Value-locality-based soft-fault tolerance. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture. ACM Digital Library; 2015. pp. 668-681. DOI: 10.1145/2749469.2750372
  15. 15. Meaney PJ, Swaney SB, Sanda PN, Spainhower L. IBM z990 soft error detection and recovery. IEEE Transactions on Device and Materials Reliability. 2005;5(3):419-427
  16. 16. Stackhouse B, Bhimji S, Bostak C, Bradley D, Cherkauer B, Desai J, et al. A 65 nm 2-billion transistor quad-core Itanium processor. IEEE Journal of Solid-State Circuits. 2008;44(1):18-31
  17. 17. Venkatesha S, Parthasarathi R. 32-Bit one instruction core: A low-cost, reliable, and fault-tolerant core for multicore systems. Journal of Testing and Evaluation. 2019;47(6):3941-3962. DOI: 10.1520/JTE20180492. ISSN 0090-3973
  18. 18. Hamming RW. Error detecting and error correcting codes’. The Bell System Technical Journal. 1950;29(2):147-160
  19. 19. Rajendiran A, Ananthanarayanan S, Patel HD, Tripunitara MV, Garg S. Reliable computing with ultra-reduced instruction set co-processors. In: DAC Design Automation Conference 2012. ACM Digital Library; 2012. pp. 697-702. DOI: 10.1145/2228360.2228485
  20. 20. Ananthanarayan S, Garg S, Patel HD. Low-cost permanent fault detection using ultra-reduced instruction set co-processors. In: 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE Explorer; 2013. pp. 933-938. DOI: 10.7873/DATE.2013.196
  21. 21. Sundaramoorthy K, Purser Z, Rotenberg E. Slipstream processors: Improving both performance and fault tolerance’. ACM SIGPLAN Notices. 2000;35(11):257-268
  22. 22. LaFrieda C, Ipek E, Martinez JF, Manohar R. Utilizing dynamically coupled cores to form a resilient chip multiprocessor. In: 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’07). IEEE Explorer; 2007. pp. 317-326. DOI: 10.1109/DSN.2007.100
  23. 23. Aggarwal N, Ranganathan P, Jouppi NP, Smith JE. Configurable isolation: Building high availability systems with commodity multi-core processors. ACM SIGARCH Computer Architecture News. 2007;35(2):470-481
  24. 24. Smolens JC, Gold BT, Falsafi B, Hoe JC. Reunion: Complexity-effective multicore redundancy. In: 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06). IEEE Explorer; 2006. pp. 223-234. DOI: 10.1109/MICRO.2006.42

Written By

Shashikiran Venkatesha and Ranjani Parthasarathi

Submitted: 18 January 2022 Reviewed: 23 January 2022 Published: 20 March 2022