Mitigating State-Drift in Memristor Crossbar Arrays for Vector Matrix Multiplication

In this Chapter, we review the recent progress on resistance drift mitigation techniques for resistive switching memory devices (specifically memristors) and its impact on the accuracy in deep neural network applications. In the first section of the chapter, we investigate the importance of soft errors and their detrimental impact on memristor-based vector – matrix multiplication (VMM) platforms performance specially the memristance state-drift induced by long-term recurring inference operations with sub-threshold stress voltage. Also, we briefly review some currently developed state-drift mitigation methods. In the next section of the chapter, we will discuss an adaptive inference technique with low hardware overhead to mitigate the memristance drift in memristive VMM platform by using optimization techniques to adjust the inference voltage characteristic associated with different network layers. Also, we present simulation results and performance improvements achieved by applying the proposed inference technique by considering non-idealities for various deep network applications on memristor crossbar arrays. This chapter suggests that a simple low overhead inference technique can revive the functionality, enhance the performance of memristor-based VMM arrays and significantly increases their life-time which can be a very important factor toward making this technology as a main stream player in future in-memory computing platforms.


Introduction
Designing specialized hardware accelerators has been a topic of interest recently due to rapid growth of machine learning and artificial intelligence [1][2][3][4][5]. Despite the recent advancements in developing machine learning complementary-metal-oxide (CMOS) based chips to efficiently implement vector-matrix multiplication (VMM) operations, these systems are limited by the off-chip memory bottleneck [6]. To this end, co-locating memory and processing has been considered as a solution by using non-volatile resistive switching memory technologies [7][8][9][10][11][12][13][14]. One such device technology is memristor and it has risen as a promising high speed, low power computational alternative to traditional CMOS hardware [15][16][17][18]. Memristor can be placed in a crossbar structure to perform highly parallel multiply-accumulate (MAC) operations efficiently using Ohm's Law [19][20][21][22]. Through heavy parallelization using Kirchoff's laws, a memristor crossbar is able to do MAC operations at O(1) speed [23]. Crossbars are most commonly used to perform VMM by mapping a n by m matrix the memristors' conductance range, applying an input voltage vector to the rows, and then reading the current from the appropriate columns [24]. In addition to neuromorphic applications [25][26][27][28] of memristor crossbar, through VMM memristor crossbar has shown to be capable of performing a number of tasks ranging from image processing [29], physical unclonable functions (PUFs) [30][31][32], optimization problems [33][34][35], sparse coding [36], and solving partial differential equations [37]. Also, there have been many researches focusing on implementation of deep neural network (DNN) accelerators using memristor crossbars which focuses on different device, algorithm and system level contributions [18][19][20]38]. While there has been significant progress in memristor crossbar-based computational devices, there are still many major challenges in robustness and computational accuracy that hinder the technology [20,39,40]. There have been several researches focus on mitigating the impact of non-idealities on memristor crossbar systems performance and these reliability improvement techniques are mainly proposed for process variations [41,42], hard faults [43,44], signal distortion issue [45], and memristance drift [46][47][48]. These techniques can be mainly categorized into (i) retraining to compensate the error (ii) mapping techniques (iii) closed-loop training (iv) error-correction coding.
Here, we focus on the memristance drift effect and the mitigation techniques to avoid the impact of this issue over the memrisitive DNN systems. Typically, in memristor crossbars, there are two major operations to perform: write and read operations. In the write operation, a voltage above the switching voltage of the memristor [49] is applied to a memristor repeatedly until the resistance of the memristor is sufficiently close to the target resistance. During the read operation, a voltage lower than the switching voltage is applied to the memristor and the current from the memristor is measured. The read operation is used extensively in the inference operation of many Ex-situ and In-situ algorithms including artificial neural networks (ANN). Ideally, the resistance of the memristor should not change at all, but in practice, there is often a very small change in the memristor state after a read operation. This phenomenon is known as memristance drift [50,51]. Over many read operations, these small changes in resistance of the memristors in a crossbar will add up to have a significant impact on computational accuracy. Memristance drift occurs in different resistive switching memory technologies and it is not similar in terms of the behavior. In phase change memory (PCM) devices, the memristance drift occurs even when there is no voltage applied over the memory cell and the amorphous state (high resistance state) of the device is changing over time [52]. Subsequently, this issue will be more severe a the high-resistive amorphous state increases and this will impact dramatically the PCM-based system's performance in presence of high cycle-to-cycle and device-to-device variations. In memristor technology, as discussed before, the repetitive VMM operations result a memristance drift phenomenon and it becomes worse as the number of inference operations increases. Existing solutions to this problem include periodical weight reprogramming and feedback designs have limitations with high computational overhead and limited long-term effectiveness. For instance, HfOx RRAM testing results using a dynamic BL-bias circuit for preventing memristor state disturbance during read operations [53]. Error correction code (ECC) is used in [51] to reduce write latency by up to 70%. However, these techniques are not sufficient in themselves to enhances the performance of computational memristor crossbars in which 2-3 bits of memristance drift can cause significant decreases in performance.
Recently, a few more effective memristance-drift mitigation techniques have been proposed [54][55][56]. This chapter will summarize the approach and results of a closedloop feedback system technique [54] and an inline calibration approach [55] before taking a more in-depth look into an adaptive inference technique (AIDX) [56] that optimizes the inference voltage pulse amplitude and width. In addition, the power and chip area overhead of these three techniques are briefly compared.

Memristance drift and its modeling
In general, there are memristor models can be separated into physics-based and behavior-based simulations depending on the characteristics of their modeling and their general purpose. Physics-based models typically attempt to simulate memristors at a molecular-level by considering the material characteristics of the active memristor layers and mathematical modeling of the ion drifting between these materials. While physical models accurately model memristance drift, they are generally computationally expensive and limited in scope to the detailed analysis of singular memristor behavior. As such, memristor crossbar arrays are modeled using behavior-based models. Behavior-based memristor models are much simpler than physics-based models and use experimental fitting parameters to match the behavior of different types of memristors. Current-voltage plots are one of the most common methods of quickly visualizing memristor short-term behavior and many behavior-based models like VTEAM [57] are built to agree with these plots. Over the course of a voltage sweep, there is not enough time for the memristor's state to change noticeably under the threshold voltage and as such, the long term consequences of memristance drift aren't captured in these models. Many popular behavior-based models, such as VTEAM [57] and TEAM [58], utilize current and voltage thresholds to partition memristor behavior under high and low voltage/ current scenarios. Generally, these threshold models approximate the subthreshold state change as zero and thus do not consider the long-term effects of memristance drift. Other behavior-models, such as the nonlinear ion drift [59] and Simmons Tunnel Barrier model [60], do not utilize a threshold and instead model memristor high and low voltage memristor behavior using the same equations and fitting parameters. Without a threshold, these models lack the flexibility to accurately model the minute changes of memristance drift without sacrificing the accuracy of its higher-voltage switching modeling. Recently, there have been attempts to extend popular behavior-based models to more accurately simulate memristance drift. For instance, [56] added subthreshold modeling equation and fitting parameters to extend the VTEAM model. However, the modeling of memristance drift in behavior-based models is still currently in its infancy due to the lack of experimental data on long-term memristor behavior when exposed to low voltage pulses.

Impact of state-drift on crossbar VMM
Memristance drift in crossbar arrays can be summarized as the buildup of small unintended changes in memristor state over many low-voltage read operations [56]. During a crossbar VMM operation, the ideal output current I j of the j-th column can be modeled simply as: Here V i represents the voltage applied to the i-th row of the crossbar and G ij is the conductance of the memristors in the i-th row and j-th column. For a given VMM operation, state drift can be represented as a small change in conductance δG. The non-ideal output current I 0 j is then given as: ΔG ij represents the accumulation of memristance drift caused by all previous VMM operations and skews the distribution of weights within the crossbar (Figure 1a). It can be shown from Eqs. (1) and (2) that the difference between I 0 j and I j scales with the read voltage of each row V and the number of rows N. Over a long period, the buildup of drift in the crossbar weights caused by memristance drift will lead to significant error (Figure 1b). The speed of memristance drift will vary depending on application and memristor characteristics. Even after a short period of 1 second, a 0.1 V signal could cause a memristor to deviate around 2% from its initial state [61]. A thorough analysis of memristance drift speed with respect to initial state and drift direction is done in [54]. In the SET direction, [54] found that memristors' resistance decreased by 77.07%, 62.07%, 56.28% and 8.81% after 100 read operations with initial resistances of 200kΩ, 100kΩ, 80kΩ, and 15kΩ, respectively. The speed of drift in the RESET direction was much slower at only $0%, 1.17 Â 10-4%, 0.018% and 16.43% increase in resistance for the same initial states. From the analysis in [54], memristance drift speed is shown to be greatly impacted by initial state and the direction of switching. A heatmap of conductance highlights the impact of initial state on the buildup of memristance drift over time ( Figure 2a). Here, the bottom row represents the bias of a neural network and are initialized near the high conductance state while the rest of the memristors representing the weights are initialized close to the low conductance state. Due to this conductance initialization, all memristors except the bottom row in the above figure experience an aggregate memristance drift in the positive direction while the bottom row experiences memristance drift in the negative direction. There have been multiple studies that show that memristance drift causes significant performance degradation on various applications after long term use. In [56], the negative impact across ten baseline ML tasks in the Proben1 [63] datasets, memristance drift caused an average classification accuracy decrease of 26%, 42%, and 51% after 500, 2000, and 10000 inference operations, respectively. [56] also analyzed the effects of memristance drift on convolutional neural networks with the CIFAR10 image classification dataset [64]. Ten different CNN architectures were tested with a relatively consistent accuracy degradation of 29%, 59%, and 72% after 500, 2000, and 10000 inference operations respectively. Memristor drift ranges from upwards of 10% deviatation from its programmed value to upwards of 30% deviation at 10000 inference steps as shown in (Figure 2a). To clarify, the memristance drift speed remains the same for each network in (Figure 2b) regardless of the number of hidden layers. However, each additional hidden layer accumulates memristance increasing amounts of memristance drift-related error from the previous layer causing deeper neural network's accuracy to degrade more quickly than networks with less hidden layers (Figure 2b). In [55], the classification accuracy on the  [62]. (b) Heatmap of typical memristor crossbar weight mapping with the bottom row as the bias and the rest of the heatmap representing the weight matrix. Over time, the memristance drift direction between the bias and weights diverges due to differences in bias and weight initialization. Figure reprinted by [56].
MNIST handwritten digits dataset [62] decreases by approximately 2.5-4% across four independent trials due to the cycle-to-cycle variation of memristance drift. In [54], the classification accuracy degradation on MNIST was tested with memristance drift in the SET and RESET directions separately. In the SET direction, the classification accuracy dropped from 91.91-60% after only 100 inference operations. In the RESET direction, accuracy degradation was slower where accuracy dropped to 60% after approximately 300 inference operations.

ICE: inline calibration
Given the significant negative impact of memristance drift on crossbar performance, there has been a few works that have proposed meaningful solutions to the memristance drift problem. When a memristor crossbar's performance drops below acceptable levels due to memristance drift, the memristors will be rewritten to their intended states. For a given application, this recalibration is usually done at periodically ensure high crossbar accuracy over long time intervals. Since rewriting a memristor to a specific state can be up to 100 times slower than the speed of an inference operation [65], frequent crossbar calibrations could significantly bottleneck crossbar throughput. In [55], the authors propose an inline calibration method that utilizes "interrupt-and-benchmark (I&B)" operations to track the crossbar computational error in order to predict the time period before the next calibration operation. In addition, the time period between two I&B operations is dynamically optimized as to minimize the time overhead of the inline calibration method on crossbar performance. As defined in the paper, I&B operations interrupt regular crossbar operation in order to evaluate crossbar computational error on a set of benchmark data [55]. Between each recalibration of the memristor crossbar, there exists a theoretical maximum number of inference operations sup n r before the next recalibration must be done due to performance degradation. The optimization goal of ICE is to make the actual number of inference operations n r between two calibration operations approach sup n r . This is not a trivial task because sup n r can vary significantly between many calibration operations due to changes in memristance drift speed from cycle-to-cycle variations and other factors. ICE approximates sup n r by applying polynomial fitting to the crossbar error data during I&B operations.
To reduce the time overhead of the inline calibration method, ICE seeks to minimize the number of I&B operations k between each recalibration of the crossbar. ICE uses its polynomial fitting function to guess the benchmark computation error of the next I&B operation. If the absolute difference between the guessed crossbar error and the actual I&B error is below some threshold, the time until the next I&B operation will be doubled up to some maximum time interval. Otherwise, ICE will reset the time interval between successive I&B operations to its default value. For testing, ICE adopts a TiOx-based memristor device model from [61]. This model extends the bulk model TiO2 with a focus on behavior-based process variation analysis of memristors. The results presented in [55] are measured in terms of efficiency with computational efficiency defined as: To quantify the time overhead caused by I&B operations, the parameter Δ is defined as: Here, k is the average number of I&B operations between crossbar recalibrations and n IB is the number of inference operations required per I&B operation. These criteria were evaluated across four baseline tasks (HMAX, KMeans, Sobel, and MNIST [62]) and compared to a baseline of constant time period crossbar recalibrations. With second degree polynomial fitting, ICE achieves an average calibration efficiency γ of 91.18% which is a 21.77% improvement over the baseline. This improvement in calibration efficiency only came at the cost of a time overhead Δ of 0.439%. No information on the voltage range or power consumption of ICE is presented in [55]. However, the author's mentioned that future works could include a detailed power analysis of ICE using a SPICE-based memristor model.

Closed-loop feedback circuit
Traditionally, the data flow of a memristor crossbar-based neural network follows a linear pipeline in a conventional open-loop system. While these open-loop systems serve as a simple and efficient pipeline for VMM operations, they are not able to effectively manage the effects of memristance drift. In [54], a closed-loop circuit is proposed to mitigate memristance drift by adaptively adjusting the direction of current in each memristor using a feedback controller. Mean square error (MSE) is used to measure the degradation caused by memristance drift. Specifically, the difference in MSE (ΔMSE) between the ideal crossbar and the current state is used as a metric to inform the feedback controller. For each inference operation, a weight compensation algorithm is run to minimize memristance drift speed. The second feature of the closed-loop design introduced in [54] is the usage of an "arrogant principle" which assumes that the prediction made by the crossbar system is always correct. This principle allows the system to use its output as the label to determine the direction of compensation in the weight compensation algorithm. The effectiveness of this assumption hinges on the ideal accuracy of the crossbar-mapped neural network. With a high initial accuracy, the "expectation of recognition accuracy probability with respect to time" will be close to its upper bound, thus keeping the rate of degradation of the feedback controller low. Naively, this closed-loop design requires an additional compensation pulse for each inference operation which would halve the throughput of the crossbar system. To address this issue, [54] combines the compensation pulse with regular inference operations by manipulating the k-th inference operation into compensating for the memristance drift caused by the (k-1)-th inference. The feedback controller is used to determine the recall direction of the (k-1)-th and then adjusts the direction of the k-th inference pulse accordingly. By integrating the k-th compensation pulse into the (k + 1)th inference pulse, there is no need to sacrifice crossbar throughput to implement this proposed system. For testing, [54] adopts the dynamic model of the TaOx memristor from [66] with a low resistive state and high resistive state of 1 kΩ and 1MΩ, respectively. The performance of single and two-layer neural networks were tested on the MNIST dataset. The effectiveness of the closed loop design was measured as the number of inference operations before the crossbar accuracy drops below 70%. As compared to a baseline crossbar system, the proposed closed-loop design is shown to increase the number of recall operations by 1897 operations for the single-layer network and 1590 operations for the two-layer network. This increase corresponds to a 13.84Â lifetime extension for the single layer network and a 13.95Â lifetime extension for the two-layer network. For power estimation, [54] uses a square recall voltage pulse of 0.3 V for 100 ns. Boban had an estimated power consumption of 1.1196 mW for a single layer neural network which increases to 6.7367 mW for a two-layer neural network implementation.

Adaptive inference scheme for memristance drift mitigation
In [56], the authors proposed an adaptive inference scheme called AIDX that optimizes the amplitude and duration of inference voltage pulses in order to minimize the speed of memristance drift. AIDX formulates memristance drift as an optimization problem and seeks to minimize the memristance drift error defined as the increase in mean squared error (MSE) from the initial programmed crossbar after a set number of inference operations. The initial MSE can be modeled as: Similarly, the MSE after k inference operations is given as: where ΔG k ij is the accumulated memristance drift in the memristor i-th row and j-th column from k inference operations. The additional error due to memristance drift after k operations is calculated as E Drift ¼ E k À E 0 . The naive approach of simply choosing the minimum allowable voltage amplitude and duration may seem logical because speed of memristance drift scales with amplitude and duration. However, this naïve approach would still result in significant memristance drift because of the vast differences in state drift speed in the SET and RESET drift [54]. As such, AIDX focuses on balancing the aggregate drift in the SET and RESET directions for a given application by optimizing the ratio of SET to RESET voltage pulse amplitude A and duration D (Figure 3a). The minimization of memristance drift with respect to voltage pulse amplitude and duration is formalized as follows: Here, E Drift is a function of A and D that are vectors which represent the ratio of SET to RESET voltage pulse amplitude and duration of each row of the memristor crossbar respectively. The Broyden-Fletcher-Goldfarb-Shannon (BFGS) algorithm [67] is used to tackle this optimization problem. Since the gradient of the memristance drift error cannot be evaluated directly, the gradients used in the BFGS algorithm were numerically approximated using function evaluations. Under some circumstances, it is possible for the optimized values of A and D to be too large or small to be properly implemented on a crossbar. Extreme values of A and D are often caused by skewed memristor characteristics where memristance drift speed in one direction is much faster than the other direction or when the data distribution is heavily skewed toward one recall direction. To address this issue, AIDX randomly inverting the direction of inference for an input vector x with probability a in order to compensate for imbalanced memristor characteristics and a skewed data distribution (Figure 3b). The probability of input inversion is optimized to minimize E Drift before the optimization of A and D vectors to ensure a relatively balanced memristance drift speed in the SET and RESET directions as to prevent extreme final values of A and D.
The general usage of AIDX in both preprocessing and inference is described in (Figure 3c). To ensure optimal performance, optimization is done in three scenarios: Optimizing over pulse amplitude ratio, optimizing over pulse duration ration, and optimization over both parameters simultaneously. The best set of parameters is then chosen according to lowest evaluated E Drift . If the parameters A and D are too extreme, the optimization of probability of input inversion a is performed. Applying AIDX during inference is almost identical to a normal crossbar VMM operation except that if the input was inverted, the output vector must be reinverted to recover the intended output. When applying AIDX to deep neural networks, it can be inefficient to optimize over all layers simultaneously due to the large number of parameters. Instead, AIDX applies the BFGS algorithm to each layer separately in forward pass order in order to reduce optimization time (Figure 4a). AIDX used a simulated extended VTEAM model to fit real TiOx-based memristor device data. The following non-idealities were considered for testing: 15% memristor programming error, 15% random gaussian noise added to the high/low conductance states and alpha/k parameters of the extended VTEAM model for device-to-device variation. In addition, 200 ohms of source resistance and 20 ohms of line resistance were considered as well as sneak paths were also considered. AIDX was applied to a memristor system that operates with a voltage range of À0.2 V to 0.2 V. Depending on the pulse amplitude optimization, AIDX's inference pulse can vary within this range. On average, there is 2% reduction in passive power consumption within the crossbar compared to the baseline system when using AIDX. A simple test of AIDX effectiveness is done by applying random positive and negative voltage pulses to memristors with and without AIDX. After 10000 inference operations, the baseline memristors deviated a max of 1.9% from its initial value while AIDX had a max deviation of only 0.17% (Figure 4b). When tested on the 10 benchmark tasks from the Proben1 dataset [63], the average classification accuracy degradation with AIDX is approximately 4%, 7%,   Figure reprinted by [56]. and 8% after 500, 2000, and 10000 inference operations (Figure 5a). On average, AIDX reduced accuracy degradation by 42% as compared to the baseline test after 10000 inference operations. When testing AIDX on 10 CNN architectures using the CIFAR10 dataset [18], the classification accuracy decrease by an average of 4%, 7%, and 8% after 500, 2000, and 10000 inference operations (Figure 5b). This accuracy degradation corresponds to a 22%, 35%, and 43% improvement over the baseline respectively. In addition, AIDX was also applied to image reconstruction by training a simple 3-layer autoencoder on the MNIST dataset [62]. The average mean squared error of the baseline auto-encoder was 0.033, 0.068, and 0.129 after 500, 2000, and 10000 inference operations respectively. With AIDX, the average mean squared error drops to 0.015, 0.021, and 0.028 after 500, 2000, and 10000 inference operations which is an improvement of 53.0%, 69.0%, and 78.6% over the baseline (Figure 6).

Overhead analysis
The three methods for mitigating memristance drift discussed in this chapter all induce small overheads in terms of power consumption and chip area. Time overhead is not discussed in this section because there is negligible change in crossbar throughput by all three mitigation methods. Power overhead is defined in this section as the additional power consumption induced in the memristor crossbar and peripheral circuits due to proposed memristance drift solutions. For the sake of consistency, the estimates of peripheral power consumption of [54] are used for comparison. While power consumption is not disclosed in [55], the power overhead of [56] is 1.19% while [54] has a power overhead of 1.61%. Area overhead is defined consistently with [56] as the additional on-chip area required for memristance drift mitigation method because of peripherals, external circuit, and other items. Since both [55,56] do not include any additional on-chip circuitry, these two methods do not have any chip area overhead while the closed loop circuits proposed in [54] require an additional 2.34% chip area. On the other hand, both [55,56] require solving an optimization problem before implementing their mitigation technique. However, considering that the optimization procedure would only needed to be performed once for an application, these solutions still promise great scalability for long-term memristor crossbar usage.

Conclusions
In summary, this chapter first discusses memristor crossbar modeling and how there is a current lack of attention in modeling subthreshold memristor behavior. The next section overviews how the speed of memristance drift is impacted by recall voltage and amplitude, memristor characteristics, crossbar size, and number of inference operations since the last write operation. In addition, memristance drift is shown to cause severe accuracy degradation across multiple datasets and tasks such as MNIST and CIFAR10. The second half of this chapter is dedicated to overviewing three different approaches for memristance drift mitigation. First, an inline calibration approach [55] and a closed-loop feedback system is summarized. Then, there is a more in-depth look into an adaptive inference scheme that optimized the ratio of SET to RESET voltage pulse amplitude and width to minimize memristance drift speed. The final section of the chapter briefly compared the power and chip area overhead of these three memristance drift mitigation techniques. Hopefully, this chapter can bring more much-needed attention to the study of memristance drift and the development of drift mitigation techniques.

Author details
Roy K, Milojicic DS. PUMA: A programmable ultra-efficient memristorbased accelerator for machine learning inference. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems 2019 Apr 4 (pp. 715-731).