Memristor Neural Network Design Memristor Neural Network Design

Neural network, a powerful learning model, has archived amazing results. However, the current Von Neumann computing system – based implementations of neural networks are suffering from memory wall and communication bottleneck problems ascrib- ing to the Complementary Metal Oxide Semiconductor (CMOS) technology scaling down and communication gap. Memristor, a two terminal nanosolid state nonvolatile resistive switching, can provide energy-efficient neuromorphic computing with its synaptic behavior. Crossbar architecture can be used to perform neural computations because of its high density and parallel computation. Thus, neural networks based on memristor crossbar will perform better in real world applications. In this chapter, the design of different neural network architectures based on memristor is introduced, including spiking neural networks, multilayer neural networks, convolution neural networks, and recurrent neural networks. And the brief introduction, the architecture, the computing circuits, and the training algorithm of each kind of neural networks are presented by instances. The potential applications and the prospects of memristor-based neural network system are discussed.


Introduction
Neural networks, composing multiple processing layers, have achieved amazing results, such as AlphaGo, DNC and WaveNet. However conventional neural networks based on Von Neumann systems have many challenges [1]. In Von Neumann computing system, the computing process and external memory are separated by a shared bus between data and program memory as shown in Figure 1, which is so called Von Neumann bottleneck. In Von Neumann computing system, a single processor has to simulate many neurons and the synapses between neurons. In addition, the bottleneck leads the energy-hungry data communication when updating the neurons states and retrieving the synapse states, and when simulates a largescale neural networks, the massages among processors will explode [2]. These defects make the Von Neumann computing system based neural network power hungrier, low density, and slow speed. In order to overcome these defects, a novel Nano device and computing architecture need proposing. Memristor crossbar is considered to be the most promising candidate to solve these problems [3]. Memristor crossbar is a high density, power efficiency computing-inmemory architecture. Thus, this chapter presents different design paradigm of memristorbased neural networks, including spiking neural networks (SNNs), multilayer neural networks (MNNs), convolutional neural networks (CNNs), and recurrent neural networks (RNNs).

Memristor neural networks 2.1. Memristor
Memristor was conceived by Leon Chua according to the symmetry of circuit theory in 1971 [4] and funded by HP lab in 2008 [5]. Memristor is a nano two-terminal nonvolatile device, with a Lissajous' IV curve. In mathematical, the model of memristor can be express as (take an example of HP memristor) [6] i t ð Þ ¼ 1 Here, w(t) stands for the normalized position of the conduction front between the O 2À vacancy-rich and O 2À vacancy-poor regions. The range of w(t) is from 0 to 1. G(φðtÞ) is the conductance. The conductance of memristor can be continuous changing when applied control pulse on the memristor. When the negative pulse is applied, the O 2À vacancy moves to O 2À vacancy-rich region, which cause the conductance decrease, and vice versa. This result is similar to the phenomenon in biological synapse, such that memristor can simulate the dynamic of synapse.

Memristor merits
Memristor as the forth device, comparing with conventional computing system such as CPU and GPU, has many advantages. First, memristor is a two-terminal nonvolatile device, resulting in the low power consumption [7]. Second, memristor is compatible with the CMOS, and it can be integrated with higher density [4]. Third, the size of memristor is in nanoscale, such that the switching speed fast [8]. These characteristics make memristor become a promising candidate for neuromorphic computing. In recent years, many researchers have performed various experiments in neural network with memristor for synapse and neurons.

Memristor as synapse
Human brain can perform complex tasks such as unstructured data classification and image recognition. In human brain, excitatory and inhibitory postsynaptic potentials are delivered from presynaptic neuron to postsynaptic neuron through chemical and electrical signal at synapses, driving the change of synaptic weight, as shown in Figure 2. The synaptic weight is precisely adjusted by the ionic flow through the neurons. In neural networks, this mechanism can be simulated by memristors. There are many samples that memristor used as synapse. In this section, we use SNN as a sample to explain how memristor used as synapse.
As shown in Figure 3, a memristor acts as a synapse between two CMOSs neuron, which acts as pre-/postsynaptic neurons, respectively. The input signal of presynaptic neurons reached the postsynaptic neurons through the synapse. When a presynaptic spike is triggered before a postsynaptic spike, equivalently there is a positive voltage applied on the memristor, and then the synaptic weight is increased and vice versa, which is [6] explained as where t pre ðt post Þ is the pule weight the presynaptic neuron (postsynaptic neuron) spikes. Δt is the difference between neurons spike time. That means, when Δt > 0, the synapse weight is increased, and when Δt < 0, the synaptic weight is decreased.

Memristor as neuron
In biology, the membrane separates the inter-cell ions and enter-cell ions. Based on the electrochemical mechanism, the potential on the sides of membrane is balanced. When the excitatory and inhibitory postsynaptic potentials are arrived, the signals through the dendrites of the neurons and the balance will be destroyed. When the potential surpasses a threshold, the neuron is fired. Emulating these neuronal mechanism, including maintaining the balance of potential, the instantaneous mechanism, and the process of neurotransmission, is the key to implement biological plausible neuromorphic computing system [9].
When a memristor is used to act as a neuron in neural networks, it is not essential that the conductance of memristor implement continuous change, instead to achieve accumulative behavior. When competent pulses applied, the neuron is fired. These pulses can change the conductance state of memristor.

Memristor crossbar
Memristor crossbar consists of two perpendicular nanowire layers, which act as top electrode and bottom electrode, respectively. The memristive material is laid between two nanowire layers; as a result, memristor is formed at each crosspoint [11]. The schematic diagram of memristor crossbar is shown in Figure 4.
Memristor crossbar is suitable for large-scale neural networks implementations. First, it is high density, since crossbar can be vertical stack, and each crosspoint is a memristor. In addition, memristor is nonvolatile, nanoscale and multistate. Second, it is low power consumption, since the crossbar allow memory and computation integrating [10], and memristor is nonvolatile device with a low operation voltage. These advantage of memristor crossbar such that this architecture applied in a wide range of neural networks. In neural networks, memristor crossbar has three operations such as read, write, and training.
In this section, we use a sample to illustrate how the memristor crossbar read, write and training.

Read operation
In memristor crossbar, the conductance of a single memristor can be read individually. As shown in Figure 5, we assume that we will read the m ij memristor, which is the crosspoint of i th top wires and j th bottom wires. The voltage V is applied on the i th top wire, and other top wires and bottom wires are grounded. In this situation, only the m ij memristor is applied the V bias, the current i can be collected on the j th bottom wire. According to Ohm's law, the conductance of m ij memristor M is caculated by M¼V/i [11].

Write operation
Similar to reading operation, the conductance of m ij memristor can be written individually. We assume that we will write the m ij memristor. Different amplitude and duration of writing  pulses will be directly applied on the target memristor. The i th top wire is applied voltage V, and the j th bottom wire is grounded. Other top wires and bottom wires are applied voltage V/2, then, only the m ij memristor is applied the full voltage V, which is above the threshold and can change the conductance of target memristor. The conductance of other memristors is not changed because the voltage applied on them is 0 [12].

Training operation
Based on the read and write operation, the memristor neural networks are trained to implement practical neural networks. We use a single-layer neural network to explain the training process of neural network. As shown in Figure 6, the relationship between input vectors U and output vectors Y can be illustrated as [12]: Here, the weight matrix W nÂm represents the synaptic strengths between the two-neuron groups, which are represented by the conductance of corresponding memristors. When we train a memristor crossbar, we first assume we have a set of data. We input the training data, the synaptic weight matrix W is updated repeatedly until the difference between the output y and the target output y* become minimum. In each repetition, W is adjusted across the gradient of the output error |y-y*| as [12] Δw Here, w ij is the synaptic weight in the W connecting the neuron i and j, Δw ij is the change of w ij during per update. μ is the training rate.

Design of memristor neural networks
This section discusses different memristor-based neural network design paradigm, including spiking neural networks (SNN), multilayer neural networks (MNN), convolutional neural networks (CNN), and recurrent neural networks (RNN). Each part of these neural networks consists of five subsections, which are the concepts, the architecture, the algorithm, the circuits, and the instance.

SNN concept
Spiking neural network (SNN), a neural network of neurons interchange information using spikes [13], is neural network based on individual spikes [14]. SNN is a brain-like architecture. The signal in SNNs uses pulse coding rather than rate coding, and allows multiplexing of information as frequency and amplitude. In some electronic SNNs, spikes have the similar waveform shape than biological spikes, but normally in electronic systems spikes are much simpler being represented by a square digital pulse [13]. In SNN, the presence and timing of individual spikes are considered as the means of communication and neural computation. The basic idea on biology is that the more intensive the input, the earlier the spike transmission. Hence, a network of spiking neurons can be designed with n input neurons Ni whose firing times are determined through some external mechanism [14].

SNN architecture
In this section, we use a three-layer neural network to illustrate the structure of SNN. In this structure, as shown in Figure 7, the multilayer SNNs are fully connected feedforward networks; all neurons between two adjacent layers are connected. All the input neurons and output neurons are multiple spikes, i.e., spikes trains.
In this structure, neurons have a model. Spike response model describes the response of both the sending and receiving neuron to a spike. In this model, the spikes of sending neuron transmitted from presynaptic neurons via synapses to postsynaptic neurons. When all spikes arrive, a postsynaptic potential is accumulated in receiving neuron. The internal state of neuron is defined as the sum of postsynaptic potential induced by all the spikes and affected by the weights for synapses that transmit these input spikes.
Suppose an input neuron has N input synapses. The i th synapse transmits G i spikes. The arrival time of each spike is denoted as The time of the most recent output spike of the neuron prior to the current time t (>0) is t ðf rÞ . Then the internal state of the postsynaptic neuron is expressed as where w i is the weight for the ith synapse. The postsynaptic potential induced by one spike is determined by the spike response function ε(t), expressed as In additional to the model of postsynaptic neuron, SNN has a model, too. For convenience, we assume that the layers are numbered backwards starting from the output layer numbered as layer 1 to the input layer. Every two neurons in adjacent layers connected by K synapses with different transmit delays and weights. The delay of the kth synapse is denoted as d k .
We assume that there are N lþ1 neurons in layer l þ 1 and neuron i, belongs to the layer l þ 1, has emitted a spike train composed of F i spikes, the times of firing are denoted F i ¼ t i , the time of the t i spike which through the kth synapse arrive at postsynaptic neuron j which is in layer l is At time t, the internal state of the jth postsynaptic neuron in layer l can be expressed by where w k ij is the weight of the k th synapse between presynaptic neuron i and postsynaptic neuron j; t ðf r Þ j is the time of the most recent output spike for neuron j prior to the current time t [15].

SNN algorithm
Spike-Timing Dependent Plasticity (STDP) is the synapse strength changing mechanism according to the precise timing of individual pre-and/or postsynaptic spikes. As illustrate in Section 2, the sign of the difference between the pre-/postsynaptic neurons times determines the synaptic weight whether increased. STDP learning in biology is inherently asynchronous and online which means that synaptic incremental update occurs while neurons and synapses transmit spikes and perform computations. In experiment, the synaptic strength is a function of relative timing between the arrival time of a presynaptic spike and the time of generation of a postsynaptic spike as shown in Figure 8.
Although the data show stochasticity, we can infer an underlying interpolated function as shown in Figure 9.  . Ideal STDP update function used in computational models of STDP synaptic learning [13].

SNN circuits
SNN with three layers of neurons and two fully connected inter-layer meshes of memristors is shown in Figure 10. The neuron layers are fabricated with CMOS devices, and the inter-layer meshes of memristors are made with nanowires on the top of a CMOS substrate [16]. In Figure 10, triangles represent the neuron soma, being the flat side its input(dendrites) and the sharp side the output (axon). Dark rectangles are memristors, representing each one synaptic junction. Each neuron controls the voltage at its input and output nodes.
In this SNN circuit, the CMOS-based spiking neurons work basically the same as conventional integrate-and-fire neuron, and use proposed spike shape and specific spike back-propagation. The total current of receiving neuron is given by Ohm's Law by conductance, g, of connected synapses and the voltage drop across the synapses. SNN training process needs building external circuit. In external circuit, the input signals are prepared, and the output signal will be measured in the external circuit.

SNN instances
Memristor behavior is more likely to a bidirectional exponentially grow with voltage, and many mathematical formulations can be used to simulate it. Here, we use a voltage-controlled device as a synapse, whose synaptic weight is represented by the conductance g of memristor.
The function of the device is "sinh-like" in the voltage Vmem. The nano device satisfied the formulation as expressed follow dg dt ¼ AsinhðV mem Þ (10) Figure 10. Memristor crossbar based SNNs paradigm [13].
A and B are the parameters which depend upon the memristor material, thickness, size, and it fabrication method.
In this section, we verify the STDP properties of the memristor-based synapses. Figure 11 is the proposed spike shape, which is similar to the biological spikes. Figure 12 shows the STDP curves produced by the proposed spike shape. In Figure 12, the vertical axis shows the average Figure 11. Proposed spike shape used for processing and learning purposed [17]. Figure 12. Simulated curve using proposed spike shape [17].
change of memristor conductance. The horizontal axis represents the difference between preand postsynaptic spike timings Δt. Here, the default spike parameters are eV AE ¼ 0:45V volt, t þ ail ¼ 11 ms, t À ail ¼ 0:3 ms. The result are provided for memristors with V thAE ≈ AE 0:5 V volt. The value of parameters A and B are 2 and 4, respectively [18,19].

MNN concepts
Multilayer neural networks, also known as multilayer perception, are the quintessential deep networks. The advantage of MNN better than the single-layer perceptron overcomes the weaknesses that the perceptron cannot classify linearly indivisible data. To realize large scale learning tasks, MNNs can perform impressively well and produce state-of-the-art results when massive computational power is available [20,21]. Learning in multilayer neural networks (MNNs) relies on continuous updating of large matrices of synaptic weights by local rules [22,23]. The BP algorithm is a common algorithm in local learning, which is widely used in the training of MNNs.

MNN architecture
In MNN architecture, neurons of upper and lower layers are fully connected, no neuron connection exists between the same layer, and no cross layer connects to the neural network. As a quintessential deep network, multilayer neural network consists of an input layer, an output layer, and a hidden layer. MNN is the evolution of the single-layer perceptron. Figure 13 is a double layer neural network.
The X 1 , X 2 may represent the inputs single, W is the value of the weight between layers, Y is the output value. For the two-layer neural network shown above, the input signal is represented as x 1 ,… x j , x n (N represents the number of input neurons), b i is represented for bias, so the result of the signal from the input layer to the hidden layer is N 11 ¼f(x 1 w 11 þ x 2 w 21 þ bÞ, and Y 1 ¼f (N 11 w 11 þ N 12 w 21 þb), in which f is an activation function.

MNN algorithm
In this section, we give a short sketch of the back-propagation technique [25,23]. The actual output value of the neural network is denoted by y j and the ideal tag value is denoted by t j , and we can use the mean square error as an error function Figure 13. Logic scheme of the implemented neural network with two inputs, two hidden and one output neurons [24].
εMSE ¼ X j y j À t j 2 (11) w ij represents the weight between two layers of neurons, the neurons of the previous layer are indexed with i, and the next layer of neurons is indexed with j. The derivation of the error can be obtained by the following equation: Moreover, it is assumed that the multilayer neural network uses sigmoid as a nonlinear activation function. For Eq.

MNN circuits
In this section, we enumerate an example of a memristor implementation of a two-layer neural network. As shown in Figure 14.
In hybrid-circuit based neural networks [26][27][28], memristors are integrated into crossbar circuits to implement density-critical analog weights (i.e., synapses). In this scheme, each artificial synapse is represented by memristors, so the weight of the synapse is equal to the conductance of the memristor. For the multilayer neural network mentioned above, each weight is represented by two memristors, so that the memristor crossbar can easily account for both "excitatory" and "inhibitory" states of the synapses. The number of memristor in the hidden layer is arranged in an 8 Â 1 grid array as shown in Figure 14. The value of each weight W ¼ G þ À G À , where G þ and G À is the effective conductance of each memristor. In the simplest case, neuron output x is encoded by voltages V and synaptic weight w by memristor conductance G. With virtually grounded neuron's input, the current was given by Ohm's law using the potential of postsynaptic V and the corresponding conductance G.
The memristor crossbar combined with CMOS circuitry, which implements neuron functionality and other peripheral functions. The artificial neuron body (soma) was implemented in the circuit by an op-amp based differential adder and a voltage divider with a MOSFET controlled by the output of the summator [24]. This element executed the basic neuron functions in terms of information processing-summation and threshold. The differential summator performing y ¼ X w i x i function is required to separate different classes of input combinations, where y is the output voltage of the summator, w i , x ithe ith input voltage and the corresponding weight respectively.

MNN instance
Conclusion all the experiments, we selected the image data of the MNIST data set to train and test the two-layer neural network, with the batch size 100 to speed up calculations [28]. The initial weights were selected randomly from the uniform distribution; in the experiment, the learning rate is changed depending on the training set error, and the learning rate is only constant at a level close to 0.0035.

CNN concepts
Convolutional neural network is taking inspiration from the study of biology neural science. A classical architecture of convolutional neural network was first proposed by Lecun et al. [29,30]. As a kind of deep learning neural network, several powerful applications of CNNs were reported including pattern recognition and classification, such as human face recognition [31], traffic sign recognition [32], and object recognition [33]. Recently, in the field of image classification accuracy, convolution neural network (CNN) achieved a state-of-the art result, which can classify more than a million images into 1000 different classes [29,34,35]. Compared with traditional neural networks, such as fully connected NN, where each neuron is connected to all neurons of the prelayer via a large number of synapses,convolutional neural networks take advantages in weight sharing, which reduces the number of parameters need to be trained [29]. CNNs are inspired from visual cortex structure, where neurons are sensitive to small subregions of the input space, called receptive fields, exploiting the strong spatially local correlation present in images [35]. CNNs, exploiting the spatial structure of input images, has significantly fewer parameters than a fully connected network of a similar size are better suited for visual document tasks than other NN topologies such as fully connected NNs [36].
Software implementations of artificial convolutional neural networks, which require powerhungry CPU/GPU to perform convolution operations, are at the state of the art for pattern recognition applications. While achieving high performance, CNN-based methods is based on computationally expensive sums of multiplications, which is demand much more computation and memory resources than traditional classification methods. This hinders their integration in portable devices. As a result, most CNN-based algorithms and methods have to be processed on servers with plenty of resources [37].

CNN architectures
The overall architecture of a typical CNN consists of two main parts, the feature extractor and classifier [38,39]. The feature extractor layers composed of two types of layers convolutional layers and pooling layers. A series of convolution and pooling are stacked, followed by fully connected layers ( Figure 15).
In the feature extraction layers, each layer of the network receives an input from the immediate previous layer [39,40]. Convolution neural networks are often used to handle image processing and recognition tasks. The image signal was processed by the input layer of the convolutional neural network and then enters the convolution layer for the convolution operation. Convolution operation can be expressed as [37] Figure 15. CNN block diagram. g x, y, z where the vector f ! and g ! respectively represent the input and output feature map in the form of 3D matrix; Cz ! is one convolution kernel with the size of C Â C; and i is the channel number of the convolution kernel and the input feature map.
This operation could extract different features of input images when using different convolutional kernels [29]. The input image signal will have dot product operation with kernel, and through the nonlinear transformation, the final output feature map. Then will be the subsampling process. Nonlinear neuron will be operated attached after the convolution kernel. And then, pooling computation is operated after the nonlinear neurons in order to reduce the data amount and keep the local invariance. A typical pooling unit computes the maximum of a local patch of units in one feature map (or in a few feature maps) [41]. Fully connected layers are the final layers of the CNN that all layers are fully connected by weights [37]. A feed forward neural network is usually used as a classifier in this work because it has been shown to provide the best performance compared to neural networks [42,43].

CNN algorithm
In this section, the backpropagation learning algorithm for CNNs will be introduced [36]. The input of a convolution layer is the previous layer's feature maps, and the output feature map is generated by a learnable kernels and the activation function, which may combine the kernel convolutions with multiple input maps. In general, we have that We can repeat the same computation for each map j in the convolutional layer, pairing it with the corresponding map in the subsampling layer: where up(Á) denotes an up sampling operation that simply tiles each pixel in the input horizontally and vertically n times in the output if the subsampling layer subsamples by a factor of n. One possible way to implement this function efficiently is to use the Kronecker product, upðxÞ X⨂1 nÂn . Since the sensitivities of a given map are known, the bias gradient can be immediately computed by simply summing over all the entries in δ l j , ∂E ∂bj ¼ Finally, the gradients of the kernel weights are computed using backpropagation. Then, the gradient of a given weight is summed over all the connections that mention this weight A subsampling layer produces down sampled versions of the input maps. If there are N input maps, then there will be exactly N output maps, although the output maps will be smaller. More formally, where down(Á) represents a subsampling function, which sum over each distinct n-by-n block in the input image so that the output image is n-times smaller along both spatial dimensions. Each output map has multiplicative bias β and an additive bias b. Since every other sample in the image j , 0 f ull 0 Þ can be thrown away, the gradients of b and β can be computed. The additive bias is again just the sum over the elements of the sensitivity map . The multiplicative bias β will involve the original down-sampled map generated by the current layer during the forward propagation. Therefore, the maps generated during the forward propagation should be saved, to aviod recomputing them during backpropagation. Defining d l j ¼ downðx lÀ1 j Þ, then the gradient of β can be represented as Meanwhile, it is better to provide an output map that involves a sum over several convolutions of different input maps. Generally, the input maps combined to form a given output map are typically chosen by hand. However, such combinations can be learned during training. Let α ij represents the weight given to input map i when forming output map j. Then output map j is calculated by By setting the α ij variables equal to the softmax over a set of unconstrained weights c ij , these constraints can be enforced Since each set of weights c ij for fixed j are independent of all other such sets for any other j, only the updates of a single map need considering and the subscript j can be dropped. Each map is updated in the same way, except with different j indices. The derivative of α k with respect to the α i variables at layer is the derivative of the softmax function is given by where δ is used as the Kronecker delta.
Use δ l represents the sensitivity map corresponding to an output map with inputs u. Again, the convolution is the "valid" type so that the result will match the size of the sensitivity map. Now, the gradients of the error function with respect to the underlying weights c i can be computed by the chain rule In addition, the sparseness constraints on the distribution of weights α i for a given map can also been imposed by adding a regularization penalty Ω(α) to the final error function. Therefore, some weights will be zero. That means, only a few input maps would contribute significantly to a given output map, as opposed to all of them. The error for a single pattern can be written as This will find the contribution of the regularization term to the gradient for the weights c i . The user defined parameter λ controls the trade-off between minimizing the fit of the network to the training data, and ensures that the weights mentioned in the regularization term are small according to the 1-norm. Again, only the weights α i for a given output map need considering and the subscript j can be dropped. First, there is Combining this result with Eq. (24), the derivation of the contribution is The final gradients for the weights c i when using the penalized error function Eq. (11) can be computed using Eqs. (13) and (9)

CNN circuits
This part introduces the construction and operation of the memristor neural networks circuit. First of all, we introduce how a single column within a memristor crossbar can be used to perform a convolution operation. Pooling operation can be seen as a simpler conversation operation [39]. The circuit diagram of each column for the convolution operation of the memristor crossbar structure is shown in Figure 16.
Each crosspoint of the circuit was composed of memristors, which is represented for synapses. The kernel (k) was represented by the conductivity value (G) in the crossbar circuit. Some extra manipulation include converting kernel matrix into two parallel column to express the positive and negative value of the kernel and converting kernel matrix to conductivity values (δ AE ) [39] that fall within the bounded range of a memristor crossbar. The op-amp circuit is used to scale the output voltage and implements the sigmoid activation function.
The convolution computation operation in memristor crossbar is the same as the matrix convolution operation. That mainly is a result of the dot-production about the matrixes of kernels and inputs. The first step is the multiplication of voltage (V) and conductance (G ¼ x À1 ) [29], which is following ohm's law (I ¼ VÁG). Second, it will follow Kirchhoff's current law (KCL), which describes that the circuit flowing out the node will be equal to the sum of current flowing into that node. Based on KCL, novel computation architecture for implementing pot-product is implemented [29]. And then, the lower end of the op amp circuit performs activation function. As a result, each neuron of hidden and output layers implements f P i ðG þ À G À ÞV i À Á , where f is a kind of activation function. Figure 17 shows the flow chart of the CNN image identification system.
where L is the number of layers of the CNN recognition system, the input layer (L ¼ 1) holds a testing set of 500 MNIST images, whose size of data set is 28 Â 28. L ¼ 2 is the first convolution layer [39].
The signal size from the front input layer is 28 Â 28. In this layer, an input image will be convolved with six different 5 Â 5 size kernels on the memristor crossbar. According with the front description, each column is the kernel value of 5 Â 5. And the 2D kernel was broken into two arrays in the memristor crossbar to easily account for negative values in the kernel arrays. The total number of a column in the crossbar structure is 2 Â 25 þ 1, in which "1" is the value of bias. Since we are using a memristor crossbar to perform the convolution operations, we can generate all six of these output maps in parallel. So, the crossbar circuit exist six parallel columns in a row. Therefore, this layer requires a 51 Â 6 memristor crossbars.
So far, we have got the memristor crossbar structure, which simulates the synapses and stores the value of kernels in it. The circuit perform the first convolution operation is shown in Figure 18.
Each image contains 784 pixel, but the image is applied 25 pixels at a time where each 25 pixel section generates a single output value. After these convolution kernels applied, a data array that has a size of 24 Â 24 Â 6 will be generated in the memristor crossbar and then will be operated in the next layer. For each column in the memristor crossbar structure,memristor is used to simulate synapses of neural networks. And, the circuit simulates neurons to produce the summation of all the product of inputs and kernels and operate activation function. According to Ohm's law and Kirchhoff's law, every single output value in this time is the input value and the kernel value of the inner product results. After the signal is input, the memristor and op-amp circuits are output later. When all 6 24 Â 24 sizes of feature map are obtained, the first convolution operation was finished,the output is the input value of the next neuron that will be applied in pooling operation process.
Step 2. First smoothing layer (l ¼ 3) Following the first convolution layer, a smoothing operation is performed on the six generated feature maps. Pooling operation can be seen as a simpler conversation operation, with all kernel applied to each feature map is Be similar with the convolution process, each column of the crossbar represents a kernel. So the memristor numbers of a single column of the crossbar is 4 Â 2 þ 1, and six column for with all six feature maps be operated in parallel, Therefore this layer requires a 6 Â (2 Â 4 þ 1) memristor crossbars. But different with the convolution layers, the conductivities corresponding to negative elements in the kernel matrix in this layer are meaningless because all components of the pooling kernel are positive. The following circuit is shown that has pooling operation on the 6 Â 24 Â 24 size of feature map which the convolution layer is derived ( Figure 19).
In the pooling operation, six different feature maps obtained from the convolution layers applied to every corresponding column respectively and obtain another sets of feature maps.
A subsampling operation is performed following each of the smoothing crossbars that reduce the size of each feature map by a square factor of 2. This could be design in to place a single-bit counter on the memory array where the data output from the smoothing operation is stored. The memory address would only update for every other sample so all unwanted data would be overwritten during the smoothing step.
Step 3. Second convolutional layer (l¼4) Following the polling and subsample, operation is the second convolution operation. Different with the first one, inputs of the second convolution layers are six feature maps with 12 Â 12 size, and it exists 12 outputs instead of six in the front one. Because the different number and size of input and output single, the structure of the second layer is distinctly different from the one. The circuit design of the second convolution layer is shown in Figure 20. Each column represents six different feature map convolution processes, and will be operated with 12 different kernels in parallel methods.
Step 4. Second smoothing layer(l ¼ 5) Following the second convolution layer, another smoothing layer is following the second convolution layer to further reduce the size of the data array. The circuit in this layer will be identical to that displayed in Figure 7. With 12 feature maps will be operated, so required 12 parallel single column crossbars. After second layers of pool will produce 4 Â 4 of the size of 12 feature map, the input to the next layer, classification layer (l ¼ 6).
Step 5. Classification layer Following the front feature extraction operations, a fully connected layer is used to classify the feature maps. The classification layer is generally a single layer perceptron or multilayer neural network.
The circuit used to complete this operation can be seen in Figure 21. The memristor crossbar used in classification layer is to store a weight matrix, which is different with storing a set of convolution kernels arrays in convolution circuits. The crossbar consists of 192*2+1 rows which represent 192 inputs (one input for each of the 16 value in each of the 12 outputs maps), and 10 columns which represent 10 outputs (one for each MNIST digit). So the total numbers of memristors in this layers is (192 þ 1) Â 10.
Step 6. Digital storage layers Following every convolution layer, a digital layer was placed at the output of each convolution. The digital storage layer reduces the amount of analog circuit error that is transmitted Figure 19. The group of convolution circuits that is used to implement the smoothing operation. between layers. We chose to store an entire image between layers because any benefit gained by a systematically reduced memory size would likely be outweighed by the complexity of a data controller of this nature [44].

CNN instance
The CNN algorithm purely in simulation under these training conditions results in 92% classification accuracy as shown in Figure 22. And, the simulation process is to test the accuracy of the memristor based CNN recognition system described in the previous section. When testing the simulated memristor crossbars, an accuracy of 91.8% was achieved.

RNN concept
Recurrent neural networks, or RNNs, are the main tool for handling sequential data, which involve variable length inputs or outputs [40]. Compared with multilayer network, the weights in an RNN are shared across different instances of the artificial neurons, each associated with Figure 20. The circuit that is used to implement the second convolution layer. different time steps [40,42]. And, others, in recurrent neural networks, lengths history represented by neurons with recurrent connections, and history length is unlimited. Also recurrent networks can learn to compress whole history in low dimensional space, while feedforward networks compress (project) just single word recurrent networks have possibility to form short term memory, so they can better deal with position invariance [45] RNN architecture.
The simplest architecture of RNNs is illustrated in Figure 23 [40]. The left of Figure 24 shows the ordinary recurrent network circuit with weight matrices U, V, W denoting three different kind of connection (input-to-hidden, hidden-to-output, and hidden-to-hidden, Figure 21. Circuit that is used to implement the classification layer of the CNN recognition system.   respectively). Each circle indicates a whole vector of activations. The right of Figure 24 is a time-unfolded flow graph, where each node is now associated with one particular time instance.

A Hopfield neural network design
Memristor-based Hopfield networks (MHN), which is an ideal model for the case where the memristor-based circuit network exhibits complex switching phenomena, and is frequently encountered in the applications [46]. A Hopfield network consists of a set of interconnected artificial neurons and synapses. In this case, a nine synapses Hopfield network is realized with six memristors and three neurons. As shown in Figure 25, the artificial neuron has three inputs and each input, Ni ¼ (i ¼ 1, 2, and 3), is connected to a synapse with synaptic weight of w i . The output of the three-input binary artificial neuron is expressed as where y is the neuron's threshold; and the sign function is defined as An artificial neuron was constructed, as shown in Figure 26. An operational amplifier is used to sum the inputs. The switches, S 1 , S 2 , and S 3 , are controlled by external signals to obtain positive or negative synaptic weights. The synaptic weights corresponding to input N 1 , N 2 , and N 3 are w 1 ¼ AE M1 M1þR , W 2 ¼ AE M2 M2þR and W 3 ¼ AE M3 M3þR , respectively (M 1 , M 2 , and M 3 are the resistance of the memristors, respectively, and the resistance of R is fixed at 3 MΩ). In the circuit shown in Figure 26, transmission gates B 1 , B 2 , and B 3 reform signals without modifying its polarity, inverters I 1 , I 2 and I 3 generate negative synaptic weights.
The architecture of a 3-bit MHN implemented with nine synapses is shown in Figure 27. The synaptic weight from neuron i to neuron j is denoted as w i, j , which is mapped to resistance of the corresponding memristor M i, j ,. M i, j , and w i, j are represented by the resistance matrix, respectively   Figure 28, and all the demonstration below is based on this circuit. The threshold vector T ¼ (θ 1 , θ 2 , θ 3 ) represents the threshold of the artificial neurons (neurons 1, 2, and 3), and the state vector X ¼ (X 1 , X 2, X 3 ) represents the states of the three neurons, respectively. In each updating cycle, new states of the neurons are updated by the following function where t represents the number of updating cycles and when t ¼ 0, X(0) represents initial states vector. In one updating cycle, new states of the neurons are asynchronously updated from X 1 , X 2 to X 3 in three stages, which are defined as stages a, b, and c, respectively [46].

Potential applications and prospects
Hardware implementation of deep neural networks is accomplished by using neuron-synapse circuits and future devices can make deep neural networks (NNs) design and fabrication more efficient. The full power of NNs has not yet been realized, but the release of commercial chips implementing arbitrary neural networks, more efficient algorithms will no doubt be realized in these domains where neural networks can improve the performance dramatically. Memristorbased NNs promote and solve many A.I. problems such as machine translation, intelligent question-and-answer, and game play, and in the future, memristor-based NNs can be used in neuromorphic computation, brain-computer interface or computer-brain interface, cell phone A.I. application, autopilot and environment monitor.

Conclusions
Different memristor-based neural network design paradigms are described. With regard to neural network systems, the current neural network implementations are not sufficient but fortunately, memristor-based systems provide the potential solution. The basic concepts of memristor-based implementation, such as memristor-based synapse, memristor-based neuron, and memristor crossbar based neuromorphic computing engine, are discussed. The memristor-based neural networks, including SNNs, MNNs, CNNs, and RNNs, are possible and efficient and are expected to spur future development of A.I. It is expected that memristorbased neural networks will take the lead.