Opening the “Black Box” of Silicon Chip Design in Neuromorphic Computing

Neuromorphic computing, a bio-inspired computing architecture that transfers neuroscience to silicon chip, has potential to achieve the same level of computation and energy efficiency as mammalian brains. Meanwhile, threedimensional (3D) integrated circuit (IC) design with non-volatile memory crossbar array uniquely unveils its intrinsic vector-matrix computation with parallel computing capability in neuromorphic computing designs. In this chapter, the state-of-the-art research trend on electronic circuit designs of neuromorphic computing will be introduced. Furthermore, a practical bio-inspired spiking neural network with delay-feedback topology will be discussed. In the endeavor to imitate how human beings process information, our fabricated spiking neural network chip has capability to process analog signal directly, resulting in high energy efficiency with small hardware implementation cost. Mimicking the neurological structure of mammalian brains, the potential of 3D-IC implementation technique with memristive synapses is investigated. Finally, applications on the chaotic time series prediction and the video frame recognition will be demonstrated.


Introduction
Benefit by the Moor's law, the von Neumann computing architecture, respectively storing and processing data instructions in the memory unit and the central processing unit (CPU), was served as the major computing model in past several decades [1]. However, physical limitations of the complementary metal-oxide-semiconductor (CMOS) technology and the storage capacity hinder the performance development of classic computers; such classic computers can no longer double its performance every 18 months, indicating the end of Moore's prediction [2].
Recently, the computing efficiency of extracting valuable information in data-intensive applications through the von Neumann computing architecture has become computationally expensive, even with super-computers [3]. The accumulated amount of energy required for the data processing through super-computers poses a query on whether the augmented performance is sustainable.

Artificial neural networks
In the endeavor to imitate the nervous system within mammalian brains, ANNs are built by employing electronic circuits to imitate biological neural networks [17]. In general, ANN methodologies adopt the biological behavior of neurons and synapses, so-call the hidden layer, in their architecture. The hidden layer is constituted by multiple "neurons" and "synapses", which carries activation functions that Opening the "Black Box" of Silicon Chip Design in Neuromorphic Computing DOI: http://dx.doi.org /10.5772/intechopen.83832 control the propagation of neuron signals. Based on the connection pattern and the learning algorithm, ANN methodologies can be classified into various categories, as depicted in Figure 2.
The multilayer perceptron (MLP), a representation of feedforward neural networks (FNNs), is composed by unidirectional connections between hidden layers. MLP has become the quintessential ANN model due to its advantages in ease of implementation [18]. However, the major design challenge in the MLP is that the runtime as well as the training and learning accuracy of the system are strongly affected by the number of neurons and hidden layers. As the neural information evolved into a much more sophisticated mixed-signal evaluation, disadvantages of MLP are exposed when such a neural network is deployed for temporal-spatial information processing tasks [19]. Recurrent neural networks (RNNs), successfully adopt the temporal-spatial characteristics within their hidden layer, closely mimic the working mechanism of biological neurons and synapses. However, the major design challenge is that all weights within the network need to be trained, which dramatically increases its computational complexity. In earlier 2000s, the reservoir computing, an emerging computing paradigm, exploits the dynamic behavior of conventional RNNs and computationally evolved its training mechanism [20]. Within the reservoir layer, synaptic connections are constructed by a layer of nonlinear neurons with fixed and untrained weights. In the reservoir computing, the complexity of the training process is significantly reduced, since only output weights are needed to be trained, thereby, higher computational efficiency can be achieved.
The conventional reservoir computing has been fully developed in the past decade to simplify the training operation of RNNs and proven its benefits across multifaceted applications [21][22][23][24]; however, the computational accuracy of the system is still highly proportional to the number of neurons within the reservoir layer. It can be observed that these enormous numbers of neurons significantly hinder the hardware development on the reservoir computing. In [25], it has been proven that the computing architecture is capable to exhibit rich dynamic behaviors during operations when the delay is employed into the system. Benefit from the embedded delay property, the training mechanism and the computing architecture of conventional reservoir computing have conceptually evolved, namely the time delay reservoir (TDR) computing [26]. In the TDR computing, the reservoir layer is built by only one nonlinear neuron with a feedback loop. In this context, time-series input data can be processed through the TDR computing by taking advantages of the feedback signal to form a short-term memory, thereby, higher computational efficiency and accuracy can be achieved.

Spiking information processing
In many brain-inspired neuromorphic computing systems, the interface between modules is often influenced by the signal propagation. The major design challenge in neuromorphic computing is the difficulty in adapting raw analog signals into a suitable data pattern, which can be used in the neuronal activities. Before digging deep into the architecture of our fabricated spiking neural network chip, in this section, a temporal encoding scheme through the analog IC design technique will be discussed.

CMOS neuron models
In past few decades, researches on biological neurons have been fully investigated in the field of neuroscience [27][28][29][30][31][32]. In general, the dendrite, the soma, the axon and the synapse are four major elements of a biological neuron [33]. Within a nervous system, dendrites collect and transmit neural signal to the soma, while the soma plays an important role as the CPU to carry out the operation of the nonlinear transformation. Moreover, signals are processed and transmitted in form of a nerve impulse, also known as the spike [34]. During the operation, an output spike is formed when the input stimulus surpasses the threshold level, indicating as the firing process. Figure 3 demonstrates a typical firing and resting operation in a biological neuron. Synapses along with the axon are then transmitted the spike data patterns to other neurons.
The leaky integrate-and-fire (LIF) neuron model plays an important role in the neuron design to convert raw analog signals into spikes [35]. Figure 4 depicts the analog electronic circuit model of a LIF neuron. The input excitation, I ex , can be expressed as where C m is the membrane capacitance, represents the voltage potential across the membrane capacitor over time, and I leak is the leakage current. During the operation, raw analog signals are firstly converted into an excitation current, which will be used to charge up the potential level across the membrane capacitor. When the voltage potential across the membrane capacitor surpasses the threshold level, the circuit fires a spike as its output. Once the firing process is accomplished, the membrane capacitor will be reset to its initial state until the next firing cycle takes place. The LIF neuron is capable to process both firing and resetting operations, closely mimicking the biological behavior of neurons.
From Eq. (1), it can be observed that the integration time over the membrane capacitor can be regulated by excitation and leakage currents. Such relation can be depicted by a simple resistor model, which can be rewritten as where V m ____ R leak determines the amount of leakage current. Thereby, the voltage potential across the membrane capacitor can be determined as

Neural codes
Neural code is used to characterize raw analog signals into neural responses. In general, there are two distinct classes to represent neural codes. One class converts analog signals into a spike train where only the number of spikes matters, knowing as the rate code. Another class converts analog signals into the temporal response structure [36] where time intervals matters, knowing as the temporal code. Figure 5 demonstrates major differences between the rate code and the temporal code. In the rate code, analog signals are encoded into the firing rate within a sampling period, as shown in Figure 5a. Considering the implementation complexity, the rate encoding scheme is easier to implement through electronic circuits compared to the temporal encoding scheme; however, small variation of an analog signal in the temporal response structure are neglected, which makes the rate  code inherency ambiguous in the real-time computation [36]. In [37], researches discover that neural information does not only depend on the spatial, but also the temporal structure. Time-to-first-spike (TTFS) latency code [38][39][40] is one of the simplest temporal encoding schemes. As demonstrated in Figure 5b, in a TTFS latency code, analog signals are encoded into a time interval between the starting point of the sampling period and the generated spike. However, the encoding error would be large if the system performs abnormally.
The inter-spike-interval (ISI) code is another branch of the temporal code, where encoded analog signals depends on the internal time correlation between spikes [41,42], as illustrated in Figure 5c. In general, the ISI temporal encoder converts all analog signals into several inter-spike-intervals, allowing each spike to be the reference frame to others. Obviously, the ISI code is capable of carrying more information within a sampling period compared to the TTFS latency code. Figure 6a demonstrates the simplified function diagram of ISI temporal encoder. The ISI temporal encoder employs an iteration architecture such that each LIF neuron operates in separate clock periods. The signal regulation layer is built by a current mirror array to duplicate the input excitation current for each LIF neuron; the neuron pool along with the signal integration layer achieve the iterative characteristic. Our ISI temporal encoder chip was fabricated through the standard GlobalFoundries (GF) 180 nm CMOS technology, as depicted in Figure 6b.
The number of spikes in an ISI code as discussed in [32] is directly proportional to the number of neurons. Even though this linear proportional correlation is desirable, its hardware implementation is still far more challenging. On the other hand, it can be observed that the exponential relation would increase the number of spikes, thus, containing more information even with the same number of neurons. Through the iterative structured ISI temporal encoder, the number of generated spikes, S N , can be determined by the number of neurons, which can be written as where N defines the total number of neurons. From Eq. (4), it can be observed that even with the same number of neurons, the ISI temporal encoder is capable to produce more spikes compared to [35]; thereby,  the ISI temporal encoder has capability to carry more information. The iterative structure greatly reduces the power consumption, since a smaller number of neurons are needed to produce the equal number of spikes.
In this iterative structured design, the ISI temporal encoder samples the original analog signal without using A/D and D/A conversions, and converts analog signals into several inter-spike-intervals. The expression of the inter-spike-interval can be simplified as In the IC implementation, the membrane capacitor is fixed, thus, V i is a constant; thereby, the variable, A i , in terms of excitation current can be defined as where β is an arbitrary design parameter. The general expression of each inter-spike-interval, as demonstrated in Figure 7, can be written as

CMOS nervous system design
With the respect to the analog design of neural code, our spiking neural network chip adapts the ISI temporal encoding scheme as it pre-signal processing module, as well as the reservoir computing module with delay topology as the processing element. Our spiking neural network, named as the analog delayed feedback reservoir (DFR) system is considered as the simplification of conventional reservoir computing. By employing the delayed feedback structure within the system, our analog DFR system processes the functionality of high dimensional projection and short-term dynamic memory, whereby the behavior of biological neuron is achieved. Figure 8 demonstrates the architecture of our analog DFR system, as published in [43,44]. During the operation, the high dimensional projection within the reservoir layer, as illustrated in Figure 9, is the key module to separate input patterns into different categories [26]. For instance, with low dimensional spaces, two different objects cannot be linearly separated by a single cut-off line, as shown in Figure 9a. However, by projecting input patterns onto higher dimensional spaces, from twodimensional to three-dimensional, the separability of the system changes accordingly. As demonstrated in Figure 9b, the same objects are linearly separated by a single cut-off plane without changing their original xy position. Our analog DFR chip was fabricated through the GF 130nm CMOS technology, as demonstrated in Figure 10.

Architecture of analog DFR system
In our analog DFR system, the dynamic behavior can be controlled by changing the total delay time within the feedback loop. Along the feedback loop, the total delay time, T , is separated into N intermediate neurons with an identical delayed time constant, τ delay , such that In the conventional reservoir computing system, represented by the echo state network (ESN), the memory within the reservoir layer fades in time due to the way that neurons are sparsely connected; such fading memory limits the performance of computation [20]. With the delay-feedback topology embedded, our analog DFR system not only reduces the implementation complexity but also overcomes the drawback of fading memory limitation. Such functionality enables the knowledge transfer processing technique, allowing new incoming input data to carry information from its previous states, as depicted in Figure 11. The expression of N th output, S N , can be simplified as where the function, f [ ] , represent the nonlinear transformation of input signal; I p (x) and I p−1 (x) indicate the current and previous input patterns, respectively; Av is the finite gain of the gain regulator within the reservoir layer.

Delay characteristic
Along the feedback loop, the delay time constant, τ delay , can be controlled by the integration time over the membrane capacitor, which can be expressed as In general, the mathematical model of the delay time constant is represented by the values of resistance and capacitance. In the LIF delay neuron, the input impedance, R in , is equivalent to V m ___ I ex , thus, the delay time constant can be simplified as The feedback loop, which is constructed by multiple LIF neurons, as illustrated in Figure 12. To enable the spiking signal propagation, the output spike train from the previous neuron is utilized as the clock signal to trigger its following neuron.

Dynamic behavior
In general, the phase portrait is used to visualize how solutions of a delay system would behave. In this experiment, measured phase portraits are plotted through two signals from the feedback loop where one of them is recorded with the time   Opening the "Black Box" of Silicon Chip Design in Neuromorphic Computing DOI: http://dx.doi.org /10.5772/intechopen.83832 delay, as shown in Figure 13. As the total delay time within the feedback loop increases, the dynamic behavior of the system changes accordingly. As plotted in Figure 13b, the delayed signal repeatedly traces its initial path when the total delay time within the feedback loop maintains around 1 μs , indicating as the periodic. When the total delay time within the feedback increases to 1.4 μs , as shown in Figure 13d, the delayed signal diverges its initial path but still tracking its equilibrium point, indicating as the edge-of-chaotic.

Three-dimensional neuromorphic computing
To closely mimic functionalities of mammalian brains, electronic neurons and synapses in neural network designs need to be constructed in a network configuration, which demands extremely high data communication bandwidth between neurons and high connectivity neural network degree [45,46]. However, these requirements are not achievable through the traditional von Neumann architecture or the two-dimensional (2D) IC design methodology. Recently, a novel 3D neuromorphic computing system that stacks the neuron and synapse vertically has been proposed as a promising solution with lower power consumption, higher data transferring rate, high network degree, and smaller design area [47,48]. There are two 3D integration techniques that can be used in the hardware implementation of neuromorphic computing: (1) through-silicon via (TSV) 3D-IC and (2) monolithic 3D-IC. A well-known 3D integration technique is to use the TSV as vertical connection to bond two wafers. In this structure, a large capacitance that is formed by TSVs can be used to build the membrane capacitor, which is required in neuron firing behavior [49][50][51]. Unlike the TSV 3D-IC technique that uses separately fabrication processes, the monolithic 3D-IC technique is capable to integrate multiple layers of devices at a single wafer, thus, the monolithic 3D-IC technique is capable to provide a smaller design area with lower power consumption [52,53].

Memristor
In neural network designs, the electronic circuit model of synapses can be implemented by an emerging non-volatile device, namely the memristor, which is a class of the resistive random-access memory (RRAM). In general, the memristor device is constructed in a metal-insulator-metal (MIM) structure, as illustrated in Figure 14a. The resistance of a memristor device can be gradually changed between its low resistance state and high resistance state as the voltage across the memristor device changes.
Memristors are typically fabricated in a 2D crossbar structure [54], which can be further extended to 3D space, as illustrated in Figure 14c and d, respectively.

Memristor-based 3D neuromorphic computing
In the field of ANN designs, a novel 3D neural network architecture, which combines memristors and the monolithic 3D-IC technique, has been proposed [55]. In this structure, neurons and memristor-based synaptic array are stacked vertically, as demonstrated in Figure 15 [48]. As a non-volatile device, RRAM is capable save static power consumption with small implementation area while maintaining its weighted value. With the monolithic 3D-IC technique, the memristor-based 3D neuromorphic computing can potentially reduce the length of critical path by 3X [56], increase the scalability [52], decrease the power consumption by 50% as well as minify the die area by 35% [57].

Chaotic time series prediction
To evaluate the precision of our analog DFR system, a chaotic time series prediction benchmark, the tenth-order nonlinear autoregressive moving average system (NARMA10), is carried out, which can be governed by the following equation where D (t) is the random input signal at time t , and O (t) is the output signal. In this experiment, 10,000 sampling points were generated through Eq. (14) for training and testing phases. 6000 samples were used for the training while rest samples were used for the testing. The prediction error was then examined through the normalized root mean square error (NRMSE).  In the training phase, output weights were trained by minimizing the deviation between target and predicted outputs. Both training and testing errors were achieved by the NRMSE, which can be defined as where y i defines the predicted output, y ̂ i is the target output, N is the number of samples, and σ y ̂ 2 determines the output variance. Experimental results of predicted output signals against target outputs with our analog DFR computing system is plotted in Figure 16. From experimental results, the training and testing errors are found to be 8.49 and 6.83%, respectively.

Video frame recognition
In this task, the application of video frame recognition is chosen to examine the performance of our analog DFR system. In this experiment, 48 images, which comprise three different persons with various face angles, were drawn from the Head Pose Image dataset [58], as demonstrated in Figure 17a. Twenty images were used for the training, while another 24 images were used for the testing. In the training phase, the face angle changes from 0 to 75° horizontally. In the testing phase, the rotational angle of face follows the training phase but with additional 15° applied vertically.
As illustrated in Section 4.3, our fabricated analog DFR chip is capable to operate at the edge-of-chaos region as the delay changes. To demonstrate the importance of delay, our model was evaluated through several delayed time constants. As depicted in Figure 18, it can be observed that the recognition rate changes with regard to the delay time. For instance, the recognition rate maintains above 98% when the system operates at the edge-of-chaos regime ( T = 20 ms ) with 10% or less salt-and-pepper noise. As the noise level approaches to 50%, the recognition rate still maintains above 93%. However, if the dynamic behavior of the system deviates from the edgeof-chaos regime, the recognition rate significantly reduces due to the change in the dynamic behavior.

Conclusions
In this chapter, the design aspect of our analog DFR system with the analogue electronic circuit model of biological neuron is discussed. By mimicking how human beings process information, our analog DFR system adapts the spiking temporal information processing technique and a nonlinear activation function to project input patterns onto higher dimensional spaces. From measurement results, our analog DFR system demonstrates richness in dynamic behaviors, closely mimicking the biological neurons with delay property. By naturally perform these neuron-like operations, our analog DFR system is capable to nonlinearly project input patterns onto higher dimensional spaces for the classification while operating at the edge-ofchaos region with merely 526 μW of power consumption. Experimental results on the chaotic time series prediction and the video frame recognition demonstrate the high recognition accuracy even with noise, making our analog DFR system a candidate for low power intelligence applications.  © 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.