States of the control unit in OSORT module.
This chapter presents a novel hardware architecture for correlation-based spike detection and unsupervised clustering. The architecture is able to utilize the information extracted from the results of spike clustering for efficient spike detection. The architecture supports the fast computation for the normalized correlation and OSORT operations. The normalized correlation is used for template matching for accurate spike detection. The OSORT algorithm is adopted for unsupervised classification of the detected spikes. The mean of spikes of each cluster produced by the OSORT algorithm is used as the templates for subsequent detection. The architecture adopts postnormalization technique for reducing the area costs. Modified OSORT operations are also proposed for facilitating unsupervised clustering by hardware. The proposed architecture is implemented by field programmable gate array (FPGA) for performance evaluation. In addition to attaining high detection and classification accuracy for spike sorting, experimental results reveal that the proposed architecture is an efficient design providing low area cost and high throughput for real-time offline spike sorting applications.
- spike sorting
- spike detection
- spike clustering
- field programmable gate array
- brain machine interface
There is an increasing demand in data-acquisition systems for neurophysiology to record simultaneously from many channels over long time periods . These experiments accumulate large amounts of data, which would be processed by spike sorting systems  for analyzing the activities of neurons. Atypical spike sorting system usually involves complicated spike detection and classification operations for separating spikes from background noise and clustering the detected spikes. Large amount of spike-trains would impose heavy computation load for a software spike sorting system, resulting in long processing time.
One approach to reduce the computation time is to implement a spike sorting system by hardware. A number of spike sorting systems based on field programmable gate array (FPGA)  have been proposed. In , a spike classification architecture using probabilistic neural networks  is proposed. Although high speed-up over its software counterpart is observed, the system does not provide spike detection. In addition, the architecture requires the user to input the number of clusters. Therefore, it does not support a fully unsupervised hardware classification. Similarly, the architecture proposed in  also focuses on the classification. The generalized Hebbian algorithm (GHA)  is implemented by FPGA in this work for feature extraction. The features produced by the architecture are then clustered by the k-means algorithm. The architecture does not include the hardware implementation of the k-means algorithm. In addition, the number of clusters still needs to be prespecified in the k-means algorithm. The architecture in  is able to carry out the feature extraction and clustering in hardware. In the architecture, the feature extraction and clustering are based on GHA and fuzzy c-means  algorithm, respectively. Therefore, similar to the architecture in , fully unsupervised classification is difficult for the architecture in  because the number of clusters still needs to be known beforehand.
An alternative FPGA-based hardware architecture  is to adopt OSORT algorithm  for spike-sorting. The OSORT algorithm is able to perform clustering without the prior knowledge of the number of clusters. Unsupervised classification therefore can be carried out. The architecture also includes a spike detection circuit based on the nonlinear energy operators (NEOs) , which is an energy-based detection algorithm. Similar to other energy-based algorithms [13, 14], the NEO algorithm is simple and effective. However, although the energy-based algorithms can operate in conjunction with a number of automatic threshold algorithms [12–14], proper selection of threshold values for these algorithms may still be difficult when noise becomes large. Therefore, their performance may deteriorate rapidly as noise energy increases.
The template-based spike detection algorithm may be suited for the detection of spikes from the source sequences with high noise level. A matched filter [15, 16] is a typical technique based on templates. To detect the presence of the templates, the filter correlates known templates with the input spike trains. This operation can also be viewed as a likelihood ratio detection (LRT) . A drawback of the matched filter is that the templates are required. Although the adaptive generation of templates is possible , only a single template is produced for spike trains. However, because spike trains in general are formed from two or more neurons, a single template may not be sufficient for the detection of all the spikes generated by the neurons. The template-based algorithm presented by  has been found to be effective for the detection of noisy spikes. It adopts the OSORT algorithm for automatic template generation. Normalized correlation operations then carry out the spike detection using the templates produced by the OSORT algorithm. Nevertheless, in the algorithm, both the template generation and correlation computation have high computational complexities. The software implementation of the algorithm may not be suitable for fast analysis of spike trains.
The objective of this chapter is to present a novel FPGA-based hardware architecture for efficient spike detection and clustering. The architecture supports both the normalized correlation operations for spike detection and the OSORT operations for spike clustering. Moreover, the clustering results produced by the OSORT algorithm are used as the templates for spike detection. The architecture is the hardware implementation of the algorithm in . It then has the advantages of accurate spike detection, fully unsupervised spike clustering, and fast computation.
We have implemented a spike sorting system on a network-on-chip (NOC) platform for the evaluation of the proposed architecture. The platform is based on FPGA. It consists of a soft-core processor  and the proposed architecture. The proposed spike sorting architecture is used as a hardware accelerator of the soft-core processor for spike sorting. The simulator developed in  is adopted to generate extracellular recordings. In this paper, comparisons with the existing software and hardware implementations are made. Experimental results show that the proposed architecture attains a high speed-up over its software counterpart for spike sorting. It also has a lower area cost over existing hardware architectures. Experimental results reveal that the proposed architecture is an effective alternative for real-time spike sorting with accurate detection and clustering.
2. The algorithm
Before presenting the hardware architectures for spike sorting, we first review the spike detection and clustering algorithms adopted by this work. Detailed discussions of these algorithms can be found in .
2.1. Normalized correlation
Consider a spike train
The normalized output at
This is the inner product of segment
Our normalized correlation operations are based on
The normalized correlation operations shown in Eq. (2) may have high computational complexities. Although the template
2.2. OSORT algorithm
The OSORT algorithm is an unsupervised template-based clustering algorithm for spike sorting. It does not require feature extraction, and the number of clusters
Given a current detected spike
2.3. Normalized correlation and OSORT algorithm for spike sorting
By combining the normalized correlation with the OSORT algorithm, an effective spike sorting system for both spike detection and classification can be realized. The system is a feedback system capable of automatic template generation for spike detection and unsupervised clustering for the classification. The block diagram of the system is revealed in Figure 2.
Initially, the clusters and templates produced by the OSORT are not available. As a result, it may be difficult to carry out the normalized correlation for spike detection. One way to solve this problem is to use only the block energy for the detection of spikes. The detected spikes are then clustered by the OSORT algorithm for the generation of initial templates.
After the templates become available, the spike detection is then based on the normalized correlation
The hardware architecture for implementing the spike sorting system is depicted in Figure 3. The architecture contains two modules and one controller. The first module of the architecture, termed normalized correlator module, is responsible for the spike detection. It is capable of performing both the GLRT and normalized correlation operations. The second module, termed OSORT module, carries out the unsupervised OSORT spike sorting. The global controller coordinates the operations of these two modules. The architecture and detailed operations of the normalized correlator module and OSORT module are presented in the following two sections.
3. Architecture of the normalized correlator module
The block diagram of the normalized correlator module is revealed in Figure 4. The module supports the filtering, block energy computation, correlation computation, detection, and buffering. The filtering operation is the preprocessing step for the spike detection. It reduces the DC offset and noises. The objective of the block energy computation is to find ||
3.1. Filter unit and block energy computation unit
The filter unit is a hardware implementation of a band-pass Butterworth filter. The filter unit contains multipliers, shift registers, and adders. The architecture is a simple realization of the direct form I of IIR filters. To implement the block energy computation unit, we first note that a basic approach may involve
Consequently, the calculation of ||
3.2. Correlator unit
The goal of the unit is to carry out the normalized correlation
Figure 6 shows the architecture of the correlator unit for the case of two templates. The circuit consists of 2
3.3. Threshold unit
Although the operations of the unit can be easily accomplished by a simple comparison circuit, the detection accuracy may be further improved by taking the detection results of the neighboring blocks into consideration. Because the neighboring blocks are overlapping, they may be similar. As a result, the normalized correlation values of the neighboring blocks may also be similar. Therefore, it is likely that an occurrence of a single spike may result in the issues of multiple hits.
To solve this problem, when the normalized correlation value of a block is above the threshold, a hit is not immediately declared. The architecture will then examine the normalized correlation values of the previous blocks. A hit would actually be issued only if
3.4. Switch buffer
The goal of switch buffer is to store the detected spikes for subsequent clustering operations. As shown in Figure 8, there are two buffers (denoted as Buffer x and Buffer y) in the circuit. When one of the buffers stores the detected spikes, the other provides the detected spikes to the OSORT module for clustering operations. The switch controller in the circuit is responsible for the determination of the buffer to store the detected spikes. The flowchart of the operations of the switch controller is shown in Figure 9. From the flowchart, it can be observed that the controller assigns the detected spikes to a buffer in accordance with the availability of that buffer. A buffer is available when it has empty cells for storing new detected spikes, and is not currently providing spikes to the OSORT module.
4. The architecture of OSORT module
The OSORT module contains buffers, distance computation unit, mean updating unit, comparator, and controller, as shown in Figure 10. The centroid and the size of each cluster are stored in the buffers. The distance computation unit and mean updating unit are responsible for squared distance computation and the updating of centroid of the clusters, respectively. The control unit coordinates different components of the OSORT module for carrying out the unsupervised clustering operations.
There are three buffers in the OSORT module, which are denoted by Buffer 1, Buffer 2, and Buffer 3, respectively. Buffer 1 holds an input spike detected by the correlators. Buffers 2 and 3 contain the mean value and size of each cluster, respectively. Buffer 1 is a simple
4.2. Distance computation unit and mean updating unit
Because buffers in the memory unit can be accessed one sample at a time, the circuits in the distance computation unit and mean updating unit provide only sample-wise computations. This is beneficial for reducing the area costs. There are two cases when the distance computation unit needs to be activated. In the first case, a new spike is arrived. To find the cluster for the new spike, the squared distance computation is required. In the second case, it is desired to merge two clusters. The squared distance calculation is needed for finding the closest clusters. The distance computation unit takes the samples fetched from Buffer 1 and Buffer 2 as inputs. The unit finds the squared distance between the spikes stored in Buffer 1 and the mean value of a cluster selected in Buffer 2. Upon the completion of the squared distance computation, the comparison of the new squared distance with the current minimum distance stored in a register of the unit is carried out. If the new squared distance is smaller than the current minimum distance, then it becomes the new current minimum distance. The same squared distance computation and comparison operations will be repeated for until all the mean values in Buffer 2 are searched. This scheme is useful for finding the best matching mean value stored in Buffer 2 to the spike stored in Buffer 1.
The mean updating unit is activated after a new spike is assigned to a cluster, or after two clusters are merged. In these cases, the mean of the updated cluster needs to be computed. Note that the clusters to be updated are determined by the distance computation unit. The mean updating unit is only responsible for the computation of the new mean of the updated clusters. The circuit takes waveforms stored in Buffer 1 and Buffer 2, and the cluster size stored in Buffer 3 as inputs. The updated mean is the weighted sum of the waveforms obtained from Buffer 1 and Buffer 2, as shown in Figure 12. The weights are determined from the cluster sizes from Buffer 3. The updated results are then stored back to Buffer 2 and Buffer 3.
4.3. Control unit
The control unit activates components of the memory unit and cluster computation unit for the unsupervised clustering. The states of the control unit are summarized in Table 1. In addition to the tasks carried out by each state, the activated circuit components associated with each state are also included in the table. Figure 13 shows the flowchart of the OSORT algorithm in terms of the states defined in Table 1.
|State 1||Buffer 1|
Distance computation unit
|Fetch a new spike to Buffer 1|
|State 2||Buffers 1, 2||Find the best matching cluster to the spike in Buffer 1|
|State 3||Buffers 1, 2, 3||Creating a new cluster|
|State 4||Buffers 1, 2, 3|
Mean updating unit
|Update the mean and size of a cluster|
|State 5||Buffers 2, 3||Remove a cluster|
The flowchart in Figure 13 is consistent with that shown in Figure 1. Although the combinations of the states shown in Table 1 are able to implement the OSORT algorithm, additional modifications may still be desirable to facilitate the hardware implementation. One modification implemented in the controller is to handle the cases when the current number of clusters reaches the upper limit
5. Global controller and NOC
As depicted in Figure 3, the global controller in the proposed spike sorting system coordinates the operations of the normalized correlator module and OSORT module. The major goal of the global controller is to fetch detected spikes from the normalized correlator module and deliver them to the OSORT module. When a buffer in the switch buffer of the normalized correlator module becomes full, the global controller starts to fetch the detected spikes one at a time from the buffer to the OSORT module.
The fetching operations are repeated until all the spikes stored in the buffer are fetched. At this time, the buffer becomes available again for storing the new detected spikes, as shown in Figure 14. The proposed hardware spike sorting system is configured as a user component in a NOC system, which is designed by the QSYS platform. In addition to the proposed system, we see from Figure 15 that the NOC contains the NIOS II processor, a DMA controller, an on-chip RAM, and a hardware timer.
The raw spike trains are stored in the on-chip RAM. The DMA controller is responsible for delivering the spike trains to the proposed hardware system without the intervention of the NIOS II processor. The DMA controller is able to halt the delivery of spike trains automatically when both buffers in the switch buffer of the normalized correlator module are unavailable and/or full. The hardware timer is used to measure the computation speed of the proposed system. The NIOS II processor integrates different components in the NOC. It activates the DMA controllers for the delivery of spike trains. After that, the processor collects the spike sorting results from the OSORT modules of the proposed spike sorting system. The processor is also able to read the information provided by the hardware timer for the measurement of the computation speed of the proposed circuit.
6. Experimental results
This section presents some experimental results of the proposed architecture. The extracellular recordings for the experiments are based on the simulator developed in , where the ground truth about spiking activity can be accessed. Each spike has length 2.67 ms. The sampling rate for the spike recording is 24,000 samples/s. Therefore, there are 64 samples (i.e.,
The performance of the proposed architecture for spike detection is first evaluated. The performance evaluation is based on the true positive rate (TPR) and false alarm rate (FAR). The TPR of a detection algorithm is defined as the number of true spikes detected by the algorithm divided by the total number of true spikes. The FAR of a detection algorithm is the number of silent segments, which are falsely detected as spikes by the algorithm, divided by the total number of the segments detected by the algorithm. The TPR and FAR of various detection algorithms are included in Table 2. In the experiments, the spike trains are from two neurons. Therefore, there are two templates (i.e.,
Because the normalized correlation is effective for detecting real spikes and ignoring silent segments, we can observe from Table 2 that the proposed architecture has superior performance over the other algorithms. We use the example shown in Figure 16 to further demonstrate this fact. In the example, a noisy spike train with SNR = −3 dB is used for the spike detection. Figure 16 reveals the normalized correlation values ,
Next we evaluate the area complexities. Because adders, multipliers, dividers, comparators, and registers are the basic building blocks of the proposed architecture, the area complexities are separated into five types: the number of adders, multipliers, dividers, comparators, and registers. Tables 3 and 4 show the area complexities of the normalized correlator and OSORT modules, respectively. It can be observed from Table 3 that, in the normalized correlator module, the correlator unit and switch buffer have larger area complexities. The number of adders, multipliers, and registers grows with the block dimension
|Filter unit||Block energy|
|Correlator||Thresholding unit||Switch buffer||Subtotal|
The proposed architecture has been implemented by FPGA for performance measurement. The target FPGA device for the hardware implementation is Altera STRATIX IV EP4SGX230. The design platform for the experiments is the Altera QUARTUS II with QSYS. Table 5 shows the hardware utilization of the proposed architecture. There are four different FPGA hardware resources considered: adaptive look-up tables (ALUTs), dedicated logic registers, block memory bits, and DSP blocks. The DSP blocks are dedicated to the implementations of adders, multipliers, dividers, and comparators. The ALUTs, dedicated logic registers, and block memory bits can be used for the implementation of registers, as well as adders, multipliers, dividers, and comparators. It can be observed from Table 5 that the consumption of DSP blocks of normalized correlator is higher than that of the OSORT module. This is because the normalized correlator requires more number of arithmetic operators. There are 182,400 ALUTs, 182,400 dedicated logic registers, 1288 DSP blocks, and 14,625,792 block memory bits in the target FPGA device. It can be observed from Table 5 that only limited hardware resources are consumed by the proposed circuit.
|Normalized correlator module||13,966||29,733||0||532|
In addition to consuming low hardware resources, the proposed architecture is able to provide high throughput. Table 6 reveals the throughput of the proposed architecture for various clock rates and switch buffer size
7. Concluding remarks
The proposed architecture has been found to be effective for real-time spike sorting. It features high accuracy, low hardware resource consumption, and high throughput. The combination of the normalized correlation and OSORT algorithm is beneficial for accurate spike detection with high TPR and low FAR even for low SNR values. The postnormalization approach adopted by the normalized correlator circuit is also able to reduce the area costs for the normalization operations. In addition, the switch buffer in the correlation circuit can effectively coordinate the operations of spike detection and classification for achieving high throughput. Experimental results reveal that the proposed architecture achieves TPR = 82.71% and FAR = 1.06% for SNR = −3 dB. The ALUT consumption is only 21.57% for the FPGA device STRUTIX IV EP4SGX230. The throughput is 25.04 Msamples/sec for the clock rate 100 MHz. All these facts demonstrate the effectiveness of the proposed architecture.
The authors would like to acknowledge the financial support of the Ministry of Science andTechnology, Taiwan, under grant MOST 105-2221-E-003-011-MY2.