InTech uses cookies to offer you the best online experience. By continuing to use our site, you agree to our Privacy Policy.

Computer and Information Science » Artificial Intelligence » "Reinforcement Learning", book edited by Cornelius Weber, Mark Elshaw and Norbert Michael Mayer, ISBN 978-3-902613-14-1, Published: January 1, 2008 under CC BY-NC-SA 3.0 license. © The Author(s).

Chapter 6

Interaction Between the Spatio-Temporal Learning Rule (Non Hebbian) and Hebbian in Single Cells: A Cellular Mechanism of Reinforcement Learning

By Minoru Tsukada
DOI: 10.5772/5277

Article top

Overview

The Hebbian Learning Rule.
Figure 1. The Hebbian Learning Rule.
Input-Output Timing Dependent LTP/LTD.
Figure 2. Input-Output Timing Dependent LTP/LTD.
Layer-specific profiles of LTP and LTD.
Figure 3. Layer-specific profiles of LTP and LTD.
Spatio-Temporal Learning Rule (STLR).
Figure 4. Spatio-Temporal Learning Rule (STLR).
a) Temporal Pattern Stimuli (b) Temporal Sequence Pattern Dependent LTP/LTD - Effects of Markov Chain Stimulus.
Figure 5. a) Temporal Pattern Stimuli (b) Temporal Sequence Pattern Dependent LTP/LTD - Effects of Markov Chain Stimulus.
Input-Input Timing-Dependent LTP/LTD.
Figure 6. Input-Input Timing-Dependent LTP/LTD.
a) i (b) Input-input timing-dependent LTP can be induced independent of backpropagating action potentials (BAPs).
Figure 7. a) i (b) Input-input timing-dependent LTP can be induced independent of backpropagating action potentials (BAPs).
A schematic presentation of synaptic potentiation or depression by the synchronous or asynchronous inputs.
Figure 8. A schematic presentation of synaptic potentiation or depression by the synchronous or asynchronous inputs.
a) All Connected 1 Layer Neural Network (b) The Spatio-Temporal Pattern of Inputs.
Figure 9. a) All Connected 1 Layer Neural Network (b) The Spatio-Temporal Pattern of Inputs.
a) Output Pattern Distribution (b) The Effect of Input Timing-Coincidence on Output Distribution.
Figure 10. a) Output Pattern Distribution (b) The Effect of Input Timing-Coincidence on Output Distribution.
a) The Change in Synaptic weight according to Hebbian Learning Rule (b) The Change in Synaptic Weight according to the Spatio-temporal Learning Rule (c) The Function of Local (dendrite) –Global (soma) Interaction and the Role of Back Propagating Spikes (BAPs).
Figure 11. a) The Change in Synaptic weight according to Hebbian Learning Rule (b) The Change in Synaptic Weight according to the Spatio-temporal Learning Rule (c) The Function of Local (dendrite) –Global (soma) Interaction and the Role of Back Propagating Spikes (BAPs).

Interaction between the Spatio-Temporal Learning Rule (non Hebbian) and Hebbian in Single Cells: A cellular mechanism of reinforcement learning

Minoru Tsukada1

1. Long term potentiation(LTP), depression(LTD) and Hebbian type learning rule

Hebb (1949) formulated the idea that modification is strengthened only if the pre- and post-synaptic elements are activated simultaneously (Fig.1). Experimentally, long term potentiation (LTP) and long term depression (LTD) are generally considered to be the cellular basis of learning and memory. Bliss & Lømo (1973) first found that high- frequency electrical stimulation (“tetanus,” 100-500 Hz) effectively produced LTP in the hippocampal CA1 pyramidal cells. Recently, a series of experiments provided direct empirical evidence of Hebb’s proposal (Markram et al., 1997; Magee & Johnston, 1997; Zhang et al., 1998; Feldman, 2000; Boettiger & Doupe, 2001; Sjostrom, 2001; Froemke & Dan, 2002). These reports indicated that synaptic modification can be induced by repetitive pairing of EPSP and back-propagating action potentials (BAPs). Pre-synaptic spiking within tens of milliseconds before postsynaptic spiking induced LTP whereas the reverse order resulted in LTD. This spike-timing-dependent LTP/LTD has been confirmed by using pyramidal cell pairs in hippocampal cultures, in which they found an asymmetric profile of LTP and LTD in relation to the relative timing between EPSPs and BAPs (Debanne & Thompson, 1998; Bi & Poo, 1998).

media/image1.jpeg

Figure 1.

The Hebbian Learning Rule.

The influence of location dependency of synaptic modification along dendritic trees was examined in the CA1 area of rat hippocampal slices (Tsukada et al., 2005). A pair of electrical pulses was used to stimulate the Schaffer-commissural collaterals (SC) and stratum oriens (SO). Then we estimated the profile of LTP and LTD at a layer specific location from the proximal to distal region of the stratum radiatum.

Figure 2 shows the optical imaging results of LTP and LTD induced by a series of different spike timing (τ). The widest and strongest LTP was observed when simultaneous stimuli (τ= 0 ms) were applied. LTP decreased rapidly in space and time as the absolute value of relative timing increased to 15 ms on both sides. Accordingly, LTP was induced when back-propagating spikes (Stim B) were applied within a time window of 15 ms before and after the onset of Stim A, whereas LTD was induced on both sides at |τ| = 20 ms. Outside the 50 ms time window, synaptic modification disappeared. These instances of LTP and LTD showed a globally symmetric window of spike timing similar to a “Mexican hat function.”

media/image2.jpeg

Figure 2.

Input-Output Timing Dependent LTP/LTD.

media/image3.jpeg

Figure 3.

Layer-specific profiles of LTP and LTD.

We tested the location dependence of synaptic modification along dendritic trees. A symmetric window was obtained at the proximal region of the SR where GABAergic interneurons are projected, while an asymmetric window was obtained at the distal region of the SR where there is no projection of GABAergic interneurons (Fig 3).

The region-specific profiles of LTP/LTD depend on the network with or without the inhibitory projection on the layer of SR. Two factors of “after hyperpolarization” of spikes and “region-specific projection of inhibitory interneurons”, which organize “lateral inhibition” for timing τ, underlie the “symmetric” profile for timing τ, while one factor of “after hyperpolarization” of spikes serves to organize the “asymmetric” profile. The “symmetric” profile, with a sharp window for τ, works as a coincidence detector between the input of CA3 Shaffer collaterals and the output of CA1 pyramidal cells. The time window corresponds to the time interval of a gamma cycle under the assumption that sequence information is processed in a time scheme of several gamma cycles (local) in a theta cycle (global) (Lisman, 1989; Aihara et al., 2000). On the other hand, the “asymmetric” profile, with a broad time window after τ= 0 ms, is able to integrate sequence information (“temporal summation”) or to code phase information. This difference between the distal region and the proximal region of SR was seen in the results of temporal-pattern-dependent LTP using optical imaging of CA1 area in hippocampal slices (Aihara et al., 1997; Aihara et al., 2005). The sensitivity of LTP to the temporal pattern is even higher in the distal region than in the proximal region (Aihara et al., 2005). These results suggest an important function of memory processing depending on the synaptic localization on dendrites of CA1 pyramidal cells.

2. Spatio-temporal learning rule (non Hebbian)

The spatiotemporal learning rule (STLR), proposed as a non-Hebbian type by Tsukada et al. (1996) consisted of two defining factors: (a) cooperative plasticity (Input-Input timing coincidence) without a postsynaptic spike and (b) temporal summation (Fig 4).

media/image4.jpeg

Figure 4.

Spatio-Temporal Learning Rule (STLR).

media/image5.jpeg

Figure 5.

a) Temporal Pattern Stimuli (b) Temporal Sequence Pattern Dependent LTP/LTD - Effects of Markov Chain Stimulus.

media/image6.jpg

Figure 6.

Input-Input Timing-Dependent LTP/LTD.

The neurophysiological evidence of “temporal summation” was obtained by applying temporal stimuli (Markov chain stimuli) to Schaffer collaterals of CA3 (Tsukada et al., 1994; 1996; Aihara et al., 2000) (Fig 5a,b). The cooperative plasticity (Fig 6) was measured by using two stimulus electrodes to stimulate the Schaffer-commissural collaterals (SC). First, electrode A was stimulated at 2 s intervals, but this did not produce any change in the synapse. Electrode B was then stimulated at a range of -50ms to 50ms with respect to electrode A. When the stimuli from both electrodes were simultaneous (relative timing τ=0), an extremely large plasticity appeared, but when it was shifted by 10 ms, there was almost no activity, and if it shifted another 10 ms, LTD appeared instead of LTP. When the relative timing was shifted 50 ms, it returned to normal. These data show that a time window exists in response to the relative timing τ. That is, the existence of a Mexican hat-shaped time window at the range of τ=±50ms. The coincidence of spike timing of Schaffer-collateral-paired stimulation of CA3 played a crucial role in inducing associative LTP (Tsukada et al., 2007). However it remains to be clarified whether the associative LTP is independent of back-propagated action potentials (BAPs) or not.

Only local dendritic depolarization at synaptic sites, such as theta-burst stimulation, can induce homosynaptic LTP evoked in the conditioning pathway by application of the associative pairing protocols to Schaffer collaterals even in the absence of BAP (in the presence of low TTX) (Golding et al., 2002). Robust homosynaptically induced LTP is observed in both the absence and presence of low TTX in the conditioning pathway (Fig.7a ; Tsukada et al., 2007). These results suggest that homosynaptic LTP by the present pairing protocol is induced under the condition of inhibiting activation of dendritic Na+ channels.

media/image7.jpeg

Figure 7.

a) i (b) Input-input timing-dependent LTP can be induced independent of backpropagating action potentials (BAPs).

However, in the same preparation, the magnitude of the heterosynaptically induced LTP in association with conditioning bursts is reduced, while a considerable amount of the LTP was preserved in the presence of low TTX (Fig.7b). Homo-synaptic and hetero-synaptic associative LTP can be induced under conditions of inhibited BAPs, even in the absence of a cell spike. If the two inputs synchronize at the dendritic synapse of CA1 pyramidal cells, then the synapse is strengthened, and the functional connection is organized on the dendrite. If the two inputs are asynchronous then the connection is weakened. A schematic representation was drawn in Fig.8. The functional connection/disconnection depends on the input-input timing dependent LTP (cooperative plasticity). This is different from the Hebbian learning rule, which requires coactivity of pre- and post-cell. However, the magnitude LTP is also influenced by BAPs. From these experimental results, it can be concluded that the two learning rules, STLR and HEBB, coexist in single pyramidal neurons of the hippocampal CA1 area.

STLR (non-Hebbian) incorporates two dynamic processes: fast (10 to 30 ms) and slow (150 to 250 ms). The fast process works as a time window to detect spatial coincidence among various inputs projected to a weight space of the hippocampal CA1 dendrites, while the slow process works as a temporal integrator of a sequence of events. In a previous paper in which parameters were fitted to the physiological data of LTP’s time scale (Aihara et al., 2000), the decay constant of fast dynamics was identified as 17 ms, which matches with the period of hippocampal gamma oscillation. The decay constant of the slow process is 169 ms, which corresponds to a theta rhythm. This suggests that cell assemblies are synchronized at two time scales in the hippocampal- cortical memory system, and this is closely related to the memory formation of spatio-temporal context.

media/image8.jpg

Figure 8.

A schematic presentation of synaptic potentiation or depression by the synchronous or asynchronous inputs.

3. The functional differences between STLR and HEBB

Two rules are applied to a single-layered feed-forward network with random connections (Fig 9a) and their abilities to separate spatiotemporal patterns are compared with those of other rules, including the Hebbian learning rule and its extended rules (Tsukada & Pan, 2005). The elements of input patterns are connected to each neuron through a separate weight wij (i =1,2,...,N, j =1,2,...,N). The potential of each neuron depends both on a weighted sum of the simultaneously provided inputs (spatial summation) and inputs arrived in the near past (temporal summation).

The above mentioned functions are expressed in the following equations.

Spatial summation:

pi(tn)=j=1Nwij(tn)xj(tn)
(1)

Temporal summation:

yi(tn)=m=0npi(tm)eλ1(tntm)
(2)
media/image11.jpeg

Figure 9.

a) All Connected 1 Layer Neural Network (b) The Spatio-Temporal Pattern of Inputs.

And the output of the neuron:

ri(tn)=f(yi(ti)θ1)
(3)

where a set of label x1, x2,..., xN are inputs to neurons, xi(tn) is an input to i neuron at time tn (n=1,2,..., n), wij(tn) is a synaptic weight from neuron j to neuron i at time tn, yi(tn) is the potential of neuron i at time tn. ri(tn) is its output, λ1is time decay constant of temporal summation, θ1is threshold. The output function of neurons is defined as:

f(u)={1           u>00           u0
(4)

The spatiotemporal pattern used in this simulation consists of 5 frames of spatial patterns (Fig. 9b), i.e., A1, A2, A3, A4, A5 (Ai is a spatial frame).

Every frame consists of N elements (N=120) and each element is chosen as "1" or "0" randomly, but the total number of "1"s is maintained throughout the various spatial patterns (in this simulation, half of the elements in one spatial frame are “1”, and the other half are “0”). The Hamming distance (HD) between every two spatial patterns is 8 bits (if not specified in the simulation). In some cases it is 2 or 24 bits (mentioned). Calculating all of the permutations of 4 spatial patterns, 24 spatiotemporal patterns were grouped as a training set. The last frame of each spatiotemporal pattern is the same (A5). During the learning process, the 24 spatiotemporal patterns in the training set were learned by each neural network under the same initial conditions. After finishing the learning course, a test pattern (same as the learned pattern) was applied to the networks to attain an output-pattern (for each learning rule, the threshold of neurons θ1 is set so that about half of the elements in the output-pattern are “1”). We compared HDs between output-patterns for each learning rule. The averaged HD is often adopted to compare the ability of discriminating spatiotemporal patterns, which is defined as:

averaged HD=(number of pairs *HD of this pair)number of pairs

Three learning algorithms were used to train each of 24 spatiotemporal input patterns in single-layer neural network models. Each of the neural networks had the same initial condition. The differentiation of output-patterns represented in learned networks was analyzed by their Hamming distances (Fig.10a). HEBB produced the same output pattern, with a Hamming distance of zero, for all of the different spatiotemporal input patterns (Fig. 10a). This proves that the Hebbian learning rule cannot discriminate different spatiotemporal input patterns. Covariant Hebbian rule showed a slight improvement in their pattern separation ability (Fig.10a). The spatiotemporal learning rule had the highest efficiency in discriminating spatiotemporal pattern sequences (Fig.10a). The novel features of this learning rule were induction of cooperative plasticity without a postsynaptic spike and the time history of its input sequences. According to the Hebbian rule, connections strengthen only if the pre- and post-synaptic elements are activated simultaneously, and thus, the Hebbian rule tends to map all of the spatio-temporal input patterns with identical firing rates into one output pattern. HEBB has a natural tendency to attract analogous firing patterns into a representative one, in the simple word “pattern completion”. In contrast, the spatio-temporal rule produces different output patterns depending on each individual input pattern. From this, the spatiotemporal learning rule has a high ability in pattern separation, while the Hebbian rule has a high ability in pattern completion. Finally, the network trained by the spatiotemporal learning rule produced the widest bimodal-distribution of Hamming distance (Fig10b), which shows that it has the highest efficiency in pattern separation.

media/image17.jpeg

Figure 10.

a) Output Pattern Distribution (b) The Effect of Input Timing-Coincidence on Output Distribution.

The two factors responsible for the high efficiency in pattern separation are spatial coincidence and temporal summation. The network trained by the learning rule without spatial coincidence produced the one-modal distribution. From this fact, we can conclude that the distribution in the longer range of the bimodal distribution (Fig.10b) in the histogram is generated by the spatial coincidence factor while the distribution in the short range is generated by the spatiotemporal summation. Thus, the ability of separating patterns in the network can be improved by introducing two factors: spatiotemporal summation and spatial coincidence, but the latter is more important.

4. Interaction of both rules in a dendrites-soma system

media/image18.jpeg

Figure 11.

a) The Change in Synaptic weight according to Hebbian Learning Rule (b) The Change in Synaptic Weight according to the Spatio-temporal Learning Rule (c) The Function of Local (dendrite) –Global (soma) Interaction and the Role of Back Propagating Spikes (BAPs).

The extension of the theoretical simulation results imply that this phenomenon occurs in a dendrites-soma system in a single pyramidal cell with many independent local dendrites in the CA1 area of the hippocampus. This system includes a spine structure, NMDA receptors, and sodium and calcium channels. The pyramidal cell integrates all of these local dendrite functions. The spatiotemporal learning rule and the Hebbian rule coexist in single pyramidal neurons of the hippocampal CA1 area (Tsukada et al., 2007). The Hebbian rule leads to the pattern completion and the spatiotemporal learning rule leads to the pattern separation.

Schematic illustrations were drawn in Figure-11abc. HEBB leads to pattern completion (Fig.11a). In contrast, STLR leads to pattern separation (Fig.11b). In the spatiotemporal learning rule, synaptic weight changes are determined by the “synchrony” level of input neurons and its temporal summation (bottom-up) whereas in the Hebbian rule, the soma fires by integrating dendritic local potentials or by top-down information such as environmental sensitivity, awareness, and consciousness. The coexistence of the spatiotemporal learning rule (local information) and the Hebbian rule (global information) on the neuronal level may support this dynamic process that repeats itself until the internal model fits the external environment (Fig 11c). The dendrite-soma interaction (Fig 11c) in pyramidal neurons of the hippocampal CA1 area can play an important role in the context formation of policy, reward, and value in reinforcement learning.

5. Mechanisms of reinforcement learning in single cells

The role of soma spiking in relation to top-down information raises a number of interesting computational predictions. Hippocampal theta is one of the candidates of top-down information which is driven by the medial septum (Buzsaki et al.,1983). The theta stimulation of adult rat hippocampal synapses can induce LTP (Thomas et al.,1998). Another candidate is extrinsic modulation by acetylcholine, serotonin, norepinephrine and dopamine. They can alter neuronal throughput and BAPs (so-called “meta-plasticity”) in such a way that these transmitters diffuse broadly.

References

1 - T. Aihara, M. Tsukada, M. C. Crair, S. Sinomoto, 1997 Stimulus-Dependent Induction of Long-Term Potentiation in CA1 Area of the Hippocampus: Experiment and Model. Hippocampus 7, 416426 .
2 - T. Aihara, M. Tsukada, H. Matsuda, 2000 Two dynamic processes for the induction of long-term in hippocampal CA1 neurons. Biol. Cybern. 82 189195 .
3 - T. Aihara, Y. Kobayashi, M. Tsukada, 2005 Spationtemporal visualization of long-term potentiation and depression in the hippocampal CA1 area. Hippocampus 15 6878 .
4 - G. Bi, M. Poo, 1998 Synaptic modifications in cultured hippocampal neurons; dependence on spike timing, synaptic strength, and postsynaptic type. J. Neurosci 18 1046410472 .
5 - T. P. Bliss, T. Lømo, 1973 Long-lasting potentiation of synaptic transmission in the dentate area of the anesthetized rabbit following stimulation of perforant path, The Journal of Physiology, 232 331356 .
6 - C. A. Boettiger, A. J. Doupe, 2001 Developmentally restricted synaptic plasticity in a songbird nucleus required for song learning. Neuron 31 809818 .
7 - G. Buzsaki, L. Leung, C. H. Vanderwolf, 1983 Celluler bases of hippocampal EEG in the behaving rat. Brain Res. Rev. 6 169171 .
8 - D. Debanne, S. M. Thompson, 1998 Associative long-term depression in the hippocampus in vitro. Hippocampus 6 916 .
9 - D. E. Feldman, 2000 Timing based LTP and LTD at vertical inputs to layer II/III pyramidal cells in rat barrel cortex. Neuron 27 4556 .
10 - R. C. Froemke, Y. Dan, 2002 Spike-timing-dependent synaptic modification induced by natural spike trains. Nature 416 433438 .
11 - N. L. Golding, N. P. Staff, N. Spruston, 2002 Dendritic spikes as a mechanism for cooperative long-term potentiation. Nature 418 6895 32631 .
12 - D. O. Hebb, 1949 The organization of behavior. John Wiley, New York
13 - J. Lisman, 1989 A mechanism for Hebb and the anti-Hebb processes underlying learning and memory. Proc Natl Acad Sci USA 86 95749578 .
14 - J. C. Magee, D. Johnston, 1997 A synaptically controlled, associative signal for Hebbian plasticity in hippocampal neurons. Science 275 5297 20913 .
15 - H. Markram, J. Lubke, M. Frotscher, B. Sakmann, 1997 Reguration of synaptic efficacy by coincidence of postsynaptic Aps and EPSPs. Science 275 213215 .
16 - P. J. Sjostrome, 2001 Rate timing, and cooperativity jointly determine cortical synaptic plasticity. Neuron 32 11491164 .
17 - M. J. Thomas, A. M. Watabe, T. D. Moody, M. Makhinson, T. J. O’Dell, 1998 Postsynaptic complex spike bursting enables the induction of LTP by theta frequency synaptic stimulation. J Neurosci 18 18 711826 .
18 - M. Tsukada, T. Aihara, M. Mizuro, H. Kato, K. Ito, 1994 Temporal pattern sensitivity of long-term potentiation in hippocampal CA1 neurons. Biol. Cybern. 70 495503 .
19 - M. Tsukada, T. Aihara, H. Saito, H. Kato, 1996 Hippocampal LTP depends on spatial and temporal correlation of inputs. Neural Networks 9 13571365 .
20 - M. Tsukada, X. Pan, 2005 The spatiotemporal learning rule and its efficiency in separating spatiotemporal patterns. Biol. Cybern. 92 139146 .
21 - M. Tsukada, T. Aihara, Y. Kobayashi, H. Shimazaki, 2005 Spatial analisis of spike-timing-dependant LTP and LTD in the CA1 area of hippocampal slices using optical imaging. Hippocampus 15 104109 .
22 - M. Tsukada, Y. Yamazaki, H. Kojima, 2007 Interaction between the Spatio-Temporal Learning Rule (STLR) and Hebb type (HEBB) in single pyramidal cells in the hippocampal CA1 Area. Cognitive Neurodynamics 1 157167 .
23 - L. I. Zhang, H. W. Tao, C. E. Holt, W. A. Harris, M. Poo, 1998 A critical window for cooperation and competition among developing retino-tectal synapses. Nature 395 3744 .