Spatio-temporal credit assignment in population learning

Friedrich, Johannes; Urbanczik, Robert; Senn, Walter (2010). Spatio-temporal credit assignment in population learning. In: 7th Forum of European Neuroscience. Amsterdam, Netherlands. July 3-7, 2010.

Official URL: http://fens2010.neurosciences.asso.fr/abstracts/R4...

Learning by reinforcement is important in shaping animal behavior. But behavioral decision making is likely to involve the integration of many synaptic events in space and time. So in using a single reinforcement signal to modulate synaptic plasticity a twofold problem arises. Different synapses will have contributed differently to the behavioral decision and, even for one and the same synapse, releases at different times may have had different effects.
Here we present a plasticity rule which solves this spatio-temporal credit assignment problem in a population of spiking neurons. The learning rule is spike time dependent and maximizes the expected reward by following its stochastic gradient. Synaptic plasticity is modulated not only by the reward but by a population feedback signal as well. While this additional signal solves the spatial component of the problem, the temporal one is solved by means of synaptic eligibility traces. In contrast to temporal difference based approaches to reinforcement learning, our rule is explicit with regard to the assumed biophysical mechanisms. Neurotransmitter concentrations determine plasticity and learning occurs fully online.
Further, it works even if the task to be learned is non-Markovian, i.e. when reinforcement is not determined by the current state of the system but may also depend on past events.
The performance of the model is assessed by studying three non-Markovian tasks. In the first task the reward is delayed beyond the last action with non-related stimuli and actions appearing in between. The second one involves an action sequence which is itself extended in time and reward is only delivered at the last action, as is the case in any type of board-game. The third is the inspection game that has been studied in neuroeconomics. It only has a mixed Nash equilibrium and exemplifies that the model also copes with stochastic reward delivery and the learning of mixed strategies.

Item Type:	Conference or Workshop Item (Poster)
Division/Institute:	04 Faculty of Medicine > Pre-clinic Human Medicine > Institute of Physiology
UniBE Contributor:	Friedrich, Johannes, Urbanczik, Robert, Senn, Walter
Subjects:	600 Technology > 610 Medicine & health
Language:	English
Submitter:	Factscience Import
Date Deposited:	04 Oct 2013 14:11
Last Modified:	05 Dec 2022 14:01
URI:	https://boris.unibe.ch/id/eprint/1914 (FactScience: 203995)

Actions (login required)

Edit item

Spatio-temporal credit assignment in population learning

Interest & Impact

Downloads

Citations

Search

Services

Actions (login required)

Item Type:

Division/Institute:

UniBE Contributor:

Subjects:

Language:

Submitter:

Date Deposited:

Last Modified:

URI:

Actions (login required)