Author:
Senn Walter,Pfister Jean-Pascal
Reference25 articles.
1. Baxter J, Bartlett P (2001) Infinite-horizon policy-gradient estimation. J Artif Intell Res 15:319–350
2. Daw ND, O’Doherty JP, Dayan P, Seymour B, Dolan RJ (2006) Cortical substrates for exploratory decisions in humans. Nature 441:876–879
3. Dayan P, Niv Y (2008) Reinforcement learning: the good, the bad and the ugly. Curr Opin Neurobiol 18:185–196
4. Di Castro D, Volkinshtein S, Meir R (2009) Temporal difference based actor critic learning: convergence and neural implementation. In: Advances in neural information processing systems, vol 21. MIT Press, Cambridge, MA, pp 385–392
5. Fiete IR, Seung HS (2006) Gradient learning in spiking neural networks by dynamic perturbation of conductances. Phys Rev Lett 97:048104