What’s Next if Reward is Enough? Insights for AGI from Animal Reinforcement Learning-Reference-Cited by-同舟云学术

What’s Next if Reward is Enough? Insights for AGI from Animal Reinforcement Learning

Published:2023-03-01 Issue:1 Volume:14 Page:15-40
ISSN:1946-0163
Container-title:Journal of Artificial General Intelligence
language:en
Short-container-title:

Author:

Rajagopal Shreya¹

Affiliation:

1. 1 University of Michigan , 500 S State St, Ann Arbor, MI 48109 , USA

Abstract

Abstract There has been considerable recent interest in the “The Reward is Enough” hypothesis, which is the idea that agents can develop general intelligence even with simple reward functions, provided the environment they operate in is sufficiently complex. While this is an interesting framework to approach the AGI problem, it also brings forth new questions - what kind of RL algorithm should the agent use? What should the reward function look like? How can it quickly generalize its learning to new tasks? This paper looks to animal reinforcement learning - both individual and social - to address these questions and more. It evaluates existing computational models and neural substrates of Pavlovian conditioning, reward-based action selection, intrinsic motivation, attention-based task representations, social learning and meta-learning in animals and discusses how insights from these findings can influence the development of animal-level AGI within an RL framework.

Publisher

Walter de Gruyter GmbH

Subject

Process Chemistry and Technology,Economic Geology,Fuel Technology

Link

https://www.sciendo.com/pdf/10.2478/jagi-2023-0002

Reference88 articles.

1. Adam, S.; Busoniu, L.; and Babuska, R. 2012. Experience Replay for Real-Time Reinforcement Learning Control. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42(2):201–212. Conference Name: IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

2. Adler, A.; Katabi, S.; Finkes, I.; Israel, Z.; Prut, Y.; and Bergman, H. 2012. Temporal Convergence of Dynamic Cell Assemblies in the Striato-Pallidal Network. Journal of Neuroscience 32(7):2473–2484. Publisher: Society for Neuroscience Section: Articles.

3. Alayrac, J.-B.; Donahue, J.; Luc, P.; Miech, A.; Barr, I.; Hasson, Y.; Lenc, K.; Mensch, A.; Millican, K.; Reynolds, M.; Ring, R.; Rutherford, E.; Cabi, S.; Han, T.; Gong, Z.; Samangooei, S.; Monteiro, M.; Menick, J.; Borgeaud, S.; Brock, A.; Nematzadeh, A.; Sharifzadeh, S.; Binkowski, M.; Barreira, R.; Vinyals, O.; Zisserman, A.; and Simonyan, K. 2022. Flamingo: a Visual Language Model for Few-Shot Learning. arXiv:2204.14198 [cs].

4. Alonso, E., and Schmajuk, N. 2012. Special issue on computational models of classical conditioning guest editors’ introduction. Learning & Behavior 40(3):231–240.

5. Balleine, B. W., and O’Doherty, J. P. 2010. Human and Rodent Homologies in Action Control: Corticostriatal Determinants of Goal-Directed and Habitual Action. Neuropsychopharmacology 35(1):48–69. Number: 1 Publisher: Nature Publishing Group.