Combining imitation and deep reinforcement learning to human-level performance on a virtual foraging task-Reference-Cited by-同舟云学术

Combining imitation and deep reinforcement learning to human-level performance on a virtual foraging task

Published:2023-09-15 Issue: Volume: Page:
ISSN:1059-7123
Container-title:Adaptive Behavior
language:en
Short-container-title:Adaptive Behavior

Author:

Giammarino Vittorio¹^ORCID,Dunne Matthew F²³⁴,Moore Kylie N²³⁴,Hasselmo Michael E⁴,Stern Chantal E²³,Paschalidis Ioannis Ch¹⁵⁶

Affiliation:

1. Division of Systems Engineering, Boston University, Boston, MA, USA

2. Cognitive Neuroimaging Center, Boston University, Boston, MA, USA

3. Graduate Program for Neuroscience, Boston University, Boston, MA, USA

4. Center for Systems Neuroscience, Boston University, Boston, MA, USA

5. Department of Electrical and Computer Engineering, Boston University, Boston, MA, USA

6. Department of Biomedical Engineering, Boston University, Boston, MA, USA

Abstract

We develop a framework to learn bio-inspired foraging policies using human data. We conduct an experiment where humans are virtually immersed in an open field foraging environment and are trained to collect the highest amount of rewards. A Markov Decision Process (MDP) framework is introduced to model the human decision dynamics. Then, Imitation Learning (IL) based on maximum likelihood estimation is used to train Neural Networks (NN) that map human decisions to observed states. The results show that passive imitation substantially underperforms humans. We further refine the human-inspired policies via Reinforcement Learning (RL) using the on-policy Proximal Policy Optimization (PPO) algorithm which shows better stability than other algorithms and can steadily improve the policies pre-trained with IL. We show that the combination of IL and RL match human performance and that the artificial agents trained with our approach can quickly adapt to reward distribution shift. We finally show that good performance and robustness to reward distribution shift strongly depend on combining allocentric information with an egocentric representation of the environment.

Funder

NSF

Office of Naval Research Global

ONR

NIH

Publisher

SAGE Publications

Subject

Behavioral Neuroscience,Experimental and Cognitive Psychology

Link

http://journals.sagepub.com/doi/pdf/10.1177/10597123231201655

Reference51 articles.

1. Autonomous Helicopter Aerobatics through Apprenticeship Learning

2. Apprenticeship learning via inverse reinforcement learning

3. Egocentric boundary vector tuning of the retrosplenial cortex

4. Reinforcement Learning, Fast and Slow

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Hierarchical control over foraging behavior by anterior cingulate cortex;Neuroscience & Biobehavioral Reviews;2024-05