Abstract
AbstractWhen faced with a novel situation, humans often spend substantial periods of time contemplating possible futures. For such planning to be rational, the benefits to behavior must compensate for the time spent thinking. Here we capture these features of human behavior by developing a neural network model where planning itself is controlled by prefrontal cortex. This model consists of a meta-reinforcement learning agent augmented with the ability to plan by sampling imagined action sequences from its own policy, which we call ‘rollouts’. The agent learns to plan when planning is beneficial, explaining empirical variability in human thinking times. Additionally, the patterns of policy rollouts employed by the artificial agent closely resemble patterns of rodent hippocampal replays recently recorded during spatial navigation. Our work provides a new theory of how the brain could implement planning through prefrontal-hippocampal interactions, where hippocampal replays are triggered by – and adaptively affect – prefrontal dynamics.
Publisher
Cold Spring Harbor Laboratory
Reference69 articles.
1. The temporal dynamics of opportunity costs: A normative account of cognitive fatigue and boredom.
2. Alver, S. and Precup, D. (2021). What is going on inside recurrent meta reinforcement learning agents? arXiv preprint arXiv:2104.14644.
3. Optimism and pessimism in optimised replay;PLOS Computational Biology,2022
4. Banino, A. , Balaguer, J. , and Blundell, C. (2021). Pondernet: Learning to ponder. arXiv preprint arXiv:2107.05407.
5. Vector-based navigation using grid-like representations in artificial agents
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献