Abstract
AbstractAnimals can adapt their preferences for different types for reward according to physiological state, such as hunger or thirst. To describe this ability, we propose a simple extension of temporal difference model that learns multiple values of each state according to different reward dimensions such as food or water. By weighting these learned values according to the current needs, behaviour may be flexibly adapted to present demands. Our model predicts that different dopamine neurons should be selective for different reward dimensions. We reanalysed data from primate dopamine neurons and observed that in addition to subjective value, dopamine neurons encode a gradient of reward dimensions; some neurons respond most to food rewards while the others respond more to fluids. Moreover, our model reproduces instant generalization to new physiological state seen in dopamine responses and in behaviour. Our results demonstrate how simple neural circuit can flexibly optimize behaviour according to animals’ needs.
Publisher
Cold Spring Harbor Laboratory
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献