Affiliation:
1. Department of Psychology & Rutgers Center for Cognitive Sciences, Rutgers The State University of New Jersey, Piscataway, NJ 08854-8020
2. Department of Psychology, Utah State University, Logan, UT 84322-2810
Abstract
Reinforcement learning inspires much theorizing in neuroscience, cognitive science, machine learning, and AI. A central question concerns the conditions that produce the perception of a contingency between an action and reinforcement—the assignment-of-credit problem. Contemporary models of associative and reinforcement learning do not leverage the temporal metrics (measured intervals). Our information-theoretic approach formalizes contingency by time-scale invariant temporal mutual information. It predicts that learning may proceed rapidly even with extremely long action–reinforcer delays. We show that rats can learn an action after a single reinforcement, even with a 16-min delay between the action and reinforcement (15-fold longer than any delay previously shown to support such learning). By leveraging metric temporal information, our solution obviates the need for windows of associability, exponentially decaying eligibility traces, microstimuli, or distributions over Bayesian belief states. Its three equations have no free parameters; they predict one-shot learning without iterative simulation.
Funder
HHS | National Institutes of Health
Publisher
Proceedings of the National Academy of Sciences
Reference49 articles.
1. 'Superstition' in the pigeon.
2. Trial and intertrial durations in Pavlovian conditioning: Issues of learning and performance.
3. Informational Variables in Pavlovian Conditioning
4. Neuronal Reward and Decision Signals: From Theories to Data
5. Y. Niv, N. D. Daw, P. Dayan, “How fast to work: Response vigor, motivation and tonic dopamine” in NIPS 18, Y. Weiss, B. Schölkopf, J. R. Platt, Eds. (MIT Press, Cambridge, MA, 2005), pp. 1019–1026.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献