Abstract
Multiarm bandit problems have been used to model the selection of competing scientific theories by boundedly rational agents. In this article, I define a variable-arm bandit problem, which allows the set of scientific theories to vary over time. I show that Roth-Erev reinforcement learning, which solves multiarm bandit problems in the limit, cannot solve this problem in a reasonable time. However, social learning via preferential attachment combined with individual reinforcement learning, which discounts the past, does.
Publisher
Cambridge University Press (CUP)
Subject
History and Philosophy of Science,Philosophy,History
Reference16 articles.
1. Alexander, J. McKenzie . Forthcoming. “Learning to Signal in a Dynamic World.” British Journal for the Philosophy of Science.
2. Inventing New Signals;Alexander;Dynamic Games and Applications,2012
3. The Epistemic Benefit of Transient Diversity
4. Signals
5. On the convergence of reinforcement learning
Cited by
13 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Where Does the Future of an Intellectual Structure of the EAQ Corpus Lie? A Response to Hallinger et al.'s Empirical Reflection;Educational Administration Quarterly;2023-10-03
2. Social network come strumento educativo;Persone, Energie, Futuro
Infinityhub: la guida interstellare per una nuova dimensione dell’energia;2023-07-05
3. Structure-sensitive testimonial norms;European Journal for Philosophy of Science;2021-07-29
4. Agent-Based Models of Dual-Use Research Restrictions;The British Journal for the Philosophy of Science;2021-06-01
5. Signaling in an Unknown World;Erkenntnis;2021-04-11