Affiliation:
1. Sauder School of Business, University of British Columbia, Vancouver, British Columbia V6T 1Z2, Canada
Abstract
A New Method for Dynamic Learning and Doing For a large class of learning-and-doing problems, two processes are intertwined in the analysis: a forward process that updates the decision maker’s belief or estimate of the unknown parameter, and a backward process that computes the expected future values. The mainstream literature focuses on the former process. In contrast, in “Dynamic Learning and Decision Making via Basis Weight Vectors,” Hao Zhang proposes a new method based on pure backward induction on the continuation values created by feasible continuation policies. When the unknown parameter is a continuous variable, the method represents each continuation-value function by a vector of weights placed on a set of basis functions. The weight vectors that are potentially useful for the optimal solution can be found backward in time exactly (for very small problems) or approximately (for larger problems). A simulation study demonstrates that an approximation algorithm based on this method outperforms some popular algorithms in the linear contextual bandit literature when the learning horizon is short.
Publisher
Institute for Operations Research and the Management Sciences (INFORMS)
Subject
Management Science and Operations Research,Computer Science Applications
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献