Affiliation:
1. Department of Industrial Engineering and Operations Research, Columbia University, New York, New York 10027
Abstract
A fundamental yet notoriously difficult problem in operations management is the periodic inventory control problem under positive lead time and lost sales. More recently, there has been interest in the problem setting where the demand distribution is not known a priori and must be learned from the observations made during the decision-making process. In “Learning in Structured MDPs with Convex Cost Functions: Improved Regret Bounds for Inventory Management,” Agrawal and Jia present a reinforcement learning algorithm that uses the observed outcomes of past decisions to implicitly learn the underlying dynamics and adaptively improve the decision-making strategy over time. They show that, compared with the best base-stock policy, their algorithm achieves an optimal regret bound in terms of the time horizon and scales linearly with the lead time of the inventory ordering process. Furthermore, they demonstrate that their approach is not restricted to the inventory problem and can be applied in an almost black box manner to more general reinforcement learning problems with convex cost functions.
Publisher
Institute for Operations Research and the Management Sciences (INFORMS)
Subject
Management Science and Operations Research,Computer Science Applications
Cited by
10 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献