Learning in Structured MDPs with Convex Cost Functions: Improved Regret Bounds for Inventory Management-Reference-Cited by-同舟云学术

Learning in Structured MDPs with Convex Cost Functions: Improved Regret Bounds for Inventory Management

Published:2022-05 Issue:3 Volume:70 Page:1646-1664
ISSN:0030-364X
Container-title:Operations Research
language:en
Short-container-title:Operations Research

Author:

Agrawal Shipra¹^ORCID,Jia Randy¹^ORCID

Affiliation:

1. Department of Industrial Engineering and Operations Research, Columbia University, New York, New York 10027

Abstract

A fundamental yet notoriously difficult problem in operations management is the periodic inventory control problem under positive lead time and lost sales. More recently, there has been interest in the problem setting where the demand distribution is not known a priori and must be learned from the observations made during the decision-making process. In “Learning in Structured MDPs with Convex Cost Functions: Improved Regret Bounds for Inventory Management,” Agrawal and Jia present a reinforcement learning algorithm that uses the observed outcomes of past decisions to implicitly learn the underlying dynamics and adaptively improve the decision-making strategy over time. They show that, compared with the best base-stock policy, their algorithm achieves an optimal regret bound in terms of the time horizon and scales linearly with the lead time of the inventory ordering process. Furthermore, they demonstrate that their approach is not restricted to the inventory problem and can be applied in an almost black box manner to more general reinforcement learning problems with convex cost functions.

Publisher

Institute for Operations Research and the Management Sciences (INFORMS)

Subject

Management Science and Operations Research,Computer Science Applications

Link

https://pubsonline.informs.org/doi/pdf/10.1287/opre.2022.2263

Reference17 articles.

1. Partial Monitoring—Classification, Regret Bounds, and Algorithms

2. On Implications of Demand Censoring in the Newsvendor Problem

3. Non-Stationary Stochastic Optimization

4. Lost-sales inventory theory: A review

5. A Nonparametric Asymptotic Analysis of Inventory Planning with Censored Demand

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. UCB-Type Learning Algorithms with Kaplan–Meier Estimator for Lost-Sales Inventory Models with Lead Times;Operations Research;2024-07

2. Projected Inventory-Level Policies for Lost Sales Inventory Systems: Asymptotic Optimality in Two Regimes;Operations Research;2024-04-01

3. Learning to Order for Inventory Systems with Lost Sales and Uncertain Supplies;Management Science;2024-03-04

4. Learning-Based Optimal Admission Control in a Single-Server Queuing System;Stochastic Systems;2024-03

5. Multiproduct Inventory Systems with Upgrading: Replenishment, Allocation, and Online Learning;SSRN Electronic Journal;2024