Variance Regularization in Sequential Bayesian Optimization-Reference-Cited by-同舟云学术

Variance Regularization in Sequential Bayesian Optimization

Published:2020-08 Issue:3 Volume:45 Page:966-992
ISSN:0364-765X
Container-title:Mathematics of Operations Research
language:en
Short-container-title:Mathematics of OR

Author:

Kim Michael Jong¹^ORCID

Affiliation:

1. Sauder School of Business, University of British Columbia, Vancouver, British Columbia V6T 1Z2, Canada

Abstract

Sequential Bayesian optimization constitutes an important and broad class of problems where model parameters are not known a priori but need to be learned over time using Bayesian updating. It is known that the solution to these problems can in principle be obtained by solving the Bayesian dynamic programming (BDP) equation. Although the BDP equation can be solved in certain special cases (for example, when posteriors have low-dimensional representations), solving this equation in general is computationally intractable and remains an open problem. A second unresolved issue with the BDP equation lies in its (rather generic) interpretation. Beyond the standard narrative of balancing immediate versus future costs—an interpretation common to all dynamic programs with or without learning—the BDP equation does not provide much insight into the underlying mechanism by which sequential Bayesian optimization trades off between learning (exploration) and optimization (exploitation), the distinguishing feature of this problem class. The goal of this paper is to develop good approximations (with error bounds) to the BDP equation that help address the issues of computation and interpretation. To this end, we show how the BDP equation can be represented as a tractable single-stage optimization problem that trades off between a myopic term and a “variance regularization” term that measures the total solution variability over the remaining planning horizon. Intuitively, the myopic term can be regarded as a pure exploitation objective that ignores the impact of future learning, whereas the variance regularization term captures a pure exploration objective that only puts value on solutions that resolve statistical uncertainty. We develop quantitative error bounds for this representation and prove that the error tends to zero like o(n-1) almost surely in the number of stages n, which as a corollary, establishes strong consistency of the approximate solution.

Publisher

Institute for Operations Research and the Management Sciences (INFORMS)

Subject

Management Science and Operations Research,Computer Science Applications,General Mathematics

Reference42 articles.

1. Bayesian Dynamic Pricing in Queueing Systems with Unknown Delay Cost Characteristics

2. Machine Learning and Portfolio Optimization

3. Thompson Sampling for Stochastic Control: The Continuous Parameter Case

4. OPTIMAL CONSUMPTION AND PORTFOLIO DECISIONS WITH PARTIALLY OBSERVED REAL PRICES

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Data-Driven Clustering and Feature-Based Retail Electricity Pricing with Smart Meters;Operations Research;2024-09-03

2. Learning Manipulation Through Information Dissemination;Operations Research;2022-11

3. Optimal Control of Partially Observable Semi-Markovian Failing Systems: An Analysis Using a Phase Methodology;Operations Research;2021-07

4. (Global) Optimization: Historical notes and recent developments;EURO Journal on Computational Optimization;2021