Estimating Objective Weights of Pareto-Optimal Policies for Multi-Objective Sequential Decision-Making-Reference-Cited by-同舟云学术

Estimating Objective Weights of Pareto-Optimal Policies for Multi-Objective Sequential Decision-Making

Published:2024-03-20 Issue:2 Volume:28 Page:393-402
ISSN:1883-8014
Container-title:Journal of Advanced Computational Intelligence and Intelligent Informatics
language:en
Short-container-title:JACIII

Author:

Ikenaga Akiko¹,Arai Sachiyo¹^ORCID

Affiliation:

1. Graduate School of Science and Engineering, Chiba University, 1-33 Yayoi-cho, Inage-ku, Chiba 263-8522, Japan

Abstract

Sequential decision-making under multiple objective functions includes the problem of exhaustively searching for a Pareto-optimal policy and the problem of selecting a policy from the resulting set of Pareto-optimal policies based on the decision maker’s preferences. This paper focuses on the latter problem. In order to select a policy that reflects the decision maker’s preferences, it is necessary to order these policies, which is problematic because the decision-maker’s preferences are generally tacit knowledge. Furthermore, it is difficult to order them quantitatively. For this reason, conventional methods have mainly been used to elicit preferences through dialogue with decision-makers and through one-to-one comparisons. In contrast, this paper proposes a method based on inverse reinforcement learning to estimate the weight of each objective from the decision-making sequence. The estimated weights can be used to quantitatively evaluate the Pareto-optimal policies from the viewpoints of the decision-makers preferences. We applied the proposed method to the multi-objective reinforcement learning benchmark problem and verified its effectiveness as an elicitation method of weights for each objective function.

Publisher

Fuji Technology Press Ltd.

Reference14 articles.

1. D. M. Roijers, P. Vamplew, S. Whiteson, and R. Dazeley, “A Survey of Multi-Objective Sequential Decision-Making,” J. of Artificial Intelligence Research, Vol.48, Issue 1, pp. 67-113, 2013.

2. L. Barrett and S. Narayanan, “Learning All Optimal Policies with Multiple Criteria,” Proc. of the 25th Int. Conf. on Machine Learning, pp. 41-47, 2008. https://doi.org/10.1145/1390156.1390162

3. C. Liu, X. Xu, and D. Hu, “Multiobjective Reinforcement Learning: A Comprehensive Overview,” IEEE Trans. on Systems, Man, and Cybernetics: Systems, Vol.45, Issue 3, pp. 385-398, 2014. https://doi.org/10.1109/TSMC.2014.2358639

4. K. Van Moffaert and A. Nowé, “Multi-Objective Reinforcement Learning Using Sets of Pareto Dominating Policies,” The J. of Machine Learning Research, Vol.15, Issue 1, pp. 3483-3512, 2014.

5. S. Guo, S. Sanner, and E. V. Bonilla, “Gaussian Process Preference Elicitation,” Advances in Neural Information Processing Systems (NIPS’2010), Vol.23, pp. 262-270, 2010.