Estimation of Different Reward Functions Latent in Trajectory Data-Reference-Cited by-同舟云学术

Estimation of Different Reward Functions Latent in Trajectory Data

Published:2024-03-20 Issue:2 Volume:28 Page:403-412
ISSN:1883-8014
Container-title:Journal of Advanced Computational Intelligence and Intelligent Informatics
language:en
Short-container-title:JACIII

Author:

Saito Masaharu¹,Arai Sachiyo¹^ORCID

Affiliation:

1. Department of Urban Environment Systems, Graduate School of Science and Engineering, Chiba University, 1-33 Yayoi-cho, Inage-ku, Chiba 263-8522, Japan

Abstract

In recent years, inverse reinforcement learning has attracted attention as a method for estimating the intention of actions using the trajectories of various action-taking agents, including human flow data. In the context of reinforcement learning, “intention” refers to a reward function. Conventional inverse reinforcement learning assumes that all trajectories are generated from policies learned under a single reward function. However, it is natural to assume that people in a human flow act according to multiple policies. In this study, we introduce an expectation-maximization algorithm to inverse reinforcement learning, and propose a method to estimate different reward functions from the trajectories of human flow. The effectiveness of the proposed method was evaluated through a computer experiment based on human flow data collected from subjects around airport gates.

Publisher

Fuji Technology Press Ltd.

Reference20 articles.

1. K. M. Kitani, B. D. Ziebart, J. A. Bagnell, and M. Hebert, “Activity forecasting,” European Conf. on Computer Vision, pp. 201-214, 2012. https://doi.org/10.1007/978-3-642-33765-9_15

2. S. Yamaguchi, H. Naoki, M. Ikeda, Y. Tsukada, S. Nakano, I. Mori, and S. Ishii, “Identification of animal behavioral strategies by inverse reinforcement learning,” PLoS Computational Biology, Vol.14, No.5, Article No.e1006122, 2018. https://doi.org/10.1371/journal.pcbi.1006122

3. T. Hirakawa, T. Yamashita, T. Tamaki, H. Fujiyoshi, Y. Umezu, I. Takeuchi, S. Matsumoto, and K. Yoda, “Can AI predict animal movements? Filling gaps in animal trajectories using inverse reinforcement learning,” Ecosphere, Vol.9, No.10, Article No.e02447, 2018. https://doi.org/10.1002/ecs2.2447

4. M. Babes, V. Marivate, K. Subramanian, and M. L. Littman, “Apprenticeship learning about multiple intentions,” Proc. of the 28th Int. Conf. on Machine Learning (ICML-11), pp. 897-904, 2011.

5. S. Russell, “Learning agents for uncertain environments,” Proc. of the 11th Annual Conf. on Computational Learning Theory, pp. 101-103, 1998. https://doi.org/10.1145/279943.279964