Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulations-Reference-Cited by-同舟云学术

Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulations

Published:2020-09-02 Issue:9 Volume:8 Page:1479
ISSN:2227-7390
Container-title:Mathematics
language:en
Short-container-title:Mathematics

Author:

Martinez-Gil Francisco^ORCID,Lozano Miguel^ORCID,García-Fernández Ignacio,Romero Pau,Serra Dolors,Sebastián Rafael

Abstract

Reinforcement learning is one of the most promising machine learning techniques to get intelligent behaviors for embodied agents in simulations. The output of the classic Temporal Difference family of Reinforcement Learning algorithms adopts the form of a value function expressed as a numeric table or a function approximator. The learned behavior is then derived using a greedy policy with respect to this value function. Nevertheless, sometimes the learned policy does not meet expectations, and the task of authoring is difficult and unsafe because the modification of one value or parameter in the learned value function has unpredictable consequences in the space of the policies it represents. This invalidates direct manipulation of the learned value function as a method to modify the derived behaviors. In this paper, we propose the use of Inverse Reinforcement Learning to incorporate real behavior traces in the learning process to shape the learned behaviors, thus increasing their trustworthiness (in terms of conformance to reality). To do so, we adapt the Inverse Reinforcement Learning framework to the navigation problem domain. Specifically, we use Soft Q-learning, an algorithm based on the maximum causal entropy principle, with MARL-Ped (a Reinforcement Learning-based pedestrian simulator) to include information from trajectories of real pedestrians in the process of learning how to navigate inside a virtual 3D space that represents the real environment. A comparison with the behaviors learned using a Reinforcement Learning classic algorithm (Sarsa(λ)) shows that the Inverse Reinforcement Learning behaviors adjust significantly better to the real trajectories.

Publisher

MDPI AG

Subject

General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)

Link

https://www.mdpi.com/2227-7390/8/9/1479/pdf

Reference34 articles.

1. Reinforcement learning in robotics: A survey

2. Optimization of global production scheduling with deep reinforcement learning

3. A Deep Learning Algorithm for the Max-Cut Problem Based on Pointer Network Structure with Supervised Learning and Reinforcement Learning Strategies

4. A Comparison of Evolutionary and Tree-Based Approaches for Game Feature Validation in Real-Time Strategy Games with a Novel Metric

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Extended floor field model for dynamic route changes;2023 14th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI);2023-07-08

2. Modeling Crossing Behaviors of E-Bikes at Intersection With Deep Maximum Entropy Inverse Reinforcement Learning Using Drone-Based Video Data;IEEE Transactions on Intelligent Transportation Systems;2023-06

3. Review of Pedestrian Trajectory Prediction Methods: Comparing Deep Learning and Knowledge-Based Approaches;IEEE Transactions on Intelligent Transportation Systems;2022-12

4. Deep Reinforcement Learning for Multi-agent Simulation using a partial floor field cutout;2022 12th International Congress on Advanced Applied Informatics (IIAI-AAI);2022-07

5. Multiagent modeling of pedestrian-vehicle conflicts using Adversarial Inverse Reinforcement Learning;Transportmetrica A: Transport Science;2022-04-28