Imitation learning by state-only distribution matching-Reference-Cited by-同舟云学术

Imitation learning by state-only distribution matching

Published:2023-11-29 Issue:24 Volume:53 Page:30865-30886
ISSN:0924-669X
Container-title:Applied Intelligence
language:en
Short-container-title:Appl Intell

Author:

Boborzi Damian^ORCID,Straehle Christoph-Nikolas,Buchner Jens S.,Mikelsons Lars

Abstract

AbstractImitation Learning from observation describes policy learning in a similar way to human learning. An agent’s policy is trained by observing an expert performing a task. Although many state-only imitation learning approaches are based on adversarial imitation learning, one main drawback is that adversarial training is often unstable and lacks a reliable convergence estimator. If the true environment reward is unknown and cannot be used to select the best-performing model, this can result in bad real-world policy performance. We propose a non-adversarial learning-from-observations approach, together with an interpretable convergence and performance metric. Our training objective minimizes the Kulback-Leibler divergence (KLD) between the policy and expert state transition trajectories which can be optimized in a non-adversarial fashion. Such methods demonstrate improved robustness when learned density models guide the optimization. We further improve the sample efficiency by rewriting the KLD minimization as the Soft Actor Critic objective based on a modified reward using additional density models that estimate the environment’s forward and backward dynamics. Finally, we evaluate the effectiveness of our approach on well-known continuous control environments and show state-of-the-art performance while having a reliable performance estimator compared to several recent learning-from-observation methods.

Funder

Universität Augsburg

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence

Link

https://link.springer.com/content/pdf/10.1007/s10489-023-05062-w.pdf

Reference36 articles.

1. Kuefler A, Morton J, Wheeler T, Kochenderfer M (2017) Imitating driver behavior with generative adversarial networks. In: 2017 IEEE intelligent vehicles symposium (IV)

2. Ho J, Ermon S (2016) Generative adversarial imitation learning. In: Advances in neural information processing systems

3. Osa T, Pajarinen J, Neumann G, Bagnell JA, Abbeel P, Peters J (2018) An algorithmic perspective on imitation learning. In: Foundations and trends in robotics

4. Torabi F, Warnell G, Stone P (2019)Recent advances in imitation learning from observation. In: Proceedings of the 28th international joint conference on artificial intelligence

5. Torabi F, Warnell G, Stone P (2018) Generative adversarial imitation from observation. In: International conference on machine learning workshop on imitation, intent, and interaction (I3)