A Dynamic Regret Analysis and Adaptive Regularization Algorithm for On-Policy Robot Imitation Learning-Reference-Cited by-同舟云学术

A Dynamic Regret Analysis and Adaptive Regularization Algorithm for On-Policy Robot Imitation Learning

Published:2019-01-22 Issue: Volume: Page:
ISSN:2516-2314
Container-title:EasyChair Preprints
language:
Short-container-title:

Author:

Lee Jonathan,Laskey Michael,Tanwani Ajay Kumar,Aswani Anil,Goldberg Ken

Abstract

On-policy imitation learning algorithms such as Dagger evolve a robot control policy by executing it, measuring performance (loss), obtaining corrective feedback from a supervisor, and generating the next policy. As the loss between iterations can vary unpredictably, a fundamental question is under what conditions this process will eventually achieve a converged policy. If one assumes the underlying trajectory distribution is static (stationary), it is possible to prove convergence for Dagger. Cheng and Boots (2018) consider the more realistic model for robotics where the underlying trajectory distribution, which is a function of the policy, is dynamic and show that it is possible to prove convergence when a condition on the rate of change of the trajectory distributions is satisfied. In this paper, we reframe that result using dynamic regret theory from the field of Online Optimization to prove convergence to locally optimal policies for Dagger, Imitation Gradient, and Multiple Imitation Gradient. These results inspire a new algorithm, Adaptive On-Policy Regularization (AOR), that ensures the conditions for convergence. We present simulation results with cart-pole balancing and walker locomotion benchmarks that suggest AOR can significantly decrease dynamic regret and chattering. To our knowledge, this the first application of dynamic regret theory to imitation learning.

Publisher

EasyChair

Cited by 43 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Maestro-U: Leveraging Joint Speech-Text Representation Learning for Zero Supervised Speech ASR;2022 IEEE Spoken Language Technology Workshop (SLT);2023-01-09

2. UIT-HWDB: Using Transferring Method to Construct A Novel Benchmark for Evaluating Unconstrained Handwriting Image Recognition in Vietnamese;2022 RIVF International Conference on Computing and Communication Technologies (RIVF);2022-12-20

3. Low-Rank Decomposition for Rate-Adaptive Deep Joint Source-Channel Coding;2022 IEEE 8th International Conference on Computer and Communications (ICCC);2022-12-09

4. Gradient Guided Sampling Method for Imbalanced Learning;2022 4th International Conference on Control and Robotics (ICCR);2022-12-02

5. Improving Graph Neural Network with Learnable Permutation Pooling;2022 IEEE International Conference on Data Mining Workshops (ICDMW);2022-11