Behavior policy learning: Learning multi-stage tasks via solution sketches and model-based controllers-Reference-Cited by-同舟云学术

Behavior policy learning: Learning multi-stage tasks via solution sketches and model-based controllers

Published:2022-10-12 Issue: Volume:9 Page:
ISSN:2296-9144
Container-title:Frontiers in Robotics and AI
language:
Short-container-title:Front. Robot. AI

Author:

Tsinganos Konstantinos,Chatzilygeroudis Konstantinos,Hadjivelichkov Denis,Komninos Theodoros,Dermatas Evangelos,Kanoulas Dimitrios

Abstract

Multi-stage tasks are a challenge for reinforcement learning methods, and require either specific task knowledge (e.g., task segmentation) or big amount of interaction times to be learned. In this paper, we propose Behavior Policy Learning (BPL) that effectively combines 1) only few solution sketches, that is demonstrations without the actions, but only the states, 2) model-based controllers, and 3) simulations to effectively solve multi-stage tasks without strong knowledge about the underlying task. Our main intuition is that solution sketches alone can provide strong data for learning a high-level trajectory by imitation, and model-based controllers can be used to follow this trajectory (we call it behavior) effectively. Finally, we utilize robotic simulations to further improve the policy and make it robust in a Sim2Real style. We evaluate our method in simulation with a robotic manipulator that has to perform two tasks with variations: 1) grasp a box and place it in a basket, and 2) re-place a book on a different level within a bookcase. We also validate the Sim2Real capabilities of our method by performing real-world experiments and realistic simulated experiments where the objects are tracked through an RGB-D camera for the first task.

Publisher

Frontiers Media SA

Subject

Artificial Intelligence,Computer Science Applications

Reference46 articles.

1. An invitation to imitation;Bagnell,2015

2. Neural dynamic policies for end-to-end sensorimotor learning;Bahl,2020

3. Robot programming by demonstration;Billard,2008

4. Using parameterized black-box priors to scale up model-based policy search for robotics;Chatzilygeroudis,2018

5. Black-box data-efficient policy search for robotics;Chatzilygeroudis,2017

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Evolving Dynamic Locomotion Policies in Minutes;2023 14th International Conference on Information, Intelligence, Systems & Applications (IISA);2023-07-10

2. Effective Skill Learning via Autonomous Goal Representation Learning;2023 14th International Conference on Information, Intelligence, Systems & Applications (IISA);2023-07-10