Reinforcement learning of motor skills using Policy Search and human corrective advice

Author:

Celemin Carlos12,Maeda Guilherme34,Ruiz-del-Solar Javier1,Peters Jan5,Kober Jens2

Affiliation:

1. Department of Electrical Engineering & Advanced Mining Technology Center, University of Chile, Chile

2. Cognitive Robotics Department, Delft University of Technology, Netherlands

3. Preferred Networks, Inc., Japan

4. Department of Brain Robot Interface, ATR Computational Neuroscience Lab, Japan

5. Intelligent Autonomous Systems lab, Technische Universität Darmstadt, Germany

Abstract

Robot learning problems are limited by physical constraints, which make learning successful policies for complex motor skills on real systems unfeasible. Some reinforcement learning methods, like Policy Search, offer stable convergence toward locally optimal solutions, whereas interactive machine learning or learning-from-demonstration methods allow fast transfer of human knowledge to the agents. However, most methods require expert demonstrations. In this work, we propose the use of human corrective advice in the actions domain for learning motor trajectories. Additionally, we combine this human feedback with reward functions in a Policy Search learning scheme. The use of both sources of information speeds up the learning process, since the intuitive knowledge of the human teacher can be easily transferred to the agent, while the Policy Search method with the cost/reward function take over for supervising the process and reducing the influence of occasional wrong human corrections. This interactive approach has been validated for learning movement primitives with simulated arms with several degrees of freedom in reaching via-point movements, and also using real robots in such tasks as “writing characters” and the ball-in-a-cup game. Compared with standard reinforcement learning without human advice, the results show that the proposed method not only converges to higher rewards when learning movement primitives, but also that the learning is sped up by a factor of 4–40 times, depending on the task.

Funder

FP7 Information and Communication Technologies

Fondo Nacional de Desarrollo Científico y Tecnológico

Comisión Nacional de Investigación Científica y Tecnológica

Publisher

SAGE Publications

Subject

Applied Mathematics,Artificial Intelligence,Electrical and Electronic Engineering,Mechanical Engineering,Modeling and Simulation,Software

Cited by 13 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. A KMP-based interactive learning approach for robot trajectory adaptation with obstacle avoidance;Industrial Robot: the international journal of robotics research and application;2024-01-18

2. Potentials of ChatGPT for Annotating Vaccine Related Tweets;2023 Tenth International Conference on Social Networks Analysis, Management and Security (SNAMS);2023-11-21

3. Robotic Skill Mutation in Robot-to-Robot Propagation During a Physically Collaborative Sawing Task;IEEE Robotics and Automation Letters;2023-10

4. Supervised learning and reinforcement learning of feedback models for reactive behaviors: Tactile feedback testbed;The International Journal of Robotics Research;2022-11

5. Deep Reinforcement Learning with Corrective Feedback for Autonomous UAV Landing on a Mobile Platform;Drones;2022-09-04

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3