Author:
Kuroda Seiya, ,Miyazaki Kazuteru,Kobayashi Hiroaki, ,
Abstract
During a long-term reinforcement learning task, the efficiency of learning is heavily degraded because the probabilistic actions of an agent often cause the task to fail, which makes it difficult to reach the goal and receive a reward. To address this problem, a fixed mode state is proposed in this paper. If the agent acquires an adequate reward, a normal state is switched to a fixed mode state. In this mode, the agent selects an action using a greedy strategy, i.e., it selects the highest weight action deterministically. First, this paper combines Online Profit Sharing reinforcement learning with the Penalty Avoiding Rational Policy Making algorithm, then introduces fixed mode states in it. The target task is then formulated, i.e., learning the modified waist trajectory of dynamically stable walking task based on the static stable walking of a biped robot. Finally, we present our simulation results and discuss the effectiveness of the proposed method.
Publisher
Fuji Technology Press Ltd.
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Human-Computer Interaction
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献