A New Improved Penalty Avoiding Rational Policy Making Algorithm for Keepaway with Continuous State Spaces-Reference-Cited by-同舟云学术

A New Improved Penalty Avoiding Rational Policy Making Algorithm for Keepaway with Continuous State Spaces

Published:2009-11-20 Issue:6 Volume:13 Page:675-682
ISSN:1883-8014
Container-title:Journal of Advanced Computational Intelligence and Intelligent Informatics
language:en
Short-container-title:JACIII

Author:

Watanabe Takuji, ,Miyazaki Kazuteru,Kobayashi Hiroaki, ,

Abstract

The penalty avoiding rational policy making algorithm (PARP) [1] previously improved to save memory and cope with uncertainty, i.e., IPARP [2], requires that states be discretized in real environments with continuous state spaces, using function approximation or some other method. Especially, in PARP, a method that discretizes state using a basis functions is known [3]. Because this creates a new basis function based on the current input and its next observation, however, an unsuitable basis function may be generated in some asynchronous multiagent environments. We therefore propose a uniform basis function and range extent of the basis function is estimated before learning. We show the effectiveness of our proposal using a soccer game task called “Keepaway.”

Publisher

Fuji Technology Press Ltd.

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Human-Computer Interaction

Reference6 articles.

1. K. Miyazaki and S. Kobayashi, “Reinforcement Learning for Penalty Avoiding Policy Making,” Proc. of the 2000 IEEE Int. Conf. on Systems, Man and Cybernetics, pp. 206-211, 2000.

2. K. Miyazaki, T. Namatame, T. Kojima, and H. Kobayashi, “Improvement of the Penalty Avoiding Rational Policy Making algorithm to Real World Robotics,” Proc. of the 13th Int. Conf. on Advanced Robotics, pp. 1183-1188, 2007.

3. K. Miyazaki and S. Kobayashi, “A Reinforcement Learning System for Penalty Avoiding in Continuous State Spaces,” J. of Advanced Computational Intelligence and Intelligent Informatics, Vol.11, No.6, pp. 668-676, 2007.

4. P. Stone, R. S. Sutton, and G. Kuhlamann, “Reinforcement Learning toward RoboCup Soccer Keepaway,” Adaptive Behavior, Vol.13, No.3, pp. 165-188, 2005.

5. S. Arai and N. Tanaka, “Experimental Analysis of Reward Design for Continuing Task in Multiagent Domains - RoboCup Soccer Keepaway-,” Trans. of the Japanese Society for Artificial Intelligence, Vol.21. No.6, pp. 537-546, 2006 (in Japanese).

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Proposal and Evaluation of an Indirect Reward Assignment Method for Reinforcement Learning by Profit Sharing Method;Advances in Intelligent Systems and Computing;2018-11-09

2. Proposal of PSwithEFP and its Evaluation in Multi-Agent Reinforcement Learning;Journal of Advanced Computational Intelligence and Intelligent Informatics;2017-09-20

3. Exploitation-Oriented Learning with Deep Learning – Introducing Profit Sharing to a Deep Q-Network –;Journal of Advanced Computational Intelligence and Intelligent Informatics;2017-09-20

4. Proposal of an Action Selection Strategy with Expected Failure Probability and Its Evaluation in Multi-agent Reinforcement Learning;Multi-Agent Systems and Agreement Technologies;2017

5. Proposal of a Propagation Algorithm of the Expected Failure Probability and the Effectiveness on Multi-agent Environments;IEEJ Transactions on Electronics, Information and Systems;2016