Reinforcement Learning for POMDP Environments Using State Representation with Reservoir Computing-Reference-Cited by-同舟云学术

Reinforcement Learning for POMDP Environments Using State Representation with Reservoir Computing

Published:2022-07-20 Issue:4 Volume:26 Page:562-569
ISSN:1883-8014
Container-title:Journal of Advanced Computational Intelligence and Intelligent Informatics
language:en
Short-container-title:JACIII

Author:

Yamashita Kodai,Hamagami Tomoki, ,

Abstract

One of the challenges in reinforcement learning is regarding the partially observable Markov decision process (POMDP). In this case, an agent cannot observe the true state of the environment and perceive different states to be the same. Our proposed method uses the agent’s time-series information to deal with this imperfect perception problem. In particular, the proposed method uses reservoir computing to transform the time-series of observation information into a non-linear state. A typical model of reservoir computing, the echo state network (ESN), transforms raw observations into reservoir states. The proposed method is named dual ESNs reinforcement learning, which uses two ESNs specialized for observation and action information. The experimental results show the effectiveness of the proposed method in environments where imperfect perception problems occur.

Publisher

Fuji Technology Press Ltd.

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Human-Computer Interaction

Reference22 articles.

1. S. Gu, E. Holly, T. Lillicrap, and S. Levine, “Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates,” Proc of IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 3389-3396, 2017.

2. D. Isele, R. Rahimi, A. Cosgun, K. Subramanian, and K. Fujimura, “Navigating occluded intersections with autonomous vehicles using deep reinforcement learning,” Proc of IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 2034-2039, 2018.

3. S. Kapturowski, G. Ostrovski, W. Dabney, J. Quan, and R. Munos, “Recurrent experience replay in distributed reinforcement learning,” Proc of Int. Conf. on Learning Representations, 2019.

4. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, Vol.9, No.8, pp. 1735-1780, 1997.

5. H. Jaeger, “The “echo state” approach to analysing and training recurrent neural networks – with an erratum note,” Germany Nat. Res. Center Inf. Technol., Vol.148, No.34, Article No.13, 2001.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Bayesian Network-Based Probabilistic Constraints for Safe Autonomous Driving in Occlusion Environments;2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC);2023-09-24