Finding optimal memoryless policies of POMDPs under the expected average reward criterion-Reference-Cited by-同舟云学术

Finding optimal memoryless policies of POMDPs under the expected average reward criterion

Published:2011-06 Issue:3 Volume:211 Page:556-567
ISSN:0377-2217
Container-title:European Journal of Operational Research
language:en
Short-container-title:European Journal of Operational Research

Author:

Li Yanjie,Yin Baoqun,Xi Hongsheng

Publisher

Elsevier BV

Subject

Information Systems and Management,Management Science and Operations Research,Modeling and Simulation,General Computer Science,Industrial and Manufacturing Engineering

Reference31 articles.

1. Neuronlike adaptive elements that can solve difficult learning control problems;Barto;IEEE Transactions on Systems, Man and Cybernetics,1983

2. Infinite-horizon policy gradient estimation;Baxter;Journal of Artificial Intelligence Research,2001

3. Experiments with infinite-horizon policy-gradient estimation;Baxter;Journal of Artificial Intelligence Research,2001

4. Bernstein, D.S., Hansen, E.A., Zilberstein, S., 2005. Bounded policy iteration for decentralized pomdps. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence, pp. 1287–1292.

5. Neuro-Dynamic Programming;Bertsekas,1996

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Future memories are not needed for large classes of POMDPs;Operations Research Letters;2023-05

2. Integer Programming on the Junction Tree Polytope for Influence Diagrams;INFORMS Journal on Optimization;2020-07

3. Mean-payoff Optimization in Continuous-time Markov Chains with Parametric Alarms;ACM Transactions on Modeling and Computer Simulation;2019-10-31

4. Online Reinforcement Learning of X-Haul Content Delivery Mode in Fog Radio Access Networks;IEEE Signal Processing Letters;2019-10

5. Observation-Based Optimization for POMDPs With Continuous State, Observation, and Action Spaces;IEEE Transactions on Automatic Control;2019-05