Experiments with Infinite-Horizon, Policy-Gradient Estimation-Reference-Cited by-同舟云学术

Experiments with Infinite-Horizon, Policy-Gradient Estimation

Published:2001-11-01 Issue: Volume:15 Page:351-381
ISSN:1076-9757
Container-title:Journal of Artificial Intelligence Research
language:
Short-container-title:jair

Author:

Baxter J.,Bartlett P. L.,Weaver L.

Abstract

In this paper, we present algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP). These algorithms are based on GPOMDP, an algorithm introduced in a companion paper (Baxter & Bartlett, this volume), which computes biased estimates of the performance gradient in POMDPs. The algorithm's chief advantages are that it uses only one free parameter beta, which has a natural interpretation in terms of bias-variance trade-off, it requires no knowledge of the underlying state, and it can be applied to infinite state, control and observation spaces. We show how the gradient estimates produced by GPOMDP can be used to perform gradient ascent, both with a traditional stochastic-gradient algorithm, and with an algorithm based on conjugate-gradients that utilizes gradient information to bracket maxima in line searches. Experimental results are presented illustrating both the theoretical results of (Baxter & Bartlett, this volume) on a toy problem, and practical aspects of the algorithms on a number of more realistic problems.

Publisher

AI Access Foundation

Subject

Artificial Intelligence

Cited by 65 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Analyzing changes in land use practices and livelihoods in the Bhungroo irrigation technology Volta basin piloted sites, West Mamprusi District, Ghana;Heliyon;2023-04

2. DATA-EFFICIENT DEEP REINFORCEMENT LEARNING WITH CONVOLUTION-BASED STATE ENCODER NETWORKS;International Journal of Robotics and Automation;2021

3. A Deep Reinforcement Learning Approach to Traffic Signal Control With Temporal Traffic Pattern Mining;IEEE Transactions on Intelligent Transportation Systems;2021

4. Does Lifelong Learning Affect Mobile Robot Evolution?;Recent Advances in Soft Computing and Cybernetics;2021

5. Decentralized Computation Offloading in IoT Fog Computing System With Energy Harvesting: A Dec-POMDP Approach;IEEE Internet of Things Journal;2020-06