A Counterexample to Temporal Differences Learning-Reference-Cited by-同舟云学术

A Counterexample to Temporal Differences Learning

Published:1995-03 Issue:2 Volume:7 Page:270-279
ISSN:0899-7667
Container-title:Neural Computation
language:en
Short-container-title:Neural Computation

Author:

Bertsekas Dimitri P.¹

Affiliation:

1. Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139 USA

Abstract

Sutton's TD(λ) method aims to provide a representation of the cost function in an absorbing Markov chain with transition costs. A simple example is given where the representation obtained depends on λ. For λ = 1 the representation is optimal with respect to a least-squares error criterion, but as λ decreases toward 0 the representation becomes progressively worse and, in some cases, very poor. The example suggests a need to understand better the circumstances under which TD(0) and Q-learning obtain satisfactory neural network-based compact representations of the cost function. A variation of TD(0) is also given, which performs better on the example.

Publisher

MIT Press - Journals

Subject

Cognitive Neuroscience,Arts and Humanities (miscellaneous)

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/neco.1995.7.2.270

Reference5 articles.

1. The convergence of TD(?) for general ?

2. On the Convergence of the LMS Algorithm with Adaptive Learning Rate for Linear Feedforward Networks

3. Learning to predict by the methods of temporal differences

4. Practical issues in temporal difference learning

Cited by 22 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Feature-based aggregation and deep reinforcement learning: a survey and some new implementations;IEEE/CAA Journal of Automatica Sinica;2019-01

2. Convergence of a Q-learning Variant for Continuous States and Actions;Journal of Artificial Intelligence Research;2014-04-29

3. Lambda-Policy Iteration: A Review and a New Implementation;Reinforcement Learning and Approximate Dynamic Programming for Feedback Control;2013-02-07

4. Least-Squares Methods for Policy Iteration;Adaptation, Learning, and Optimization;2012

5. A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications;Journal of Control Theory and Applications;2011-07-19