Kalman Temporal Differences-Reference-Cited by-同舟云学术

Kalman Temporal Differences

Published:2010-10-29 Issue: Volume:39 Page:483-532
ISSN:1076-9757
Container-title:Journal of Artificial Intelligence Research
language:
Short-container-title:jair

Author:

Geist M.,Pietquin O.

Abstract

Because reinforcement learning suffers from a lack of scalability, online value (and Q-) function approximation has received increasing interest this last decade. This contribution introduces a novel approximation scheme, namely the Kalman Temporal Differences (KTD) framework, that exhibits the following features: sample-efficiency, non-linear approximation, non-stationarity handling and uncertainty management. A first KTD-based algorithm is provided for deterministic Markov Decision Processes (MDP) which produces biased estimates in the case of stochastic transitions. Than the eXtended KTD framework (XKTD), solving stochastic MDP, is described. Convergence is analyzed for special cases for both deterministic and stochastic transitions. Related algorithms are experimented on classical benchmarks. They compare favorably to the state of the art while exhibiting the announced features.

Publisher

AI Access Foundation

Subject

Artificial Intelligence

Cited by 33 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Bayesian reinforcement learning: A basic overview;Neurobiology of Learning and Memory;2024-05

2. A probabilistic successor representation for context-dependent learning.;Psychological Review;2023-05-11

3. A probabilistic successor representation for context-dependent prediction;2022-06-04

4. Multi-Agent Reinforcement Learning via Adaptive Kalman Temporal Difference and Successor Representation;Sensors;2022-02-11

5. AKF-SR: Adaptive Kalman filtering-based successor representation;Neurocomputing;2022-01