Gaussian Based Non-linear Function Approximation for Reinforcement Learning-Reference-Cited by-同舟云学术

Gaussian Based Non-linear Function Approximation for Reinforcement Learning

Published:2021-04-20 Issue:3 Volume:2 Page:
ISSN:2662-995X
Container-title:SN Computer Science
language:en
Short-container-title:SN COMPUT. SCI.

Author:

Haider Abbas^ORCID,Hawe Glenn,Wang Hui,Scotney Bryan

Abstract

AbstractReinforcement learning (RL) problems with continuous states and discrete actions (CSDA) can be found in classic examples such as Cart Pole and Puck World, as well as real world applications such as Market Making. Solutions to CSDA problems typically involve a function approximation (FA) of the mapping from states to actions and can be linear or nonlinear. Linear FAs such as tile-coding (Sutton and Barto in Reinforcement learning, 2nd ed, 2009) suffer from state information loss due to state discretization, whilst non-linear FAs such as DQN (Mnih et al. in Playing atari with deep reinforcement learning, https://arxiv.org/abs/1312.5602, 2013) are practically infeasible in infinitely large state spaces due to their cubic time complexity (

$$O(n^3)$$

O ( n 3 ) ). In this paper, we propose a novel, general solution to CSDA problems, called Gaussian distribution based non-linear function approximation (GBNLFA). Experimentation on three CSDA RL problems (Cart Pole, Puck World, Market Marking) demonstrates the superiority of GBNLFA over state-of-the-art FAs, namely tile-coding and DQN. In particular, GBNLFA resolves the state information loss problem with linear FAs and provides an asymptotically faster algorithm (O(n)) than linear FAs (

$$O(n^2)$$

O ( n 2 ) ) and neural network based nonlinear FAs (

$$O(n^3)$$

O ( n 3 ) ).

Funder

Ulster University (GB) VCRS

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s42979-021-00642-4.pdf

Reference29 articles.

1. Anschel O, Baram N, Shimkin N. Averaged-dqn: variance reduction and stabilization for deep reinforcement learning. In: Proceedings of the 34th international conference on machine learning, PMLR, vol. 70. 2017. p. 176–85.

2. Avellaneda M, Stoikov S. High-frequency trading in a limit order book. Quant Finance. 2008;8(3):217–24.

3. Bertsekas DP, Tsitsiklis JN. Neuro-dynamic programming. Nashua: Athena Scientific; 1996.

4. Davies S. Multidimensional triangulation and interpolation for reinforcement learning. https://scottdavies.net/nips96.pdf. 1997.

5. Geist M, Pietquin O, Fricout G. Kalman temporal differences: the deterministic case. In: 2009 IEEE symposium on adaptive dynamic programming and reinforcement learning. 2009.

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Reinforcement learning for addressing the cold-user problem in recommender systems;Knowledge-Based Systems;2024-06

2. Non-linear Reward Deep Q Networks for Smooth Action in a Car Game;Springer Proceedings in Mathematics & Statistics;2024

3. Function approximation reinforcement learning of energy management with the fuzzy REINFORCE for fuel cell hybrid electric vehicles;Energy and AI;2023-07

4. Predictive Market Making via Machine Learning;Operations Research Forum;2022-01-26

5. Reinforcement Learning Approaches to Optimal Market Making;Mathematics;2021-10-22