Improving reinforcement learning algorithms: Towards optimal learning rate policies-Reference-Cited by-同舟云学术

Improving reinforcement learning algorithms: Towards optimal learning rate policies

Published:2023-02-26 Issue: Volume: Page:
ISSN:0960-1627
Container-title:Mathematical Finance
language:en
Short-container-title:Mathematical Finance

Author:

Mounjid Othmane¹^ORCID,Lehalle Charles‐Albert²³^ORCID

Affiliation:

1. CMAP École Polytechnique Palaiseau France

2. Abu Dhabi Investment Authority Abu Dhabi United Arab Emirates

3. Imperial College London London UK

Abstract

AbstractThis paper shows how to use results of statistical learning theory and stochastic algorithms to have a better understanding of the convergence of Reinforcement Learning (RL) once it is formulated as a fixed point problem. This can be used to propose improvement of RL learning rates. First, our analysis shows that the classical asymptotic convergence rate is pessimistic and can be replaced by with , and N the number of iterations. Second, we propose a dynamic optimal policy for the choice of the learning rate used in RL. We decompose our policy into two interacting levels: the inner and outer levels. In the inner level, we present the PASS algorithm (for “PAst Sign Search”) which, based on a predefined sequence of learning rates, constructs a new sequence for which the error decreases faster. The convergence of PASS is proved and error bounds are established. In the outer level, we propose an optimal methodology for the selection of the predefined sequence. Third, we show empirically that our selection methodology of the learning rate outperforms significantly standard algorithms used in RL for the three following applications: the estimation of a drift, the optimal placement of limit orders, and the optimal execution of a large number of shares.

Publisher

Wiley

Subject

Applied Mathematics,Economics and Econometrics,Social Sciences (miscellaneous),Finance,Accounting

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1111/mafi.12378

Reference47 articles.

1. Amrouni S. Moulin A. Vann J. Vyetrenko S. Balch T. &Veloso M.(2021).ABIDES‐gym: Gym environments for multi‐agent discrete event simulation and application to financial markets. In Proceedings of the Second ACM International Conference on AI in Finance pp.1–9.

2. Arulkumaran K. Cully A. &Togelius J.(2019).AlphaStar: An evolutionary computation perspective. InProceedings of the genetic and evolutionary computation conference companion(pp. 314–315).

3. Baldacci B. Manziuk I. Mastrolia T. &Rosenbaum M.(2022).Market Making and Incentives Design in the Presence of a Dark Pool: A Stackelberg Actor‐Critic Approach. Operations Research.

4. Convexity, Classification, and Risk Bounds

5. Functional Approximations and Dynamic Programming

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. AutoRL-Sim: Automated Reinforcement Learning Simulator for Combinatorial Optimization Problems;Modelling;2024-08-26