Asymptotics of Reinforcement Learning with Neural Networks-Reference-Cited by-同舟云学术

Asymptotics of Reinforcement Learning with Neural Networks

Published:2021-11-16 Issue: Volume: Page:
ISSN:1946-5238
Container-title:Stochastic Systems
language:en
Short-container-title:Stochastic Systems

Author:

Sirignano Justin¹²^ORCID,Spiliopoulos Konstantinos³^ORCID

Affiliation:

1. Mathematics, University of Oxford, Oxfordshire, Oxford OX1 2JD, United Kingdom;

2. Department of Industrial & Systems Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801;

3. Department of Mathematics and Statistics, Boston University, Boston, Massachusetts 02215

Abstract

We prove that a single-layer neural network trained with the Q-learning algorithm converges in distribution to a random ordinary differential equation as the size of the model and the number of training steps become large. Analysis of the limit differential equation shows that it has a unique stationary solution that is the solution of the Bellman equation, thus giving the optimal control for the problem. In addition, we study the convergence of the limit differential equation to the stationary solution. As a by-product of our analysis, we obtain the limiting behavior of single-layer neural networks when trained on independent and identically distributed data with stochastic gradient descent under the widely used Xavier initialization.

Publisher

Institute for Operations Research and the Management Sciences (INFORMS)

Subject

Management Science and Operations Research,Statistics, Probability and Uncertainty,Modelling and Simulation,Statistics and Probability

Reference16 articles.

1. Asynchronous Stochastic Approximations

2. Ethier S, Kurtz T (1986) Markov Processes: Characterization and Convergence (Wiley, New York).