1. Natural actor–critic algorithms
2. Doubly robust off-policy actor-critic: Convergence and optimality;xu;Proc ICML,2021
3. Finite-sample analysis of off-policy natural actor-critic with linear function approximation;chen;arXiv 2105 12540,2021
4. Policy gradient methods for reinforcement learning with function approximation;sutton;Advances in Neural IInformation Processing Systems,1999
5. A natural policy gradient;kakade;Advances in Neural IInformation Processing Systems,2001