1. Ilge Akkaya Marcin Andrychowicz Maciek Chociej Mateusz Litwin Bob McGrew Arthur Petron Alex Paino Matthias Plappert Glenn Powell Raphael Ribas 2019. Solving rubik’s cube with a robot hand. arXiv:1910.07113.
2. Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model;Bengio Yoshua;IEEE Transactions on Neural Networks,2007
3. Yash Chandak, Georgios Theocharous, James E. Kostas, Scott M. Jordan, and Philip S. Thomas. 2019. Learning action representations for reinforcement learning. International Conference on Machine Learning, 1565–1582.
4. Top-K Off-Policy Correction for a REINFORCE Recommender System
5. User Response Models to Improve a REINFORCE Recommender System