1. Training language models to follow instructions with human feedback;Ouyang;arXiv preprint,2022
2. Augmenting reinforcement learning with human feedback;Knox;ICML 2011 Workshop on New Developments in Imitation Learning (July 2011),2011
3. Proximal policy optimization algorithms;Schulman;arXiv preprint,2017