1. Wang, Y., Zhong, W., Li, L., Mi, F., Zeng, X., Huang, W., Shang, L., Jiang, X., and Liu, Q. (2023). Aligning Large Language Models with Human: A Survey. arXiv.
2. Training language models to follow instructions with human feedback;Ouyang;Adv. Neural Inf. Process. Syst.,2022
3. Glaese, A., McAleese, N., Trębacz, M., Aslanides, J., Firoiu, V., Ewalds, T., Rauh, M., Weidinger, L., Chadwick, M., and Thacker, P. (2022). Improving alignment of dialogue agents via targeted human judgements. arXiv.
4. Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., and McKinnon, C. (2022). Constitutional AI: Harmlessness from AI feedback. arXiv.
5. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. (May, January 30). Towards Deep Learning Models Resistant to Adversarial Attacks. Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada.