1. A. Vaswani , N. Shazeer , N. Parmar , et al., ”Attention is all you need,” NIPS’17, 6000–6010, Curran Associates Inc., (Red Hook, NY, USA) (2017).
2. J. Devlin , M.-W. Chang , K. Lee , et al., ”BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), J. Burstein , C. Doran , and T. Solorio , Eds., 4171–4186, Association for Computational Linguistics, (Minneapolis, Minnesota) (2019).
3. A. Radford and K. Narasimhan , ”Improving language understanding by generative pretraining,” (2018).
4. OpenAI, ”Gpt-4 technical report,” (2023).
5. Y. Wang , Z. Yu , Z. Zeng , et al., ”Pandalm: An automatic evaluation benchmark for llm instruction tuning optimization,” (2024).