1. Bang Y, Cahyawijaya S, Lee N et al. A multitask, multilingual, multimodal evaluation of ChatGPT on reasoning, hallucination, and interactivity. In Proc. the 13th International Joint Conference on Natural Language and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, Nov. 2023, pp.675–718. DOI: https://doi.org/10.18653/v1/2023.ijcnlp-main.45.
2. Zhao W X, Zhou K, Li J Y et al. A survey of large language models. arXiv: 2303.18223, 2023. https://arxiv.org/abs/2303.18223, May 2024.
3. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.6000–6010.
4. Kaplan J, McCandlish S, Henighan T, Brown T B, Chess B, Child R, Gray S, Radford A, Wu J, Amodei D. Scaling laws for neural language models. arXiv: 2001. 08361, 2020. https://arxiv.org/abs/2001.08361, May 2024.
5. Xue F Z, Fu Y, Zhou W C S, Zheng Z W, You Y. To repeat or not to repeat: Insights from scaling LLM under token-crisis. arXiv: 2305.13230, 2023. https://arxiv.org.abs/2305.13230, May 2024.