1. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L. Kaiser, I. Polosukhin, Attention is all you need, in: Proceedings of the 31st International Conference on Neural Information Processing Systems, NeurIPS ’17, Long Beach, California, USA, 2017, pp. 6000–6010.
2. Scaling laws for neural language models;Kaplan,2020
3. T.B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., Language models are few-shot learners, in: Proceedings of the 34th International Conference on Neural Information Processing Systems, Vol. 159, NIPS ’20, Vancouver, BC, Canada, 2020, pp. 1877–1901.
4. Bloom: A 176b-parameter open-access multilingual language model;Scao,2022
5. GPT-4 technical report;Achiam,2023