1. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E.H., Le, Q.V., and Zhou, D. (December, January 28). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA.
2. Carlini, N., Tramer, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K., Roberts, A., Brown, T.B., Song, D., and Erlingsson, U. (2021, January 11–13). Extracting Training Data from Large Language Models. Proceedings of the USENIX Security Symposium, San Diego, CA, USA.
3. Shoeybi, M., Patwary, M., Puri, R., LeGresley, P., Casper, J., and Catanzaro, B. (2019). Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv.
4. Memorization without overfitting: Analyzing the training dynamics of large language models;Tirumala;Adv. Neural Inf. Process. Syst.,2022
5. Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. (2021). Lora: Low-rank adaptation of large language models. arXiv.