1. Armen Aghajanyan, Anchit Gupta, Akshat Shrivastava, Xilun Chen, Luke Zettlemoyer, and Sonal Gupta. 2021. Muppet: Massive Multi-task Representations with Pre-Finetuning. arxiv: 2101.11038 [cs.CL]
2. Loubna Ben Allal Raymond Li Denis Kocetkov et al. 2023. SantaCoder: don't reach for the stars!arxiv: 2301.03988 [cs.SE]
3. Rohan Anil Andrew M. Dai Orhan Firat et al. 2023. PaLM 2 Technical Report. arxiv: 2305.10403 [cs.CL]
4. Anthropic. 2023. Model Card and Evaluations for Claude Models. https://www-files.anthropic.com/production/images/Model-Card-Claude-2.pdf
5. Vamsi Aribandi, Yi Tay, Tal Schuster, Jinfeng Rao, Huaixiu Steven Zheng, Sanket Vaibhav Mehta, Honglei Zhuang, Vinh Q. Tran, Dara Bahri, Jianmo Ni, Jai Gupta, Kai Hui, Sebastian Ruder, and Donald Metzler. 2022. ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning. arxiv: 2111.10952 [cs.CL]