1. Attention is all you need;Vaswani,2017
2. Improving language understanding by generative pre-training;Radford,2018
3. Language models are few-shot learners;Brown;Adv. Neural Inf. Process. Syst.,2020
4. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity;Fedus;J. Mach. Learn. Res.,2022
5. Grandmaster level in StarCraft II using multi-agent reinforcement learning;Vinyals;Nature,2019