1. Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp. 353–355 (2018). https://aclanthology.org/W18-5446
2. Wang, B.: Mesh-Transformer-JAX: model-parallel implementation of transformer language model with JAX (2021)
3. Wang, B., Komatsuzaki, A.: GPT-J-6B: a 6 billion parameter autoregressive language model (2021)
4. Brown, T., et al.: Language models are few-shot learners (2020)
5. Tay, Y., et al.: UL2: unifying language learning paradigms (2023)