1. Sid Black Stella Biderman Eric Hallahan Quentin Anthony Leo Gao Laurence Golding Horace He Connor Leahy Kyle McDonell Jason Phang Michael Pieler USVSN Sai Prashanth Shivanshu Purohit Laria Reynolds Jonathan Tow Ben Wang and Samuel Weinbach. 2022. GPT-NeoX-20B: An Open-Source Autoregressive Language Model. arxiv:2204.06745 [cs.CL] Sid Black Stella Biderman Eric Hallahan Quentin Anthony Leo Gao Laurence Golding Horace He Connor Leahy Kyle McDonell Jason Phang Michael Pieler USVSN Sai Prashanth Shivanshu Purohit Laria Reynolds Jonathan Tow Ben Wang and Samuel Weinbach. 2022. GPT-NeoX-20B: An Open-Source Autoregressive Language Model. arxiv:2204.06745 [cs.CL]
2. Tom Brown , Benjamin Mann , Nick Ryder , Melanie Subbiah , Jared D Kaplan , Prafulla Dhariwal , Arvind Neelakantan , Pranav Shyam , Girish Sastry , Amanda Askell , 2020. Language models are few-shot learners. Advances in neural information processing systems 33 ( 2020 ), 1877–1901. Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
3. Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arxiv:1810.04805 [cs.CL] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arxiv:1810.04805 [cs.CL]
4. Philipp Ennen Po-Chun Hsu Chan-Jan Hsu Chang-Le Liu Yen-Chen Wu Yin-Hsiang Liao Chin-Tung Lin Da-Shan Shiu and Wei-Yun Ma. 2023. Extending the Pre-Training of BLOOM for Improved Support of Traditional Chinese: Models Methods and Results. arxiv:2303.04715 [cs.CL] Philipp Ennen Po-Chun Hsu Chan-Jan Hsu Chang-Le Liu Yen-Chen Wu Yin-Hsiang Liao Chin-Tung Lin Da-Shan Shiu and Wei-Yun Ma. 2023. Extending the Pre-Training of BLOOM for Improved Support of Traditional Chinese: Models Methods and Results. arxiv:2303.04715 [cs.CL]
5. Chelsea Finn , Pieter Abbeel , and Sergey Levine . 2017 . Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks . In Proceedings of the 34th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 70) , Doina Precup and Yee Whye Teh (Eds.). PMLR, https://proceedings.mlr.press/v70/finn17a.html, 1126–1135. Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In Proceedings of the 34th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 70), Doina Precup and Yee Whye Teh (Eds.). PMLR, https://proceedings.mlr.press/v70/finn17a.html, 1126–1135.