1. Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. Preprint. Posted online October 11, 2018. arXiv:1810.04805v2. https://doi.org/10.48550/arXiv.1810.04805
2. Brown TB, Mann B, Ryder N, et al. Language models are few-shot learners. Preprint. Posted online May 28, 2020. arxiv:2005.14165v4. https://doi.org/10.48550/arXiv.2005.14165
3. Chowdhery A, Narang S, Devlin J, et al. PaLM: scaling language modeling with pathways. Preprint. Posted online April 5, 2022. arxiv:2204.02311v5. https://doi.org/10.48550/arXiv.2204.02311
4. Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: transformers for image recognition at scale. Preprint. Posted online October 22, 2020. arXiv:2010.11929v2. https://doi.org/10.48550/arXiv.2010.11929
5. Kirillov A, Mintun E, Ravi N, et al. Segment anything. Preprint. Posted online April 5, 2023. arXiv:2304.02643v1. https://doi.org/10.48550/arXiv.2304.02643