1. BERT. (2023a). BERT: How to handle long documents. Salt Data Labs. Retrieved August 14 2023a from https://www.saltdatalabs.com/blog/bert-how-to-handle-long-documents?rq=bert
2. BERT. (2023b). BERT (language model) - Wikipedia. Retrieved August 14 2023b from https://en.wikipedia.org/wiki/BERT_(language_model)
3. BERT. (2023c). BERT large model (uncased). Hugging Face. Retrieved August 14 2023c from https://huggingface.co/bert-large-uncased
4. Borealis A. I. Tutorial #14: Transformers I: Introduction. Borealis AI. Retrieved August 14 2023 from https://www.borealisai.com/research-blogs/tutorial-14-transformers-i-introduction/#Multiple_heads
5. Brown, T., Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S. (2020). Mann. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, & H. Lin (Eds.), Advances in Neural Information Processing Systems (Vol. 33, pp. 1877–1901). Curran Associates, Inc.