1. Jo Kristian Bergum. 2021. Pretrained Transformer Language Models for Search - part 4. https://blog.vespa.ai/pretrained-transformer-language-models-for-search-part-4/ Jo Kristian Bergum. 2021. Pretrained Transformer Language Models for Search - part 4. https://blog.vespa.ai/pretrained-transformer-language-models-for-search-part-4/
2. Byte Pair Encoding is Suboptimal for Language Model Pretraining
3. Kevin Clark , Minh-Thang Luong , Quoc V Le , and Christopher D Manning . 2020 . ELECTRA: Pre-training text encoders as discriminators rather than generators. In Proceddings of ICLR. Kevin Clark, Minh-Thang Luong, Quoc V Le, and Christopher D Manning. 2020. ELECTRA: Pre-training text encoders as discriminators rather than generators. In Proceddings of ICLR.
4. Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding . In Proceedings of ACL. 4171--4186 . Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of ACL. 4171--4186.
5. A White Box Analysis of ColBERT