1. BERT: Pre-training of deep bidirectional transformers for language understanding;devlin;NAACL HLT 2019 - 2019 Conf North Am Chapter Assoc Comput Linguist Hum Lang Technol - Proc Conf,2019
2. GPT-3: Its Nature, Scope, Limits, and Consequences
3. Distributed representations ofwords and phrases and their compositionality;mikolov;Adv Neural Inf Process Syst,2013
4. BERT rediscovers the classical NLP pipeline;tenney;ACL 2019 - 57th Annu Meet Assoc Comput Linguist Proc Conf,2020
5. Well-Read Students Learn Better: On the Importance of Pre-training Compact Models;turc,2019