1. Deep Residual Learning for Image Recognition
2. LoRA: Low-rank adaptation of large language models;Hu
3. BERT: pre-training of deep bidirectional transformers for language understanding;Devlin
4. Language models are few-shot learners;Brown
5. Robust speech recognition via large-scale weak supervision;Radford