1. Deep Residual Learning for Image Recognition
2. Top-Down Neural Attention by Excitation Backprop
3. Bert: Pre-training of deep bidirectional transformers for language understanding;devlin;Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL),2019
4. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks;lu;Advances in Neural IInformation Processing Systems,2019
5. End-to-end object detection with transformers;carion;Proceedings of the European Conference on Computer Vision (ECCV),2020