1. BERT: Pre-training of deep bidirectional transformers for language understanding;devlin;Proc of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL),2019
2. Unified vision-language pre-training for image captioning and vqa;luowei,2019
3. Dependency-Based Self-Attention for Transformer NMT
4. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks;lu;NeurIPS,2019
5. Learning to Parse and Translate Improves Neural Machine Translation