1. Bottom-up and top-down attention for image captioning and visual question answering;P Anderson;Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR),2018
2. UNITER: universal image-text representation learning;Y Chen;Proceedings of the 16th European Conference on Computer Vision (ECCV),2020
3. Dual encoding for zero-example video retrieval;J Dong;IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2019
4. Vse++: Improving visualsemantic embeddings with hard negatives;F Faghri;Proceedings of the British Machine Vision Conference (BMVC),2018