1. P. Anderson , X. He , C. Buehler , D. Teney , M. Johnson , S. Gould , and L. Zhang . 2018. Bottom-up and top-down attention for image captioning and VQA . In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6077--6086 . P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang. 2018. Bottom-up and top-down attention for image captioning and VQA. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6077--6086.
2. Localizing Moments in Video with Natural Language
3. S. Antol , A. Agrawal , J. Lu , M. Mitchell , D. Batra , Z. C. Lawrence , and D. Parikh . 2015. VQA: Visual question answering . In IEEE International Conference on Computer Vision. 2425--2433 . S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, Z. C. Lawrence, and D. Parikh. 2015. VQA: Visual question answering. In IEEE International Conference on Computer Vision. 2425--2433.
4. Yalong Bai , Jianlong Fu , Tiejun Zhao , and Tao Mei . 2018 . Deep Attention Neural Tensor Network for Visual Question Answering. In European Conference on Computer Vision. 20--35 . Yalong Bai, Jianlong Fu, Tiejun Zhao, and Tao Mei. 2018. Deep Attention Neural Tensor Network for Visual Question Answering. In European Conference on Computer Vision. 20--35.
5. Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval