1. Hedi Ben-younes, Remi Cadene , Matthieu Cord , and Nicolas Thome . 2017 . MUTAN: Multimodal Tucker Fusion for Visual Question Answering . In Proceedings of the IEEE International Conference on Computer Vision. 2631--2639 . Hedi Ben-younes, Remi Cadene, Matthieu Cord, and Nicolas Thome. 2017. MUTAN: Multimodal Tucker Fusion for Visual Question Answering. In Proceedings of the IEEE International Conference on Computer Vision. 2631--2639.
2. End-to-End Object Detection with Transformers
3. Localizing Natural Language in Videos
4. Z Chen , L Ma , W Luo , and relax KKY Wong . 2019 b. Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video. In Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. Z Chen, L Ma, W Luo, and relax KKY Wong. 2019b. Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video. In Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics.
5. Visual Grounding via Accumulated Attention