Funder
National Natural Science Foundation of China
Reference40 articles.
1. Label-attention transformer with geometrically coherent objects for image captioning;Dubey;Inform. Sci.,2023
2. Towards local visual modaling for image captioning;Ma;Pattern Recognit.,2023
3. C. Jing, Y. Jia, Y. Wu, X. Liu, Q. Wu, Maintaining reasoning consistency in compositional visual question answering, in: Proceedings of 2022 IEEE/CVF Conference on Computer Visio and Pattern Recognition, 2022.
4. Encoder-decoder cycle for visual question answering based on perception-action cycle;Mohamud;Pattern Recognit.,2023
5. L. Yang, Y. Xu, C. Yuan, W. Liu, B. Li, W. Hu, Improving visual grounding with visual-linguistic verification and iterative reasoning, in: Proceedings of Conference on Computer Vision and Pattern Recognition, 2020.