BagFormer: Better cross-modal retrieval via bag-wise interaction
-
Published:2024-10
Issue:
Volume:136
Page:108912
-
ISSN:0952-1976
-
Container-title:Engineering Applications of Artificial Intelligence
-
language:en
-
Short-container-title:Engineering Applications of Artificial Intelligence
Author:
Hou HaowenORCID, Yan Xiaopeng, Zhang YigengORCID
Reference37 articles.
1. Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., Zhang, L., 2018. Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 6077–6086. 2. Multidimensional binary search trees used for associative searching;Bentley;Commun. ACM,1975 3. Chen, H., Ding, G., Liu, X., Lin, Z., Liu, J., Han, J., 2020. Imram: Iterative matching with recurrent attention memory for cross-modal image-text retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12655–12663. 4. Cheng, M., Sun, Y., Wang, L., Zhu, X., Yao, K., Chen, J., Song, G., Han, J., Liu, J., Ding, E., et al., 2022. ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5184–5193. 5. Cubuk, E.D., Zoph, B., Shlens, J., Le, Q.V., 2020. Randaugment: Practical automated data augmentation with a reduced search space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. pp. 702–703.
|
|