Author:
Feng Wenjun,Lin Dazhen,Cao Donglin
Publisher
Springer Nature Singapore
Reference33 articles.
1. Lecture Notes in Computer Science;N Carion,2020
2. Cornia, M., Stefanini, M., Baraldi, L., Cucchiara, R.: Meshed-memory transformer for image captioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10578–10587 (2020)
3. Diao, H., Zhang, Y., Ma, L., Lu, H.: Similarity reasoning and filtration for image-text matching. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 1218–1226 (2021)
4. Faghri, F., Fleet, D.J., Kiros, J.R., Fidler, S.: VSE++: improving visual-semantic embeddings with hard negatives. arXiv preprint arXiv:1707.05612 (2017)
5. Feng, F., Zhang, J., He, X., Zhang, H., Chua, T.S.: Empowering language understanding with counterfactual reasoning. arXiv preprint arXiv:2106.03046 (2021)