Publisher
Springer Nature Switzerland
Reference9 articles.
1. Frome, A., et al.: DeViSE: a deep visual-semantic embedding model. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), vol. 26 (2013)
2. Schoeffmann, K., Lokoč, J., Bailer, W.: 10 years of video browser showdown. In: MMAsia 2020: ACM Multimedia Asia (2022)
3. Faghri, F., Fleet, D.J., Kiros, R., Fidler, S.: VSE++: improved visual-semantic embeddings. arXiv:1707.05612 (2017)
4. Lee, K.-H., Chen, X., Hua, G., Hu, H., He, X.: Stacked cross attention for image-text matching. In: Proceedings of European Conference on Computer Vision (ECCV) (2018)
5. Liu, C., Mao, Z., Zhang, T., Xie, H., Wang, B., Zhang, Y.: Graph structured network for image-text matching. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)