Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models-Reference-Cited by-同舟云学术

Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models

Published:2016-10-22 Issue:1 Volume:123 Page:74-93
ISSN:0920-5691
Container-title:International Journal of Computer Vision
language:en
Short-container-title:Int J Comput Vis

Author:

Plummer Bryan A.,Wang Liwei,Cervantes Chris M.,Caicedo Juan C.,Hockenmaier Julia,Lazebnik Svetlana

Funder

National Science Foundation

Alfred P. Sloan Foundation

Xerox Foundation

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Software

Link

http://link.springer.com/content/pdf/10.1007/s11263-016-0965-7.pdf

Reference64 articles.

1. Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C.L., & Parikh, D. (2015). Vqa: Visual question answering. In ICCV.

2. Chen, X. & Zitnick, C. L. (2015). Minds eye: A recurrent visual representation for image caption generation. In CVPR.

3. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In CVPR.

4. Devlin, J., Cheng, H., Fang, H., Gupta, S., Deng, L., He, X., Zweig, G., & Mitchell, M. (2015). Language models for image captioning: The quirks and what works. In ACL.

5. Dodge, J., Goyal, A., Han, X., Mensch, A., Mitchell, M., Stratos, K., Yamaguchi, K., Choi, Y., III, H. D., Berg, A. C., & Berg, T. L. (2012). Detecting visual text. In NAACL.

Cited by 124 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Attention-based image captioning for structural health assessment of apartment buildings;Automation in Construction;2024-11

2. A novel binary quantizer for variational autoencoder-based image compressor;International Journal of Computers and Applications;2024-07-23

3. Decoupling Classification and Localization of CLIP;2024 IEEE International Conference on Multimedia and Expo Workshops (ICMEW);2024-07-15

4. Improving Accuracy and Generalizability via Multi-Modal Large Language Models Collaboration;2024 International Joint Conference on Neural Networks (IJCNN);2024-06-30

5. Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models;Proceedings of the 2024 International Conference on Multimedia Retrieval;2024-05-30