1. VQA: Visual Question Answering
2. DenseCap: Fully Convolutional Localization Networks for Dense Captioning
3. Every Picture Tells a Story: Generating Sentences from Images
4. Y. Yang, C. Teo, III, H, Daumé, and Y. Aloimonos, “Corpus-guided sentence generation of natural images,” In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 444-454, 2011.
5. S. Li, G. Kulkarni, T. Berg, A. Berg, and Y. Choi, “Composing simple image descriptions using web-scale ngrams,” In Proceedings of the Fifteenth Conference on Computational Natural Language Learning, pp. 220-228, 2011.