1. Ordonez, Vicente and Kulkarni, Girish and Berg, Tamara L. (2011) “Im2text: Describing images using 1 million captioned photographs” Advances in neural information processing systems 1143–1151.
2. Li, Siming and Kulkarni, Girish and Berg, Tamara L and Berg, Alexander C and Choi, Yejin. (2011) “Composing simple image descriptions using web-scale n-grams.” Proceedings of the Fifteenth Conference on Computational Natural Language Learning 220–228.
3. Yang, Yezhou and Teo, Ching Lik and Daumé III, Hal and Aloimonos, Yiannis. (2011) “Corpus-guided sentence generation of natural images.” Proceedings of the Conference on Empirical Methods in Natural Language Processing 444–454.
4. Tomas Mikolov and Kai Chen and Greg S. Corrado and Jeffrey Dean (2013) “Efficient Estimation of Word Representations in Vector Space”, arXiv:1301.3781
5. Karpathy, Andrej and Fei-Fei, Li. (2015). “Deep visual-semantic alignments for generating image descriptions”, Proceedings of the IEEE conference on computer vision and pattern recognition, 3128–3137.