1. Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D.M., Jordan, M.I.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)
2. Feng, S., Manmatha, R., Lavrenko, V.: Multiple bernoulli relevance models for image and video annotation. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1002–1009 (2004)
3. Kulkarni, G., Premraj, V., Dhar, S., Li, S., Choi, Y., Berg, A.C., Berg, T.L.: Baby talk: Understanding and generating simple image descriptions. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1601–1608 (2011)
4. Siddiquie, B., Gupta, A.: Beyond active noun tagging: Modeling contextual interactions for multi-class active learning. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2010)
5. Quattoni, A., Collins, M., Darrell, T.: Learning visual representations using images with captions. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2007)