1. Chen, N., Prasanna, V.K.: A bag-of-semantics model for image clustering. Vis. Comput. 29(11), 1221–1229 (2013)
2. Kulkarni, G., Premraj, V., Ordonez, V., Dhar, S., Li, S., Choi, Y., et al.: BabyTalk: understanding and generating simple image descriptions. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 35, pp. 1601–1608. IEEE Computer Society (2011)
3. Yao, B.Z., Yang, X., Lin, L., Lee, M.W., Zhu, S.C.: I2t: image parsing to text description. Proc. IEEE 98(8), 1485–1508 (2010)
4. Donahue, J., Hendricks, L.A., Guadarrama, S., Rohrbach, M., Venugopalan, S., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. AB initto calculation of the structures and properties of molecules. Elsevier, Amsterdam (2015)
5. Guadarrama, S., Krishnamoorthy, N., Malkarnenkar, G., Venugopalan, S., Mooney, R., Darrell, T., et al.: YouTube2Text: recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In: IEEE International Conference on Computer Vision, pp. 2712–2719. IEEE (2014)