Affiliation:
1. Microsoft Research Asia, China
Abstract
Although it has been studied for years by computer vision and machine learning communities, image annotation is still far from practical. In this chapter, the authors propose a novel attempt of modeless image annotation, which investigates how effective a data-driven approach can be, and suggest annotating an uncaptioned image by mining its search results. The authors collected 2.4 million images with their surrounding texts from a few photo forum Web sites as our database to support this data-driven approach. The entire process contains three steps: (1) the search process to discover visually and semantically similar search results; (2) the mining process to discover salient terms from textual descriptions of the search results; and (3) the annotation rejection process to filter noisy terms yielded by step 2. To ensure real time annotation, two key techniques are leveraged – one is to map the high dimensional image visual features into hash codes, the other is to implement it as a distributed system, of which the search and mining processes are provided as Web services. As a typical result, the entire process finishes in less than 1 second. Since no training dataset is required, our proposed approach enables annotating with unlimited vocabulary, and is highly scalable and robust to outliers. Experimental results on real Web images show the effectiveness and efficiency of the proposed algorithm.
Reference49 articles.
1. BarnardK.DuyguluP.FreitasN.ForsythD. (2001). Clustering art (pp. 434–439). Computer Vision and Pattern Recognition.
2. BarnardK.DuyguluP.FreitasN.ForsythD. (2003). Recognition as translating images into yext. Internet Imaging IX, Electronic Imaging.
3. Barnard, K., Duygulu, P., Freitas, N., Forsyth, D., Blei, D., & Jordan, M. (2003). Matching words and pictures. Journal of Machine Learning Research, 1107–1135. doi:10.1162/153244303322533214
4. Blei, D., & Jordan, M. I. (2003). Modeling annotated data. Annual International ACM SIGIR Conference, Toronto, Canada.
5. Cai, D., He, X., Li, Z., Ma, W.-Y., & Wen, J.-R. (2004). Hierarchical clustering of WWW image search results using visual, textual and link information. ACM International Conference on Multimedia, (pp. 952-959).