The Curious Layperson: Fine-Grained Image Recognition Without Expert Labels-Reference-Cited by-同舟云学术

The Curious Layperson: Fine-Grained Image Recognition Without Expert Labels

Published:2023-09-13 Issue:2 Volume:132 Page:537-554
ISSN:0920-5691
Container-title:International Journal of Computer Vision
language:en
Short-container-title:Int J Comput Vis

Author:

Choudhury Subhabrata^ORCID,Laina Iro,Rupprecht Christian,Vedaldi Andrea

Abstract

AbstractMost of us are not experts in specific fields, such as ornithology. Nonetheless, we do have general image and language understanding capabilities that we use to match what we see to expert resources. This allows us to expand our knowledge and perform novel tasks without ad-hoc external supervision. On the contrary, machines have a much harder time consulting expert-curated knowledge bases unless trained specifically with that knowledge in mind. Thus, in this paper we consider a new problem: fine-grained image recognition without expert annotations, which we address by leveraging the vast knowledge available in web encyclopedias. First, we learn a model to describe the visual appearance of objects using non-expert image descriptions. We then train a fine-grained textual similarity model that matches image descriptions with documents on a sentence-level basis. We evaluate the method on two datasets (CUB-200 and Oxford-102 Flowers) and compare with several strong baselines and the state of the art in cross-modal retrieval. Code is available at: https://github.com/subhc/clever.

Funder

Facebook

European Research Council

Engineering and Physical Sciences Research Council

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Software

Link

https://link.springer.com/content/pdf/10.1007/s11263-023-01885-9.pdf

Reference102 articles.

1. Agirre, E., Cer, D., Diab, M., & Gonzalez-Agirre, A. (2012). SemEval-2012 task 6: A pilot on semantic textual similarity. In SEM 2012, pp. 385–393.

2. Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A., & Guo, W. (2013). SEM 2013 shared task: Semantic textual similarity. In SEM, 2013, pp. 32–43.

3. Akata, Z., Perronnin, F., Harchaoui, Z., & Schmid, C. (2015). Label-embedding for image classification. TPAMI, 38(7), 1425–1438.

4. Andrew, G., Arora, R., Bilmes, J., & Livescu, K. (2013) Deep canonical correlation analysis. In ICML, pp. 1247–1255 . PMLR.

5. Asano, Y.M., Rupprecht, C., &Vedaldi, A. (2020). Self-labelling via simultaneous clustering and representation learning. In ICLR.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Pattern-Expandable Image Copy Detection;International Journal of Computer Vision;2024-06-22

2. Tomato ripeness detection based on image recognition;International Conference on Image, Signal Processing, and Pattern Recognition (ISPP 2024);2024-06-13

3. Waffling around for Performance: Visual Classification with Random Words and Broad Concepts;2023 IEEE/CVF International Conference on Computer Vision (ICCV);2023-10-01