SeeSaw: Interactive Ad-hoc Search Over Image Databases

Author:

Moll Oscar1ORCID,Favela Manuel2ORCID,Madden Samuel1ORCID,Gadepally Vijay3ORCID,Cafarella Michael1ORCID

Affiliation:

1. MIT CSAIL, Cambridge, MA, USA

2. MIT, Cambridge, MA, USA

3. MIT Lincoln Laboratory, Cambridge, MA, USA

Abstract

As image datasets become ubiquitous, the problem of ad-hoc searches over image data is increasingly important. Many high-level data tasks in machine learning, such as constructing datasets for training and testing object detectors, imply finding ad-hoc objects or scenes within large image datasets as a key sub-problem. New foundational visual-semantic embeddings trained on massive web datasets such as Contrastive Language-Image Pre-Training (CLIP) can help users start searches on their own data, but we find there is a long tail of queries where these models fall short in practice. Seesaw is a system for interactive ad-hoc searches on image datasets that integrates state-of-the-art embeddings like CLIP with user feedback in the form of box annotations to help users quickly locate images of interest in their data even in the long tail of harder queries. One key challenge for Seesaw is that, in practice, many sensible approaches to incorporating feedback into future results, including state-of-the-art active-learning algorithms, can worsen results compared to introducing no feedback, partly due to CLIP's high-average performance. Therefore, Seesaw includes several algorithms that empirically result in larger and also more consistent improvements. We compare Seesaw's accuracy to both using CLIP alone and to a state-of-the-art active-learning baseline and find Seesaw consistently helps improve results for users across four datasets and more than a thousand queries. Seesaw increases Average Precision (AP) on search tasks by an average of .08 on a wide benchmark (from a base of .72), and by a .27 on a subset of more difficult queries where CLIP alone performs poorly.

Funder

United States Air Force Research Laboratory and the Department of the Air Force Artificial Intelligence Accelerator

Publisher

Association for Computing Machinery (ACM)

Reference56 articles.

1. A System for Efficient High-Recall Retrieval

2. E Adelson , P Burt , C Anderson , J M Ogden , and J Bergen . 1984. PYRAMID METHODS IN IMAGE PROCESSING. undefined ( 1984 ). https://www.semanticscholar.org/paper/e49793511ba203e26b99e7e81fd15a7d505b5cea E Adelson, P Burt, C Anderson, J M Ogden, and J Bergen. 1984. PYRAMID METHODS IN IMAGE PROCESSING. undefined (1984). https://www.semanticscholar.org/paper/e49793511ba203e26b99e7e81fd15a7d505b5cea

3. Andrei Barbu , David Mayo , Julian Alverio , William Luo , Christopher Wang , Dan Gutfreund , Josh Tenenbaum , and Boris Katz . 2019. ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models . In Advances in Neural Information Processing Systems, H Wallach, H Larochelle, A Beygelzimer, F d\'Alché-Buc , E Fox, and R Garnett (Eds.), Vol. 32 . Curran Associates, Inc. https://proceedings.neurips.cc/paper/ 2019 /file/97af07a14cacba681feacf3012730892-Paper.pdf Andrei Barbu, David Mayo, Julian Alverio, William Luo, Christopher Wang, Dan Gutfreund, Josh Tenenbaum, and Boris Katz. 2019. ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models. In Advances in Neural Information Processing Systems, H Wallach, H Larochelle, A Beygelzimer, F d\'Alché-Buc, E Fox, and R Garnett (Eds.), Vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2019/file/97af07a14cacba681feacf3012730892-Paper.pdf

4. Mikhail Belkin and Partha Niyogi. 2006. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. https://www.jmlr.org/papers/volume7/belkin06a/belkin06a.pdf. https://www.jmlr.org/papers/volume7/belkin06a/belkin06a.pdf Accessed: 2023--3--7. Mikhail Belkin and Partha Niyogi. 2006. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. https://www.jmlr.org/papers/volume7/belkin06a/belkin06a.pdf. https://www.jmlr.org/papers/volume7/belkin06a/belkin06a.pdf Accessed: 2023--3--7.

5. E. Bernhardsson. [n.d.]. ANNOY: Approximate Nearest Neighbors Oh Yeah. https://github.com/spotify/annoy. Accessed: 2021-05--20. E. Bernhardsson. [n.d.]. ANNOY: Approximate Nearest Neighbors Oh Yeah. https://github.com/spotify/annoy. Accessed: 2021-05--20.

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Survey of vector database management systems;The VLDB Journal;2024-07-15

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3