Crowdsourcing Thumbnail Captions: Data Collection and Validation-Reference-Cited by-同舟云学术

Crowdsourcing Thumbnail Captions: Data Collection and Validation

Published:2023-09-11 Issue:3 Volume:13 Page:1-28
ISSN:2160-6455
Container-title:ACM Transactions on Interactive Intelligent Systems
language:en
Short-container-title:ACM Trans. Interact. Intell. Syst.

Author:

Aguirre Carlos¹^ORCID,Cao Shiye¹^ORCID,Mahmood Amama¹^ORCID,Huang Chien-Ming¹^ORCID

Affiliation:

1. The Johns Hopkins University, USA

Abstract

Speech interfaces, such as personal assistants and screen readers, read image captions to users. Typically, however, only one caption is available per image, which may not be adequate for all situations (e.g., browsing large quantities of images). Long captions provide a deeper understanding of an image but require more time to listen to, whereas shorter captions may not allow for such thorough comprehension yet have the advantage of being faster to consume. We explore how to effectively collect both thumbnail captions—succinct image descriptions meant to be consumed quickly—and comprehensive captions—which allow individuals to understand visual content in greater detail. We consider text-based instructions and time-constrained methods to collect descriptions at these two levels of detail and find that a time-constrained method is the most effective for collecting thumbnail captions while preserving caption accuracy. Additionally, we verify that caption authors using this time-constrained method are still able to focus on the most important regions of an image by tracking their eye gaze. We evaluate our collected captions along human-rated axes—correctness, fluency, amount of detail, and mentions of important concepts—and discuss the potential for model-based metrics to perform large-scale automatic evaluations in the future.

Funder

Malone Center for Engineering in Healthcare

Publisher

Association for Computing Machinery (ACM)

Subject

Artificial Intelligence,Human-Computer Interaction

Link

https://dl.acm.org/doi/pdf/10.1145/3589346

Reference73 articles.

1. Crowdsourcing Thumbnail Captions via Time-Constrained Methods

2. SPICE: Semantic Propositional Image Caption Evaluation

3. Francisco Aranda. 2021. spaCy WordNet. Retrieved June 22 2022 fromhttps://pypi.org/project/spacy-wordnet/.

4. Somnath Arjun, G. S. Rajshekar Reddy, Abhishek Mukhopadhyay, Sanjana Vinod, and Pradipta Biswas. 2021. Evaluating visual variables in a virtual reality environment. In 34th British HCI Conference 34. BCS Learning & Development, London, 11–22.

5. Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond;Artetxe Mikel;Transactions of the Association for Computational Linguistics,2019

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Variability Management for Large Language Model Tasks: Practical Insights from an Industrial Application;28th ACM International Systems and Software Product Line Conference;2024-09-02