Leveraging Human Attention in Novel Object Captioning-Reference-Cited by-同舟云学术

Leveraging Human Attention in Novel Object Captioning

Published:2021-08 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence
language:
Short-container-title:

Author:

Chen Xianyu¹,Jiang Ming¹,Zhao Qi¹

Affiliation:

1. Department of Computer Science and Engineering, University of Minnesota

Abstract

Image captioning models depend on training with paired image-text corpora, which poses various challenges in describing images containing novel objects absent from the training data. While previous novel object captioning methods rely on external image taggers or object detectors to describe novel objects, we present the Attention-based Novel Object Captioner (ANOC) that complements novel object captioners with human attention features that characterize generally important information independent of tasks. It introduces a gating mechanism that adaptively incorporates human attention with self-learned machine attention, with a Constrained Self-Critical Sequence Training method to address the exposure bias while maintaining constraints of novel object descriptions. Extensive experiments conducted on the nocaps and Held-Out COCO datasets demonstrate that our method considerably outperforms the state-of-the-art novel object captioners. Our source code is available at https://github.com/chenxy99/ANOC.

Publisher

International Joint Conferences on Artificial Intelligence Organization

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Every Problem, Every Step, All in Focus: Learning to Solve Vision-Language Problems With Integrated Attention;IEEE Transactions on Pattern Analysis and Machine Intelligence;2024-07

2. Auxiliary feature extractor and dual attention-based image captioning;Signal, Image and Video Processing;2024-02-19

3. RCA-NOC: Relative Contrastive Alignment for Novel Object Captioning;2023 IEEE/CVF International Conference on Computer Vision (ICCV);2023-10-01

4. Novel Object Captioning with Semantic Match from External Knowledge;Applied Sciences;2023-07-04

5. Hybrid attention network for image captioning;Displays;2022-07