Image Difference Captioning with Pre-training and Contrastive Learning-Reference-Cited by-同舟云学术

Image Difference Captioning with Pre-training and Contrastive Learning

Published:2022-06-28 Issue:3 Volume:36 Page:3108-3116
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Yao Linli,Wang Weiying,Jin Qin

Abstract

The Image Difference Captioning (IDC) task aims to describe the visual differences between two similar images with natural language. The major challenges of this task lie in two aspects: 1) fine-grained visual differences that require learning stronger vision and language association and 2) high-cost of manual annotations that leads to limited supervised data. To address these challenges, we propose a new modeling framework following the pre-training-finetuning paradigm. Specifically, we design three self-supervised tasks and contrastive learning strategies to align visual differences and text descriptions at a fine-grained level. Moreover, we propose a data expansion strategy to utilize extra cross-task supervision information, such as data for fine-grained image classification, to alleviate the limitation of available supervised IDC data. Extensive experiments on two IDC benchmark datasets, CLEVR-Change and Birds-to-Words, demonstrate the effectiveness of the proposed modeling framework. The codes and models will be released at https://github.com/yaolinli/IDC.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 13 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. M3ixup: A multi-modal data augmentation approach for image captioning;Pattern Recognition;2025-02

2. Multi-Grained Representation Aggregating Transformer with Gating Cycle for Change Captioning;ACM Transactions on Multimedia Computing, Communications, and Applications;2024-09-12

3. SMART: Syntax-Calibrated Multi-Aspect Relation Transformer for Change Captioning;IEEE Transactions on Pattern Analysis and Machine Intelligence;2024-07

4. Expert Insight-Enhanced Follow-Up Chest X-ray Summary Generation;Lecture Notes in Computer Science;2024

5. CLIP-Driven Distinctive Interactive Transformer for Image Difference Captioning;2023 5th International Conference on Frontiers Technology of Information and Computer (ICFTIC);2023-11-17