Abstract
Current deep learning-based image captioning systems have been proven to store practical knowledge with their parameters and achieve competitive performances in the public datasets. Nevertheless, their ability to access and precisely manipulate the mastered knowledge is still limited. Besides, providing evidence for decisions and updating memory information are also important yet under explored. Towards this goal, we introduce a memory-augmented method, which extends an existing image caption model by incorporating extra explicit knowledge from a memory bank. Adequate knowledge is recalled according to the similarity distance in the embedding space of history context, and the memory bank can be constructed conveniently from any matched image-text set, e.g., the previous training data. Incorporating such non-parametric memory-augmented method to various captioning baselines, the performance of resulting captioners imporves consistently on the evaluation benchmark. More encouragingly, extensive experiments demonstrate that our approach holds the capability for efficiently adapting to larger training datasets, by simply transferring the memory bank without any additional training.
Publisher
Association for the Advancement of Artificial Intelligence (AAAI)
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Dual-adaptive interactive transformer with textual and visual context for image captioning;Expert Systems with Applications;2024-06
2. Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head Video Generation;2023 IEEE/CVF International Conference on Computer Vision (ICCV);2023-10-01
3. End-to-End Non-Autoregressive Image Captioning;ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2023-06-04
4. An integration of Pseudo Anomalies and Memory Augmented Autoencoder for Video Anomaly Detection;The 11th International Symposium on Information and Communication Technology;2022-12
5. Efficient Modeling of Future Context for Image Captioning;Proceedings of the 30th ACM International Conference on Multimedia;2022-10-10