Image-Captioning Model Compression-Reference-Cited by-同舟云学术

Image-Captioning Model Compression

Published:2022-02-04 Issue:3 Volume:12 Page:1638
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Atliha Viktar,Šešok Dmitrij

Abstract

Image captioning is a very important task, which is on the edge between natural language processing (NLP) and computer vision (CV). The current quality of the captioning models allows them to be used for practical tasks, but they require both large computational power and considerable storage space. Despite the practical importance of the image-captioning problem, only a few papers have investigated model size compression in order to prepare them for use on mobile devices. Furthermore, these works usually only investigate decoder compression in a typical encoder–decoder architecture, while the encoder traditionally occupies most of the space. We applied the most efficient model-compression techniques such as architectural changes, pruning and quantization to several state-of-the-art image-captioning architectures. As a result, all of these models were compressed by no less than 91% in terms of memory (including encoder), but lost no more than 2% and 4.5% in metrics such as CIDEr and SPICE, respectively. At the same time, the best model showed results of 127.4 CIDEr and 21.4 SPICE, with a size equal to only 34.8 MB, which sets a strong baseline for compression problems for image-captioning models, and could be used for practical applications.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/12/3/1638/pdf

Reference61 articles.

1. Staniūtė, R., and Šešok, D. (2019). A Systematic Literature Review on Image Captioning. Appl. Sci., 9.

2. Zafar, B., Ashraf, R., Ali, N., Iqbal, M.K., Sajid, M., Dar, S.H., and Ratyal, N.I. (2018). A novel discriminating and relative global spatial image representation with applications in CBIR. Appl. Sci., 8.

3. Region-based image retrieval in the compressed domain using shape-adaptive DCT;Multimed. Tools Appl.,2016

4. Vinyals, O., Toshev, A., Bengio, S., and Erhan, D. (2015, January 7–12). Show and tell: A neural image caption generator. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.

5. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., and Bengio, Y. (2015, January 6–11). Show, attend and tell: Neural image caption generation with visual attention. Proceedings of the International Conference on Machine Learning, Lille, France.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Lightweight Image Captioning Model Based on Knowledge Distillation;MultiMedia Modeling;2024

2. Automatic image captioning in Thai for house defect using a deep learning-based approach;Advances in Computational Intelligence;2023-12-29

3. Image Understanding by Captioning with Differentiable Architecture Search;Proceedings of the 30th ACM International Conference on Multimedia;2022-10-10

4. Text Augmentation for Compressed Image Captioning Models;2022 IEEE Open Conference of Electrical, Electronic and Information Sciences (eStream);2022-04-21