Deep Learning for Video Captioning: A Review-Reference-Cited by-同舟云学术

Deep Learning for Video Captioning: A Review

Published:2019-08 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
language:
Short-container-title:

Author:

Chen Shaoxiang¹,Yao Ting²,Jiang Yu-Gang¹³

Affiliation:

1. Shanghai Key Lab of Intelligent Info. Processing, School of Computer Science, Fudan University, China

2. JD AI Research, China

3. Jilian Technology Group (Video++), Shanghai, China

Abstract

Deep learning has achieved great successes in solving specific artificial intelligence problems recently. Substantial progresses are made on Computer Vision (CV) and Natural Language Processing (NLP). As a connection between the two worlds of vision and language, video captioning is the task of producing a natural-language utterance (usually a sentence) that describes the visual content of a video. The task is naturally decomposed into two sub-tasks. One is to encode a video via a thorough understanding and learn visual representation. The other is caption generation, which decodes the learned representation into a sequential sentence, word by word. In this survey, we first formulate the problem of video captioning, then review state-of-the-art methods categorized by their emphasis on vision or language, and followed by a summary of standard datasets and representative approaches. Finally, we highlight the challenges which are not yet fully understood in this task and present future research directions.

Publisher

International Joint Conferences on Artificial Intelligence Organization

Cited by 16 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Towards Human-Interactive Controllable Video Captioning with Efficient Modeling;Mathematics;2024-06-30

2. Local feature‐based video captioning with multiple classifier and CARU‐attention;IET Image Processing;2024-04-17

3. Multi-level video captioning method based on semantic space;Multimedia Tools and Applications;2024-02-08

4. Deep learning and knowledge graph for image/video captioning: A review of datasets, evaluation metrics, and methods;Engineering Reports;2023-10-12

5. Collaborative three-stream transformers for video captioning;Computer Vision and Image Understanding;2023-10