Accelerated masked transformer for dense video captioning-Reference-Cited by-同舟云学术

Accelerated masked transformer for dense video captioning

Author:

Yu Zhou,Han Nanjia

Funder

National Natural Science Foundation of China

Publisher

Elsevier BV

Subject

Artificial Intelligence,Cognitive Neuroscience,Computer Science Applications

Reference30 articles.

1. J.L. Ba, J.R. Kiros, G.E. Hinton, Layer normalization, 2016. arXiv preprint arXiv:1607.06450.

2. Activitynet: a large-scale video benchmark for human activity understanding;Caba Heilbron,2015

3. Temporal deformable convolutional encoder-decoder networks for video captioning;Chen,2019

4. Deep residual learning for image recognition;He;IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2016

5. Bilinear attention networks;Kim,2018

Cited by 14 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Effective Video Summarization by Extracting Parameter-Free Motion Attention;ACM Transactions on Multimedia Computing, Communications, and Applications;2024-05-16

2. Video captioning – a survey;Multimedia Tools and Applications;2024-04-09

3. Dense Video Captioning Based on Memory Enhanced Attention and Guided Learning;2023 International Conference on Image Processing, Computer Vision and Machine Learning (ICICML);2023-11-03