L-STAP: Learned Spatio-Temporal Adaptive Pooling for Video Captioning-Reference-Cited by-同舟云学术

L-STAP: Learned Spatio-Temporal Adaptive Pooling for Video Captioning

Published:2019 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the 1st International Workshop on AI for Smart TV Content Production, Access and Delivery - AI4TV '19
language:
Short-container-title:

Author:

Francis Danny¹,Huet Benoit¹

Affiliation:

1. EURECOM, Biot, France

Funder

ANR

European Union

Publisher

ACM Press

Reference40 articles.

1. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation ({OSDI} 16). 265--283.

2. Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6077--6086.

3. Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299--6308.

4. David L Chen and William B Dolan. 2011. Collecting highly parallel data for paraphrase evaluation. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 190--200.

5. Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Att-BiL-SL: Attention-Based Bi-LSTM and Sequential LSTM for Describing Video in the Textual Formation;Applied Sciences;2021-12-29

2. AI4TV 2019;Proceedings of the 27th ACM International Conference on Multimedia;2019-10-15

3. Image and Video Captioning Using Deep Architectures;Multi-faceted Deep Learning;2012-02-24