Author:
Wu Yong,Tian Jinyu,Liu HuiJun,Tang Yuanyan
Funder
National Natural Science Foundation of China
Reference67 articles.
1. Watch, listen, and describe: globally and locally aligned cross-modal attentions for video captioning;Wang;North Am. Chapter Assoc. Comput. Linguist.,2018
2. GELLA, Spandana; LEWIS, Mike; ROHRBACH, Marcus. A dataset for telling the stories of social media videos. In: Proceedings of the 2018 conference on empirical methods in natural language processing. 2018. pp. 968-974, URL https://aclanthology.org/D18-1117.
3. Krishna, Ranjay, et al. Dense-captioning events in videos. In: Proceedings of the IEEE international conference on computer vision. 2017. pp. 706-715.
4. Zhou, Luowei, Chenliang Xu, Jason Corso, Towards automatic learning of procedures from web instructional videos, in: Proceedings of the AAAI Conference on Artificial Intelligence 32(1) (2018), doi: 10.1609/aaai.v32i1.12342.
5. Grounding action descriptions in videos;Regneri;Trans. Assoc. Comput. Ling.,2013