1. Sequence to sequence-video to text;Venugopalan,2015
2. Video paragraph captioning using hierarchical recurrent neural networks;Yu,2016
3. Video Description Generation Incorporating Spatio-temporal Features and a Soft-attention Mechanism;Yao,2015
4. Attention-based multimodal fusion for video description;Hori,2017
5. Spatio-temporal attention models for grounded video captioning;Zanfir,2016