1. VQA: visual question answering;Antol,2015
2. Movie fill in the blank with adaptive temporal attention and description update;Chen,2017
3. Recurrent batch normalization;Cooijmans;CoRR,2016
4. Video captioning with attention-based LSTM and semantic consistency;Gao;IEEE Trans. Multimed.,2017
5. Youtube2text: recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition;Guadarrama,2013