1. DevNet: a deep event network for multimedia event detection and evidence recounting;Gan,2015
2. End-to-end learning of motion representation for video understanding;Fan,2018
3. X. Duan, W. Huang, C. Gan, J. Wang, W. Zhu, J. Huang, Weakly supervised dense event captioning in videos, (2018), arXiv preprint arXiv:1812.03849.
4. Recognizing fine-grained and composite activities using hand-centric features and script data;Rohrbach;Int. J. Comput. Vis. (IJCV),2016
5. On space-time interest points;Laptev;Int. J. Comput. Vis. (IJCV),2005