1. Lisa Anne Hendricks, Oliver Wang, Eli Shechtman, Josef Sivic, Trevor Darrell, and Bryan Russell. 2017. Localizing moments in video with natural language. In Proceedings of the ICCV.
2. Junwen Chen, Wentao Bao, and Yu Kong. 2020. Activity-driven weakly supervised spatio-temporal grounding from untrimmed videos. In Proceedings of the ACM MM.
3. Jie Chen, Zhiheng Li, Jiebo Luo, and Chenliang Xu. 2020. Learning a weakly supervised video actor-action segmentation model with a wise selection. In Proceedings of the CVPR.
4. Kan Chen, Jiyang Gao, and Ram Nevatia. 2018. Knowledge aided consistency for weakly supervised phrase grounding. In Proceedings of the CVPR.
5. Relation attention for temporal action localization;Chen Peihao;IEEE Trans. Multimedia,2019