Funder
National Natural Science Foundation of China
Reference61 articles.
1. On pursuit of designing multi-modal transformer for video grounding;Cao,2021
2. H. Li, M. Cao, X. Cheng, Y. Li, Z. Zhu, Y. Zou, G2l: Semantically aligned and uniform video grounding via geodesic and game theory, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 12032–12042.
3. J. Gao, C. Sun, Z. Yang, R. Nevatia, Tall: Temporal activity localization via language query, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 5267–5275.
4. M. Zheng, S. Li, Q. Chen, Y. Peng, Y. Liu, Phrase-level Temporal Relationship Mining for Temporal Sentence Localization, in: Proceedings of the AAAI Conference on Artificial Intelligence, 2023.
5. Temporal sentence grounding in videos: A survey and future directions;Zhang;IEEE Trans. Pattern Anal. Mach. Intell.,2023