1. Tall: Temporal activity localization via language query;J Gao;Proceedings of the IEEE/CVF International Conference on Computer Vision,2017
2. Russell, Localizing moments in video with natural language;L Hendricks;Proceedings of the IEEE/CVF International Conference on Computer Vision,2017
3. Where does it exist: Spatio-temporal video grounding for multi-form sentences;Z Zhang;Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020
4. Humancentric spatio-temporal video grounding with visual transformers;Z Tang;IEEE Transactions on Circuits and Systems for Video Technology,2021
5. Hierarchical attention based spatial-temporal graph-to-sequence learning for grounded video description;K Shen;Proceedings of the International Joint Conference on Artificial Intelligence, AAAI,2020