1. End-to-end concept word detection for video captioning, retrieval, and question answering;Yu;Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017
2. A joint sequence fusion model for video question answering and retrieval;Yu;Proceedings of the European Conference on Computer Vision (ECCV),2018
3. Mdmmt: Multidomain multimodal transformer for video retrieval;Dzabraev;Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2021
4. Quo vadis, action recognition? A new model and the kinetics dataset;Carreira;Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017
5. T-C3D: Temporal convolutional 3D network for real-time action recognition;Liu;Proceedings of the AAAI Conference on Artificial Intelligence,2018