1. ViViT: A Video Vision Transformer
2. Ensemble knowledge distillation for learning improved and efficient networks;Asif,2019
3. Sound-net: Learning sound representations from unlabeled video;Aytar;Advances in neural information processing systems,2016
4. A comparison of tv-l1 optical flow solvers on gpu;Bao;GTC Posters,2014
5. Is space-time attention all you need for video understanding?;Bertasius