1. ViViT: A Video Vision Transformer
2. Is space-time attention all you need for video understanding?;Bertasius
3. Tarn: Temporal attentive relation network for few-shot and zero-shot action recognition;Bishay,2019
4. Few-Shot Video Classification via Temporal Alignment
5. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset