1. X3d: Expanding architectures for efficient video recognition;christoph;CVPR,0
2. What do position embeddings learn? an empirical study of pre-trained language model positional encoding;wang;ArXiv Preprint,2022
3. Multiscale Vision Transformers
4. Unidual: A unified model for image and video understanding;wang;ArXiv Preprint,2019
5. SlowFast Networks for Video Recognition