1. Uniformerv2: Spatiotemporal learning by arming image vits with video uniformer;Li,2022
2. An image is worth 16x16 words: Transformers for image recognition at scale;Dosovitskiy,2020
3. Measurement Reliability and Validity
4. A comprehensive study of deep video action recognition;Zhu,2020
5. On Space-Time Interest Points