1. Slowfast networks for video recognition;Feichtenhofer,2019
2. Vivit: A video vision transformer;Arnab,2021
3. Mask r-cnn;He,2017
4. Social scene understanding: End-to-end multi-person action localization and collective activity recognition;Bagautdinov,2017
5. A hierarchical deep temporal model for group activity recognition;Ibrahim,2016