1. Multiscale vision transformers Savi++: Towards end-to-end object-centric learning from real-world videos;fan;In ICCV,2021
2. Is an object-centric video representation beneficial for transfer?;zhang;ACCV,2022
3. Savi++: Towards end-to-end object-centric learning from real-world videos;elsayed;NeurIPS,2022
4. A-ViT: Adaptive Tokens for Efficient Vision Transformer
5. Masked autoencoders as spatiotemporal learners;feichtenhofer;NeurIPS,2022