1. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
2. Linformer: Self-attention with linear complexity;wang;arXiv preprint arXiv 2006 04989,2020
3. Temporal segment networks: Towards good practices for deep action recognition;wang;ECCV,2016
4. Max-deeplab: End-to-end panoptic segmentation with mask transformers;wang;2012 arXiv preprint arXiv,2020
5. End-to-end video instance segmentation with transformers;wang;arXiv preprint arXiv 2011 14858,2020