1. Anurag Arnab , Mostafa Dehghani , Georg Heigold , Chen Sun , Mario Lucic , and Cordelia Schmid . 2021 . ViViT: A Video Vision Transformer. In International Conference on Computer Vision. IEEE, 6816--6826 . Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lucic, and Cordelia Schmid. 2021. ViViT: A Video Vision Transformer. In International Conference on Computer Vision. IEEE, 6816--6826.
2. Gedas Bertasius , Heng Wang , and Lorenzo Torresani . 2021 . Is Space-Time Attention All You Need for Video Understanding? . In International Conference on Machine Learning. PMLR, 813--824 . Gedas Bertasius, Heng Wang, and Lorenzo Torresani. 2021. Is Space-Time Attention All You Need for Video Understanding?. In International Conference on Machine Learning. PMLR, 813--824.
3. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
4. Audio-driven Talking Video Frame Restoration
5. Mostafa Dehghani Josip Djolonga Basil Mustafa Piotr Padlewski Jonathan Heek Justin Gilmer Andreas Steiner Mathilde Caron Robert Geirhos Ibrahim Alabdulmohsin Rodolphe Jenatton Lucas Beyer Michael Tschannen Anurag Arnab Xiao Wang Carlos Riquelme Matthias Minderer Joan Puigcerver Utku Evci Manoj Kumar Sjoerd van Steenkiste Gamaleldin F. Elsayed Aravindh Mahendran Fisher Yu Avital Oliver Fantine Huot Jasmijn Bastings Mark Patrick Collier Alexey A. Gritsenko Vighnesh Birodkar Cristina Vasconcelos Yi Tay Thomas Mensink Alexander Kolesnikov Filip Pavetic Dustin Tran Thomas Kipf Mario Lucic Xiaohua Zhai Daniel Keysers Jeremiah Harmsen and Neil Houlsby. 2023. Scaling Vision Transformers to 22 Billion Parameters. CoRR Vol. abs/2302.05442 (2023) 1--21. Mostafa Dehghani Josip Djolonga Basil Mustafa Piotr Padlewski Jonathan Heek Justin Gilmer Andreas Steiner Mathilde Caron Robert Geirhos Ibrahim Alabdulmohsin Rodolphe Jenatton Lucas Beyer Michael Tschannen Anurag Arnab Xiao Wang Carlos Riquelme Matthias Minderer Joan Puigcerver Utku Evci Manoj Kumar Sjoerd van Steenkiste Gamaleldin F. Elsayed Aravindh Mahendran Fisher Yu Avital Oliver Fantine Huot Jasmijn Bastings Mark Patrick Collier Alexey A. Gritsenko Vighnesh Birodkar Cristina Vasconcelos Yi Tay Thomas Mensink Alexander Kolesnikov Filip Pavetic Dustin Tran Thomas Kipf Mario Lucic Xiaohua Zhai Daniel Keysers Jeremiah Harmsen and Neil Houlsby. 2023. Scaling Vision Transformers to 22 Billion Parameters. CoRR Vol. abs/2302.05442 (2023) 1--21.