1. ETC: Encoding long and structured inputs in transformers;Ainslie,2020
2. Character-level language modeling with deeper self-attention;Al-Rfou,2019
3. Vivit: A video vision transformer;Arnab,2021
4. Layer normalization;Ba,2016
5. Rezero is all you need: Fast convergence at large depth;Bachlechner,2020