1. Bert loses patience: Fast and robust inference with early exit;Zhou;Adv. Neural Inf. Process. Syst.,2020
2. Attention is not all you need: pure attention loses rank doubly exponentially with depth;Dong,2021
3. Improve vision transformers training by suppressing over-smoothing;Gong,2021
4. Anti-oversmoothing in deep vision transformers via the fourier domain analysis: From theory to practice;Wang,2022
5. Shallow-deep networks: Understanding and mitigating network overthinking;Kaya,2019