1. Training data-efficient image transformers & distillation through attention;Touvron
2. Vitcod: Vision transformer acceleration via dedicated algorithm and accelerator co-design;You;ArXiv,2022
3. Vitality: Unifying low-rank and sparse approximation for vision transformer acceleration with a linear taylor attention;Dass;ArXiv,2022
4. Gaussian error linear units (gelus);Hendrycks,2016