1. Gaussian error linear units (gelus);hendrycks;arXiv preprint arXiv 1606 08415,2016
2. Long Short-Term Memory
3. An Image is Worth 16x16 Words, What is a Video Worth?;sharir;arXiv preprint arXiv 2103 13111,2021
4. Stand-alone self-attention in vision models;ramachandran;arXiv preprint arXiv 1906 05190,2019
5. An Image is Worth 16x16 Words: Transformers for Image Recog-nition at Scale;dosovitskiy;International Conference on Learning Representations,2020