1. Tanti M, Gatt A, Camilleri KP (2017) What is the role of recurrent neural networks (rnns) in an image caption generator? In: Alonso JM, Bugarín A, Reiter E (eds) Proceedings of the 10th international conference on natural language generation, INLG 2017, Santiago de Compostela, Spain, September 4-7, 2017. Association for Computational Linguistics, ???, pp 51–60. https://doi.org/10.18653/V1/W17-3506
2. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
3. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: 9th International conference on learning representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, ???. https://openreview.net/forum?id=YicbFdNTTy
4. Tay Y, Dehghani M, Bahri D, Metzler D (2020) Efficient transformers: a survey. ACM Comput Surv (CSUR)
5. Kitaev N, Kaiser L, Levskaya A (2020) Reformer: the efficient transformer. In: 8th International conference on learning representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, ???. https://openreview.net/forum?id=rkgNKkHtvB