1. Attention is all you need;Vaswani;IEEE Ind. Appl. Mag.,2002
2. Transformer-based online speech recognition with decoder-end adaptive computation steps;Li,2021
3. Neural speech synthesis with transformer network;Li,2019
4. A. Dosovitskiy et al., “An image is worth 16×16 words: transformers for image recognition at scale,” arXiv:2010.11929v2, 2021.
5. A. Kolesnikov et al., “Big Transfer (BiT): general visual representation learning,” arXiv:1912.11370v3, vol. 12350 LNCS, pp. 491–507, 2020, 10.1007/978-3-030-58558-7_29.