1. Generating long sequences with sparse transformers;child;arXiv 1904 10509,2019
2. Longformer: The long-document transformer;beltagy;arXiv 2004 05150,2020
3. Image transformer;parmar;Proc Int Conf Mach Learn,2018
4. TVT: Transferable vision transformer for unsupervised domain adaptation;yang;arXiv 2108 05988,2021
5. Are wider nets better given the same number of parameters?;golubeva;arXiv 2010 14495,2020