1. Flamingo: a visual language model for few-shot learning;Advances in Neural Information Processing Systems,2022
2. Bachlechner, T. , Majumder, B. P. , Mao, H. , Cottrell, G. , and McAuley, J. (2021). Rezero is all you need: Fast convergence at large depth. In Uncertainty in Artificial Intelligence, pages 1352–1361. PMLR.
3. Molgpt: molecular generation using a transformer-decoder model;Journal of Chemical Information and Modeling,2021
4. Beit: Bert pre-training of image transformers;arXiv preprint arXiv,2021
5. Investigating expressiveness of transformer in spectral domain for graphs;arXiv preprint arXiv,2022