1. Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018). Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
2. Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn etal 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020). Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
3. Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration
4. Gene H Golub and Charles F Van Loan . 2013. Matrix computations . JHU press . Gene H Golub and Charles F Van Loan. 2013. Matrix computations. JHU press.
5. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units