1. Iz Beltagy , Matthew E Peters , and Arman Cohan . 2020 . Longformer: The longdocument transformer. arXiv preprint arXiv:2004.05150 (2020). Iz Beltagy, Matthew E Peters, and Arman Cohan. 2020. Longformer: The longdocument transformer. arXiv preprint arXiv:2004.05150 (2020).
2. Danqi Chen , Adam Fisch , Jason Weston , and Antoine Bordes . 2017. Reading wikipedia to answer open-domain questions. arXiv preprint arXiv:1704.00051 ( 2017 ). Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading wikipedia to answer open-domain questions. arXiv preprint arXiv:1704.00051 (2017).
3. Rewon Child , Scott Gray , Alec Radford , and Ilya Sutskever . 2019. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509 ( 2019 ). Rewon Child, Scott Gray, Alec Radford, and Ilya Sutskever. 2019. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509 (2019).
4. Misha Denil , Alban Demiraj , Nal Kalchbrenner , Phil Blunsom , and Nando de Freitas . 2014. Modelling , visualising and summarising documents with a single convolutional neural network. arXiv preprint arXiv:1406.3830 ( 2014 ). Misha Denil, Alban Demiraj, Nal Kalchbrenner, Phil Blunsom, and Nando de Freitas. 2014. Modelling, visualising and summarising documents with a single convolutional neural network. arXiv preprint arXiv:1406.3830 (2014).
5. Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018). Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).