1. Iz Beltagy , Matthew E Peters , and Arman Cohan . 2020 . Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150 (2020). Iz Beltagy, Matthew E Peters, and Arman Cohan. 2020. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150 (2020).
2. Rewon Child , Scott Gray , Alec Radford , and Ilya Sutskever . 2019. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509 ( 2019 ). Rewon Child, Scott Gray, Alec Radford, and Ilya Sutskever. 2019. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509 (2019).
3. Kyunghyun Cho , Bart Van Merriënboer , Caglar Gulcehre , Dzmitry Bahdanau , Fethi Bougares , Holger Schwenk , and Yoshua Bengio . 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 ( 2014 ). Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).
4. Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018). Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
5. Emil Julius Gumbel. 1954. Statistical theory of extreme values and some practical applications: a series of lectures. Vol. 33. US Government Printing Office. Emil Julius Gumbel. 1954. Statistical theory of extreme values and some practical applications: a series of lectures. Vol. 33. US Government Printing Office.