1. Character-Level Language Modeling with Deeper Self-Attention
2. Alexei Baevski and Michael Auli. 2018. Adaptive input representations for neural language modeling. arXiv preprint arXiv:1809.10853 (2018). Alexei Baevski and Michael Auli. 2018. Adaptive input representations for neural language modeling. arXiv preprint arXiv:1809.10853 (2018).
3. Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul Lamere. 2011. The million song dataset. (2011). Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul Lamere. 2011. The million song dataset. (2011).
4. Tom B Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell et almbox. 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020). Tom B Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell et almbox. 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020).
5. Rewon Child Scott Gray Alec Radford and Ilya Sutskever. 2019. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509 (2019). Rewon Child Scott Gray Alec Radford and Ilya Sutskever. 2019. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509 (2019).