1. A Divide-and-Conquer Approach to the Summarization of Long Documents
2. Hierarchical Learning for Generation with Long Source Sequences;Rohde;Allen School of Computer Science & Engineering, University of Washington, Seattle, WA
3. Big Bird: Transformers for Longer Sequences;Zaheer;Google Research
4. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding;Devlin;Google AI Language