1. Text classification improved by integrating bidirectional lstm with two-dimensional max pooling;zhou;ArXiv Preprint,2016
2. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks
3. Bert: Pre-training of deep bidirectional transformers for language understanding;devlin;ArXiv Preprint,2018
4. Attention is all you need;vaswani;Advances in neural information processing systems,2017
5. A structured self-attentive sentence embedding;lin;ArXiv Preprint,2017