1. Towards Neural Machine Translation with Latent Tree Attention
2. Kyunghyun Cho Bart Van Merriënboer Dzmitry Bahdanau and Yoshua Bengio. 2014. On the properties of neural machine translation: encoder-decoder approaches. arXiv preprint arXiv:1409.1259. Kyunghyun Cho Bart Van Merriënboer Dzmitry Bahdanau and Yoshua Bengio. 2014. On the properties of neural machine translation: encoder-decoder approaches. arXiv preprint arXiv:1409.1259.
3. Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2018. Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2018. Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
4. Caiwen Ding Siyu Liao Yanzhi Wang Zhe Li etal 2017. CirCNN: Accelerating and Compressing Deep Neural Networks using Block-circulant Weight Matrices. In MICRO. ACM 395--408. Caiwen Ding Siyu Liao Yanzhi Wang Zhe Li et al. 2017. CirCNN: Accelerating and Compressing Deep Neural Networks using Block-circulant Weight Matrices. In MICRO. ACM 395--408.
5. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9 8 1735--1780. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9 8 1735--1780.