1. Ashish Vaswani et al. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, 5998–6008. Retrieved from http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf.
2. Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. Retrieved from http://arxiv.org/abs/1810.04805.
3. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems. MIT Press, 91–99.
4. Yi Sun Ding Liang Xiaogang Wang and Xiaoou Tang. 2015. DeepID3: Face recognition with very deep neural networks. Retrieved from http://arxiv.org/abs/1502.00873.
5. Barret Zoph and Quoc V. Le. 2016. Neural architecture search with reinforcement learning. Retrieved from http://arxiv.org/abs/1611.01578.