1. Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate
2. Bast H, Haussmann E (2015) More accurate question answering on freebase. In: Bailey J, Moffat A, Aggarwal C C, de Rijke M, Kumar R, Murdock V, Sellis TK, and Yu J X (eds), Proceedings of the 24th ACM international conference on information and knowledge management, CIKM 2015, Melbourne, VIC, Australia, October 19 - 23, 2015, pp 1431–1440. ACM
3. Bhojanapalli S, Yun C, Rawat A S, Reddi S J, Kumar S (2020) Low-rank bottleneck in multi-head attention models. In: Proceedings of the 37th international conference on machine learning, ICML 2020, 13-18 July 2020, Virtual event, volume 119 of proceedings of machine learning research. PMLR, pp 864–873
4. Brown T B, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler D M, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D (2020) Language models are few-shot learners. CoRR, abs/2005.14165
5. Devlin J, Chang M-W, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805