1. A. Mnih, and Y.W. Teh, “A fast and simple algorithm for training neural probabilistic language models,” Proc. 29th Int. Conf. Mach. Learn. ICML 2012 2, 1751–1758 (2012).
2. A. Mnih, and G. Hinton, “A Scalable Hierarchical Distributed Language Model BT - Advances in Neural Information Processing Systems,” Adv. Neural Inf. Process. Syst. 21, 1–8 (2008).
3. A vector space model for automatic indexing
4. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Adv. Neural Inf. Process. Syst. 2017-Decem(Nips), 5999–6009 (2017).
5. M.Z. Alaya, S. Bussy, S. Gaïffas, and A. Guilloux, “Binarsity: A penalization for one-hot encoded features in linear supervised learning,” J. Mach. Learn. Res. 20, 1–33 (2019).