1. Airoldi, E.M. and Bischof, J.M. (2016). Improving and evaluating topic models and other models of text. J. Am. Stat. Assoc., 111, 1381–1403.
2. Arora, S., Li, Y., Liang, Y., Ma, T. and Risteski, A. (2016). A latent variable model approach to PMI-based word embeddings. Trans. Assoc. Comput. Linguist., 4, 385–299. https://doi.org/10.1162/tacl_a_00106
3. Bengio, Y., Ducharme, R., Vincent, P. and Janvin, C. (2003). A neural probabilistic language model. J. Mach. Learn. Res., 3, 1137–1155.
4. Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Lai, J.C. and Mercer, R.L. (1992a). An estimate of an upper bound for the entropy of English. Comput. Linguist., 18, 31–40.
5. Brown, P.F., Della Pietra, V.J., deSouza, P.V., Lai, J.C. and Mercer, R.L. (1992b). Class-based $$n$$-gram models of natural language. Comput. Linguist., 18, 467–479.