1. Callan, J., Hoy, M., Yoo, C., Zhao, L.: Clueweb09 data set (2009)
2. Dinh, L., Krueger, D., Bengio, Y.: Nice: non-linear independent components estimation. arXiv preprint arXiv:1410.8516 (2014)
3. Ethayarajh, K.: How contextual are contextualized word representations? Comparing the geometry of BERT, ELMO, and GPT-2 embeddings. arXiv preprint arXiv:1909.00512 (2019)
4. Friedman, J.H.: Exploratory projection pursuit. J. Am. Stat. Assoc. 82(397), 249–266 (1987)
5. Gao, J., He, D., Tan, X., Qin, T., Wang, L., Liu, T.Y.: Representation degeneration problem in training natural language generation models. arXiv preprint arXiv:1907.12009 (2019)