1. Banerjee, A., Dhillon, I., Ghosh, J., Sra, S.: Clustering on the unit hypersphere using von mises-fisher distributions. J. Mach. Learn. Res. 6, 1345–1382 (2005)
2. Charniak, E.: Statistical Language Learning. MIT Press, Cambridge (1996)
3. Clark, K., Khandelwal, U., Levy, O., Manning, C.D.: What Does BERT Look At? An Analysis of BERT’s Attention.
arXiv:1906.04341
[cs], June 2019
4. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding.
arXiv:1810.04805
[cs] (2019)
5. Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence);A Glushchenko,2019