1. J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, 2019, pp. 4171–4186.
2. Generalized self-supervised contrastive learning with bregman divergence for image recognition;Li;Pattern Recognit. Lett.,2023
3. A tutorial on energy-based learning;LeCun,2006
4. Neural networks and physical systems with emergent collective computational abilities;Hopfield;Natl. Acad. Sci.,1982
5. Dense associative memory for pattern recognition;Krotov,2016