1. Arthur, D., & Vassilvitskii, S. (2006). K-means++ the advantages of careful seeding. In Proceedings of the first annual ACM-SIAM symposium on discrete algorithms.
2. Ba, J. L., Kiros, J. R., & Hinton, G. E. (2016). Layer normalization. arXiv preprint arXiv:1607.06450
3. Baevski, A., Hsu, W.-N., Xu, Q., Babu, A., Gu, J., & Auli, M. (2022). Data2vec: A general framework for self-supervised learning in speech, vision and language. In Proceedings of international conference on machine learning (pp. 1298–1312). PMLR.
4. Bao, H., Dong, L., Piao, S., & Wei, F. (2021). Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254
5. Barbu, A., Mayo, D., Alverio, J., Luo, W., Wang, C., Gutfreund, D., Tenenbaum, J., & Katz, B. (2019). Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models. In Proceedings of advances in neural information processing systems (vol. 32).