1. Baevski, A., Hsu, W.N., Xu, Q., Babu, A., Gu, J., Auli, M.: Data2vec: a general framework for self-supervised learning in speech, vision and language. In: International Conference on Machine Learning, pp. 1298–1312. PMLR (2022)
2. Bau, D., Zhou, B., Khosla, A., Oliva, A., Torralba, A.: Network dissection: quantifying interpretability of deep visual representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6541–6549 (2017)
3. Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
4. Christian, B.: The Alignment Problem: How Can Machines Learn Human Values? Atlantic Books (2021)
5. Crabbé, J., van der Schaar, M.: Concept activation regions: a generalized framework for concept-based explanations. Adv. Neural. Inf. Process. Syst. 35, 2590–2607 (2022)