1. Rishabh Agarwal, Nino Vieillard, Piotr Stanczyk, Sabela Ramos, Matthieu Geist, and Olivier Bachem. 2023. GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models. arXiv preprint arXiv:2306.13649 (2023).
2. Zeyuan Allen-Zhu and Yuanzhi Li. 2023. Towards Understanding Ensemble Knowledge Distillation and Self-Distillation in Deep Learning. In ICLR.
3. Stéphane Boucheron, Olivier Bousquet, and Gábor Lugosi. 2005. Theory of classification: A survey of some recent advances. ESAIM: probability and statistics, Vol. 9 (2005), 323--375.
4. Model compression
5. Distilling Knowledge via Knowledge Review