1. Anil, R., Pereyra, G., Passos, A., Ormándi, R., Dahl, G. E., & Hinton, G. E. (2018). Large scale distributed neural network training through online distillation. In 6th international conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, conference track proceedings.
2. Representation learning: A review and new perspectives;Bengio;IEEE Transactions on Pattern Analysis and Machine Intelligence,2013
3. Chen, G., Choi, W., Yu, X., Han, T. X., & Chandraker, M. (2017). Learning Efficient Object Detection Models with Knowledge Distillation. In Advances in neural information processing systems 30: Annual conference on neural information processing systems 2017, December 4-9, 2017, Long Beach, CA, USA (pp. 742–751).
4. Chen, D., Mei, J., Zhang, Y., Wang, C., Wang, Z., Feng, Y., & Chen, C. (2021). Cross-Layer Distillation with Semantic Calibration. In Thirty-fifth AAAI conference on artificial intelligence (pp. 7028–7036).
5. Cheng, H., Koc, L., Harmsen, J., Shaked, T., Chandra, T., Aradhye, H., Anderson, G., Corrado, G., Chai, W., Ispir, M., Anil, R., Haque, Z., Hong, L., Jain, V., Liu, X., & Shah, H. (2016). Wide & Deep Learning for Recommender Systems. In Proceedings of the 1st workshop on deep learning for recommender systems (pp. 7–10).