1. Butterfly transform: An efficient FFT based neural architecture design;Alizadeh Vahid,2020
2. Adaptable butterfly accelerator for attention-based NNs via hardware and algorithm co-design;Fan,2022
3. G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network, in: NIPS 2014 Deep Learning Workshop, 2014.
4. Be your own teacher: Improve the performance of convolutional neural networks via self distillation;Zhang,2019
5. Newton-LESS: Sparsification without trade-offs for the sketched Newton update;Dereziński,2021