1. Large scale distributed neural network training through online distillation;Anil,2018
2. Conditional computation in neural networks for faster models;Bengio,2015
3. Estimating or propagating gradients through stochastic neurons for conditional computation;Bengio,2013
4. Curriculum learning
5. Bit-Mixer: Mixed-precision networks with runtime bit-width selection