1. Ron Banner, Yury Nahshan, and Daniel Soudry. 2019. Post training 4-bit quantization of convolutional networks for rapid-deployment. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems (NeurIPS’19), Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.). 7948–7956.
2. Davis W. Blalock, Jose Javier Gonzalez Ortiz, Jonathan Frankle, and John V. Guttag. 2020. What is the state of neural network pruning? In Proceedings of the Machine Learning and Systems (MLSys’20), Inderjit S. Dhillon, Dimitris S. Papailiopoulos, and Vivienne Sze (Eds.).
3. Matthieu Courbariaux and Yoshua Bengio. 2016. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to \( + \) 1 or \( - \) 1. arXiv:1602.02830. Retrieved from http://arxiv.org/abs/1602.02830.
4. AutoAugment: Learning Augmentation Strategies From Data
5. Runtime Deep Model Multiplexing for Reduced Latency and Energy Consumption Inference