1. Trained ternary quantization;zhu;International Conference on Learning Representations (ICLR),2017
2. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients;zhou,2016
3. Mixed precision quantization of convnets via differentiable neural architecture search;wu,2018
4. Hugging-face’s transformers: State-of-the-art natural language processing;wolf,2019
5. HAQ: Hardware-Aware Automated Quantization With Mixed Precision