1. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding;Han
2. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
3. Binarized neural networks: training neural networks with weights and activations constrained to +1 or -1, arXiv:1602.02830 [cs.LG];Courbariaux;arXiv.org,2016
4. Ternary weight networks;Li
5. DoReFa-Net: training low bitwidth convolutional neural networks with low bitwidth gradients, arXiv:1606.06160 [cs.NE];Zhou;arXiv.org,2016