1. Prasanna S. Deep learning deployment with NVIDIA TensorRT[J]. NVIDIA Deep Learning Institute, New York, 2019.
2. Vanholder H. Efficient inference with tensorrt[C]//GPU Technology Conference. 2016.
3. Han S, Mao H, Dally W J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding[J]. arXiv preprint arXiv:1510.00149, 2015.
4. Wen T, Lai S, Qian X. Preparing lessons: Improve knowledge distillation with better supervision[J]. Neurocomputing, 2021.
5. Krishnamoorthi R. Quantizing deep convolutional networks for efficient inference: A whitepaper[J]. arXiv preprint arXiv:1806.08342, 2018.