Accelerating BERT inference with GPU-efficient exit prediction-Reference-Cited by-同舟云学术

Accelerating BERT inference with GPU-efficient exit prediction

Published:2024-01-22 Issue:3 Volume:18 Page:
ISSN:2095-2228
Container-title:Frontiers of Computer Science
language:en
Short-container-title:Front. Comput. Sci.

Author:

Li Lei,Wang Chengyu,Qiu Minghui,Chen Cen,Gao Ming,Zhou Aoying

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s11704-022-2341-9.pdf

Reference36 articles.

1. Devlin J, Chang M W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019, 4171–4186

2. Radford A, Narasimhan K. Improving language understanding by generative pre-training. See https://cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf website. 2018

3. Yang Z, Dai Z, Yang Y, Carbonell J G, Salakhutdinov R, Le Q. XLNet: generalized autoregressive pretraining for language understanding. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 517

4. Gou J, Yu B, Maybank S J, Tao D. Knowledge distillation: a survey. International Journal of Computer Vision, 2021, 129(6): 1789–1819

5. Laskaridis S, Kouris A, Lane N D. Adaptive inference through early-exit networks: design, challenges and directions. In: Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning. 2021, 1–6

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. FastTuning: Enabling Fast and Efficient Hyper-Parameter Tuning With Partitioning and Parallelism of Search Space;IEEE Transactions on Parallel and Distributed Systems;2024-07