EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference-Reference-Cited by-同舟云学术

EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference

Published:2021-10-17 Issue: Volume: Page:
ISSN:
Container-title:MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture
language:
Short-container-title:

Author:

Tambe Thierry¹,Hooper Coleman²,Pentecost Lillian¹,Jia Tianyu²,Yang En-Yu¹,Donato Marco³,Sanh Victor⁴,Whatmough Paul⁵,Rush Alexander M.⁶,Brooks David¹,Wei Gu-Yeon²

Affiliation:

1. Harvard University, United States of America

2. Harvard University

3. Tufts University

4. Hugging Face

5. Arm Research / Harvard, United States of America

6. Cornell University

Funder

Semiconductor Research Corporation JUMP ADA

National Science Foundation

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3466752.3480095

Reference88 articles.

1. accessed Oct 1 2020. Catapult High-Level Synthesis. https://www.mentor.com/hls-lp/catapult-high-level-synthesis accessed Oct 1 2020. Catapult High-Level Synthesis. https://www.mentor.com/hls-lp/catapult-high-level-synthesis

2. accessed Oct 1 2020. Jetson TX2 Module. https://developer.nvidia.com/embedded/jetson-tx2 accessed Oct 1 2020. Jetson TX2 Module. https://developer.nvidia.com/embedded/jetson-tx2

3. T. Ajayi S. Kamineni Y. Cherivirala M. Fayazi K. Kwon M. Saligane S. Gupta C. Chen D. Sylvester D. Dreslinski B. Calhoun and D. Wentzloff. 2020. An Open-source Framework for Autonomous SoC Design with Analog Block Generation. In 020 IFIP/IEEE 28th International Conference on Very Large Scale Integration (VLSI-SoC). T. Ajayi S. Kamineni Y. Cherivirala M. Fayazi K. Kwon M. Saligane S. Gupta C. Chen D. Sylvester D. Dreslinski B. Calhoun and D. Wentzloff. 2020. An Open-source Framework for Autonomous SoC Design with Analog Block Generation. In 020 IFIP/IEEE 28th International Conference on Very Large Scale Integration (VLSI-SoC).

Cited by 38 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. ToEx: Accelerating Generation Stage of Transformer-Based Language Models via Token-Adaptive Early Exit;IEEE Transactions on Computers;2024-09

2. MECLA: Memory-Compute-Efficient LLM Accelerator with Scaling Sub-matrix Partition;2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA);2024-06-29

3. LLMCompass: Enabling Efficient Hardware Design for Large Language Model Inference;2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA);2024-06-29

4. Latency-aware service placement for GenAI at the edge;Disruptive Technologies in Information Sciences VIII;2024-06-06

5. EINS: Edge-Cloud Deep Model Inference with Network-Efficiency Schedule in Serverless;2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD);2024-05-08