EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference

Author:

Tambe Thierry1,Hooper Coleman2,Pentecost Lillian1,Jia Tianyu2,Yang En-Yu1,Donato Marco3,Sanh Victor4,Whatmough Paul5,Rush Alexander M.6,Brooks David1,Wei Gu-Yeon2

Affiliation:

1. Harvard University, United States of America

2. Harvard University

3. Tufts University

4. Hugging Face

5. Arm Research / Harvard, United States of America

6. Cornell University

Funder

Semiconductor Research Corporation JUMP ADA

National Science Foundation

Publisher

ACM

Reference88 articles.

1. accessed Oct 1 2020. Catapult High-Level Synthesis. https://www.mentor.com/hls-lp/catapult-high-level-synthesis accessed Oct 1 2020. Catapult High-Level Synthesis. https://www.mentor.com/hls-lp/catapult-high-level-synthesis

2. accessed Oct 1 2020. Jetson TX2 Module. https://developer.nvidia.com/embedded/jetson-tx2 accessed Oct 1 2020. Jetson TX2 Module. https://developer.nvidia.com/embedded/jetson-tx2

3. T. Ajayi S. Kamineni Y. Cherivirala M. Fayazi K. Kwon M. Saligane S. Gupta C. Chen D. Sylvester D. Dreslinski B. Calhoun and D. Wentzloff. 2020. An Open-source Framework for Autonomous SoC Design with Analog Block Generation. In 020 IFIP/IEEE 28th International Conference on Very Large Scale Integration (VLSI-SoC). T. Ajayi S. Kamineni Y. Cherivirala M. Fayazi K. Kwon M. Saligane S. Gupta C. Chen D. Sylvester D. Dreslinski B. Calhoun and D. Wentzloff. 2020. An Open-source Framework for Autonomous SoC Design with Analog Block Generation. In 020 IFIP/IEEE 28th International Conference on Very Large Scale Integration (VLSI-SoC).

Cited by 38 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. ToEx: Accelerating Generation Stage of Transformer-Based Language Models via Token-Adaptive Early Exit;IEEE Transactions on Computers;2024-09

2. MECLA: Memory-Compute-Efficient LLM Accelerator with Scaling Sub-matrix Partition;2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA);2024-06-29

3. LLMCompass: Enabling Efficient Hardware Design for Large Language Model Inference;2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA);2024-06-29

4. Latency-aware service placement for GenAI at the edge;Disruptive Technologies in Information Sciences VIII;2024-06-06

5. EINS: Edge-Cloud Deep Model Inference with Network-Efficiency Schedule in Serverless;2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD);2024-05-08

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3