CHARM: C omposing H eterogeneous A ccele R ators for M atrix Multiply on Versal ACAP Architecture

Author:

Zhuang Jinming1ORCID,Lau Jason2ORCID,Ye Hanchen3ORCID,Yang Zhuoping1ORCID,Du Yubo1ORCID,Lo Jack4ORCID,Denolf Kristof5ORCID,Neuendorffer Stephen4ORCID,Jones Alex1ORCID,Hu Jingtong1ORCID,Chen Deming3ORCID,Cong Jason2ORCID,Zhou Peipei1ORCID

Affiliation:

1. University of Pittsburgh, Pittsburgh, PA, USA

2. University of California, Los Angeles, Los Angeles, CA, USA

3. University of Illinois at Urbana-Champaign, Urbana, IL, USA

4. Advanced Micro Devices Inc, San Jose, CA, USA

5. Advanced Micro Devices Inc, Longmont, CO, USA

Funder

National Science Foundation

Publisher

ACM

Reference48 articles.

1. Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N Gomez , ?ukasz Kaiser, and Illia Polosukhin . Attention is all you need. Advances in neural information processing systems, 30 , 2017 . Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ?ukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017.

2. Neural Collaborative Filtering

3. Alexey Dosovitskiy , Lucas Beyer , Alexander Kolesnikov , Dirk Weissenborn , Xiaohua Zhai , Thomas Unterthiner , Mostafa Dehghani , Matthias Minderer , Georg Heigold , Sylvain Gelly , An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 , 2020 . Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.

4. Yu Emma Wang , Gu-Yeon Wei , and David Brooks . Benchmarking TPU, GPU , and CPU platforms for deep learning. arXiv preprint arXiv:1907.10701 , 2019 . Yu Emma Wang, Gu-Yeon Wei, and David Brooks. Benchmarking TPU, GPU, and CPU platforms for deep learning. arXiv preprint arXiv:1907.10701, 2019.

5. In-Datacenter Performance Analysis of a Tensor Processing Unit

Cited by 12 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. FiberFlex: Real-time FPGA-based Intelligent & Distributed Fiber Sensor System for Pedestrian Recognition;ACM Transactions on Reconfigurable Technology and Systems;2024-08-28

2. EA4RCA: Efficient AIE accelerator design framework for regular Communication-Avoiding Algorithm;ACM Transactions on Architecture and Code Optimization;2024-07-15

3. TaPaS Co-AIE: An Open-Source Framework for Streaming-Based Heterogeneous Acceleration Using AMD AI Engines;2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW);2024-05-27

4. Accelerating ML Workloads using GPU Tensor Cores: The Good, the Bad, and the Ugly;Proceedings of the 15th ACM/SPEC International Conference on Performance Engineering;2024-05-07

5. Efficient Approaches for GEMM Acceleration on Leading AI-Optimized FPGAs;2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM);2024-05-05

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3