BaGuaLu-Reference-Cited by-同舟云学术

BaGuaLu

Published:2022-03-28 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
language:
Short-container-title:

Author:

Ma Zixuan¹,He Jiaao¹,Qiu Jiezhong²,Cao Huanqi¹,Wang Yuanwei¹,Sun Zhenbo¹,Zheng Liyan¹,Wang Haojie¹,Tang Shizhi¹,Zheng Tianyu³,Lin Junyang⁴,Feng Guanyu¹,Huang Zeqiang³,Gao Jie³,Zeng Aohan²,Zhang Jianwei⁴,Zhong Runxin¹,Shi Tianhui¹,Liu Sha³,Zheng Weimin¹,Tang Jie⁵,Yang Hongxia⁴,Liu Xin³,Zhai Jidong¹,Chen Wenguang¹

Affiliation:

1. Tsinghua University

2. Tsinghua University and Beijing Academy of Artificial Intelligence

3. Zhejiang Lab

4. Alibaba Group

5. Tsinghua University and Beijing Academy of Artificial Intelligencea

Funder

National Key R&D Program of China

National Natural Science Foundation of China

NSFC for Distinguished Young Scholar

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3503221.3508417

Reference33 articles.

1. Tom B. Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell Sandhini Agarwal Ariel Herbert-Voss Gretchen Krueger Tom Henighan Rewon Child Aditya Ramesh Daniel M. Ziegler Jeffrey Wu Clemens Winter Christopher Hesse Mark Chen Eric Sigler Mateusz Litwin Scott Gray Benjamin Chess Jack Clark Christopher Berner Sam McCandlish Alec Radford Ilya Sutskever and Dario Amodei. 2020. Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL] Tom B. Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell Sandhini Agarwal Ariel Herbert-Voss Gretchen Krueger Tom Henighan Rewon Child Aditya Ramesh Daniel M. Ziegler Jeffrey Wu Clemens Winter Christopher Hesse Mark Chen Eric Sigler Mateusz Litwin Scott Gray Benjamin Chess Jack Clark Christopher Berner Sam McCandlish Alec Radford Ilya Sutskever and Dario Amodei. 2020. Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL]

2. Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018). Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

3. William Fedus , Barret Zoph , and Noam Shazeer . 2021 . Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. arXiv:2101.03961 [cs.LG] William Fedus, Barret Zoph, and Noam Shazeer. 2021. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. arXiv:2101.03961 [cs.LG]

4. The Sunway TaihuLight supercomputer: system and applications

5. Kaiming He , Xiangyu Zhang , Shaoqing Ren , and Jian Sun . 2016 . Identity Mappings in Deep Residual Networks. In ECCV 2016. 630--645. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity Mappings in Deep Residual Networks. In ECCV 2016. 630--645.

Cited by 22 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Enabling High-Performance Physical Based Rendering on New Sunway Supercomputer;2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS);2024-05-27

2. Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules;IEEE INFOCOM 2024 - IEEE Conference on Computer Communications;2024-05-20

3. ScheMoE: An Extensible Mixture-of-Experts Distributed Training System with Tasks Scheduling;Proceedings of the Nineteenth European Conference on Computer Systems;2024-04-22

4. SWattention: designing fast and memory-efficient attention for a new Sunway Supercomputer;The Journal of Supercomputing;2024-03-11

5. Distance is the spice, but not the whole enchilada: Country-pair psychic distance stimuli and country fixed effects in a deep learning implementation of the trade flow model;International Business Review;2024-02