Evaluation of pre-training large language models on leadership-class supercomputers-Reference-Cited by-同舟云学术

Evaluation of pre-training large language models on leadership-class supercomputers

Published:2023-06-16 Issue:18 Volume:79 Page:20747-20768
ISSN:0920-8542
Container-title:The Journal of Supercomputing
language:en
Short-container-title:J Supercomput

Author:

Yin Junqi,Dash Sajal,Gounley John,Wang Feiyi,Tourassi Georgia

Funder

U.S. Department of Energy

Publisher

Springer Science and Business Media LLC

Subject

Hardware and Architecture,Information Systems,Theoretical Computer Science,Software

Link

https://link.springer.com/content/pdf/10.1007/s11227-023-05479-7.pdf

Reference38 articles.

1. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30

2. Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

3. Radford A, Narasimhan K (2018) Improving language understanding by generative pre-training

4. Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, von Arx S, Bernstein MS, Bohg J, Bosselut A, Brunskill E, Brynjolfsson E, Buch S, Card D, Castellon R, Chatterji NS, Chen AS, Creel K, Davis JQ, Demszky D, Donahue C, Doumbouya M, Durmus E, Ermon S, Etchemendy J, Ethayarajh K, Fei-Fei L, Finn C, Gale T, Gillespie L, Goel K, Goodman ND, Grossman S, Guha N, Hashimoto T, Henderson P, Hewitt J, Ho DE, Hong J, Hsu K, Huang J, Icard T, Jain S, Jurafsky D, Kalluri P, Karamcheti S, Keeling G, Khani F, Khattab O, Koh PW, Krass MS, Krishna R, Kuditipudi R, et al (2021) On the opportunities and risks of foundation models. CoRR abs/2108.07258. https://arxiv.org/abs/2108.07258

5. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler D, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D: Language models are few-shot learners. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H (2020) (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901. Curran Associates, Inc., virtual. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Perspective on Scalable AI on High-Performance Computing and Leadership Class Supercomputing Facilities [Industrial and Governmental Activities];IEEE Computational Intelligence Magazine;2024-08

2. Exploring Vision Transformers on the Frontier Supercomputer for Remote Sensing and Geoscientific Applications;IGARSS 2024 - 2024 IEEE International Geoscience and Remote Sensing Symposium;2024-07-07

3. Towards Maps of Disease Progression: Biomedical Large Language Model Latent Spaces For Representing Disease Phenotypes And Pseudotime;2024-06-16

4. Comparative Study of Large Language Model Architectures on Frontier;2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS);2024-05-27

5. Toward a Holistic Performance Evaluation of Large Language Models Across Diverse AI Accelerators;2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW);2024-05-27