Optimizing Single DGX-A100 System: Overcoming GPU Limitations via Efficient Parallelism and Scheduling for Large Language Models-Reference-Cited by-同舟云学术

Optimizing Single DGX-A100 System: Overcoming GPU Limitations via Efficient Parallelism and Scheduling for Large Language Models

Published:2023-08-16 Issue:16 Volume:13 Page:9306
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Kim Kyeong-Hwan¹^ORCID,Jeong Chang-Sung¹

Affiliation:

1. Department of Electrical Engineering, Korea University, Seoul 02841, Republic of Korea

Abstract

In this study, we introduce a novel training algorithm specifically designed to overcome the limitations of GPU memory on a single DGX-A100 system. By utilizing the CPU and main memory in the training process and applying a strategy of division and parallelization, our algorithm enhances the size of the trainable language model and the batch size. In addition, we developed a comprehensive management system to effectively manage the execution of the algorithm. This system systematically controls the training process and resource usage, while also enabling the asynchronous deployment of tasks. Finally, we proposed a scheduling technique integrated into the management system, promoting efficient task scheduling in a complex, heterogeneous training environment. These advancements equip researchers with the ability to work with larger models and batch sizes, even when faced with limited GPU memory.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/16/9306/pdf

Reference36 articles.

1. Attention is all you need;Vaswani;Adv. Neural Inf. Process. Syst.,2017

2. Mars, M. (2022). From Word Embeddings to Pre-Trained Language Models: A State-of-the-Art Walkthrough. Appl. Sci., 12.

3. Garrido-Muñoz, I., Montejo-Ráez, A., Martínez-Santiago, F., and Ureña-López, L.A. (2021). A survey on bias in deep NLP. Appl. Sci., 11.

4. Radford, A., Narasimhan, K., Salimans, T., and Sutskever, I. (2023, August 10). Improving language understanding by generative pre-training. Available online: https://www.mikecaptain.com/resources/pdf/GPT-1.pdf.

5. Language models are few-shot learners;Brown;Adv. Neural Inf. Process. Syst.,2020