Flexible silicon photonic architecture for accelerating distributed deep learning-Reference-Cited by-同舟云学术

Flexible silicon photonic architecture for accelerating distributed deep learning

Published:2024-01-09 Issue:2 Volume:16 Page:A157
ISSN:1943-0620
Container-title:Journal of Optical Communications and Networking
language:en
Short-container-title:J. Opt. Commun. Netw.

Author:

Wu Zhenguo^ORCID,Yuan Dai Liang,Wang Yuyang,Wang Songli,Bergman Keren^ORCID

Abstract

The increasing size and complexity of deep learning (DL) models have led to the wide adoption of distributed training methods in datacenters (DCs) and high-performance computing (HPC) systems. However, communication among distributed computing units (CUs) has emerged as a major bottleneck in the training process. In this study, we propose Flex-SiPAC, a flexible silicon photonic accelerated compute cluster designed to accelerate multi-tenant distributed DL training workloads. Flex-SiPAC takes a co-design approach that combines a silicon photonic hardware platform with a tailored collective algorithm, optimized to leverage the unique physical properties of the architecture. The hardware platform integrates a novel wavelength-reconfigurable transceiver design and a micro-resonator-based wavelength-reconfigurable switch, enabling the system to achieve flexible bandwidth steering in the wavelength domain. The collective algorithm is designed to support reconfigurable topologies, enabling efficient all-reduce communications that are commonly used in DL training. The feasibility of the Flex-SiPAC architecture is demonstrated through two testbed experiments. First, an optical testbed experiment demonstrates the flexible routing of wavelengths by shuffling an array of input wavelengths using a custom-designed spatial-wavelength selective switch. Second, a four-GPU testbed running two DL workloads shows a 23% improvement in job completion time compared to a similarly sized leaf-spine topology. We further evaluate Flex-SiPAC using large-scale simulations, which show that Flex-SiPAC is able to reduce the communication time by 26% to 29% compared to state-of-the-art compute clusters under representative collective operations.

Funder

Advanced Research Projects Agency - Energy

National Security Agency

Center for Ubiquitous Connectivity

Semiconductor Research Corporation

Defense Advanced Research Projects Agency

Publisher

Optica Publishing Group

Subject

Computer Networks and Communications

Reference48 articles.

1. Efficient large-scale language model training on GPU clusters using Megatron-LM;Narayanan,2021

2. Attention is all you need;Vaswani,2017

3. XLNet: generalized autoregressive pretraining for language understanding;Yang,2019