Network-accelerated distributed machine learning for multi-tenant settings

Author:

Viswanathan Raajay1,Balasubramanian Arjun2,Akella Aditya3

Affiliation:

1. Uber Technologies Inc

2. Amazon Web Services

3. University of Wisconsin-Madison

Funder

VMWare

Google

NSF

Publisher

ACM

Reference57 articles.

1. [n.d.]. Caffe2: A New Lightweight Modular and Scalable Deep Learning Framework. https://caffe2.ai/. [n.d.]. Caffe2: A New Lightweight Modular and Scalable Deep Learning Framework. https://caffe2.ai/.

2. [n.d.]. Gloo: Collective Communications Library. https://github.com/facebookincubator/gloo. Accessed: 2018-01-01. [n.d.]. Gloo: Collective Communications Library. https://github.com/facebookincubator/gloo. Accessed: 2018-01-01.

3. [n.d.]. NVIDIA Collective Communication Library. https://github.com/NVIDIA/nccl. Accessed: 2018-01-01. [n.d.]. NVIDIA Collective Communication Library. https://github.com/NVIDIA/nccl. Accessed: 2018-01-01.

4. [n.d.]. NY Times Dataset. https://archive.ics.uci.edu/ml/machine-learning-databases/bag-of-words. [n.d.]. NY Times Dataset. https://archive.ics.uci.edu/ml/machine-learning-databases/bag-of-words.

5. [n.d.]. PyTorch -Distributed communication package. http://pytorch.org/docs/master/distributed.html. [n.d.]. PyTorch -Distributed communication package. http://pytorch.org/docs/master/distributed.html.

Cited by 11 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Training Job Placement in Clusters with Statistical In-Network Aggregation;Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1;2024-04-17

2. ExplSched: Maximizing Deep Learning Cluster Efficiency for Exploratory Jobs;2023 IEEE International Conference on Cluster Computing (CLUSTER);2023-10-31

3. Preemptive Switch Memory Usage to Accelerate Training Jobs with Shared In-Network Aggregation;2023 IEEE 31st International Conference on Network Protocols (ICNP);2023-10-10

4. Maximizing Aggregation Throughput for Distributed Training with Constrained In-Network Computing;ICC 2023 - IEEE International Conference on Communications;2023-05-28

5. SOAR: Minimizing Network Utilization Cost With Bounded In-Network Computing;IEEE Transactions on Network and Service Management;2023

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3