SmartIO

Author:

Markussen Jonas1,Kristiansen Lars Bjørlykke1,Halvorsen Pål2,Kielland-Gyrud Halvor1,Stensland Håkon Kvale3,Griwodz Carsten4

Affiliation:

1. Dolphin Interconnect Solutions, Norway

2. SimulaMet, Norway

3. Simula Research Laboratory, Norway

4. University of Oslo, Norway

Abstract

The large variety of compute-heavy and data-driven applications accelerate the need for a distributed I/O solution that enables cost-effective scaling of resources between networked hosts. For example, in a cluster system, different machines may have various devices available at different times, but moving workloads to remote units over the network is often costly and introduces large overheads compared to accessing local resources. To facilitate I/O disaggregation and device sharing among hosts connected using Peripheral Component Interconnect Express (PCIe) non-transparent bridges, we present SmartIO. NVMes, GPUs, network adapters, or any other standard PCIe device may be borrowed and accessed directly, as if they were local to the remote machines. We provide capabilities beyond existing disaggregation solutions by combining traditional I/O with distributed shared-memory functionality, allowing devices to become part of the same global address space as cluster applications. Software is entirely removed from the data path, and simultaneous sharing of a device among application processes running on remote hosts is enabled. Our experimental results show that I/O devices can be shared with remote hosts, achieving native PCIe performance. Thus, compared to existing device distribution mechanisms, SmartIO provides more efficient, low-cost resource sharing, increasing the overall system performance.

Funder

Norges Forskningsråd

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference92 articles.

1. Keras. [n.d.]. Retrieved from https://keras.io. Keras. [n.d.]. Retrieved from https://keras.io.

2. TensorFlow. [n.d.]. Large-Scale Machine Learning on Heterogeneous Systems. Retrieved from https://www.tensorflow.org/. TensorFlow. [n.d.]. Large-Scale Machine Learning on Heterogeneous Systems. Retrieved from https://www.tensorflow.org/.

3. IntelŴVirtualization Technology for Directed I/O

4. FlatFlash

Cited by 7 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Neos: A NVMe-GPUs Direct Vector Service Buffer in User Space;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13

2. Data Motion Acceleration: Chaining Cross-Domain Multi Accelerators;2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2024-03-02

3. Design of QoS-Aware Network Functions;Springer Theses;2024

4. DxPU: Large-scale Disaggregated GPU Pools in the Datacenter;ACM Transactions on Architecture and Code Optimization;2023-12-14

5. GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System Architecture;Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2;2023-01-27

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3