Affiliation:
1. Dolphin Interconnect Solutions, Norway
2. SimulaMet, Norway
3. Simula Research Laboratory, Norway
4. University of Oslo, Norway
Abstract
The large variety of compute-heavy and data-driven applications accelerate the need for a distributed I/O solution that enables cost-effective scaling of resources between networked hosts. For example, in a cluster system, different machines may have various devices available at different times, but moving workloads to remote units over the network is often costly and introduces large overheads compared to accessing local resources. To facilitate I/O disaggregation and device sharing among hosts connected using Peripheral Component Interconnect Express (PCIe) non-transparent bridges, we present SmartIO. NVMes, GPUs, network adapters, or any other standard PCIe device may be borrowed and accessed directly, as if they were local to the remote machines. We provide capabilities beyond existing disaggregation solutions by combining traditional I/O with distributed shared-memory functionality, allowing devices to become part of the same global address space as cluster applications. Software is entirely removed from the data path, and simultaneous sharing of a device among application processes running on remote hosts is enabled. Our experimental results show that I/O devices can be shared with remote hosts, achieving native PCIe performance. Thus, compared to existing device distribution mechanisms, SmartIO provides more efficient, low-cost resource sharing, increasing the overall system performance.
Publisher
Association for Computing Machinery (ACM)
Reference92 articles.
1. Keras. [n.d.]. Retrieved from https://keras.io. Keras. [n.d.]. Retrieved from https://keras.io.
2. TensorFlow. [n.d.]. Large-Scale Machine Learning on Heterogeneous Systems. Retrieved from https://www.tensorflow.org/. TensorFlow. [n.d.]. Large-Scale Machine Learning on Heterogeneous Systems. Retrieved from https://www.tensorflow.org/.
3. IntelŴVirtualization Technology for Directed I/O
4. FlatFlash
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Neos: A NVMe-GPUs Direct Vector Service Buffer in User Space;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13
2. Data Motion Acceleration: Chaining Cross-Domain Multi Accelerators;2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2024-03-02
3. Design of QoS-Aware Network Functions;Springer Theses;2024
4. DxPU: Large-scale Disaggregated GPU Pools in the Datacenter;ACM Transactions on Architecture and Code Optimization;2023-12-14
5. GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System Architecture;Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2;2023-01-27