Improving Storage Systems Using Machine Learning-Reference-Cited by-同舟云学术

Improving Storage Systems Using Machine Learning

Published:2023-01-19 Issue:1 Volume:19 Page:1-30
ISSN:1553-3077
Container-title:ACM Transactions on Storage
language:en
Short-container-title:ACM Trans. Storage

Author:

Akgun Ibrahim Umit¹^ORCID,Aydin Ali Selman¹^ORCID,Burford Andrew¹^ORCID,McNeill Michael¹^ORCID,Arkhangelskiy Michael¹^ORCID,Zadok Erez¹^ORCID

Affiliation:

1. Stony Brook University, Stony Brook, NY

Abstract

Operating systems include many heuristic algorithms designed to improve overall storage performance and throughput. Because such heuristics cannot work well for all conditions and workloads, system designers resorted to exposing numerous tunable parameters to users—thus burdening users with continually optimizing their own storage systems and applications. Storage systems are usually responsible for most latency in I/O-heavy applications, so even a small latency improvement can be significant. Machine learning (ML) techniques promise to learn patterns, generalize from them, and enable optimal solutions that adapt to changing workloads. We propose that ML solutions become a first-class component in OSs and replace manual heuristics to optimize storage systems dynamically. In this article, we describe our proposed ML architecture, called KML. We developed a prototype KML architecture and applied it to two case studies: optimizing readahead and NFS read-size values. Our experiments show that KML consumes less than 4 KB of dynamic kernel memory, has a CPU overhead smaller than 0.2%, and yet can learn patterns and improve I/O throughput by as much as 2.3× and 15× for two case studies—even for complex, never-seen-before, concurrently running mixed workloads on different storage devices.

Funder

NSF

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture

Link

https://dl.acm.org/doi/pdf/10.1145/3568429

Reference106 articles.

1. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2016). 265–283.

2. Michael Abd-El-Malek, William V. Courtright II, Chuck Cranor, Gregory R. Ganger, James Hendricks, Andrew J. Klosterman, Michael P. Mesnier, Manish Prasad, Brandon Salmon, Raja R. Sambasivan, Shafeeq Sinnamohideen, John D. Strunk, Eno Thereska, Matthew Wachs, and Jay J. Wylie. 2005. Ursa minor: Versatile cluster-based storage. In Proceedings of the FAST ’05 Conference on File and Storage Technologies, 2005. USENIX.

3. Neural additive models: Interpretable machine learning with neural nets;Agarwal Rishabh;arXiv:2004.13912,2020

4. A Machine Learning Framework to Improve Storage System Performance

5. Ibrahim Umit Akgun, Geoff Kuenning, and Erez Zadok. 2020. Re-animator: Versatile high-fidelity storage-system tracing and replaying. In Proceedings of the 13th ACM International Systems and Storage Conference (SYSTOR’20). ACM .

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Streaming Machine Learning for Supporting Data Prefetching in Modern Data Storage Systems;Proceedings of the First Workshop on AI for Systems;2023-08-10