Affiliation:
1. Computer Science Department, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA
2. IBM T. J. Watson Research Center, Yorktown Heights, NY
Abstract
Current operating systems offer poor performance when a numeric application's working set does not fit in main memory. As a result, programmers who wish to solve “out-of-core” problems efficiently are typically faced with the onerous task of rewriting an application to use explicit I/O operations (e.g., read/write). In this paper, we propose and evaluate a fully automatic technique which liberates the programmer from this task, provides high performance, and requires only minimal changes to current operating systems. In our scheme the compiler provides the crucial information on future access patterns without burdening the programmer; the operating system supports nonbinding
prefetch
and
release
hints for managing I/O; and the operating systems cooperates with a run-time layer to accelerate performance by adapting to dynamic behavior and minimizing prefetch overhead. This approach maintains the abstraction of unlimited virtual memory for the programmer, gives the compiler the flexibility to aggressively insert prefetches ahead of references, and gives the operating system the flexibility to arbitrate between the competing resource demands of multiple applications. We implemented our compiler analysis within the SUIF compiler, and used it to target implementations of our run-time and OS support on both research and commercial systems (Hurricane and IRIX 6.5, respectively). Our experimental results show large performance gains for out-of-core scientific applications on both systems: more than 50% of the I/O stall time has been eliminated in most cases, thus translating into overall speedups of roughly twofold in many cases.
Publisher
Association for Computing Machinery (ACM)
Reference43 articles.
1. A prefetching prototype for the parallel file systems on the Paragon
2. BAILEY D. BARTON J. LASINSKI T. AND SIMON H. 1991. The NAS parallel benchmarks. RNR-91-002. BAILEY D. BARTON J. LASINSKI T. AND SIMON H. 1991. The NAS parallel benchmarks. RNR-91-002.
3. A study of integrated prefetching and caching strategies
Cited by
35 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Spidermine: Low Overhead User-Level Prefetching;Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing;2023-03-27
2. PARC: A novel OS cache manager;Software: Practice and Experience;2018-08-31
3. iFetcher: User-Level Prefetching Framework With File-System Event Monitoring for Linux;IEEE Access;2018
4. An Adaptive IO Prefetching Approach for Virtualized Data Centers;IEEE Transactions on Services Computing;2017-05-01
5. Using Locality-Enhanced Distributed Memory Cache to Accelerate Applications on High Performance Computers;2017 IEEE 3rd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing, (HPSC) and IEEE International Conference on Intelligent Data and Security (IDS);2017-05