Affiliation:
1. Simon Fraser University, Burnaby, Canada
2. UJF, Grenoble, France
3. CNRS, Grenoble, France
4. Grenoble INP, Grenoble, France
Abstract
NUMA systems are characterized by Non-Uniform Memory Access times, where accessing data in a remote node takes longer than a local access. NUMA hardware has been built since the late 80's, and the operating systems designed for it were optimized for access locality. They co-located memory pages with the threads that accessed them, so as to avoid the cost of remote accesses. Contrary to older systems, modern NUMA hardware has much smaller remote wire delays, and so remote access costs per se are not the main concern for performance, as we discovered in this work. Instead,
congestion on memory controllers and interconnects
, caused by memory traffic from data-intensive applications, hurts performance a lot more. Because of that, memory placement algorithms must be redesigned to target traffic congestion. This requires an arsenal of techniques that go beyond optimizing locality. In this paper we describe
Carrefour
, an algorithm that addresses this goal. We implemented
Carrefour
in Linux and obtained performance improvements of up to 3.6 relative to the default kernel, as well as significant improvements compared to NUMA-aware patchsets available for Linux.
Carrefour
never hurts performance by more than 4% when memory placement cannot be improved. We present the design of
Carrefour
, the challenges of implementing it on modern hardware, and draw insights about hardware support that would help optimize system software on future NUMA systems.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Software
Reference32 articles.
1. AMD64 Technology Lightweight Profiling Specification Aug. 2010. http://support.amd.com/us/Processor_TechDocs/43724.pdf. AMD64 Technology Lightweight Profiling Specification Aug. 2010. http://support.amd.com/us/Processor_TechDocs/43724.pdf.
2. AutoNUMA: the other approach to NUMA scheduling. LWN.net Mar. 2012. http://lwn.net/Articles/488709/. AutoNUMA: the other approach to NUMA scheduling. LWN.net Mar. 2012. http://lwn.net/Articles/488709/.
3. Handling the problems and opportunities posed by multiple on-chip memory controllers
4. The multikernel
Cited by
156 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献