Traffic management-Reference-Cited by-同舟云学术

Traffic management

Published:2013-04-23 Issue:4 Volume:48 Page:381-394
ISSN:0362-1340
Container-title:ACM SIGPLAN Notices
language:en
Short-container-title:SIGPLAN Not.

Author:

Dashti Mohammad¹,Fedorova Alexandra¹,Funston Justin¹,Gaud Fabien¹,Lachaize Renaud²,Lepers Baptiste³,Quema Vivien⁴,Roth Mark¹

Affiliation:

1. Simon Fraser University, Burnaby, Canada

2. UJF, Grenoble, France

3. CNRS, Grenoble, France

4. Grenoble INP, Grenoble, France

Abstract

NUMA systems are characterized by Non-Uniform Memory Access times, where accessing data in a remote node takes longer than a local access. NUMA hardware has been built since the late 80's, and the operating systems designed for it were optimized for access locality. They co-located memory pages with the threads that accessed them, so as to avoid the cost of remote accesses. Contrary to older systems, modern NUMA hardware has much smaller remote wire delays, and so remote access costs per se are not the main concern for performance, as we discovered in this work. Instead, congestion on memory controllers and interconnects , caused by memory traffic from data-intensive applications, hurts performance a lot more. Because of that, memory placement algorithms must be redesigned to target traffic congestion. This requires an arsenal of techniques that go beyond optimizing locality. In this paper we describe Carrefour , an algorithm that addresses this goal. We implemented Carrefour in Linux and obtained performance improvements of up to 3.6 relative to the default kernel, as well as significant improvements compared to NUMA-aware patchsets available for Linux. Carrefour never hurts performance by more than 4% when memory placement cannot be improved. We present the design of Carrefour , the challenges of implementing it on modern hardware, and draw insights about hardware support that would help optimize system software on future NUMA systems.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design,Software

Link

https://dl.acm.org/doi/pdf/10.1145/2499368.2451157

Reference32 articles.

1. AMD64 Technology Lightweight Profiling Specification Aug. 2010. http://support.amd.com/us/Processor_TechDocs/43724.pdf. AMD64 Technology Lightweight Profiling Specification Aug. 2010. http://support.amd.com/us/Processor_TechDocs/43724.pdf.

2. AutoNUMA: the other approach to NUMA scheduling. LWN.net Mar. 2012. http://lwn.net/Articles/488709/. AutoNUMA: the other approach to NUMA scheduling. LWN.net Mar. 2012. http://lwn.net/Articles/488709/.

3. Handling the problems and opportunities posed by multiple on-chip memory controllers

4. The multikernel

Cited by 156 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Global-State Aware Automatic NUMA Balancing;Proceedings of the 15th Asia-Pacific Symposium on Internetware;2024-07-24

2. AdaptMD: Balancing Space and Performance in NUMA Architectures With Adaptive Memory Deduplication;IEEE Transactions on Computers;2024-06

3. A unified hybrid memory system for scalable deep learning and big data applications;Journal of Parallel and Distributed Computing;2024-04

4. ABSS: An Adaptive Batch-Stream Scheduling Module for Dynamic Task Parallelism on Chiplet-based Multi-Chip Systems;ACM Transactions on Parallel Computing;2024-03-11

5. GRIT: Enhancing Multi-GPU Performance with Fine-Grained Dynamic Page Placement;2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2024-03-02