Staged memory scheduling-Reference-Cited by-同舟云学术

Staged memory scheduling

Published:2012-09-05 Issue:3 Volume:40 Page:416-427
ISSN:0163-5964
Container-title:ACM SIGARCH Computer Architecture News
language:en
Short-container-title:SIGARCH Comput. Archit. News

Author:

Ausavarungnirun Rachata¹,Chang Kevin Kai-Wei¹,Subramanian Lavanya¹,Loh Gabriel H.²,Mutlu Onur¹

Affiliation:

1. Carnegie Mellon University

2. Advanced Micro Devices, Inc.

Abstract

When multiple processor (CPU) cores and a GPU integrated together on the same chip share the off-chip main memory, requests from the GPU can heavily interfere with requests from the CPU cores, leading to low system performance and starvation of CPU cores. Unfortunately, state-of-the-art application-aware memory scheduling algorithms are ineffective at solving this problem at low complexity due to the large amount of GPU traffic. A large and costly request buffer is needed to provide these algorithms with enough visibility across the global request stream, requiring relatively complex hardware implementations. This paper proposes a fundamentally new approach that decouples the memory controller's three primary tasks into three significantly simpler structures that together improve system performance and fairness, especially in integrated CPU-GPU systems. Our three-stage memory controller first groups requests based on row-buffer locality. This grouping allows the second stage to focus only on inter-application request scheduling. These two stages enforce high-level policies regarding performance and fairness, and therefore the last stage consists of simple per-bank FIFO queues (no further command reordering within each bank) and straightforward logic that deals only with low-level DRAM commands and timing. We evaluate the design trade-offs involved in our Staged Memory Scheduler (SMS) and compare it against three state-of-the-art memory controller designs. Our evaluations show that SMS improves CPU performance without degrading GPU frame rate beyond a generally acceptable level, while being significantly less complex to implement than previous application-aware schedulers. Furthermore, SMS can be configured by the system software to prioritize the CPU or the GPU at varying levels to address different performance needs.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/2366231.2337207

Reference37 articles.

1. Advanced Micro Devices. AMD Radeon HD 5870 Graphics. http://www.amd.com/us/products/desktop/graphics/ati-radeon-hd-5000/hd-5870. Advanced Micro Devices. AMD Radeon HD 5870 Graphics . http://www.amd.com/us/products/desktop/graphics/ati-radeon-hd-5000/hd-5870.

2. DRAM Scheduling Policy for GPGPU Architectures Based on a Potential Function

3. Bobcat: AMD's Low-Power x86 Processor

4. Application-aware prioritization mechanisms for on-chip networks

Cited by 22 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Read Disturbance in High Bandwidth Memory: A Detailed Experimental Study on HBM2 DRAM Chips;2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN);2024-06-24

2. Hybrid Refresh: Improving DRAM Performance by Handling Weak Rows Smartly;Proceedings of the 2022 International Symposium on Memory Systems;2022-10-03

3. Polynesia: Enabling High-Performance and Energy-Efficient Hybrid Transactional/Analytical Databases with Hardware/Software Co-Design;2022 IEEE 38th International Conference on Data Engineering (ICDE);2022-05

4. DR-STRaNGe: End-to-End System Design for DRAM-based True Random Number Generators;2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2022-04

5. PCCS: Processor-Centric Contention-aware Slowdown Model for Heterogeneous System-on-Chips;MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture;2021-10-17