Computation spreading-Reference-Cited by-同舟云学术

Computation spreading

Published:2006-10-20 Issue:5 Volume:40 Page:283-292
ISSN:0163-5980
Container-title:ACM SIGOPS Operating Systems Review
language:en
Short-container-title:SIGOPS Oper. Syst. Rev.

Author:

Chakraborty Koushik¹,Wells Philip M.¹,Sohi Gurindar S.¹

Affiliation:

1. University of Wisconsin, Madison

Abstract

In canonical parallel processing, the operating system (OS) assigns a processing core to a single thread from a multithreaded server application. Since different threads from the same application often carry out similar computation, albeit at different times, we observe extensive code reuse among different processors, causing redundancy (e.g., in our server workloads, 45-65% of all instruction blocks are accessed by all processors). Moreover, largely independent fragments of computation compete for the same private resources causing destructive interference. Together, this redundancy and interference lead to poor utilization of private microarchitecture resources such as caches and branch predictors.We present Computation Spreading (CSP), which employs hardware migration to distribute a thread's dissimilar fragments of computation across the multiple processing cores of a chip multiprocessor (CMP), while grouping similar computation fragments from different threads together. This paper focuses on a specific example of CSP for OS intensive server applications: separating application level (user) computation from the OS calls it makes.When performing CSP, each core becomes temporally specialized to execute certain computation fragments, and the same core is repeatedly used for such fragments. We examine two specific thread assignment policies for CSP, and show that these policies, across four server workloads, are able to reduce instruction misses in private L2 caches by 27-58%, private L2 load misses by 0-19%, and branch mispredictions by 9-25%.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/1168917.1168893

Reference32 articles.

1. Advanced Micro Devices. AMD64 Architecture Programmer's Manual Volume 2: System Programming Dec 2005. Advanced Micro Devices. AMD64 Architecture Programmer's Manual Volume 2: System Programming Dec 2005.

2. A. Agarwal J. Hennessy and M. Horowitz. Cache performance of operating system and multiprogramming workloads. ACM Trans. Comput. Syst. 6(4):393--431 1988. 10.1145/48012.48037 A. Agarwal J. Hennessy and M. Horowitz. Cache performance of operating system and multiprogramming workloads. ACM Trans. Comput. Syst. 6(4):393--431 1988. 10.1145/48012.48037

3. The interaction of architecture and operating system design

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Asynchronous Abstract Machines;Proceedings of the 9th International Workshop on Runtime and Operating Systems for Supercomputers - ROSS '19;2019

2. Bibliography;Embedded Multi-Core Systems;2013-07-18