Enabling scalability and performance in a large scale CMP environment-Reference-Cited by-同舟云学术

Enabling scalability and performance in a large scale CMP environment

Published:2007-06 Issue:3 Volume:41 Page:73-86
ISSN:0163-5980
Container-title:ACM SIGOPS Operating Systems Review
language:en
Short-container-title:SIGOPS Oper. Syst. Rev.

Author:

Saha Bratin¹,Adl-Tabatabai Ali-Reza¹,Ghuloum Anwar¹,Rajagopalan Mohan¹,Hudson Richard L.¹,Petersen Leaf¹,Menon Vijay¹,Murphy Brian¹,Shpeisman Tatiana¹,Sprangle Eric¹,Rohillah Anwar¹,Carmean Doug¹,Fang Jesse¹

Affiliation:

1. Intel Corporation

Abstract

Hardware trends suggest that large-scale CMP architectures, with tens to hundreds of processing cores on a single piece of silicon, are iminent within the next decade. While existing CMP machines have traditionally been handled in the same way as SMPs, this magnitude of parallelism introduces several fundamental challenges at the architectural level and this, in turn, translates to novel challenges in the design of the software stack for these platforms. This paper presents the "Many Core Run Time" (McRT), a software prototype of an integrated language runtime that was designed to explore configurations of the software stack for enabling performance and scalability on large scale CMP platforms. This paper presents the architecture of McRT and discusses our experiences with the system, including experimental evaluation that lead to several interesting, non-intuitive findings, providing key insights about the structure of the system stack at this scale. A key contribution of this paper is to demonstrate how McRT enables near linear improvements in performance and scalability for desktop workloads such as the popular XviD encoder and a set of RMS (recognition, mining, and synthesis) applications. Another key contribution of this work is its use of McRT to explore non-traditional system configurations such as a light-weight executive in which McRT runs on "bare metal" and replaces the traditional OS. Such configurations are becoming an increasingly attractive alternative to leverage heterogeneous computing uints as seen in today's CPU-GPU configurations.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/1272998.1273006

Reference52 articles.

1. First-class user-level threads

2. B. Lewis and D. J. Berg "Multithreaded Programming with Pthreads " Prentice Hall 1998. B. Lewis and D. J. Berg "Multithreaded Programming with Pthreads " Prentice Hall 1998.

3. Next Generation POSIX Threading. http://www-124.ibm.com/pthreads/ Next Generation POSIX Threading. http://www-124.ibm.com/pthreads/

4. U. Drepper and I. Molnar. The native POSIX thread library for Linux Jan 2003. http://people.redhat.com/drepper/nptl-design.pdf. U. Drepper and I. Molnar. The native POSIX thread library for Linux Jan 2003. http://people.redhat.com/drepper/nptl-design.pdf.

5. D. Vianney Hyper-Threading speeds Linux Jan 2003. http://www-128.ibm.com/developerworks/linux/library/l-htl/ D. Vianney Hyper-Threading speeds Linux Jan 2003. http://www-128.ibm.com/developerworks/linux/library/l-htl/

Cited by 14 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Role of Big Data in Internet of Things Networks;Research Anthology on Big Data Analytics, Architectures, and Applications;2022

2. Role of Big Data in Internet of Things Networks;Advances in Data Mining and Database Management;2019

3. Nosv: A lightweight nested-virtualization VMM for hosting high performance computing on cloud;Journal of Systems and Software;2017-02

4. ElCore: Dynamic elastic resource management and discovery for future large-scale manycore enabled distributed systems;Microprocessors and Microsystems;2016-10

5. Performance implications of dynamic memory allocators on transactional memory systems;Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming;2015-01-24