Enabling scalability and performance in a large scale CMP environment

Author:

Saha Bratin1,Adl-Tabatabai Ali-Reza1,Ghuloum Anwar1,Rajagopalan Mohan1,Hudson Richard L.1,Petersen Leaf1,Menon Vijay1,Murphy Brian1,Shpeisman Tatiana1,Sprangle Eric1,Rohillah Anwar1,Carmean Doug1,Fang Jesse1

Affiliation:

1. Intel Corporation

Abstract

Hardware trends suggest that large-scale CMP architectures, with tens to hundreds of processing cores on a single piece of silicon, are iminent within the next decade. While existing CMP machines have traditionally been handled in the same way as SMPs, this magnitude of parallelism introduces several fundamental challenges at the architectural level and this, in turn, translates to novel challenges in the design of the software stack for these platforms. This paper presents the "Many Core Run Time" (McRT), a software prototype of an integrated language runtime that was designed to explore configurations of the software stack for enabling performance and scalability on large scale CMP platforms. This paper presents the architecture of McRT and discusses our experiences with the system, including experimental evaluation that lead to several interesting, non-intuitive findings, providing key insights about the structure of the system stack at this scale. A key contribution of this paper is to demonstrate how McRT enables near linear improvements in performance and scalability for desktop workloads such as the popular XviD encoder and a set of RMS (recognition, mining, and synthesis) applications. Another key contribution of this work is its use of McRT to explore non-traditional system configurations such as a light-weight executive in which McRT runs on "bare metal" and replaces the traditional OS. Such configurations are becoming an increasingly attractive alternative to leverage heterogeneous computing uints as seen in today's CPU-GPU configurations.

Publisher

Association for Computing Machinery (ACM)

Reference52 articles.

1. First-class user-level threads

2. B. Lewis and D. J. Berg "Multithreaded Programming with Pthreads " Prentice Hall 1998. B. Lewis and D. J. Berg "Multithreaded Programming with Pthreads " Prentice Hall 1998.

3. Next Generation POSIX Threading. http://www-124.ibm.com/pthreads/ Next Generation POSIX Threading. http://www-124.ibm.com/pthreads/

4. U. Drepper and I. Molnar. The native POSIX thread library for Linux Jan 2003. http://people.redhat.com/drepper/nptl-design.pdf. U. Drepper and I. Molnar. The native POSIX thread library for Linux Jan 2003. http://people.redhat.com/drepper/nptl-design.pdf.

5. D. Vianney Hyper-Threading speeds Linux Jan 2003. http://www-128.ibm.com/developerworks/linux/library/l-htl/ D. Vianney Hyper-Threading speeds Linux Jan 2003. http://www-128.ibm.com/developerworks/linux/library/l-htl/

Cited by 14 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Role of Big Data in Internet of Things Networks;Research Anthology on Big Data Analytics, Architectures, and Applications;2022

2. Role of Big Data in Internet of Things Networks;Advances in Data Mining and Database Management;2019

3. Nosv: A lightweight nested-virtualization VMM for hosting high performance computing on cloud;Journal of Systems and Software;2017-02

4. ElCore: Dynamic elastic resource management and discovery for future large-scale manycore enabled distributed systems;Microprocessors and Microsystems;2016-10

5. Performance implications of dynamic memory allocators on transactional memory systems;Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming;2015-01-24

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3