Enhancing computation-to-core assignment with physical location information

Author:

Kislal Orhan1,Kotra Jagadish1,Tang Xulong1,Kandemir Mahmut Taylan1,Jung Myoungsoo2

Affiliation:

1. Pennsylvania State University, USA

2. Yonsei University, South Korea

Abstract

Going beyond a certain number of cores in modern architectures requires an on-chip network more scalable than conventional buses. However, employing an on-chip network in a manycore system (to improve scalability) makes the latencies of the data accesses issued by a core non-uniform. This non-uniformity can play a significant role in shaping the overall application performance. This work presents a novel compiler strategy which involves exposing architecture information to the compiler to enable an optimized computation-to-core mapping. Specifically, we propose a compiler-guided scheme that takes into account the relative positions of (and distances between) cores, last-level caches (LLCs) and memory controllers (MCs) in a manycore system, and generates a mapping of computations to cores with the goal of minimizing the on-chip network traffic. The experimental data collected using a set of 21 multi-threaded applications reveal that, on an average, our approach reduces the on-chip network latency in a 6×6 manycore system by 38.4% in the case of private LLCs, and 43.8% in the case of shared LLCs. These improvements translate to the corresponding execution time improvements of 10.9% and 12.7% for the private LLC and shared LLC based systems, respectively.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design,Software

Reference67 articles.

1. 2007. Intel teralops research chip. goo.gl/lewCk7. 2007. Intel teralops research chip. goo.gl/lewCk7.

2. 2009. Intel Single-cloud chip. goo.gl/RSJjfg. 2009. Intel Single-cloud chip. goo.gl/RSJjfg.

3. 2012. minighost. https://mantevo.org/default.php. 2012. minighost. https://mantevo.org/default.php.

4. 2012. The Architecture and Performance of the TILE-Gx Processor Family. http://www.tilera.com/products/processors/TILE-Gx_Family. 2012. The Architecture and Performance of the TILE-Gx Processor Family. http://www.tilera.com/products/processors/TILE-Gx_Family.

5. 2013. CORAL Benchmarks. htps://asc.llnl.gov/CORAL-benchmarks/ 2013. CORAL Benchmarks. htps://asc.llnl.gov/CORAL-benchmarks/

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Distance-in-time versus distance-in-space;Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation;2021-06-18

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3