Modeling the impact of permanent faults in caches

Author:

Sánchez Daniel1,Sazeides Yiannakis2,Cebrián Juan M.3,García José M.3,Aragón Juan L.3

Affiliation:

1. University of Murcia1, Murcia, Spain

2. University of Cyprus, Nicosia, Cyprus

3. University of Murcia, Murcia, Spain

Abstract

The traditional performance cost benefits we have enjoyed for decades from technology scaling are challenged by several critical constraints including reliability. Increases in static and dynamic variations are leading to higher probability of parametric and wear-out failures and are elevating reliability into a prime design constraint. In particular, SRAM cells used to build caches that dominate the processor area are usually minimum sized and more prone to failure. It is therefore of paramount importance to develop effective methodologies that facilitate the exploration of reliability techniques for caches. To this end, we present an analytical model that can determine for a given cache configuration, address trace, and random probability of permanent cell failure the exact expected miss rate and its standard deviation when blocks with faulty bits are disabled. What distinguishes our model is that it is fully analytical, it avoids the use of fault maps, and yet, it is both exact and simpler than previous approaches. The analytical model is used to produce the miss-rate trends ( expected miss-rate ) for future technology nodes for both uncorrelated and clustered faults. Some of the key findings based on the proposed model are (i) block disabling has a negligible impact on the expected miss-rate unless probability of failure is equal or greater than 2.6e-4, (ii) the fault map methodology can accurately calculate the expected miss-rate as long as 1,000 to 10,000 fault maps are used, and (iii) the expected miss-rate for execution of parallel applications increases with the number of threads and is more pronounced for a given probability of failure as compared to sequential execution.

Funder

Seventh Framework Programme

Spanish MEC and European Commission FEDER funds

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Cited by 6 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Modeling Remapping Based Fault Tolerance Techniques for Chip Multiprocessor Cache with Design Space Exploration;Journal of Electronic Testing;2020-02

2. Latency Aware Fault Tolerant Cache in Multicore Using Dynamic Remapping Clusters;2019 IEEE 28th Asian Test Symposium (ATS);2019-12

3. A fault-tolerant last level cache for CMPs operating at ultra-low voltage;Journal of Parallel and Distributed Computing;2019-03

4. Modeling & Analysis of Redundancy Based Fault Tolerance for Permanent Faults in Chip Multiprocessor Cache;2018 31st International Conference on VLSI Design and 2018 17th International Conference on Embedded Systems (VLSID);2018-01

5. Performance Analysis of Disability Based Fault Tolerance Techniques for Permanent Faults in Chip Multiprocessors;Communications in Computer and Information Science;2017

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3