Abstract
With the advancement of technology scaling, multi/many-core platforms are getting more attention in embedded systems due to the ever-increasing performance requirements and power efficiency. This feature size scaling, along with architectural innovations, has dramatically exacerbated the rate of manufacturing defects and physical fault-rates. As a result, in addition to providing high parallelism, such hardware platforms have introduced increasing unreliability into the system. Such systems need to be well designed to ensure long-term and application-specific reliability, especially in mixed-criticality systems, where incorrect execution of applications may cause catastrophic consequences. However, the optimal allocation of applications/tasks on multi/many-core platforms is an increasingly complex problem. Therefore, reliability-aware resource management is crucial while ensuring the application-specific Quality-of-Service (QoS) requirements and optimizing other system-level performance goals. This article presents a survey of recent works that focus on reliability-aware resource management in multi-/many-core systems. We first present an overview of reliability in electronic systems, associated fault models and the various system models used in related research. Then, we present recent published articles primarily focusing on aspects such as application-specific reliability optimization, mixed-criticality awareness, and hardware resource heterogeneity. To underscore the techniques’ differences, we classify them based on the design space exploration. In the end, we briefly discuss the upcoming trends and open challenges within the domain of reliability-aware resource management for future research.
Subject
Electrical and Electronic Engineering
Reference127 articles.
1. The Making of Colossus
2. Design of ion-implanted MOSFET's with very small physical dimensions
3. Computer Organization and Design ARM Edition: The Hardware Software Interface;Patterson,2016
4. 42 Years of Microprocessor Trend Datahttps://www.karlrupp.net/2018/02/42-years-of-microprocessor-trend-data/
5. big. LITTLE Technology: The Future of Mobilehttps://img.hexus.net/v2/press_releases/arm/big.LITTLE.Whitepaper.pdf
Cited by
18 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. A Survey of MPSoC Management toward Self-Awareness;Micromachines;2024-04-26
2. Towards Fault Tolerance and Resilience in the Sequential Codelet Model;Communications in Computer and Information Science;2024
3. Preliminaries and Related Work;Quality-of-Service Aware Design and Management of Embedded Mixed-Criticality Systems;2023-07-24
4. Introduction;Quality-of-Service Aware Design and Management of Embedded Mixed-Criticality Systems;2023-07-24
5. Leveraging Adaptive Redundancy in Multi-Core Processors for Realizing Adaptive Fault Tolerance in Mixed-Criticality Systems;2023 12th Mediterranean Conference on Embedded Computing (MECO);2023-06-06