Hybrid Modular Redundancy: Exploring Modular Redundancy Approaches in RISC-V Multi-Core Computing Clusters for Reliable Processing in Space

Author:

Rogenmoser Michael1,Tortorella Yvan2,Rossi Davide2,Conti Francesco2,Benini Luca3

Affiliation:

1. ETH Zürich, Switzerland

2. Università di Bologna, Italy

3. ETH Zürich, Switzerland and Università di Bologna, Italy

Abstract

Space Cyber-Physical Systems (S-CPS) such as spacecraft and satellites strongly rely on the reliability of onboard computers to guarantee the success of their missions. Relying solely on radiation-hardened technologies is extremely expensive, and developing inflexible architectural and microarchitectural modifications to introduce modular redundancy within a system leads to significant area increase and performance degradation. To mitigate the overheads of traditional radiation hardening and modular redundancy approaches, we present a novel Hybrid Modular Redundancy (HMR) approach, a redundancy scheme that features a cluster of RISC-V processors with a flexible on-demand dual-core and triple-core lockstep grouping of computing cores with runtime split-lock capabilities. Further, we propose two recovery approaches, software-based and hardware-based, trading off performance and area overhead. Running at 430MHz, our fault-tolerant cluster achieves up to 1160MOPS on a matrix multiplication benchmark when configured in non-redundant mode and 617 and 414 MOPS in dual and triple mode, respectively. A software-based recovery in triple mode requires 363 clock cycles and occupies 0.612 mm 2 , representing a 1.3% area overhead over a non-redundant 12-core RISC-V cluster. As a high-performance alternative, a new hardware-based method provides rapid fault recovery in just 24 clock cycles and occupies 0.660 mm 2 , namely ∼ 9.4% area overhead over the baseline non-redundant RISC-V cluster. The cluster is also enhanced with split-lock capabilities to enter one of the available redundant modes with minimum performance loss, allowing execution of a mission-critical portion of code when in independent mode, or a performance section when in a reliability mode, with <400 clock cycles overhead for entry and exit. The proposed system is the first to integrate these functionalities on an open-source RISC-V-based compute device, enabling finely tunable reliability vs. performance trade-offs.

Publisher

Association for Computing Machinery (ACM)

Subject

Artificial Intelligence,Control and Optimization,Computer Networks and Communications,Hardware and Architecture,Human-Computer Interaction

Reference59 articles.

1. Software-only based Diverse Redundancy for ASIL-D Automotive Applications on Embedded HPC Platforms

2. Radiation hardness of FDSOI and FinFET technologies

3. Development of a NOEL-V RISC-V SoC Targeting Space Applications

4. LEON Processor Devices for Space Missions: First 20 Years of LEON in Space

5. ARM. 2011. Cortex-R5 Technical Reference Manual. https://developer.arm.com/documentation/ddi0460/d Revision: r1p2. ARM. 2011. Cortex-R5 Technical Reference Manual. https://developer.arm.com/documentation/ddi0460/d Revision: r1p2.

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3