Affiliation:
1. Princeton University, Princeton, NJ
2. Intel Corporation, Hudson, MA
Abstract
Traditional fault-tolerance techniques typically utilize resources ineffectively because they cannot adapt to the changing reliability and performance demands of a system. This paper proposes software-controlled fault tolerance, a concept allowing designers and users to tailor their performance and reliability for each situation. Several software-controllable fault-detection techniques are then presented: SWIFT, a software-only technique, and CRAFT, a suite of hybrid hardware/software techniques. Finally, the paper introduces PROFiT, a technique which adjusts the level of protection and performance at fine granularities through software control. When coupled with software-controllable techniques like SWIFT and CRAFT, PROFiT offers attractive and novel reliability options.
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Information Systems,Software
Reference43 articles.
1. Soft errors in advanced semiconductor devices-part I: the three radiation sources
2. Bossen D. C. 2002. CMOS soft errors and server design. In IEEE 2002 Reliability Physics Tutorial Notes Reliability Fundamentals. 121_07.1--121_07.6. Bossen D. C. 2002. CMOS soft errors and server design. In IEEE 2002 Reliability Physics Tutorial Notes Reliability Fundamentals. 121_07.1--121_07.6.
Cited by
62 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献