Affiliation:
1. Duke University and IBM Systems and Technology Group, Durham, NC
2. Duke University, Durham, NC
Abstract
We develop a microprocessor design that tolerates hard faults, including fabrication defects and in-field faults, by leveraging existing microprocessor redundancy. To do this, we must: detect and correct errors, diagnose hard faults at the field deconfigurable unit (FDU) granularity, and deconfigure FDUs with hard faults. In our reliable microprocessor design, we use DIVA dynamic verification to detect and correct errors. Our new scheme for diagnosing hard faults tracks instructions' core structure occupancy from decode until commit. If a DIVA checker detects an error in an instruction, it increments a small saturating error counter for every FDU used by that instruction, including that DIVA checker. A hard fault in an FDU quickly leads to an above-threshold error counter for that FDU and thus diagnoses the fault. For deconfiguration, we use previously developed schemes for functional units and buffers and present a scheme for deconfiguring DIVA checkers. Experimental results show that our reliable microprocessor quickly and accurately diagnoses each hard fault that is injected and continues to function, albeit with somewhat degraded performance.
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Information Systems,Software
Reference38 articles.
1. Software Optimization Guide for AMD64 Processors;AMD.;Publication 25112, Rev.,2005
2. SimpleScalar: an infrastructure for computer system modeling
3. Static electromigration analysis for on-chip signal interconnects
4. The microarchitecture of the Intel Pentium 4 processor on 90nm technology;Boggs D.;Intel Technology Journal,2004
Cited by
14 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献