Software-based permanent fault recovery techniques using inherent hardware redundancy

2007 2007

Other formats: Order a copy

Abstract (summary)

Recent advances in deep submicron (DSM) technology have imposed an adverse impact on the long-term lifetime reliability of semiconductor devices. According to the reliability report from International Technology Roadmap for Semiconductors (ITRS), smaller feature sizes and higher power densities make DSM devices more susceptible to wear-out failures. As a consequence, permanent faults are more likely to occur in DSM devices at runtime. To ensure system reliability and availability, fault tolerant techniques must be applied to overcome these runtime permanent faults. For systems requiring non-stop computation, a full duplication of system hardware components is usually required, which incurs a high overhead in hardware cost. For systems that allow a short period of downtime, however, low cost software techniques that take advantage of the inherent hardware redundancy of computing devices, such as Field Programmable Gate Arrays (FPGAs) and Very Long Instruction Word (VLIW) processors, can potentially be applied as an intermediate fault recovery step. These techniques can reconfigure the computation of a faulty device to maintain the system operation until the faulty device can be replaced.

To maintain correct computation on a faulty device, operations originally assigned to faulty resources must be moved to fault-free device resources. This process requires two phases: a testing phase to locate faults and a recovery phase to eliminate the usage of faulty resources in the computation. In this dissertation, we present software techniques that address specific testing and recovery challenges for FPGAs and VLIW processors.

For FPGAs, we focus on testing and recovering path delay faults. Path delay faults occur when the maximum delay of at least one critical path exceeds the maximum allowable system delay due to a permanent fault. To locate paths with delay faults, a built-in self-test (BIST) approach is presented to evaluate all combinations of signal transitions along critical paths. To recover from path delay faults, a timing-driven incremental router is used to reroute paths affected by the faults. To facilitate fast fault recovery, information from the initial design route is used to guide the reroute process. Since many embedded systems have a limited amount of local computational resources, a network-based recovery system has been developed. A computationally superior server performs the FPGA fault recovery and sends the results back to the affected client, completing the recovery process. Experiments on the recovery system have shown that the incremental router provides a speedup of up to 12x compared with a commercial incremental flow.

For VLIW processors, we focus on recovering from permanent faults in registers. To maintain VLIW functionality after detecting faulty registers, programs must be recompiled to assign variables to fault-free registers. One issue with recompilation is possible performance loss due to increased register requirements. To address this problem, a register pressure control technique is presented to reduce register requirements. To demonstrate its advantages, the technique has been integrated into an academic VLIW compiler. Experimental results have shown that the technique improves performance by 14% compared with an academic VLIW flow.

Indexing (details)

Electrical engineering;
Computer science
0544: Electrical engineering
0984: Computer science
Identifier / keyword
Applied sciences; Fault recovery; Hardware redundancy; Software
Software-based permanent fault recovery techniques using inherent hardware redundancy
Xu, Weifeng
Number of pages
Publication year
Degree date
School code
DAI-B 68/11, Dissertation Abstracts International
Place of publication
Ann Arbor
Country of publication
United States
Tessier, Russell
Committee member
Fu, Kevin; Menon, Premachandran; Wolf, Tilman
University of Massachusetts Amherst
Electrical & Computer Engineering
University location
United States -- Massachusetts
Source type
Dissertations & Theses
Document type
Dissertation/thesis number
ProQuest document ID
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
Access the complete full text

You can get the full text of this document if it is part of your institution's ProQuest subscription.

Try one of the following:

  • Connect to ProQuest through your library network and search for the document from there.
  • Request the document from your library.
  • Go to the ProQuest login page and enter a ProQuest or My Research username / password.