Online low-cost defect tolerance solutions for microprocessor designs
One of the major driving forces of the semiconductor industry is the continuous scaling of the silicon process technology. Over the last four decades, the scaling into a new silicon technology every few years offered smaller and faster transistors that made possible the development of high-performance microprocessors. This technological achievement fueled the widespread adoption of microprocessor-based products in applications that touch every aspect of our life. However, many device experts warn that the continued transistor size scaling into smaller dimensions will inevitably result in silicon technologies that are much less reliable than the current ones. Microprocessors manufactured in future silicon technologies will likely experience failures due to silicon defects. In the absence of any viable alternative technology, the success of the semiconductor industry in the future will depend on the creation of cost-effective mechanisms to tolerate silicon defects while the microprocessor is in operation.
This thesis is focused on the development of defect-tolerance techniques that will provide low-cost mechanisms to protect a microprocessor from silicon defects. The approach of these novel defect-tolerance solutions represents a new thinking in the field of defect-tolerant design. In particular, traditional approaches to defect-tolerant design saddle a system with redundant components that continuously verify computation. In contrast, the proposed BulletProof approach provides low cost periodic hardware checking. Furthermore, to lower the cost of hardware checking, the silicon defect detection process is shifted from hardware to software using a software-based approach, the ACE Framework. This thesis also makes the case that the hardware resources of the ACE framework can also be used for other applications to add value and ease its adoption in future generation microprocessors. Finally, this thesis presents CrashTest, a novel FPGA-based framework used to assess the threats and the reliability requirements of a microprocessor.
Altogether, the defect-tolerance solutions presented in this thesis provide a cost-effective defect-tolerance framework that makes possible the development of reliable microprocessors using unreliable silicon technologies. This enables the continuation of silicon scaling into smaller but possibly less reliable transistors, a key requirement for the development of the next generation microprocessors and the extension of microprocessor-based products into new applications.