Overcoming Hard-Faults in High-Performance Microprocessors.

详细信息

作者：Ansari ; Amin.
学历：Doctor
年：2011
导师：Mahlke, Scott,eadvisor
毕业院校：University of Michigan
ISBN：9781124896892
CBH：3476321
Country：USA
语种：English
FileSize：7225200
Pages：168

文摘

As device density grows, each transistor gets smaller and more fragile leading to an overall higher susceptibility to hard-faults. These hard-faults result in permanent silicon defects and impact manufacturing yield, performance, and lifetime of semiconductor devices. In this thesis, we propose comprehensive, low-cost solutions to tackle reliability problems in high-performance microprocessors. These microprocessors mainly consist of on-chip caches and core pipeline. We first present two flexible cache architectures, ZerehCache and Archipelago, to protect regular SRAM structures against high failure rates. ZerehCache virtually reorganizes the cache data array using a permutation network to provide higher degrees of freedom for spare allocation. In order to study the impact of fault patterns on the redundancy requirements in a cache, we propose a methodology to model the collision patterns in caches as a graph problem. Given this model, a graph coloring scheme is employed to minimize the amount of additional redundancy required for protecting the cache. Archipelago targets failures in near-threshold region. It resizes the cache to provide redundancy for repairing faulty cells. Furthermore, a near optimal minimum clique covering configuration algorithm is introduced to minimizes the cache capacity loss. With proper solutions in place for caches, a robust and heterogeneous core coupling execution scheme, Necromancer, is presented to protect the general core area against hard-faults. Although a faulty core cannot be trusted, we observe that for most defects, execution traces on a defective core coarsely resemble those of fault-free executions. Necromancer exploits a functionally dead core to improve system throughput by supplying hints regarding high-level program behavior. We partition the cores into multiple groups. Each group shares a lightweight core that can be substantially accelerated. However, due to the presence of defects, a perfect data or instruction stream cannot be provided by the dead core. This necessitates employing low-cost recovery mechanism and generic hints that are more resilient to local abnormalities.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700