高可靠8051设计与实现及可靠性评估
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
近20年来,随着计算机技术的广泛应用,许多应用场合都要求计算机必须长期稳定、可靠地运行,作为计算机系统核心的微处理器的可靠性因此受到广泛的关注。辐射和电磁干扰是目前造成的微处理器失效的主要原因,其造成的单粒子效应对于微处理器可靠性的影响是当前高可靠微处理器设计技术研究领域关注的焦点。
     单粒子效应中的单粒子翻转(SEU)现象不会损坏逻辑电路,但可改变逻辑电路中信号的状态,从而造成电路工作紊乱,引发故障。SEU具有偶然性、突发性和随机性,因而成为目前高可靠微处理器抗单粒子效应设计中主要防护的对象。
     单粒子翻转(SEU)会引起微处理器功能单元不同故障,会导致处理器的不同失效情况。微处理器的不同功能单元其工作机理也不一样,因此有不同的可靠性增强技术对它们进行可靠性增强。
     首先,本文分析了单粒子效应产生的环境、产生机理。然后论述了单粒子效应对于微处理器的影响,特别是对时序电路和组合电路的影响。
     微处理器中的寄存器在受到单粒子翻转(SEU)事件时容易发生故障,三模冗余技术可对其进行加固。传统的三模冗余三路寄存器会在同一时刻采样故障值从而导致寄存器出现故障。本文将增强型时空三模冗余技术用于对寄存器进行可靠性增强,从而在提高时序电路可靠性的同时增强了组合电路的容错性能。增强型时空三模冗余技术结合了时间冗余和空间冗余,是在用于对非反馈型电路可靠性增强普通时空三模冗余技术的基础上结合加固反馈型电路的带双沿触发寄存器进行改进的。
     针对微处理器的ALU运算单元,在HR8051中增加了Berger码检测器对其运算过程进行监控。Berger码检测器利用算术运算各种函数映射关系来检测运算过程是否有误。针对存储器和寄存器文件增加了EDAC检错纠错器对其读写过程进行检错和纠错。控制流检测与现场保存和恢复用于对控制单元进行可靠性增强实现。安全状态机用于对MDU运算控制的状态机进行可靠性增强。
     在实现的可靠性增强微处理器HR8051基础上进行故障注入,分析其在故障存在条件下的行为和各可靠性增强技术的效果。故障注入的结果显示,时空三模冗余技术在故障持续时间不大于三路时钟的相位差的情况下,可以很好的屏蔽组合逻辑和时钟线的单粒子翻转(SEU)事件。同时结果表明,适当的增加三路时钟相位差可以提高时空三模冗余技术的效果,但有个最佳值。随后可靠性增强效果呈下降趋势;当故障持续时间大于三路时钟相位差时使两路时钟同时采样到故障值,在反馈型电路会导致长时间的故障状态。
     最后介绍将SystemVerilog断言机制应用于故障检验,结合故障注入从系统级检验可靠性增强技术对电路可靠性的影响。Markov分析方法结合故障注入的结果和HR8051的具体实现,对HR8051在单粒子翻(SEU)事件的攻击下的行为进行部分假设和简化来分析了热备份系统的可靠性行为。
In recent years, there has been a rapid increase in the use of computer systems. Most applications require computer systems to work steadily and reliably. This trend has led to critical concerns with the validation of the reliability of the microprocessor, which is the heart of the computer system. Radiation and electromagnetic interference are two typical causes of microprocessor faults. The interference of Single Event Element (SEE) caused by radiation and electromagnetic interference is the focus of current high reliable microprocessor design techniques.
     Single Event Upset (SEU) phenomenon of SEE will not damage the circuit of the processor, but it can change the logic state of the circuit. As a result, the circuit will work incorrectly and failures will be brought in. SEU is a transient effect and occurs randomly, so it has become the main concern in SEE mitigation techniques of the high reliable microprocessor design. SEU can result in different kinds of fault and lead to microprocessor malfunction. Different function unit in microprocessor has different operational principle, and there are different kinds of Reliability-improving technologies to improve their reliability.
     First, this paper analyzes how SEU event arises and its environment, and then analyzes how it can affect microprocessor, especially temporal logic circuit and combinational logic circuit.
     Registers in microprocessor are easily to malfunction under SEU attack. Triple Modular Redundancy (TMR) technique can improve their reliability. But traditional Triple Modular Redundancy (TMR) technique will sample the same fault value at the same time and make registers fault. Enhanced ST-TMR (EST-TMR) is use to improve fault tolerance of both the combinational logic circuit and sequential logic circuit, which enhance the Space-Time TMR (ST-TMR) technique with double edge triggered registers.
     For ALU, a Berger code detector is added in microprocessor to monitor its operation. Berger code detector use the internal function mapping relationship in different arithmetic and logic operation to detect err during operation. EDAC error detector and corrector have been implemented to improve the reliability of register file and memory during read and write process. Control flow check and context saving restoring is use to harden the control unit. Also safe state machine has been added to harden state register of Multiply and Division Unit (MDU) for operation control.
     Fault interjection has been applied to check how much the effect of reliability-improving technologies can achieve and the way that fault affects microprocessor operation. Results indicate that when fault duration is shorter than phase difference of three clocks, enhanced ST-TMR can almost mask the SEU in combinational logic circuit and clock line. Slightly enlarge the phase difference can improve effect, but has optimum value. If phase difference is larger optimum value, it will degrade the effect. When fault duration is longer than phase difference will result in fault sampling by two registers and make feedback circuit in fault state for a long time.
     At last, this paper applies SystemVerilog assertions to fault checking, working with fault interjection to check the effects of reliability-improving technologies applied to circuits. Markov analysis method working with the fault interjection results and implementation of HR8051, analyzes the hot backup system behavior at some assumption which is made to simplify the fault events analysis.
引文
[1]B.W.Jonson.Design and Analysis of Fault Tolerant Digital Systems.Addison Wesley Publishing Company,1989
    [2]贺朝会.单粒子效应研究的现状和动态.中国科技论文在线,2005
    [3]Tim Williams.EMC for Product Designers(Third Edition).Pubishing House of Electronics industry,2004
    [4]史保华,贾新章,张德胜.微电子器件可靠性.电子科技大学出版社,2001
    [5]N.C.C.Lu.Advanced Cell Structures for Dynamic RAMs.IEEE Circuits and Devices Magazine,1989:370-378
    [6]ATMEL.Rad-Hard Embedded Processor 32-bit SPARC.TSC695F.Data Sheet,www.atmet.com,March 2001
    [7]http://www.spaceelectronics.com/documentation/datasheets.html
    [8]COTA.E.,LIMA.F.,REZGUI.S.,CARRO.L.,VELAZCO.R.,LUBASZEWSKI.M.,REIS.R.Synthesis of an 8051-like Micro- Controller Tolerant to Transient Faults.JETTA,2001
    [9]http://www.estec.esa.int/microelectronics/leon/leonumc-datasheet.zip.
    [10]Ohlsson.J.,Rim'en.M.,Gunneflo.U.A Study of the Effects of Transient FaultInjection into a 32-bit RISC with Built-in Watchdog.Proc.of the 22th Fault TolerantComputing Symposium(FTCS-22),316-325,Boston,Massachusetts,1992
    [11]Francisco Rodriguez,Jos'e Carlos Campelo,Juan Jos'e Serrano.A Watchdog Processor Architecture with Minimal Performance Overhead.Springer-Verlag Berlin Heidelberg,2002
    [12]CARMICHAEL C.Triple Module Redundancy Design Techniques for the Virtex TM Series.Xilinx Application Notexapp 197,2001
    [13]http://www.atmel.com
    [14]MA.T.,DRESSENDORFER,P.Ionizing Radiation Effects in MOS Devices and Circuits.Wiley Eds,New York,1989
    [15]庄奕琪.微电子器件应用可靠性设计技术.电子工业出版社,1996
    [16]J.T.Wallmark and S.M.Marcus,Minimum Size and Maximum Packing Density of Nonredundant Semiconductor Devices.Proceedings of RIE,1962
    [17]D.Binder et al,Satellite Anomelies From Galactic Cosmic.IEEE Trans.Nuclear.Science.(Vol.22),No.6,1975
    [18]张新.航天微电子器件单粒子翻转研究.硕士学位论文.中国原子能科学研究院,2002
    [19]F.W Sexton,Micro beam studies of single event effects,IEEE Trans.Nuclear.Science,1996(Vol.43) No.6:687
    [20]J.R.Hauser et al.Ion tracks shunt effects in the mufti junction structures.IEEE Traps.Nuclear.Science,1985(Vol.32),No.6:411
    [21]Normand E.Single Event Effects in Avionics.IEEE Trans.Nuclear Science,1996,43(2):461-474
    [22]Musseau O.Single Event Effects in SOI Technologies and Devices.IEEE Trans.Nuclear Science,43(2),1996:603-613
    [23]Robert E.Glaser,Gerald M.Masson.Tansient Upsets in Microprocessor Controllers.FTCS-11.1981:165-167
    [24]Neil Cohen,T.S.Sriram,Norm Leland,David Moyer,Steve Butler,Robert Flatley.Soft Error Considerations for Deep-Submicron CMOS Circuit Applications,1999
    [25]Stephen A.,Campbell.The Science and Engineering of Microelectronic Fabrication.Pubishing House of Electronics industry,2004
    [26]Siewiorek,D.P.and R.S.Swarz.Reliable Computer Systems:Design and Evaluation,Digital Press,1992
    [27]P.Liden,P.Dahlgren,R.Johansson,and J.Karlsson.On Latching Probability of Particle Induced Transients in Combinational Networks.In Proceedings of the 24th Symposium on Fault-Tolerant Computing(FTCS-24),1994:340-349
    [28]P.Shivakumar,M.Kistler,S.W.Keckler,D.Burger,L.Alvisi.Modeling the effect of technology trends on the soft error rate of combinational logic.Proceedings International Conference on Dependable Systems and Networks,23-26 June 2002:389-98
    [29]Michael D.Ciletti.Advanced Digital Design with the Verilog HDL.Pubishing House of Electronics industry,2004
    [30]W.Peterson.Error-correcting codes,2nd ed.Cambridge:The MIT Press,1980
    [31]B.W.Jonson.Design and Analysis of Fault Tolerant Digital Systems.Addison Wesley Publishing Company,1989
    [32]Nahmsuk Oh,Philip P.Shirvani,Edward J.McCluskey.Control Flow Checking by Software Signatures.IEEE Transactions on Reliability Special Section on:Fault Tolerant VLSI Systems,2003
    [33]张民选,王永文.高性能微处理器.国防科技大学出版社,2003
    [34]A.Mahmood,E.J.McCluskey.Concurrent error detection using watchdog processors-a survey.IEEE Transaction on Computers,Vol.37(No.2),February 1988:160-174
    [35]Galla,T.M.,Sprachmann,M.,Steininger,A.,Temple.Control Flow Monitoring for a Time-Triggered Communication Controller.Proceedings of the 10th European Workshop on Dependable Computing(EWDC-10),43-48,Vienna,Austria,1999:263
    [36]Srikanth Vijayaraghavan,Meyyappan Ramanatha,陈俊杰译.SystemVerilog Assertions 应用指南.清华大学出版社,2006
    [37]陈微.高可靠微处理器设计关键技术研究.硕士学位论文.国防科技大学,2006

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700