非相似容错计算机系统设计及其验证技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
容错计算机可分为相似余度和非相似余度两种形式。相似余度计算机是将一个版本的软件拷贝到各个相同的计算机中,由于软件是同一个版本的拷贝,而版本中的软件故障又难以避免,所以软件复制的同时,也复制了软件中的故障。在某种触发条件下,就可能导致所有余度都出现相同的错误结果。
     NVFT是笔者主持研究的我国第一台非相似余度容错计算机系统。本文结合工程背景,在国内首次对采用相异性设计原理实现的非相似余度容错计算机(NVFT)所涉及到的技术及专题进行了系统的研究和探索。所做的研究工作包括:非相似余度计算机系统设计、硬件设计、软件设计,非相似余度计算机的同步/异步设计、通讯设计、软件版本之间交叉表决点、交叉表决向量及监控表决算法的设计研究。最后进行了NVFT的可靠性分析工作,介绍了在验证和测试容错系统过程中非常重要的故障注入器和飞行试验。
     NVFT设计采用的相异性设计基本原理是设置多个不同的设计小组,按照同一个规范,分别独立地进行各个余度软硬件的设计、开发、测试和验证。
     一般的相异性设计有两种方式实现,即N版本编程和恢复块技术。本文在多机容错系统结构基础上,独到地提出一种新的NR容错计算机结构,并进行了一定的分析,在NVFT中进行了初步的尝试。
     规范设计重要的是保证其正确性和完整性。在非相似余度计算机的规范中,需要明确规定实现相异性编程所需的交叉表决点、交叉表决向量等要求。
     非相似容错系统运行方式可以为同步方式,也可以是异步运行方式。本文详细地分析了这两种运行方式的特点。本文提出了一种时标机制,完成了在异步条件下通道之间的输入变量的比较和监控。
     交叉表决是非相似余度计算机系统的关键设计之一,例如交叉表决点和表决向量的设置。交叉表决设计体现在我们开发的面向飞行控制的软件中。
     对非相似容错系统的可靠性和性能分析是本文研究的中的重点之一,本论文提出了用故障树和马尔可夫过程对NVFT系统的可靠性进行了分析的思路。根据我们试验的数据,得到了NVFT系统的可靠性指标。
     故障注入是研究容错系统的一个有力工具。软件故障注入是比较难于实现。本文提出了一种新的思路,在硬件故障注入器基础上,实现软件的故障注入。这对于实现容错软件的验证有着重要的意义。
The fault tolerant computer can be divided into two categories: similar and dissimilar redundant computer. The similar redundant computer runs the same copy of software on each redundant computer. Because each one uses the same copy of software and the errors within the software is very hard to be avoided, the same error is copied into each computer. If an error were accidentally trigged under some conditions, all the software resided in computers would have the same fault at the same time.NVFT (N Version Fault Tolerance) is the first dissimilar computer system in China that directed by the author. Based on the engineering practice, this paper gives the summary of this project concerning with its principle and research. All works include: dissimilar redundancy computer architecture, hardware and software design, synchronization and asynchronization, communication design, software cross check point and voting vector design and the software voting and monitoring algorithm design. Also a software reliability analysis using our experimental data is presented. The conception and methods of fault insertion and flight test of the dissimilar computer are described.The diversity design principle is to use multiply design teams to independently design individual redundant computer according to the same specification. Combining with the advantages of N-version programming and recovery block, this paper presents a new fault tolerant structure — NR fault tolerant computer architecture. The analysis and implementation are also given in this paper.There are several key points to develop a N-version programming software redundancy system:1. The correctness, completeness, well-defined requirements are needed in the system specification. The specification should specify the crosscheck points and cross check vectors in detail.2. The fault tolerance system can be based on synchronization or asynchronization styles. For restraining of common noise signal, asynchronization style is a better one. This paper gives a time-tag mechanism to monitor input signals.3. In the control law design of the project, the three design teams use different algorithms to implement the specification of the crosscheck points and voting vectors.
    4. The software reliability analysis is an infinitude subject. This paper uses fault tree and Markov chain based on real experiment data to analyze the reliability of NVFT system.5. For the verification of a fault tolerant system, a fault inserter is developed in our project, and a new idea of the software fault injection is presented. This is very important for the system demonstration.
引文
[ANDY88] Fault Tolerant Avionics, GEC Avionics, AIAA88-3901-CP
    [AVIZ86] Algirdas Avizienis "Dependable Computing: From Concepts to Design Diversity" Proceedings of IEEE VOL 74, No 5, MAY 1986
    [AVIZ88] A. Avizienis, M. R. Lyu Wemer Schutz, "In Search of Effective Diversity: A Six-Language Study Of Fault-Tolerant Flight Control Software" Proceedings of the 18th international Symposium on Fault Tolerant Computing, 1988 pp 15-22
    [AVIZ95] Algirdas Avizienis "Dependable Computing Depends on Structured Fault Tolerance" 6th IEEE international Symposium on Software Reliability Engineering 1995 pp 158-168, (X27623 )
    [BENS97] A. Benso, P. Prinetto, M. Rebaudengo, M. Sonza Reorda, "A Fault Injection Environment for Microprocessor-based Boards" From Internet
    [BISH88] P. G. Bishop and F. D Pullen "PODS Revisited-A case study of Software Failure Behaviour" FTCS-18 1988, PP2-8
    [BRIL90] S. S Brilliant, J. C. Knight "On the Performance of Software Testing Using Multiple Version"
    [CHEN96] Mei-Chen Hsueh and Ravishankar K. Iyer "Dependability and Performance Measurement" Computer Aided Design Test Evalution for Dependability, 1996
    [CHEN78] L Chen and A. Avizienis "N-version Programming: A Fault-Tolerance Approct to Reliabilty of Softwar, Operation" 8th Annual International Symposium on Fault Tolerant Computing, Digest FTCS-8, Toulouse, France 1978 PP3-9
    [DENN90] R. W. Dennis, A. D. HILLS "A Fault Tolerant Fly-By-Wire System For Maintenance Free Applications"
    [DO178B] RTCA DO178B, 《Software considerations in airborne system and equipment certification》
    [DUNN86] William R. Dunn "Software Reliability: Measures and Effects in Flight Critical Digital Avionics Systems"
    [GOEL79] Goel. A. L, K. Okumoto 1979, "A Markovian Model for Reliability and other Performance Measures" Proceedings National Computer Coference pp769-774
    [HECH79] Herbert Hecht "Fault-tolerant Software" IEEE Transactions ON Reliability VOL. R-28, NO. 3 AUGUST 1979 227-232
    [HILL88] Andy D. Hills, Nisar A. Mirza "Fault Tolerant Avionics" AIAA-88-3901-CP
    [INTEL93A] Intel Corporation《Product Overview》, "intel386 Microprocessor Product" PP2-16~2-22
    [INTEL93B] Intel Corporation《Product Overview》, "i960 Microprocessor Family" PP4-10~4-17
    [JOO96] Fault-tolerant 32-bit RISC Microprocessor Design pp312-316 Computer Aided Design Test Evalution for Dependability
    [JEAN90] Jean Arlat, Karama kanoun and Jean-claude Laprie "Dependability Modeling and Evaluation of Software Fault-Tolerant Systems" IEEE Trans on Computers Vol 39 No4 April 1990
    [JOAN94] Joanne Bechta Dugan, M. R. Lyu "System Reliability Analysis of an N-version Programming Application" IEEE Transaction ON Reliability, Dec 1994, pp513-519
    [JEAN90] Jean Arlat, Karama Kanoun and Jean-claude Laprie "Dependability Modeling and Evaluation of Software Fault-Tolerant System" IEEE on Computers, Vol39, No. 4, 1990 pp504-513
    [JELI72] Jelinski. Z, P. B. Moranda, 1972, "Software Reliability Research" Statisticl Computer Performance Evaluation, Academic, New York
    [JOHN87] John. D. Musa, Anthony. Iannino, Kazuhira. Okumoto《Software Reliability Measurement, Prediction, Application》1987
    [KATE95] Katerina Goseva-Popstojanova, Aksenti Grnarov "Performanbility Modeling of N Version Programming Technique" 6th IEEE international Symposium on Software Reliability Engineering 1995(X27623)
    [KELL88] John P. J. Kelly, David E. Eckhardt, et, al "A Large Scale Second Generation Experiment In Multi-Version Softwaer: description and Early Result" Proceedings of the 18th international Symposium on Fault Tolerant Computing, 1988, PP9-14
    [KLDJ94] J. Karlsson, P. Liden, P. Dahlgren, R. Johansson, U. Gunneflo, Using Heavy-Ion Radiation to Validate Fault-Handling Mechanisms, IEEE Micro, Vol. 14, No. 1, pp. 8-32, 1994
    [LEVE95] Software Assembly Workbench: How to Construct Software Like hardware IEEE International Computer Performance and Dependability Symposium 1995.
    [LYU96] Michael R. Lyu et. al "Handbook of software reliability Engineer" Computing McGraw-Hill 1996. 8
    [LYU92] Lyu,M.R, "Software Reliability Measurements In N-version Software Environment", Proceedings of the 1992 International Symposium on Software Reliability Engineering, 1992, pp254-263
    [MEYE95] J.E.Meyer "Performability Evaluation: Where It Is and What Lies Ahead" IEEE International Computer Performance and Dependability Symposium
    [MIL217]MIL-STD-217 Reliability Prediction of Electronic Equipment
    [MIRON90]Digital Systems Testing And Testable DESIGN, Computer Science Press 1991
    [MUSA84]Musa.J D K.Okumoto 1984,"A Logarithmic Possion Execution Time Model for Software Reliability Measurement" Proceedings Seventh International Coferance on Software Engineerring pp230-238
    [NASS95]Nasser A.Kanawati, Ghani A. Kanawati, Jacob A Abraham "Dependability Evaluation using Hybrid Fault/Error Injector" X27868
    [OLIV97]The Four Steps to Achieve A Reliable Design
    [PAUL90]Software Reliability Handbook New York 1990
    [PERE95]D.Perez,"Dependability of Safty-Critical Systems-Contribution of the Synchronous Approach 6th IEEE international Symposium on Software Reliability Engineering 1995(X27623)
    [SOMA97]Achieving fault tolerance and high reliability, Microprocessors and Microsystems 21 (1997) 147-150
    [SING87]Community Error Recovery in N-version Software:A Design Study with Experimentation
    [SWAP97] Swapna S.Gokhale.M R.Lyu,K S.Trivedi "Reliability Simulation of Fault-Tolerant Software and System" From Internet
    [SWAP96]Swapna S.Gokhale at al.Reliability Simulation of Fault-Tolerant Software and System
    [TSO86] Community Error Recovery In N-version Software:A Design Study With Experiexce
    [TIMO95] Timothy K.Tsai and Ravishankar K. Iyer "FTAPE: A Fault Injection Tool to Measure Fault Tolerance" AIAA-95-1041-CP
    [VICT86]Victoria A.Regcnie, Claude V.Chacon, Wilton P.Lock "Experience with Synchronous and Asynchronous Digital Control Systems" AIAA TM-88271 (Prepared as AIAA-86-2239-CP)
    [XIE96] M.Xie ,"An Additive Reliability Model for the Analysis of Modular Software Failure Data" 6th IEEE international Symposium on Software Reliability Engineering 1995,ppl88-194
    [YAO99A]Yao Yiping, Cheng Minhua "The Application on Dynamic Fault Tree
     Analysis for Dissmilar Fault-tolerant Flight Control System"
    [YAO99B] Yao Yiping, Cheng Minhua, HanWei, LinJian "Study on Dynamic Fault Tree Analysis for Dissmilar Software/Hardware Fault-tolerant System" The 4th Asian/Pacific International Symposium on Instrumentation Measurement and Automatic Control(IMC99), pp358-363, August 1999, Beijing
    [王95] 容错性能评测设备HFI-1故障注入器
    [王97] 王海晨,《数字系统四余度容错计算机可靠性分析与可靠性仿真》硕士论文,1997年
    [袁92] 袁由光、陈以农,《容错与避错技术及其应用》,科学出版社,1992
    [韩94] 韩炜《NVFT系统的设计与实现》,第六届全国容错计算学术会议论文集,PP100-106,1994年8月
    [韩95] 韩炜《NVP_MOD设计报告》,六三一所内部技术文件
    [韩01] 韩炜《故障注入研究》航空计算技术,pp1-7,2001年9月
    [臧93] 臧红伟、韩炜《容错软件的实现方法及其发展概况》,机载计算机发展趋势文集,pp115-130,1993年7月
    [程99] 程明华、姚一平、韩炜、林坚,《非相似软、硬件容错系统动态故障树分析》第八届全国容错计算学术会议论文集,pp185-189,1999年10月
    [苏82] 苏东庄《计算机系统结构》,国防工业出版社,1982年6月
    [郑92] 郑人杰《计算机软件测试技术》,清华大学出版社,1992年12月
    [姚91] 姚一平、李沛琼《可靠性及余度技术》,航空工业出版社,1991年7月
    [姚89A] 姚一平、裘忠侯《软件可靠性预测模型与方法研究》,北京航空航天大学,1989.4.4 北京
    [姚89B] 姚一平、程明华《动态故障树分析方法研究》,中国国防科技报告,北京航空航天大学,1998.11 北京
    [韩97] 韩炜《波音777的基本飞行计算机》,抗恶劣环境计算机,pp 37-40,1997年2月,总第55期
    [中84] 中国电子计算机学会《英汉计算机词典》,人民邮电出版社,1984.3,上海

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700