用户名: 密码: 验证码:
空间机器人软件错误检测技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
为了应对恶劣的外层空间环境,航天系统的软硬件必须具备抗辐射影响的容错特性,而错误检测(简称检错)是容错的重点和难点环节。近年来,商业货架(COTS)器件被越来越广泛地应用到现代航天系统,加上体积、重量等物理条件以及成本因素的限制,使得硬件实现的检错面临重重困难,而不依赖于硬件的软件实现的检错由于通用性强、成本低、不易受技术封锁以及与COTS天生关系紧密等优势被凸显出来,成为一个研究热点。
     以空间机器人为背景研究航天软件的错误检测技术具有普遍性和代表性。空间机器人是一种低成本的轻型遥控机器人,可在空间环境中导航及飞行,代替宇航员完成舱外作业。它是现代航天系统的典型代表,在软硬件上具备现代航天系统的典型特征:硬件上它运算节点多,大量使用不具备抗辐射能力COTS器件;软件上具有基于组件技术、分布式运行,组件来源复杂,具有多任务操作系统等现代航天系统的普遍特征。传统的软件检错技术不能满足这些新特征的提出的新要求,为此需要研究新的软件检错技术。
     结合空间机器人软件系统的特点,本文首先研究了软件检错技术的基本原理与评价体系。提出对空间机器人的软件容错问题在进程和系统两个层次分别进行研究,再用一个混合模型来协调这两个层次,最终通过一个原型系统对理论进行验证和评估。
     进程内的错误通过在编译阶段自动添加检纠错指令的方法进行检测。这样可以避免人工添加指令的成本,还可以提高效率。但自动添加检错指令必须突破现有技术受限于寄存器分配的限制,为此本文提出了模拟存储器的概念及分配方法。其核心是基于编译后的程序中寄存器使用不均的特点,利用暂不使用的寄存器或存储器模拟寄存器供添加的检纠错指令使用。针对控制流和数据错误,分别设计了基于模拟存储器的错误检测算法,并将二者组合优化,提高了检错覆盖率和检测效果。实验表明,该方法效果明显,优于现有其它方法。
     系统级错误检测通过先研究框架模型,再研究不同实现算法的思路来实现。本文提出的分布式组件检错模型(CDEDM)是一个多层检错模型,适用于具有分布式、基于组件以及多进程等复杂结构特点软件,它充分利用系统空闲资源来提高系统性能,同时它被设计得易于扩展和与其它技术协作。基于CDEDM模型,本文提出了分布式自适应冗余模型(DARM)模型。在此基础上提出了IO错误检测和利用微检查点进行错误检测的算法。其中前者是一种非侵入式的技术,可以检查IO的顺序和内容的正确性;而后者则更加灵活,可设置微检查点的位置、密度等细节。实验表明二者可联合运用,效果良好。
     最后本文建立了一个混合检错模型(MSEDS),并以空间机器人实际项目为背景建立原型系统来验证。将前述进程级和系统级的研究成果,集中到同一个系统内,实现利用软件检错技术对一个复杂软件的完整覆盖。辅以故障注入等系统,在模拟实际空间环境的条件下进行验证,原型系统表现符合预期,其平均故障检测率达到了93.5%,平均正常运行率为95.5%,平均错误纠正率为95.1%,而平均只增加了21%的时间开销和约132.1%的存储开销。
In response to the harsh environment of outer space, space hardware and software systems must have anti-radiation characteristics of the fault-tolerant. Fault-tolerant can be achieved as the basis for other characteristics of the high dependability. In recent years, the commercial shelf (COTS) devices are increasingly widely used in modern aerospace systems, together with the size, weight and other physical conditions and cost factors, which put hardware implemented fault-tolerant hardware in difficulties. Not dependent on hardware, software implemented fault-tolerant technology is achieved as a result of high universality, low-cost and close relationship with natural COTS advantage. So it becomes a research hotspot.
     Space robot is universal and representative for study on error detection of high dependable software. It is a typical example of modern space systems. It can fly or navigate in the outer space. It is typical of modern aerospace systems and has many modern characteristics, such as large-scale use of COTS radiation devices in hardware and with characteristics of component-based, distributed, and modern multi-tasking operating system equipmented in software. Therefore, it is representative for study on falult telerance of high dependable software in space-robot. Traditional techniques can’t meet these new features, so new techniques should be studied.
     Thinking about the characteristics of space robot high-dependable software systems, this paper examined the software implemented error detection techniques to achieve the basic theoretical model and evaluation system. Software fault-tolerance problem of space robot are studied in the two levels of both process and system-level respectively. And finally a unified model is used to coordinate the two-level, and a prototype system is developed for evaluation.
     In this paper, error detection and correction instructions are added during the compile phase for the errors in the process. In order to break the restrictions of the register allocation, the concept of analogize registers is raised. The core is using the unbanlance of the registers in the compiled code. Memory or temporarily unused registers are taken as register for added error detection or correction instructions. SEUs caused errors are divided into control flow errors and data errors. Control flow error detection algorithms and data error detection algorithms are designed based on analogize register allocation algorithms, and these two are combinatorial optimization to expand the coverage of error detection, improve the detection results. The experiments show that the method is superior to other methods available.
     There are two steps in the system-level error detection. First, study the framework of the model, and then the realization of different algorithms to achieve the idea. CDEDM is raised in this paper, which is suitable for distributed, component-based technologies, as well as complex multi-process software model. This is a hierarchical model, breaking the restriction of the previous studies in software fault-tolerant which can’t be used for the complex software systems. Moreover, the model also makes full use of idle system resources to build the redundant system. It can effectively improve system performance, and it has good adaptability and scalability. In this paper, it is raised that the CDEDM model-based IO error detection and micro-check point error detection algorithms. IO error detection algorithm is non-invasive technology, it can check the correctness and the order of the content, while the detection based on micro-check points are more flexible. The location, density and other details of the micro-checkpoint can be set. They are both asynchronous detection. The experiments show that the two methods can be combined well.
     Finally, a hybrid model, MSEDS, and a prototype system are established to verify the research results of process level and system-level on the backgroud of space robot. Generally speaking, MSEDS unites the process level and system-level research results into the same system to relize full error coverage of software. Fault injection system can help to verify the results. The faults are injected into the system under the conditions of simulation of the actual space environment. The prototype system is in line with expectations. The average fault detection rate is about 93.5 percent, and the average rate for the normal operation is about 95.5 percent, while the average error correction rate is about 95.1 percent. All this costs about 21 time overhead and about 132.1 percent space overhead.
引文
1李爱国,洪炳镕,王司.基于错误传播分析的软件脆弱点识别方法研究.计算机学报. 2007, 30(11): 1910-1921
    2 R Duncan, L Pullum, QRA Inc, FL DeBary. Object-oriented executives and components for fault tolerance. Aerospace Conference. 2001, 6:2849-2855
    3 Shirvani, P.P., et al. Software-implemented hardware fault tolerance experiments: COTS in space. International Conference on Dependable Systems and Networks (FTCS-30 and DCCA-8), New York (NY). 2000
    4 Shirvani, P.and E.J. McCluskey, Fault-tolerant systems in a space environment: The CRC ARGOS project. CSL-TR-98-774. 1998
    5 Beland S.,潘科炎.加拿大的空间机器人——从国际空间站的灵敏作业机器人到行星探测机器人.控制工程(北京). 2001(2):22-29
    6 Hirzinger, G., et al. ROTEX-The first remotely controlled robot in space. Robotics and Automation. 1994,3:2604-2611
    7 Yoshida, K., K. Hashizume, S. Abiko. Zero reaction maneuver: flight validation with ETS-VII space robot and extension to kinematically redundant arm. 2001.
    8孙汉旭,王凤翔.加拿大,美国空间机器人研究情况.航天技术与民品. 1999(4): 33-35.
    9刘迎春.勇气号火星探测器任务简介.测控信息. 2004(1): 1-8.
    10 Moser T.L.,张晓丹.载人航天飞行的自动化和机器人的发展.国外导弹与航天运载器. 1992(6): 78-91.
    11 Reid, J.F., W.J. Caelli. DRM, trusted computing and operating system architecture. 2005: Australian Computer Society, Inc. Darlinghurst, Australia, Australia.
    12王怀民等.互联网软件的可信机理.中国科学: E辑. 2006,36(10):1156-1169.
    13 Rus, I., S. Komi-Servio, and Costa. Software Dependability Properties: A Survey of Definitions. Measures and Techniques. tech. report 03-110. Fraunhofer Center for Experimental Software Eng. 2003
    14陈火旺,王戟,董威.高可信软件工程技术.电子学报. 2004.31(12):3-3
    15 Proceeding soft ecommitteeon computing. i.c.w. Research challengers in high confidence systems. AUGUST6-7. 1997
    16王志刚,李师贤.可信计算及其关键技术研究.计算机科学. 2003.30(9):165-168
    17 IEEEStd982.1-1988. IEEE Standard Dictionary of Measures to Produce Reliable Software. 1988
    18 IEEEStd1219-2008. Software Maintenance. 2008
    19 Jack Eller M.M., Barry C. Stauffer. the department of defense information technology security certification and accreditation process (DITSCAP). Dec 1997
    20 IEEEStd610.12-2007. IEEE standard glossary of software engineering terminology. Dec 2007
    21 Neumann, P.G. Practical Architectures for Survivable Systems and Networks. White paper. Army Research Lab. June 2000
    22 Wensley, J.H., et al. SIFT: Design and analysis of a fault-tolerant computer for aircraft control. Proceedings of the IEEE 1978,66(10):1240-1255.
    23 Krodel, J., Commercial Off-The-Shelf (COTS) Avionics Software Study. FAA report DOT/FAA. MAY 2001
    24 Krodel, J., Commercial Off-The-Shelf Real-Time Operating System and Architectural Considerations, O.o.A. Research, Editor. Feb 2004
    25 Narita, S. and Y. Ohkami. Development of distributed controller software for improving robot performance and reliability. Intelligent Robots and Systems(IROS 2004). 2004,3:2384-2389
    26马保离,霍伟.空间机器人系统的自适应控制.控制理论与应用. 1996,13(2): 191-197.
    27杨芙清,吕建.浅论软件技术发展.电子学报. 2002.30(12A):1901-1906.
    28蔡自兴,贺汉根.未知环境中移动机器人导航控制研究的若干问题.控制与决策. 2002,17(4): 385-390
    29 Musa, J.D. Software Reliability Engineering. McGraw-Hill. 1998
    30 Nelson, E., Estimating. software reliability from test data. Microelectronics and Reliability. 1978,17: 67-73
    31 Dugan, J.B., M.R. Lyu. Dependability modeling for fault-tolerant software and systems. Software Fault Tolerance. 1995:109
    32 Weber, P., L. Jouffe. Complex system reliability modelling with dynamic object oriented bayesian networks (doobn). Reliability Engineering and System Safety. 2006,91(2): 149-162
    33 Gaffney, J.E. and C.F. Davis. An approach to estimating software errors and availability. 1988,8(3):115-121
    34 Laboratory, R. Methodology for Software Reliability Prediction and Assessment, T.R. RL-TR-08-52. Editor. 2008
    35 R. Hefner, R.K., M. Schanken, The Systems Security Engineering CMM. Crosstalk. Oct 2000
    36 Pikoulas J, B.W.J., Mannion M, Triantafyllopoulos K. Bayesian forecasting model for enhanced network security In the proceedings of Eighth. in Annual IEEE International Conference and Workshop on the Engineering of Computer BasedSystems. 2001
    37 Seija Komi-Sirvio, L.R., Patricia costa. Models for software Dependability Properties Assessment and Estimation. 2003. Fraunhofer Center for Experimental Software Engineering College park,Maryland:2012-2035
    38 P Place, K.K. Safety-Critical Software: Status Report and Annotated Bibliography, T.R. CMU/SEI-93-TR-005, Editor. 1993, Software Engineering Institute, Carnegie Mellon University:1365-1374
    39 Koopman, P. The Ballista? Project:COTS Software Robustness Testing. June 2007, Carnegie Mellon University:12-17
    40 Victor Basili, P.D. Sima Asgari, Modeling Dependability The Unified Model of Dependability Tool. June 2004, Computer Science Department University of Maryland College Park, Maryland:2-4
    41 Shaw, M. self-healing: Softening precision to avoid brittleness. Workshop on Self-healing systems. Proceedings of the first workshop on Self-healing systems Charleston, South Carolina. 2002:111-114
    42 Li P.L., S.M., Herbsleb J.D. Selecting a Defect Prediction Model for Maintenance Resource Planning and Software Insurance. in Position paper for the Fifth Workshop on Economics-Driven Software Research (EDSER-5), affiliated with the 25th International Conference on Software Engineering. 2003:2187-2192
    43 Boyland J.T. Connecting Effects and Uniqueness with Adoption. in POPL 2005. 2005, Long Beach, CA
    44 Raz O., B.R., Shaw M., Koopman P.,Faloutsos C. Eliciting User Expectations for Data Behavior via Invariant Templates. January 2003, Carnegie Mellon University, Computer Science
    45 Rus Ioana, S.B., M. Halling. Systematically Combining Process Simulation and Empirical Data in Support of Decision Analysis in Software Development. in Workshop on Software Engineering Decision Support in Proceedings of SEKE2002. July 2002, Ischia, Italy
    46 Majzik, I., P. Domokos, M. Magyar, Tool-supported dependability evaluation of redundant architectures in computer based control systems. Proc. FORMS/FORMAT. 2007:342-352
    47 Shlyakhter I., S.M., Seater R.,Jackson, D. Exploiting Subformula Sharing in Automatic Analysis of Quantified Formulas. in 6th International Conference on Theory and Applications of Satisfiability Testing (SAT 2003). May 2003, Portofino, Italy
    48 Narasimhan.Trade-Offs Between Real-Time and Fault Tolerance for Middleware Applications. in Workshop on Foundations of Middleware Technologies. Nov 2002,Irvine, CA.
    49 Raz O., K.P., Shaw M. Enabling Automatic Adaptation in Systems with Under-Specified Elements. in Proceedings of the First ACM SIGSOFT Workshop on Self-Healing Systems (WOSS '02). Nov 2002. Charleston, South Carolina
    50 Shaw.M. Everyday Dependability for Everyday Needs, in Supplemental. Proceedings of the 13th International Symposium on Software Reliability Engineering. Nov 2002. Annapolis, MD. 2002:7-11
    51 Raz O., B.R., Shaw M.,Koopman P.,Faloutsos C. Detecting Semantic Anomalies in Truck Weigh-In-Motion Traffic Data Using Data Mining. Journal of Computing in Civil Engineering. To appear
    52 Silva, J.G., et al. Practical Issues in the Use of ABFT and a new Failure Model. Fault-Tolerant Computing. 1998
    53 Wang, W.L., D. Pan, M.H. Chen. Architecture-based software reliability modeling. The Journal of Systems & Software. 2006.79(1):132-146
    54 Diaconescu A. A framework for using component redundancy for self-adapting and self-optimising component-based enterprise systems. 2003. ACM New York, NY, USA.
    55 Whisnant, K., Z. Kalbarczyk, and R.K. Iyer. Micro-checkpointing: Checkpointing for multithreaded applications. 6th IEEE International On-Line Testing Workshop, 2000
    56 Oh, N., P.Shirvani, E.J. McCluskey. Control-flow checking by software signatures. IEEE Transactions on Reliability. 2002,51(1):111-122
    57 Venkatasubramanian, R., J.Hayes, and B.T. Murray. Low-cost on-line fault detection using control flow assertions. 9th IEEE On-Line Testing Symposium. 2003
    58 Rebaudengo, M., et al. A source-to-source compiler for generating dependable software. First IEEE International Workshop on Source Code Analysis and Manipulation. 2001
    59 Sterpone, L., M. Violante, S. Rezgui. An analysis based on fault injection of hardening techniques for SRAM-based FPGAs. IEEE Transactions on Nuclear Science. 2006,53: 2054-2059
    60 Ohlsson, J., M. Rimen, U. Gunneflo. A study of the effects of transient fault injection into a 32-bitRISC with built-in watchdog. Fault-Tolerant Computing, 1992
    61 Borin, E., et al. Software-based transparent and comprehensive control-flow error detection. 2006, IEEE Computer Society Washington, DC, USA
    62 Schuette, M.A.,J.Shen. Processor control flow monitoring using signatured instruction streams. IEEE Transactions on Computers. 1987,36(3):264-276
    63 Serafini, M., A. Bondavalli, N. Suri. On-Line Diagnosis and Recovery: On theChoice and Impact of Tuning Parameters. IEEE Transactions on Dependable and Secure Computing. 2007, 4(4):295-312
    64 Alkhalifa, Z., et al. Design and evaluation of system-level checks for on-line controlflow error detection. IEEE Transactions on Parallel and Distributed Systems. 1999,10(6):627-641
    65 Oh, N., P.Shirvani, E.J. McCluskey. Error detection by duplicated instructions in super-scalarprocessors. IEEE Transactions on Reliability. 2002,51(1):63-75
    66 Kanawati, G.A., et al. Evaluation of integrated system-level checks for on-line errordetection. Computer Performance and Dependability Symposium. 2007
    67 Goloubeva, O.R., et al. Soft-error detection using control flow assertions. 2003.
    68 Nicolescu, B., Y. Savaria, R. Velazco. Software detection mechanisms providing full coverage against single bit-flip faults. IEEE Transactions on Nuclear Science. 2004, 51(6 Part 2):3510-3518
    69 Bagchi, S., et al. Hierarchical error detection in a software implemented faulttolerance (SIFT) environment. IEEE Transactions on Knowledge and Data Engineering. 2000,12(2):203-224
    70 Karypis, G., E.,H. Han,V. Kumar, Chameleon: Hierarchical clustering using dynamic modeling. Computer, 2007. 32(8): 68-75
    71 Chow, E. ,A. Willsky. Analytical redundancy and the design of robust failure detection systems. IEEE Transactions on automatic control. 1984,29(7):603-614
    72向琳等.微小卫星星务计算机系统的容错控制策略研究.宇航学报. 2005,26(4): 400-404
    73 Patton, R.J. Fault-tolerant control systems: The 1997 situation. citeseerx.ist.psu.edu. 1997:1033-1054
    74 Dobson, G. Using ws-bpel to implement software fault tolerance for web services. IEEE Computer Society Washington,DC, USA,2006
    75 Chen, L., A. Avizienis. n-version programminc: a fault-tolerance approach to rellablllty of software operation. Fault-Tolerant Computing, 1995:113-117
    76 Shirvani, P.P., N.R. Saxena, E.J. McCluskey. Software-implemented EDAC protection against SEUs. IEEE Transactions on reliability. 2000,49(3):273-284
    77 Reis, G.A., et al. SWIFT: Software implemented fault tolerance. IEEE Computer Society Washington. 2005,DC,USA
    78 Sadda, S.R., et al. Error correction and quantitative subanalysis of optical coherence tomography data using computer-assisted grading. Investigative ophthalmology & visual science. 2007,48(2): 839-848
    79 Shaw, M. Everyday dependability for everyday needs. Supplemental Proc of 13th International Symposium on Software Reliability Engineering, Maryland. 2002
    80 Sastry, S., H. Gill, and G. Koob, High Confidence Software & Systems. Report to PITAC. 2001
    81 Tonella, and F. Ricca. A 2-layer model for the white-box testing of web applications. Sixth IEEE International Workshop on Web Site Evolution, 2004
    82 Sirmon, D.G., M.A. Hitt, R.D. Ireland. Managing firm resources in dynamic environments to create value: looking inside the black box. The Academy of Management Review (AMR). 2007,32(1):273-292
    83 Clark, J.A., D.K. Pradhan, Fault injection: a method for validating computer-systemdependability. Computer.1995,28(6):47-56
    84 Hsueh, M.C., T.K. Tsai, R.K. Iyer. Fault injection techniques and tools. Computer. 1997,30(4):75-82
    85温东新,刘宏伟.利用软件故障注入提高软件测试覆盖率.同济大学学报:自然科学版. 2002,30(10):1253-1256
    86彭俊杰,洪炳镕,袁成军.软件实现的星载系统故障注入技术研究.哈尔滨工业大学学报. 2004,36(7):934-936
    87张平等.基于单片机的故障注入系统.计算机测量与控制. 2004,12(10):996-998
    88王平.软硬件协同容错电源控制系统的验证.微电子学与计算机. 2004.21(5): 157-159
    89王占林,陈.裘.分布式系统故障注入研究.系统仿真学报. 1999,1(6): 473-476
    90 Larson, J.W., et al. Components, the common component architecture, and the climate/weather/ocean community. 2004. American Meteorological Society
    91 Palmintier, B., et al. A Distributed Computing Architecture for Small Satellite and Multi-Spacecraft Missions. 16 th Annual AIAA/USE Conference on Small Satellites. 2002:287-290
    92 Hays, G.C., et al. Satellite telemetry suggests high levels of fishing-induced mortality in marine turtles. Marine Ecology Progress Series. 2003,262:305-309.
    93杨仕平等.高可信软件的防危性评估研究.计算机工程与设计. 2004,25(2):161-165.
    94蔡开元,白成刚,钟小军.构件软件系统的可靠性评估模型简介.西安交通大学学报. 2003,37(6):551-554.
    95毛晓光,邓勇进.基于构件软件的可靠性通用模型.软件学报. 2004,15(1):27-32
    96仉俊峰,洪炳镕,乔永强.基于软件方法故障注入系统.哈尔滨工业大学学报. 2006,38(6): 873-876
    97 Some, R.R., et al. Fault injection experiment results in space borne parallel application programs. IEEE Aerospace Conference Proceedings. 2002,52133-2147
    98 Tixeuil, S., W. Hoarau, L.M. Silva. An overview of existing tools for fault-injectionand dependability benchmarking in grids. Second CoreGRID Workshop on Grid and Peer to Peer Systems. 2006:154-157
    99王丽君.实践四号卫星静态单粒子事件监测器探测结果初步分析.航天器工程. 1995,4(3):20-24
    100 Baumann, R.C. Soft errors in commercial semiconductor technology: Overview and scaling trends. IEEE 2002 Reliability Physics Tutorial Notes, Reliability Fundamentals:121
    101 Rebaudengo, M., et al. Soft-error detection through software fault-tolerance techniques. Defect and Fault Tolerance in VLSI Systems. 1999:210-218
    102 Oh, N., S. Mitra, E.J. McCluskey. ED4I:error detection by diverse data and duplicatedinstructions. IEEE Transactions on Computers. 2002,51(2):180-199
    103 Nicolescu, B., R. Velazco. Detecting soft errors by a purely software approach: method, tools and experimental results. Design, Automation and Test in Europe Conference and Exhibition. 2003:57-62
    104 Mahmood, A., E.J. McCluskey. Concurrent error detection using watchdog processors-a survey. IEEE Transactions on Computers. 1988,37(2):160-174
    105 Oh, N., E.J. McCluskey. Low energy error detection technique using procedure call duplication. Proceedings of the 2001 International Symposium on Dependable Systems and Networks 2001
    106高星等.基于COTS处理器的微小卫星软件容错策略研究.高技术通讯. 2007, 17(6):551-556
    107高星等.基于虚拟寄存器的控制流错误检测算法.宇航学报. 2007,28(1):183-187
    108 Gao, X., et al. Run-time error detection of space-robot based on adaptive redundancy. Aircraft Engineering and Aerospace Technology: An International Journal. 2009,81(1):14-18
    109 Gray, J., C. Van Ingen. Empirical measurements of disk failure rates and error rates. Arxiv preprint cs/0701166, 2007
    110 Nakagawa, S., S. Fukumoto, N. Ishii. Optimal checkpointing intervals of three error detection schemes by a double modular redundancy. Mathematical and computer modelling. 2003,38(11-13): 1357-1363
    111刘华,程莉.机器人控制器与被控机器人的通讯方法研究.机器人技术与应用. 2002(4): 34-37
    112陈智育,温彦军,陈琪. VxWorks程序开发实践.北京:人民邮电出版社. 2004
    113刘宏, G. Hirzinger,智能机器人灵巧手的研究.西安交通大学学报. 2003,37(4): 331-337

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700