基于软件体系结构的容错机制动态配置技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
软件实现的容错技术是保障软件可用性和可靠性的主要方法之一,它在运行时刻检测功能构件中的错误,并将错误状态恢复正常,从而避免整个软件系统因一个构件发生故障而不能向用户提供正确的服务。容错技术的一个固有特性是特定于具体的软件系统:容错技术必须要与软件系统的故障假设、应用领域、运行环境和系统特征等因素相匹配。这种特定性限制了容错技术对环境和需求的应变能力。作为Internet上的一种新型软件形态,网构软件在开放、动态、多变的环境中运行,并以包括第三方构件在内的异构构件作为其构成单元,这种运行环境和构成单元上的“开放性”导致网构软件的运行时行为体现出一定程度的“变化性”。当在网构软件中使用容错技术时,网构软件的变化性与容错技术有限的应变能力之间的矛盾变得突出。
     为了能够在因外部运行环境发生变动或内部构件更新升级而造成软件的故障假设、容错需求或特定应用的约束发生改变的情况下,继续保持网构软件的高可用性和高可靠性,一种可行途径是在运行时刻根据需要为不具备容错能力的构件增加容错能力,或者调整其已有容错能力(去除或替换为另一种容错技术)。本文将这种调整称为对容错机制的动态配置,并基于这一思路,对网构软件的容错机制动态配置需要解决的两个关键问题进行了研究:(1)如何清晰区分软件的功能部分与容错部分,并刻画出二者之间的关系,从而使动态配置容错机制成为可能;(2)如何确保动态配置结果的正确性和有效性。
     为了解决上述问题,本文建立了一套基于软件体系结构的容错机制动态配置技术框架,其主要特色和贡献包括:
     (1)从软件体系结构的角度解释容错机制动态配置问题,并将每一种适用于网构软件的容错机制规约为一种支持容错的体系结构风格(即容错风格)。容错风格明确了容错机制的结构、行为以及对应用构件的影响,并作为动态配置过程中的核心知识。同时,给出了容错体系结构的形式化模型,以支持对动态配置结果的验证。
     (2)提出一种基于模型检查的容错风格选择方法,解决了为网构软件选择适宜容错风格的问题。该方法的基本思想是把容错风格抽象为一种模型检查中的计算模型,把容错需求和特定于应用的约束抽象为容错属性,通过自动检查每一种容错风格对应的计算模型是否满足给定属性,从而找到满足给定容错需求且不违反特定于应用的约束的容错风格。
     (3)提出一种基于模型合并技术的容错配置自动生成方法。该方法根据应用构件之间的依赖关系,确定出受容错机制动态配置影响的构件集合;通过比对容错风格实例元素与应用体系结构元素得到二者之间的匹配关系;根据匹配关系使用模型转换技术实现容错风格实例与应用体系结构的自动化合并,生成容错配置。这种容错风格实例和应用体系结构的自动化合并方法有助于保证使用容错机制时的正确性,而且合成结果可以直接用于有效性验证。
     (4)设计并实现了一个支持容错机制动态配置的中间件支撑框架。在这个支撑框架中,构件容器被扩展成为一个容错“沙盒”,并作为容错管理的基本单元。对应于不同容错机制的截取器组合在运行时被加入到容错沙盒中,并在容错管理服务的控制下进行动态配置和容错处理。通过使用运行时软件体系结构,体系结构规划阶段生成的容错配置可以用来指导中间件层的动态调整。框架对应用屏蔽了容错细节,实现透明的容错支持和透明的容错机制动态配置,并很好地适用于目前的主流中间件。
     JEE应用ECperf做为研究实例贯穿了整篇论文,并通过这个实例展示了相关主要方法的有效性。
Software-implement fault tolerance (FT) is an effective way to achieve high availability and reliability. It takes two successive steps to tolerate faults in software: the error detection step aims to identify the presence of an error, while the recovery step aims to transfer abnormal states into normal ones. The effectiveness of a fault-tolerant mechanism depends on its fitness for an application context, including fault assumption, application domain, execution environment, etc. This constraint makes fault-tolerant mechanisms inflexible to the change of environment and user requirements. As a kind of Internet-scale software, Internetware runs in an open and dynamic environment, and consists of various third-party components. This openness leads to a fact that its behaviors may change continuously. When applying fault-tolerant mechanisms to Internetware, it is very likely that a formerly effective mechanism no longer works after the fault assumption, fault-tolerant requirements, or application-specific constraints are changed due to the change of execution environment or components upgrade.
     Reconfiguring fault-tolerant mechanisms for Internentware at runtime is a promising way to achieve high availability and reliability all the time. The reconfiguration includes adding a fault-tolerant mechanism to a non-fault-tolerant component, eliminating existing mechanism from a component, or switching between different mechanisms for a component. This thesis focuses on the above problem of dynamic reconfiguration of fault-tolerant mechanisms for Internetware.
     There are two major challenges in solving the problem. The first challenge is to make a clear separation between the software’s functional parts and the fault-tolerant parts, and the relationships between these two parts have to be explicitly specified. Otherwise, it is hardly to modify the fault-tolerant parts without do harm to the functional parts. The second challenge is to ensure the correctness and the effectiveness of the dynamic reconfiguration of fault-tolerant mechanisms. In the thesis, we present a Software Architecture (SA)-based approach to achieve the goal.
     At first, in order to depict the relationship between fault-tolerant mechanisms and application components, we specify the fault-tolerant mechanisms suitable for Internetware as a special architectural style - fault-tolerant styles, which explicitly grasp the mechanisms’structural characters, behavioral characters, and the interactions with application components. The available fault-tolerant styles are also classified and well-organized for the sake of reuse. In addition, a formal model for fault-tolerant SA is given to enable the validation of the reconfiguration. The formal model covers the fault-tolerant styles and dependencies among components, and forms the theoretical foundation of the dynamic reconfiguration of fault-tolerant mechanisms.
     Second, in order to select the most suitable one from several fault-tolerant styles, we use model checking to obtain solid evidences. As a pre-process step, fault-tolerant styles’behavioral models are automatically translated into a model checker’s verification model, and the fault-tolerant requirements and application-specific constraints are specified as fault-tolerant properties. Then the satisfactions of the required properties for candidate styles are verified by model checking. The satisfied properties and constraints are evidences for the selection.
     Third, in order to avoid the human mistakes in dynamic reconfiguration and alleviate maintainers’burden, we provide an automatic generation of the reconfiguration operations in middleware. As the first step, the scope of to-be-modified components in an application is automatically identified, with the help of dependency information provided by SA. Then the elements in a fault-tolerant style instance and those in the application are matched via the comparison of the style instance and application’s architecture. At last, model transformation technique automatically merges the style instance and the application architecture, as well as generates a desired fault-tolerant configuration. The merged fault-tolerant SA can be verified for its effectiveness and correctness.
     At last, we present a framework supporting dynamic reconfigurable FT in middleware. It consists of a fault-tolerant sandbox design and an FT management service. The fault-tolerant sandbox is extended from generic component containers, and it acts as a unit of the reconfiguration. Different fault-tolerant styles are implemented as different combination of container interceptors in the sandboxes. The sandbox supports dynamic loading/unloading of the interceptors to implement the dynamic reconfiguration of fault-tolerant mechanisms. The FT management service acts as the coordinator of dynamic reconfiguration and recovery. The fault-tolerant configuration generated in the above SA-level planning is mapped to a sequence of middleware operations, with the help of Runtime Software Architecture, and executed by the framework. The framework provides transparent FT and dynamic reconfigurable FT for applications, and works well in PKUAS and JBoss.
     A JEE application, ECperf, is illustrated as a case study in the thesis. It shows the effectiveness of the proposed approach.
引文
[黄05]黄罡.反射式软件中间件原理与技术研究.博士学位论文.北京:北京大学, 2003
    [兰07]兰灵.基于中间件的网构软件自优化技术研究.博士学位论文.北京: 北京大学, 2007
    [刘06]刘天成.中间件服务自修复技术研究.博士学位论文.北京:北京大学, 2006
    [梅03]梅宏,陈锋,冯耀东,杨杰. "ABC:基于软件体系结构、面向构件的软件开发方法",软件学报, 2003, 14(4), pp. 721-732.
    [梅06]梅宏,申峻嵘.软件体系结构研究进展
    [J].软件学报, 2006, 17(6): pp. 1257-1275.
    [杨99]杨芙清,梅宏,李克勤.软件复用与软件构件技术,电子学报, 1999, 27(2).
    [杨02]杨芙清,梅宏,吕建,金芝.浅论软件技术发展,电子学报, 2002, 30(12), pp. 1901-1906.
    [杨07]杨杰.基于软件体系结构的网构软件组装技术研究.博士学位论文. 北京:北京大学, 2007
    [周08]周立,陈湘萍,黄罡,孙艳春,梅宏.支持协商的网构软件体系结构行为建模与验证方法,软件学报, 2008, 19(5), pp. 1083-1096.
    [Agha93] G. Agha, S. Frlund, R. Panwar, and D. Sturman“A Linguistic Framework for Dynamic Composition of Dependability Protocols,”In Proceeding of the Dependable Computing for Critical Applications, pp. 345-363, 1993.
    [Allen83] J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren,“Conversion of control dependence to data dependence,”In Proceeding of the 10th ACM Symposium on Principles of Programming Languages, Austin, Texas, January 1983, pp. 177-189.
    [Ammann87] P. E. Ammann, J. C. Knight,“Data Diversity: An Approach to Software Fault Tolerance”, In Proceedings of the 17th International Symposium on Fault-Tolerant Computing Systems (FTCS-17), Pittsburgh, PA, pp. 122-126, 1987.
    [Arora93] A. Arora and M. G. Gouda.“Closure and convergence: A foundation of fault-tolerant computing”. IEEE Transactions on Software Engineering, 19(11), pp. 1015-1027, 1993.
    [Arora98] A. Arora and S. S. Kulkarni.“Component based design of multi-tolerantsystems”. IEEE Transactions on Software Engineering, 24, pp. 63-78, January 1998.
    [Arshad06] N. Arshad, Ph.D. Thesis. A Planning-Based Approach to Failure Recovery in Distributed Systems. University of Colorado, 2006
    [Avi?ieni84] A. Avi?ieni, and J. P. J. Kelly,“Fault Tolerance by Design Diversity: Concepts and Experiments”, IEEE Computer, Vol. 17, No. 8, 1984, pp. 67-80.
    [Avi?ieni95] A. Avi?ieni,“The Methodology of N-Version Programming,”Software Fault Tolerance, M.R. Lyu, ed., John Wiley & Sons, New York, pp. 23-46, 1995.
    [Avizienis04] A. Avizienis, J-C. Lapri, B. Randell, and C. Landwehr,“Basic Concepts and Taxonomy of Dependable and Secure Computing”, IEEE Transactions on Dependable and Secure Computing, Vol.1, No.1, January-March 2004, pp.11-33
    [Bachmann00] F. Bachmann, L. Bass, C. Buhman, S. Comella-Dorda, F. Long, J. Robert, R. Seacord, and K. Wallnau,“Volume II: Technical Concepts of Component-Based Software Engineering,”second ed., Technical Report CMU/SEI-2000-TR-008, Software Eng. Inst., Carnegie Mellon Univ., 2000.
    [Bass03] L. Bass, P. Clements, R. Kazman. Software Architecture in Practice. 2nd ed. Boston: Addison Wesley Professional, 2003.
    [Bellissard96] L. Bellissard, S. B. Atallah, F. Boyer, M. Riveill,“Distributed Application Configuration”. ICDCS’96, pp. 579-585, 1996.
    [Bernardeschi02] C. Bernardeschi, A. Fantechi, and S. Gnesi,“Model checking fault tolerant systems”. Software Testing Verification and Reliability, vol. 12, 2002, pp. 251-275.
    [Bidan98] C. Bidan, V. Issarny, T. Saridakis, A. Zarras,“A dynamic reconfiguration service for CORBA”, In Proc. IEEE International Conference on Configurable Distributed Systems, May 1998.
    [Binkley98] A. Binkley and S. Schach,“Validation of the Coupling Dependency Metric as a Predictor of Run-Time Failures and Maintenance Measures”, in Proceedings of the 22nd International Conference on Software Engineering, 1998, pp. 452-455.
    [Blair98] G. S. Blair, G. Coulson, P. Robin and M. Papathomas.“An Architecture for Next Generation Middleware”. Proceedings of the IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing (Middleware'98), Lake District, UK, Editors: Davies, N., Raymond, K., Seitz, J., Springer-Verlag, 1998, pp.191-206.
    [Boner04] J. Boner,“What are the key issues for commercial aop use: how does aspectwerkz address them?”In: Proceedings of the 3rd International Conference on Aspect-Oriented Software Development (AOSD). (2004)
    [Brown01] A. Brown, G. Kar, and A. Keller.“An active approach to characterizing dynamic dependencies for problem determination in a distributed environment”. In Proceedings of the 7th IFIP/IEEE International Symposium on Integrated Network Management (IM 2001), Seattle, WA, May 2001.
    [Buschmann01] F. Buschmann, R. Meunier, H. Rohnert, P. Sommerlad, and M. Stal. Pattern-Oriented Software Architecture: a System of Patterns, Volume 1. John Wiley & Sons, July 2001.
    [Campbell86] R.H. Campbell, B. Randell,“Error Recovery in Asynchronous Systems”, IEEE Transactions on Software Engineering, 12(8), pp. 811-826, August 1986.
    [Cao02] J. Cao, M. Cao, A.S.T. Chan, G. Wu, "Architectural level support for dynamic reconfiguration and fault tolerance in component-based distributed software," in Proceedings of Ninth International Conference on Parallel and Distributed Systems, 2002., pp. 251-256, 17-20 Dec. 2002
    [Capozucca05] A. Capozucca, N. Guelfi, P. Pelliccione, H. Muccini,“An Archi-tecture-driven Methodology for Developing Fault-Tolerant Sys-tems,”Software Engineering Competence Center Technical Re-port nr. TR-SE2C-05-10, SE2C, Luxembourg, 2005
    [Candea03(1)] G. Candea, E. Kiciman, S. Zhang, P. Keyani, A. Fox,“JAGR: An Autonomous Self-Recovering Application Server,”in Proceedings of the 5th International Workshop on Active Middleware Services, Seattle, WA, June 2003
    [Candea03(2)] G. Candea, M. Delgado, M. Chen, A. Fox,“Automatic Failure-Path Inference: A Generic Introspection Technique for Internet Applications”, in Proceedings of the 3rd IEEE Workshop on Internet Applications (WIAPP), San Jose, CA, June 2003
    [Chandra00] S. Chandra, and P. M. Chen,“Whither Generic Recovery from Application Faults? A Fault Study using Open-Source Software”, in Proceedings of the 2000 International Conference on Dependable Systems and Networks/Symposium on Fault-Tolerant Computing (FTCS), 2000, pp. 97-106.
    [Chen78] L. Chen and A. Avizienis,“N-Version Programming: A Fault Tolerance Approach to Reliability of Software Operation”, in Proceedings of the 8th International Symposium on Fault–Tolerant Computing Systems (FTCS–8), Toulouse, France, pp. 3- 9, 1978.
    [Chen02] X. Chen, "Dependence management for dynamic reconfiguration of component-based distributed systems," In Proceedings of 17th IEEE International Conference on Automated Software Engineering (ASE 2002). pp. 279-284, 2002
    [Cotroneo06] D. Cotroneo, S. Orlando, and S. Russo,“Failure classification and analysis of the Java Virtual Machine”, in Proceedings of the 26th IEEE International Conference on Distributed Computing Systems (ICDCS’06), Lisboa, Portugal, July 4-7, 2006.
    [Cristian89] F. Cristian, Dependability of Resilient Computers. Blackwell Scientifc Publications, Oxford, Chapter Exception handling, 1989. pp. 68-97.
    [Dai05] Y. S. Dai, M. Xie, and K. L. Poh,“Modeling and analysis of correlated software failures of multiple types,”IEEE Trans. on Reliability, Vol. 54, No. 1, pp. 100-106, March 2005.
    [Deconinck02] G. Deconinck, V. D. Florio, and O. Botti,“Software-implemented fault-tolerance and separate recovery strategies enhance maintainability.”IEEE Transactions on Reliability, 2002, Vol. 51, No. 2: pp.158-165.
    [De Florio08] V. De Florio, and C. Blondia.“A survey of linguistic structures for application-level fault tolerance.”ACM Computing Survey. 40(2), April 2008.
    [de Lemos06] R. de Lemos, P. Guerra, and C. Rubira,“A fault-tolerant architectural approach for dependable systems”. IEEE Software, 23(2), 2006. pp.80-87
    [Domokos05] P. Domokos, and I. Majzik,“Design and Analysis of Fault Tolerant Architectures by Model Weaving.”In Proceedings of the Ninth IEEE international Symposium on High-Assurance Systems Engineering (HASE 2005), October, 2005. IEEE Computer Society, Washington, DC, pp. 15-24.
    [Dumitra?05] T. Dumitra?, P. Narasimhan,“Fault-tolerant middleware and the magical 1%.”In: ACM/IEEE/IFIP Middleware Conference, Grenoble, France, 2005, pp.431-441
    [Ebnenasir07(1)] A. Ebnenasir and B.H.C. Cheng,“Pattern-Based Modeling and Analysis of Failsafe Fault-Tolerance”, the 10th IEEE International Symposium on High Assurance System Engineering (HASE), Dallas, Texas, November 14-16, 2007.
    [Ebnenasir07(2)] A. Ebnenasir and B.H.C. Cheng,“A Pattern-Based Approach for Modeling and Analysis of Error Recovery", Architecting Dependable Systems Book IV, 2007.
    [Eclipse09] The Eclipse Foundation, Eclipse Homepage, www.eclipse.org, 2009
    [Ecperf08] SUN Microsystems, ECperf Specification, Version 1.1,http://java.sun.com/developer/earlyAccess/j2ee/ecperf/download.html, 2008
    [Ejb01] SUN Microsystems, Enterprise JavaBeans Specification, Version 2.0, Final Release, 2001.
    [Elnozahy02] E.N. Elnozahy, L. Alvisi, Y.-M. Wang, and D.B. Johnson,“A survey of rollback-recovery protocols in message-passing systems.”ACM Computing Surveys, 34(3), pp. 375-408, Sept. 2002.
    [Ensel99] C. Ensel.“Automated generation of dependency models for service management”. In Workshop of the OpenView University Association, Bologna, Italy, 1999.
    [Fabre98] J.-C. Fabre and T. P′erennou.“A metaobject architecture for fault-tolerant distributed systems: The FRIENDS approach.”IEEE Trans. Comput. 47, 1 (Jan. 1998), pp. 78-95.
    [Felber04] P. Felber, P. Narasimhan,“Experiences, Strategies, and Challenges in Building Fault- Tolerant CORBA Systems”, IEEE Trans. Computers, Vol.53, No.5, pp.497-511, May 2004.
    [Filho06] F. C. Filho, N. Cacho, E. Figueiredo, R. Maranh?o, A. Garcia, and C. M. Rubira,“Exceptions and aspects: the devil is in the details.”In Proceedings of the 14th ACM SIGSOFT international Symposium on Foundations of Software Engineering (Portland, Oregon, USA, November 05 - 11, 2006). SIGSOFT '06/FSE-14. ACM, New York, NY, pp. 152-162.
    [Fleury03] M. Fleury and F. Reverbel,“The JBoss Extensible Server,”IFIP Middleware 2003, LNCS 2672, pp. 344-373, 2003.
    [Fowler04] M. Fowler, Inversion of Control Containers and the Dependency Injection pattern, http://www.martinfowler.com/articles/injection.html, 2004
    [Froihofer07] L. Froihofer, K.M. Goeschka, and J. Osrael, "Middleware Support for Adaptive Dependability," Proc. ACM/IFIP/USENIX 8th Int. Middleware Conference, 2007, pp. 308-327
    [Garbinato95] B. Garbinato, R. Guerraoui, and K. Mazouni,“Implementation of the GARF Replicated Objects Platform,”Distributed Systems Eng. J., Vol. 2, pp. 14-27, 1995.
    [Gashi07] I. Gashi, P. Popov, and L. Strigini,“Fault Tolerance via Diversity for Off-the-Shelf Products: A Study with SQL Database Servers”, IEEE Transaction on Dependable and Secure Computing, Vol. 4, No. 4, October-December 2007,pp. 280-294.
    [Gong97] L. Gong, M. Mueller, H. Prafullchandra, and R. Schemers,“Going215-229, 2006.
    [Kon00] F.Kon, R. Campbell, M. D. Mickunas, K. Nahrstedt, and F. J. Ballesteros, “A Distributed Operating System for Dynamic Heterogeneous Environments.”9th IEEE International Symposium on High Performance Distributed Computing. Pittsburgh. August 1-4, 2000, pp. 201-210.
    [Kramer90] J. Kramer, J. Magee.“The evolving philosophers’problem: dynamic change management,”IEEE Transactions on Software Engineering 16(11), pp. 1293-1306, November 1990.
    [Kramer07] J. Kramer, J. Magee, "Self-Managed Systems: an Architectural Challenge," Future of Software Engineering, 2007 (FOSE '07), pp.259-268, 23-25 May 2007
    [Krishnamurthy03] S. Krishnamurthy, W.H. Sanders, M. Cukier, "An adaptive quality of service aware middleware for replicated services," IEEE Transactions on Parallel and Distributed Systems, Vol.14, No.11, pp. 1112-1125, Nov. 2003
    [Ledoux97] T. Ledoux,“Implementing Proxy Objects in a Reflective ORB,”Proc. ECOOP’97 Workshop on CORBA: Implementation, Use and Evaluation, Jyv?skyl?, Finland, 1997.
    [Lin94] T.-H Lin and K.G. Shin, "An Optimal Retry Policy Based on Fault Classification," IEEE Trans. Computers, Vol. 43, No. 9, pp. 1014-1025, Sept. 1994.
    [Lippert00] M. Lippert and C. V. Lopes.“A study on exception detecton and handling using aspect-oriented programming.”In Proceedings of the International Conference on Software Engineering, pp. 418-427. IEEE Computer Society, 2000.
    [Liskov88] B. Liskov,“Distributed Programming in Argus,”Communications of the ACM, Vol. 31, No. 3, March 1988, pp. 300-312
    [Little96] M. Little and S. Shrivastava,“Using Application Specific Knowledge for Configuring Object Replicas,”Proc. Third Int'l Conf. Configurable Distributed Systems, May 1996.
    [Liu06] Z. Liu, G. Huang, H. Mei,“The Model and Implementation of Component Array Container”, In: Proc. of 30th Annual Int’l Computer Software and Applications Conference. 2006.
    [Liu08] T. Liu, Y. Li, A. Schofield, M. Hogstrom, K. Sun, Y. Chen. “Partition-based heap memory management in an application server.”ACM SIGOPS Operating Systems Review, Vol. 42, Issue 1, (January 2008), pp. 98-98, 2008
    [LoadRunner09] Mercury Interactive Corporation, Mercury LoadRunner, Mercury LoadRunner Web Page, http://www.mercury.com/us/products/ performance-center/loadrunner/
    [Locke93] C. D. Locke.“Fault Tolerant Applications Systems: A Requirements Perspective.”Hardware and Software Architectures for Fault Tolerance, Experiences and Perspecives. LNCS, Springer, 1994, pp. 21-25
    [Lyu96] M.R. Lyu, Handbook of Software Reliability Engineering, Computing McGraw-Hill, 1996
    [Lyu07] M.R. Lyu, "Software Reliability Engineering: A Roadmap," Future of Software Engineering, 2007. FOSE '07 , pp.153-170, 23-25 May 2007
    [Maes87] P. Maes,“Concepts and Experiments in Computational Reflection,”Proc. Second Ann. ACM Conf. Object-Oriented Programming Systems, Languages and Applications, pp. 147-155, 1987.
    [McKinley04] P. McKinley, S. Sadjadi, E. Kasten, and B. Cheng. Composing Adaptive Software. IEEE Computer, 37(07), pp. 56-64, 2004.
    [Mei01] H. Mei, J-C. Chang, F-Q. Yang.“Software component composition based on ADL and middleware.”Science in China (F), 2001, 44(2), pp. 136-151.
    [Mei04] H. Mei and G. Huang.“PKUAS: An Architecture-based Reflective Component Operating Platform,”invited paper, 10th IEEE International Workshop on Future Trends of Distributed Computing Systems, 2004, pp. 163-169.
    [Muccini07] H. Muccini, A. Romanovsky,“Architecting Fault Tolerant Systems,”University of Newcastle upon Tyne, CS-TR-1051, 2007.
    [Narasimhan99] P. Narasimhan, Transparent Fault Tolerance for CORBA, PhD thesis, Dept. of Electrical and Computer Eng., Univ. of California, Santa Barbara. 1999.
    [Ohba84] M. Ohba,“Software Reliability Analysis Models,”IBM Journal of Research and Development, Vol. 28, No. 4, pp. 428-443, 1984.
    [OMG-UML07] Object Management Group. Unified Modeling Language: Superstructure (version 2.1.1, formal/2007-02-03), February 2007.
    [Parrington90] G. D. Parrington,“Reliable distributed programming in C++: The Arjuna approach.”In Proceedings of the 2nd Usenix C++ Conference, San Francisco, pp. 37-50.
    [Pottinger03] R.A. Pottinger and P.A. Bernstein,“Merging models based on given correspondences,”Proc. 29th international Conference on Very Large Data Bases (VLDB’03), pp. 862-873, Sept. 09-12, 2003.
    [Pradhan96] D. K. Pradhan, Fault-Tolerant Computer System Design, Prentice-Hall, Inc., 1996.
    [Perry92] D.E. Perry, and A.L. Wolf,“Foundations for the study of software architecture,”SIGSOFT Software Engineering Notes, 17(4), 1992, pp. 40-52.
    [Pradhan96] D. K. Pradhan,“Fault-Tolerant Computer System Design”, Prentice-Hall, Inc., 1996.
    [Randell75] B. Randell,“System Structure for Software Fault Tolerance”, IEEE Transactions on Software Engineering SE-1(2), pp. 220-232, 1975.
    [Reimer03] D. Reimer, and H. Srinivasan,“Analyzing Exception Usage in Large Java Applications.”In Proceedings of the ECOOP 2003 Workshop on Exception Handling in Object-Oriented Systems: Towards Emerging Application Areas and New Programming Paradigms, 2003.
    [Ren01] Y. Ren, Ph.D. thesis. AQUA: a Framework for Providing Adaptive Fault Tolerance to Distributed Applications. University of Illinois at Urbana-Champaign, 2001.
    [Romanovsky07] A. Romanovsky,“A Looming Fault Tolerance Software Crisis?”ACM SIGSOFT Software Engineering Notes. Vol. 32, NO. 2, 2007
    [Rutherford02] M. J. Rutherford, K. Anderson, A. Carzaniga, D. Heimbigner, and E. L. Wolf, "Reconfiguration in the Enterprise JavaBean Component Model", IFIP/ACM working conference on component deployment (CD 2002), Berlin, 20-21 June 2002.
    [Ruiz03] J.C. Ruiz, M.-O. Killijian, J.-C. Fabre, and P. Thvenod-Fosse, "Reflective fault-tolerant systems: from experience to challenges," IEEE Transactions on Computers, Vol. 52, No. 2, pp. 237-254, Feb. 2003
    [Salatge07] N. Salatge, and J.C Fabre,“Fault Tolerance Connectors for Unreliable Web Services,”In Proc. of 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07), Edinburgh, UK, 2007, pp. 51-60.
    [Schmidt00] M. S. Schmidt, H. Rohnert, F. Buschmann. Pattern-Oriented Software Architecture. Volume 2, Patterns for Concurrent and Networked Objects. John Wiley & Sons, 2000.
    [Shaw96] M. Shaw, D. Garlan. Software Architecture: Perspectives on an Emerging Discipline. New Jersey: Prentice Hall, 1996.
    [S?zer08] H. S?zer, and B. Tekinerdogan,“Introducing Recovery Style for Modeling and Analyzing System Recovery,”In Proc. of 7th IEEE/IFIP Working Conference on Software Architecture, Vancouver, Canada, 2008, pp. 167-176.
    [Stafford01] A. S. Judith, A. L. Wolf.“Architecture-Level Dependence Analysis for Software Systems.”International Journal of Software Engineering and Knowledge Engineering, Vol. 11, No. 4, August 2001, pp. 431-451.
    [US-Canada04] U.S.-Canada Power System Outage Task Force, Final Report on the August 14, 2003 Blackout in the United States and Canada: Causes and Recommendations. April 2004. https://reports.energy.gov/BlackoutFinal- Web.pdf
    [Wermelinger99] M.A. Wermelinger, Specification of software architecture reconfiguration, Ph.D. thesis, Universidade Nova de Lisboa, September 1999.
    [Xu95] J. Xu, B. Randell, A. Romanovsky, C.M.F. Rubira, R.J. Stroud, Z. Wu, “Fault Tolerance in Concurrent Object-Oriented Software through Coordinated Error Recovery”, in Proceedings of the 25th International Symposium on Fault–Tolerant Computing Systems (FTCS–25), pp. 499-509, Pasadena, California, 1995.
    [Xu00] J. Xu, A. Romanovsky, and B. Randell,“Concurrent Exception Handling and Resolution in Distributed Object Systems”, IEEE Trans. on Parallel and Distributed Systems, 11(10), 2000, pp. 1019-1032.
    [Yacoub02] S. Yacoub, and H. H. Ammar,“A Methodology for Architectural-Level Reliability Risk Analysis”, IEEE Transaction on Software Engineering, 2002, 28(6), pp. 529-547.
    [Yacoub04] S. Yacoub, B. Cukic, H. H. Ammar,“A Scenario-Based Reliabilty Analysis Approach for Component-Based Software,”IEEE Transactions on Reliability, Vol.53, No.4, Dec. 2004, pp. 465-480
    [Yuan06] L. Yuan, J.S. Dong, J. Sun, and H.A. Basit,“Generic Fault Tolerant Software Architecture Reasoning and Customization,”IEEE Trans. on Reliability. 55(3) 2006, pp. 421- 435.
    [Zhang04] Y. Zhang, and K. Chakrabarty,“Dynamic adaptation for fault tolerance and power management in embedded real-time systems.”Trans. on Embedded Computing Sys. 3, 2 (May. 2004), pp. 336-360.
    [Zhu07] Q. Zhu, C. Yuan,“A Reinforcement Learning Approach to Automatic Error Recovery.”In proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2007. DSN '07. 25-28 June 2007. pp. 729-738.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700