容错系统中实时任务调度和负载均衡算法研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

About the library

Background
History
Leadership
Organization

Readers' Guide

Opening Hours
Collections
Help Via Email

Publications

Electronic Information Resources

容错系统中实时任务调度和负载均衡算法研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Real-Time Scheduling and Load Balancing Algorithms in Fault Tolerant Systems
作者：王健
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：容错 ; 实时 ; 分布式 ; 软件容错模型 ; 主/副版本容错模型 ; 调度算法 ; 负载均衡
英文关键词：fault-tolerant system ; real-time system ; distributed computing ; deadline mechanism ; primary-backup ; scheduling algorithm ; load-balancing algorithm
学位年度：2009
导师：王申康 ; 孙建伶
学科代码：081203
学位授予单位：浙江大学
论文提交日期：2009-04-01

摘要

容错系统担负关键控制系统角色,已经被广泛应用于国防、航空航天、核反应堆控制、通信行业、过程控制、医药行业等领域,容错计算技术也已经成为计算机科学技术一个重要的学科领域。近年来,随着实时应用和分布式应用兴起,容错系统一个新的发展趋势是不仅要求系统能够屏蔽故障,还要求系统中关键任务必须能够及时正确被调度完成,保证系统在故障发生前和故障发生后达到负载均衡的状态,从而扩展容错系统在实时计算和分布式计算领域中的应用,提高资源利用率及性能。
     本文深入研究容错系统中实时任务调度和负载均衡算法。目前容错系统中实时任务调度算法大多针对硬件容错,很少考虑软件的运行故障;并且在针对硬件容错时,具有过高的硬件冗余度。针对上述问题,提出软件容错模型中部分抢占实时任务调度算法和主/副版本容错模型中一个高效的实时任务调度算法。此外,由于目前缺少通用的可适用于分布式容错系统的负载均衡算法,因此提出主/副版本容错模型中一个通用的负载均衡算法,并将算法应用于分布式容错环境中一个全球股票集中撮合系统。
     总结上述,本文的主要贡献如下:
     1)提出软件容错模型中针对硬实时系统软件运行故障的部分抢占调度算法——RMPPA和EDFPPA算法。部分抢占调度算法不仅可以获得与以前算法近似调度性能,还可以在一定条件下大大减少抢占次数,降低系统运行开销。
     2)提出主/副版本容错模型中针对硬实时系统硬件故障的一个高效的任务调度算法——TPFTRM算法。TPFTRM不仅最大限度利用副版本重叠和分离技术减少硬件冗余度,还将任务集合和处理器集合划分调度,使TPFTRM调度算法便于理解、实现以及减少调度所需要的运行时间。
     3)提出主/副版本容错模型中静态负载均衡算法——RSA算法。RSA算法根据任务主/副版本的负载情况将进程集合分配到各个处理机,使处理机在发生故障前后都处于负载均衡的状态。
     4)将RSA算法应用于一个基于分布式数据划分模型的全球股票集中撮合系统,提高负载均衡能力。
Fault-tolerant systems currently take over the safety-critical role in many areas such as defense, airplane, nucleus control, communication, industry process control, medical. Fault-tolerant computing also becomes an important subject of the computer science. These years according to the real-time computing and distributed computing rapid developing, a new trend of fault-tolerant systems is to ensure crtical tasks be executed timely and acquire load balancing in both absence and presence of faults, hence to expand the fault-tolerant applications in the real-time and distributed computing areas and to improve fault-tolerant systems resource utilization and performance.
     Therefore, this dissertation considers the real-time scheduling and load balancing algorithms in fault tolerant systems. Almost all fault-tolerant scheduling algorithms in real-time systems so far are designed to deal with hardware faults, less of them take possible software faults into account hence this dissertation proposes two partial-preemptive algorithms in the deadline mechanism which provides software fault-tolerance in hard real-time systems. Moreover, none of the previous proposed fault-tolerant scheduling algorithms have good scheduling performance for all the cases when the task set has different upper bound for the task load hence this dissertation proposes an efficient scheduling algorithm which extends the uniprocessor RM algorithm to primary-backup model to provide fault tolerance. In addition, this dissertation proposes a universal load-balancing algorithm for primary-backup based fault tolerant systems. At last, this dissertation reports a global equity crossing fault-tolerant system as a case study to demonstrate the load-balancing algorithm benefits in reality.
     The contributions of this dissertation are summarized as below:
     1) Two partial-preemptive scheduling algorithms called EDFPPA and RMPPA are proposed in the deadline mechanism which provides software fault-tolerance in hard real-time periodic task systems. Extensive simulations results show that both EDFPPA and RMPPA can obtain the similar scheduling performance as well as the well-known algorithms so far. Moreover, EDFPPA and RMPPA reduce the preemption dramatically than previous algorithms, thus reduce the negative impact introduced by preemption such as overhead runtime computation time.
     2) An efficient scheduling algorithm called TPFTRM is proposed in primary-backup based fault-tolerant systems. Compared with previous scheduling algorithms in this area, TPFTRM maximizes the backup over-booking and deallocation, thus reduces the hardware redundancy. Moreover, TPFTRM proposes the task partitioning and processors grouping technique, which reduce the scheduling computation time and also make an easy way to understand and implement it.
     3) A load-balancing algorithm called RSA is proposed in primary-backup based fault-tolerant systems. Compared with previous work of this area, RSA algorithm has the load better balanced no matter how many backup processes each primary process owns.
     4) A global equity crossing fault-tolerant system is described as a case study which integrates RSA algorithm and the other mechanisms to demonstrate the load-balancing algorithm benefits in reality.

引文

[1] D. Patterson, A. Brown, P. Broadwell. Recovery Oriented Computing (ROC):Motivation, Definition, Techniques, and Case Studies. Technical Report UCB//CSD-02-1175, Computer Science Division, University of California at Berkeley, 2002.3
    [2] B. Lampson. Computer Systems Research-Past and Future. Keynote address, 17th SOSP, 1999.12

    [3] J. Hennessy. The Future of Systems Research. Computer, 1999, 32(8):27-33
    [4] IBM. Automonic computing. 2001. Available from:http://www.research.ibm.com/autonomic.
    [5] J. Gray, D.P. Siewiorek. High Availability Computer Systems. IEEE Computer Magazine, 1991,1-19
    [6] J.-C. Laprie. Dependability of Computer System: from Concept to limits.LAAS-CNRS, Toulouse, France, 1992
    [7] B. Randell, J. Xu. The Evolution of the Recovery Block Concept in Software Fault Tolerance. M. R. Lyu, ed., Wiley, 1995,1-22
    [8] A. Avizienis. The N-Version Approach to Fault-Tolerant Software. IEEE Transaction on Software Engineering, 1995, SE(11):1491-1501
    [9] D. McEvoy. The architecture of Tandem's NonStop system. Proceedings of the ACM '81 conference, 1981, 1(1):245-246
    [10] V. Nicola. Checkpointing and the Modeling of Program Execution Time in Software Fault Tolerance. M. R. Lyu, ed., Wiley, 1995, 167-188

    [11] R. Ramamritham, J. Stankovic. Scheduling Algorithms and Operating Systems Support for Real-Time Systems. Proceedings of IEEE, 82(1), 1994.1

    [12] K. Kim. Issues Insufficiently Resolved in Century 20 in the Fault-Tolerant Distributed Computing Field. IEEE CS 19th Symp on Reliability Distributed Systems, 2000:106-115

    [13] J. Wensley. Design and Analysis of a Fault-Tolerant Computer for Aircraft Control.Proc.IEEE,Vol.66,1978.10
    [14]A.Hopkins.A highly Reliable Fault-Tolerant Multiprocessor for Aircraft.Proc.IEEE,vol.66,1978.10
    [15]L.Sha,M.Gagliardi,R.Rajkumar.Analytic Redundancy:A Foundation for Evolvable and Dependable Systems.The Proceedings of the International Conference on Reliability and Quality in Design,1995.3
    [16]A.Wellings,L.Beus-Dukic,D.Powell.Genericity and Upgradability in Ultra-Dependable Real-Time Architectures.
    [17]D.Powell.Distributed Fault Tolerance-Lessons Learnt from Delta-4.IEEE Micro,Feb 1994,14(1):36-47
    [18]陈宇.高可靠容错实时系统的支撑技术研究.工学博士学位论文,电子科技大学,2002
    [19]闵应骅.容错计算二十五年.计算机学报,1995,18(12):930-943
    [20]J.C.Laprie.Dependability:Basic Concepts and Terminology.5,Dependable Computing and Fault-Tolerant Systems,1992
    [21]J.Arlat,Y.Crouzet,D.Powell.Fault-Tolerant Computing.Techniques Report 98005,Laboratry of Analysis and Architecture System,1998
    [22]J.Gray.Why Do Computers Stop and What Can Be Done About It? Technical Report 85.7,TANDEM Computers,1985
    [23]杨剑峰,常晓波,李敏.分布式系统原理与范型.北京:清华大学出版社,2004:283-320
    [24]A.Avizienis.The N-Version Approach to Fault-Tolerant Software.IEEE Transaction on Software Engineering,1985,SE(11):1491-1501
    [25]黄伟.机群系统容错中间件技术研究.工学博士学位论文,中国科学院研究生院,2005
    [26]V.P.Nelson.Fault-tolerant computing:fundamental concepts.Computer,Jul 1990,23(7):19-15
    [27]M.K.Aguilera,W.Chen,S.Toueg.Heartbeat:A timeout-free failure detector for quiescent reliable communication.Lecture Notes in Computer Science,1997,1320(1):126-140
    [28]Y.Wang,Y.Huang,W.K.Fuchs,C.Kintala,G.Suri.Progressive Retry for Software Failure Recovery in Message-Passing Applications.IEEE Transactions on Computers,Oct 1997,46(10):1137-1141
    [29]R.Krishnan,M.Allman,C.Partridge,J.P.G.Sterbenz.Explicit Transport Error Notification for ErrorProne Wireless and Satellite Networks.BBN Technical Report No.8333,BBN Technologies,2002.2
    [30]J.V.Neumann.Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components.Princeton:Princeton University Press,1956,43-98
    [31]G.ShinK,P.Ramanathan.Real-Time Computing:A New Discipline of Computer Science and Engineering Proc.IEEE,1994,82(1):6-24
    [32]M.Barborak,A.Dahbura,M.Malek.The consensus problem in fault-tolerant computing.ACM Computing Surveys(CSUR),June 1993,25(2):171-220
    [33]宋平.基于Quorum系统容错技术综述.计算机研究与发展,2004,41(4):414-425
    [34]E.F.Moore.Reliable Circuits Using less Reliable Relays.Journal of the Franlin Institute,1956,9(10):191-208
    [35]A.Avizienis.The STAR(Self-Testing-and-Reparing) Computer:an Investigation of the Theory and Practice of Fault-Tolerant Computer Design.IEEE Trans on Computers,1971,C-20(11):1312-1321
    [36]R.W.Downing.ESS System Organization and Objectives.Bell Syst Tech J,1964,43(5):1831-1844
    [37]D.P.Siemwiorek.Architecture of fault-tolerant computers:an historical perspective.Proceedings of the IEEE,1991,79(12):1710-1734
    [38]J.Gray.A Census of Tandem System Availability,1985-1990.IEEE Trans.on Reliability,1990,39(4):409-418
    [39]A.B.Brown,D.A.DPatterson.Embracing Failure A Case for Recovery-Oriented Computing(ROC).HIGH PERFORMANCE TRANSACTION SYSTEMS WORKSHOP(HTPS'01),2001
    [40]D.Patterson,A.Brown,P.Broadwell,G.Candea.Recovery Oriented Computing (ROC):Motivation,Definition,Techniques,and Case Studies.Technical Report:CSD-02-1175,University of California at Berkeley Berkeley,CA,USA 2002
    [41]D.Rumsfeld.Rumsfeld's Rules:Advice on government,business and life.The Wall Street Journal Manager's Journal,2001.1
    [42]P.Broadwell,N.Sastry,J.Traupman FIG:A prototype tool for online verification of recovery mechanisms.Workshop on Self-Healing,Adaptive and self-MANaged Systems,2002
    [43]J.Xu,P.Ning,C.Kil,Y.Zhai,C.bookholt.Automatic diagnosis and response to memory corruption vulnerabilities.Proceedings of the 12th ACM conference on Computer and communications security 2005,223-234
    [44]G.Candea,A.Fox.Recursive restartability:turning the reboot sledgehammer into a scalpel.Proc.8th Workshop on Hot Topics in Operating Systems,2001,125-130
    [45]W.Jane,S.Liu.Real-Time Systems.Prentice Hall,2000
    [46]J.Stankovic.Misconceptions about Real-Time Computing:A Serious Problem for Next-Generation System.IEEE Computer Magazine,Oct 1988,21(10):10-19
    [47]A.Burns.Scheduling hard real-time systems:a review.Software Engineering Journal,1991,6(3):116-128
    [48]王强,王宏安,金宏,戴国忠.实时系统中的非定期任务调度算法综述.计算机研究与发展,2004,41(3):385-392
    [49]阳春华,桂卫华,计莉.基于多处理机的混合实时任务容错调度.计算机学报,2003,26(11):1479-1486
    [50]E.D.Jensen,C.D.Locke,H.Tokuda.Time-Driven Scheduling Model for Real-Time Operating Systems.IEEE,1985,112-122
    [51]J.Wu,J.-C.Liu,W.Zhao.Utilization-bound based schedulability analysis of weighted round robin schedulers.Proceedings -28th IEEE International Real-Time Systems Symposium,2007,435-446
    [52]T.F.Abdelzaher,V.Sharma,C.Lu.A utilization bound for aperiodic tasks and priority driven scheduling.IEEE Transactions on Computers,2004, 53(3):334-350
    [53]C.L.Liu,J.W.Layland.Scheduling Algorithms for Multiprogramming in a Hard Real-Time Environment.JACM,1973,20(1):46-61
    [54]J.P.Lehoczky,L.Sha,Y.Ding.The Rate Monotonic Scheduling Algorithm-Exact Characterization and Average Case Behavior.IEEE RTSS,1989
    [55]邢建生,刘军祥,王永吉.RM及其扩展可调度性判定算法性能分析.计算机研究与发展,2005,42(11):2025-2032
    [56]王永吉,陈秋萍.单调速率及其扩展算法的可调度性判定.软件学报,2004,15(6):799-814
    [57]J.A.Stankovic,M.Spuri,K.Ramamritham,G.C.Buttazzo.Deadline Scheduling for Real-time Systems-EDF and Related Algorithms.Kluwer Academic Publishers,1998
    [58]乔颖.实时异构系统的集成动态调度算法研究.博士学位论文,中国科学院软件研究所,2001
    [59]N.Budhiraja,K.Marzullo,F.Schneider,S.Toueg.The primary-backup approach.Acre Press Frontier Series,1993
    [60]秦啸,韩宗芬,庞丽萍,李胜利.混合型实时容错调度算法的设计和性能分析.软件学报,2000,11(5):686-693
    [61]秦啸,庞丽萍,韩宗芬,李胜利.分布式实时系统的容错调度算法.计算机学报,2000,23(10):1056-1063
    [62]郭辉,王智广,周敬利.异构分布式系统中基于负载均衡的容错调度算法.计算机学报,2005,28(11):1807-1816
    [63]S.Ghosh,R.Melhem,D.Mosse.Fault-Tolerance Through Scheduling of Aperiodic Tasks in Hard Real-Time Multiprocessor Systems IEEE Transactions on Parallel and Distributed Systems,Mar 1997,8(3):272-284
    [64]G.Manimaran,C.S.R.Murthy.A Fault-Tolerant Dynamic Scheduling Algorithm for Multiprocessor Real-Time Systems and Its Analysis.IEEE Transactions on Parallel and Distributed Systems,Nov 1998,9(11):1137-1152
    [65]秦啸,韩宗芬,庞丽萍.基于异构分布式系统的实时容错调度算法.计算机学报,2002,25(1):49-56
    [66]R.H.Campbell,K.H.Horton,G.G.Belford.Simulations of a fault-tolerant deadline mechanism.1979,95-101
    [67]A.L.Liestman,R.H.Campbell.A fault-tolerant scheduling problem.IEEE Transactions on Software Engineering,Nov 1986,12(11):1089-1095
    [68]C.C.HAN,G.SHINK,J.WU.A fault-tolerant scheduling algorithm for real-time periodic tasks with possible software faults.IEEE Trans on Computer,2003,52(3):362-372
    [69]李庆华,韩建军,A.E.ABBAS.硬实时系统中基于软件容错的动态调度算法.软件学报,2005,16(1):101-107
    [70]刘东,张春元,李瑞.软件容错模型中的容错实时调度算法.计算机研究与发展,2007,44(9):1495-1500
    [71]王继刚,顾国昌,徐立峰,李翌.基于非精确计算的实时任务检查点设置策略.哈尔滨工程大学学报,2007,28(2):203-206
    [72]张尧学,方存好,王勇.非精确计算中基于反馈的CPU在线调度算法.软件学报,2004,15(4):616-623
    [73]韩宗芬,王俊,袁平鹏,谭朋柳.基于非精确计算的保证计算质量的容错调度.华中科技大学学报(自然科学版),2006,34(12):40-43
    [74]M.Hamdaoui,P.Ramanathan.A dynamic priority assignment technique for streams with(m,k)-firm deadlines.IEEE Trans on Computers,1995,44(12):1443-1451
    [75]A.Castorino,G.Cieearell.Algorithm for real-time scheduling of error-cumulative tasks based on the imprecision computation approach Journal of Systems and Architecture,2000,46(1):587-600
    [76]P.Ramanathan.Graceful Degradation in Real-Time Control Applications Using (m,k)-Firm Guarantee.Proc.IEEE Fault-Tolerant Computing Symp,Seattle,1997,132-141
    [77]I.Gupta,G.Manimaran,C.S.R.Murthy.Primary-backup based fault-tolerant dynamic scheduling of tasks in multiprocessor real-time systems. IEEE Fault-Tolerant Computing Symposium Fast Abstracts, Madison, Wisconsin, 1999
     [78] R. Al-Omari, A.K. Somani, G. Manimaran. Efficient overloading techniques for primary-backup scheduling in real-time systems. Journal of Parallel and Distributed Computing. Journal of Parallel and Distributed Computing, 2004,64(5):629-648
    [79] H. Aydn, R. Melhem, D. Mosse. Incorporating error recovery into the imprecise computation model. The Sixth International Conference on Real-Time Computing Systems and Applications, 1999, 348-355
    [80] H. Aydin, R. Melhem, D. Messe. Optimal scheduling of imprecise computation tasks in the presence of multiple faults. Proceedings of the Seventh International Conference on Real-Time Systems and Applications, 2000, 289-296
    [81] C. Montez. Dealing with overloading in tasks scheduling. Proceedings of the 22th International Conference of the Chilean Computer Science Society, 2002,24-29
    [82] S. Purao, K.J. Hemant, L.N. Derek. Effective Distribution of Object-Oriented Applications. Communications of the ACM, 1998, 41(8):100-108
    [83] M.A. Serrano, D.L. Carver, C.M. Oca. Reengineering Legacy Systems for Distributed Environments. Journal of Systems and Software, 2002, 64(1):37-55
    [84] IBMCorp. Parallel Sysplex Cluster Technology Overview. Available from:http://www.ibm.com/servers/eserver/zseries/library/techpapers/pdf/gml30156.pdf
    [85] OracleCorp. Oracle8 Parallel Server Concepts & Administration. 2006.Available from:http://www.rohan.sdsu.edu/doc/oracle/server803/A5463901/toc.html.
    [86] IBMCorp. Using WebSphere Extended Deployment V6.0 To Build an On Demand Production Environment. Technical Report, 2006
    [87] J. Ju, K. Yang, GH. Xu. Using resource utilization as load index in dynamic load balancing. Journal of Software, 1996, 7(4):238-243
    [88] M. Bozyigit, M. Melhi. Load balancing framework for distributed system. Journal of Computer Science and Engineering,1997,12(5):287-293
    [89]H.B.Mor,D.B.Allen.Exploiting process lifetime distributions for dynamic load balancing.ACM Transaction on Computer System,1997,15(3):253-285
    [90]杨小虎,王新宇,毛明.基于数据划分的分布式模型及其负载均衡算法.浙江大学学报(工学版),2008,42(4):602-607
    [91]H.C.Chang.Optimized software component allocation on clustered application servers.PhD.dissertation,2004
    [92]S.I.Ben,O.Holder,B.Lavva.Dynamic adaptation and deployment of distributed components in Hadas.IEEE Trans on Software Engineering,2001,27(9):769-787
    [93]E.Wong,R.H.Katz.Distributing a database for parallelism.Proceedings of the SIGMOD conference,1983,23-29
    [94]E.Lee,W.S.Lee,C.Wu.A reengineering process for migrating from an object-oriented legacy system to a component-based system.Proceedings of the 27th International Computer Software and Applications Conference,2003,336-341
    [95]M.Manish,J.D.David.Data placement in shared-nothing parallel database systems.The VLDB Journal,1997,6(1):53-72
    [96]J.A.Bannister,K.S.Trivedi.Task allocation in fault-tolerant distributed systems.Acta Inform.J.,1983,20(3):261-281
    [97]R.Davoli,L.A.Giachini,O.Babaoglu.Parallel computing in networks of workstations with Paralex.IEEE Transactions on Parallel and Distributed Systems,1996,7(4):371-384
    [98]J.Kim,H.Lee,S.Lee.Process Allocation for Load Distribution in Fault-Tolerant Multicomputers.Proceedings of Twenty-Fifth International Symposium on Fault-Tolerant Computing,1995
    [99]J.Kim,H.Lee,S.Lee.Replicated process allocation for load distribution in fault-tolerant multicomputers.IEEE Transactions on Computers,1997,46(4):499-505
    [100]J.Kim,H.Lee,S.Lee.Fault-Tolerant Process Allocation with Load Balancing in Fault Tolerant Systems.Proceedings of Pacific Rim Int'l Symp,1995
    [101]H.Lee,J.Kim,S.Hong.Evaluation of Two Load-Balancing Primary-Backup Process Allocation Schemes.IEICE Transactions on Information and Systems,1999,E82-D(12):118-127
    [102]L.George,N.Rivierre,M.Spuri.Preemptive and Non-Preemptive Real-Time Uni-Processor Scheduling.INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE,ISSN:0249-6399,1996
    [103]王济勇,林涛,王金东.EDF调度算法抢占行为的研究及其改进.电子学报,2004,32(1):64-68
    [104]K.Jeffay,D.F.Stanat,C.U.Martel.On non-preemptive scheduling of periodic and sporadic tasks.Proceedings of the 12 th IEEE Symposium on Real-Time Systems.San Antonio,1991,129-139
    [105]A.A.Bertossi,L.V.Mancini,F.Rossini.Fault-tolerant rate-monotonic first-fit scheduling in hard-real-time systems.IEEE Trans on Parallel and Distributed Systems,1999,10(9):1035-1045
    [106]S.K.Dhall,C.L.Liu.On a Real-Time Scheduling Problem.Operations Research,1978,26(1):127-140
    [107]A.Burchard,J.Liebeherr,Y.Oh,S.H.Son.New strategies for assigning real-time tasks to multiprocessor systems.IEEE Trans on Computers,1995,44(12):1429-1442
    [108]G.Manimaran,C.Siva Ram Murthy.An efficient dynamic scheduling algorithm for multiprocessor real-time systems.IEEE Trans.Parallel and Distributed Systems,1998,9(3):312-319
    [109]P.Ezhilchelvan.On the progress in fault-tolerant real-time computing.Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems,2004,103-105
    [110]J.Y.Leung,M.L.Merrill.A note on preemptive scheduling periodic real-time tasks.Information Processing Letters 1980,11(1):115-118
    [111]F.M.Yang,W.Luo,L.P.Pang.An efficient real-time fault-tolerant scheduling algorithm based on multiprocessor systems.Wuhan University Journal of Natural Sciences,2007,12(1):113-116
    [112]A.A.Bertossi,L.V.Mancini,F.Rossini.Scheduling hard-real-time tasks with backup phasing delay.Proceedings of the Tenth IEEE International Symposium on Distributed Simulation and Real-Time Applications,2006,107-116
    [113]罗威,阳富民,庞丽萍,李俊.基于延迟主动副版本的分布式实时容错调度算法.计算机研究与发展,2007,44(3):521-528
    [114]罗威,阳富民,庞丽萍.异构分布式系统中实时周期任务的容错调度算法.计算机学报,2007,20(10):1740-1749
    [115]M.Joseph,P.Pandya.Finding response times in a real-time system.The Computer Journal,1986,29(5):390-395
    [116]C.H.Yang,G.Deconink,G.W.H.Fault-tolerant scheduling for real-time embedded control systems.Journal of Computer Science & Technology,2004,19(2):191-202
    [117]H.Garcia-Molina.Elections in a Distributed Computing System.Computers,IEEE Transactions,1982,31(1):48-59
    [118]S.Singh,J.F.Kurose.Electing "good" leaders.Parallel and Distributed Computing,1994,21(2):184-201
    [119]S.D.Stoller.Leader Election in asynchronous distributed systems.IEEE Trans.Computer,2000,49(3):283-284
    [120]D.Powell.Distributed Fault Tolerance:Lessons from Delta-4.IEEE Micro,1994,14(1):36-47
    [121]Z.X.Y.,D.Zagorodnov,M.Hiltunen,K.Marzullo,R.D.Schlichting.Fault-tolerant grid services using primary-backup:feasibility and performance.Proceedings of IEEE International Conference on Cluster Computing,2004
    [122]L.Yang,W.H.Zhao.Research on architecture of dynamic self-healing system.Journal of Zhejiang University(Engineering Science),2005
    [123]S.Navaratnam,S.Chanson,G.Neufeld.Reliable group communication in distributed systems.8th International Conference,San Jose,CA,USA.,Jun 1988
    [124]K.Kopper.The Linux Enterprise Cluster:build a highly available cluster with commodity hardware and free software.2005
    [125]JGroups.A Toolkit for Reliable Multicast Communication.Available from:http://www.jgroups.org/javagroupsnew/docs/index.html.
    [126]SonicMQ.Available from:http://www.sonicsoftware.com/products/sonicmq/documentation/index.ssp.
    [127]SUN.The Java Virtual Machine Profiler Interface(JVMPI) Available from:http://java.sun.com/j2se/1.4.2/docs/guide/jvmpi/jvmpi.html.