实时分布式系统的容错设计与负载平衡算法的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
实时系统需要具备容错功能的观点,已经被实时与容错这两个领域的研究者所广泛认同?而分布处理和分布控制技术显示的广泛应用前景,各种分布的软硬件系统的大量开发使用,加之分布系统的模块性?并行性?自治性等优点,也促使人们花费更多的精力研究分布式容错系统?
     本文结合电力SCADA系统,介绍了实时?容错和分布式系统的相关概念,阐述了以PC机和以太网为基础的分布式处理环境,根据实时系统容错模型的要求,设计分布式PC容错系统,实现了基于Windows多线程思想设计的双机容错系统,从串口和网络两方面论证了其设计思想?详细论述了机群系统中的双前置机?双服务器的实现方法?给出了基于网络广播的全对等模型和双网模型?同时设计了系统其它一些容错措施?为实现分布式实时系统的时间统一,必须建立统一的时间服务系统或时间服务器?根据实际需要,提出得到时间校正值的一种算法,采用Socket编程在系统中得到实现?对系统中所涉及的实时调度算法也进行了必要的讨论?
     文中对实时分布式容错设计中的负载平衡问题进行了深入的研究,提出本系统的负载平衡模型,讨论了服务对象容错和负载平衡,设计了几种负载平衡算法,对自适应的高可用负载平衡算法中冗余度的确定采用遗传算法得到了实现?对相关算法进行了一定的仿真与测试?负载平衡保证了冗余系统资源的有效利用?
     容错设计和负载平衡算法的研究保证了实时分布式系统的高可靠性?高可用性以及实时性的要求,具有广泛的应用前景?
The viewpoint of reai time system needs the function of fault olerance has been approved widely by the researchers in the domain of eal time system and fault tolerance.At the same time,the extensive application prospect of distributed management and control,a large amount of developing and using a variety of distributed software and hardware system,and the distributed system’s advantages of module,parallelism,and self-government also drive people to spend more energy to study the distributed fault tolerance system.
     Combining the electric power’s SCADA system,the text introduces the correlative concepts of real time,fault tolerance and distributed system,and expounds the distributed process environment on the base of PC and Ethernet.According to the requirement of the real time system fault tolerance mode,designs the distributed PC fault tolerance system. The paper presents an approach to design duplicated fault tolerance technology based on multi—threads Windows operation,and demonstrates the way to realize it in serial port and in network port.The paper discusses the Duplicated former computers and Duplicated servers’achievement details in the computer group system,and presents the overall equity mode based on the network broadcast and duplicated network mode.Also,the text designs some other fault tolerance measures.To realize the time unity of distributed time system,the united time server system or time server must be established.The paper provides an arithmetic of getting the time emendation for practical demand,and uses Socket programming to realize it in electric power systems.The text also discusses some real time dispatch algorithm.
    According to the requirement of load balance in the design of real time distributed fault tolerance,the text carries out thorough research,submits the system’s load balance mode,and discusses the service objects’fault tolerance and load balance,also designs some load balance algorithm.To
    
    
    conclude the redundancy in the adaptive high available-load balancing algorithm,takes genetic algorithm.For some algorithm,the text has some emulations and tests.The load balance guarantees the effective utilization of the redundancy system’s resource.
     The design of Fault tolerance and the study of load balance guarantees the system’s high-reliability,high-usability and real time’s requirement,and has extensive application prospect.
引文
[1]Jim Gray and D.P.Siewiorek, “High-Availability Computer System”, COMPUTER,24(9),Sep.1991,pp39-48
    [2]V.P.Nelson, “Fault-Tolerant Computing: Fundamental Concepts”,COMPUTER,23(7),July,
    1990,pp19-25
    [3]D.P.Siewiorek, R.S.Swarz, The Theory and Practice of Reliable System Design, Digital Press,1982
    [4]石立,王占林,王一平,数字式电传飞行控制系统容错计算机冗余度管理研究,第五界全国容错计算机会议论文集,北京,1993,pp215-219
    [5]戈应安,分布式实时系统的容错设计与实现,中国科技大学博士论文,1998,pp1-2,6-7,27-28
    [6]汪东升,分布式容错计算机系统的冗余管理及其卷回恢复技术,哈尔滨工业大学博士论文,1995,pp3
    [7]刘云龙,分布式系统中基于备查点机制的容错策略研究,北京邮电大学博士论文,1997,pp9
    [8]郭晶虹,高速实时系统容错软件的研究与设计,南京航空航天大学硕士学位论文,1998,pp16-18
    [9]侯朝桢,分布式计算机控制系统[M],北京理工大学出版社,1997,pp15-16
    [10]朱海滨,蔡开裕,樊爱华等,分布式系统原理与设计[M],国防科技大学出版社,1997,pp15
    [11]徐立云,邵惠鹤,双机容错系统的一种实现途径[J],计算机工程,2000,26(9):95-96
    [12]李宏亮等,强实时双系统中容错技术研究[J],国防科技大学学报,2000,22(6):57-59
    [13] 熊伟,孙娜,丁宇征,基于DCOM的分布式可重组系统的研究与实现[J],计算机应用,2002,22(11):62-65
    [14]张伟,双网冗余技术及其实现,重庆教育学院学报[J],2000,13(4):68-71
    [15]张公忠,王钰,NOVELL组网原理与使用技术[M],北京:清华大学出版社,1992
    [16]张伟,NOVELL网络的容错探讨[J],重庆教育学院学报,1999,(1):63
    [17]中华人民共和国电力行业标准远动设备及系统第5部分传输规约,中华人民共和国电力工业部
    [18]欧阳珣,李榕,分布式容错系统的同步化策略[J],计算机系统应用,2002.2:23-25
    [19]钱方,分布计算环境中冗余服务管理机制的研究与实现,国防科技大学博士学位论文,2000年
    
    [20]贾焰,分布式对象事务处理中间件技术研究,国防科技大学博士学位论文,2000年,pp73
    [21]孙荣恒,李建平等,排队论基础[M],科学出版社,2002,10
    [22]周明,孙树栋,遗传算法原理及应用[M],国防工业出版社,1999,6
    [23]贺鹏,董甲东,被动式时间同步算法在电力系统中的应用与误差分析[J],微计算机信息,2002.18(10):71-72
    [24]魏振华,赵龄强,王培东,基于容错机制的并行计算机系统中同步策略[J],哈尔滨理工大学学报,2000.5(1):51-55
    [25]贺鹏,吴海涛,分布式系统的时间同步算法研究及应用[J],计算机应用,2001.21(12):20-21
    [26]钱方等,提高冗余服务性能的动态容错算法[J],软件学报,2001,12(6):928-935
    [27]秦啸等,分布式实时系统的容错调度算法[J],计算机学报,2000,23(10):1056-1063
    [28]孙英华等,多处理机容错系统中实时系统的轮转式调度算法[J],计算机工程与应用,2001.17:104-106
    [29]袁凌,基于冗余的软件容错技术研究,武汉大学硕士学位论文,2002
    [30]蔡德聪等编著,工业控制计算机实时操作系统[M],清华大学出版社,1999,12
    [31]朱海滨,蔡开裕,樊爱华等,分布式系统原理与设计[M],国防科技大学出版社,1997,9
    [32]何玉彬,李新忠,神经网络控制技术及其应用[M],科学出版社,2000,11
    [33]张云生,实时控制系统软件设计原理及应用[M],国防工业出版社,1998,12
    [34]胡昌华,许化龙,控制系统故障诊断与容错控制的分析和设计[M],国防工业出版社,2000,7
    [35]王仲生,智能容错技术及应用,国防工业出版社[M],2002,9
    [36]王福利,张颖伟,容错控制,东北大学出版社[M],2003
    [37]A.D.Singh,S.Murugesan, “Fault Tolerant Systems”, COMPUTER, 23(7),July,1990
    [38]S.Hariri, A.Choudhary, and B.Sarikaya, “Architectural Support for Designing Fault-Tolerant Open Distributed Systems”, COMPUTER, 25(6),June 1992
    [39]X.Wang, K.Ramamritham, and J.A.Stankovic, “Determining Redundancy Levels for Fault-Tolerant Real-Time Systems”, IEEE Trans. on Computers,44(2).Feb,1995
    [40]D.S.Wang, X..Z.Yang, and Z.B.Wu, “Dependability Prediction for Fault-Tolerant Computing System,” Proc. Of Reliability Theory, Method and its Applications, 1994
    
    [41]Anderson Thoms E.Culler David E.Patterson David.A case for NOW(NetWorks of Workstations)[m].IEEE MICRO.,February 1995
    [42]S.Long and W.K.fuchs, “Compiler-Assisted Static Checkpoint Inserttion”, FTC-22,1992
    [43]Brian A.Coan,”Efficient agreement Using Fault Diagnosis”, Distrib. Comput(1993)7,pp87-98
    [44]M.Ahamad, luke Lin, “Using Checkpoints to Localize the effects of Faults in Distributed Systems”,Proc.of Reliable Distributed systs,1989,pp2-11
    [45]C.Li and W.K.Fuchs, “CATCH-Compiler-Assisted Techniques for Checkpointing”,FTCS-90,
    pp74-81
    [46]Miremadi,Gassem etal,”Two Software Techniques for On-line Error Detection”,Proc.of IEEE FTCS-22,July 1992,pp328-335

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700