网格服务可靠性建模及任务调度优化研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着制造产品功能和结构的复杂性增加,在产品设计中对计算能力和存储能力的需求量也越来越大,仅靠单个计算机已经无法满足现代产品设计的需求。网格技术的出现,使得人们能够通过互联网获得更加强大的计算能力和数据储存能力,而且还可以实现多台计算机的协同工作。利用网格技术,许多产品设计领域中一些以前看来不可能完成的问题都能迎刃而解。然而,由于网格系统的复杂性,网格在可靠性方面仍面临着诸多问题,致使网格系统还无法真正深入制造领域并最终实现整个行业模式的巨大变革。
     作为衡量网格服务质量的重要属性,网格服务可靠性能够从用户的角度反映网格系统提供服务的能力,因此如何分析和提高网格服务可靠性已经引起国内外众多学者的极大关注。本文将网格容错技术和可靠性分析方法相结合,研究了容错机制下的网格服务可靠性建模以及考虑网格服务可靠性的任务调度优化问题,为从根本上提高网格系统的可靠性做出基础性的探索。首先引入节点失效恢复机制,研究了考虑失效恢复以及本地任务到达的网格服务可靠性建模问题;其次,在所建立的模型基础上,研究了以网格服务可靠性为中心的网格任务调度优化方法;最后针对网格任务调度过程中的资源定价问题,提出了网格资源补偿的概念,形成了一种市场机制下的资源定价方式,从而为网格走向社会生产生活创造有利条件。
     本文的研究成果主要体现在如下几个方面:
     (1)考虑失效恢复的网格服务可靠性建模
     为了提高网格的服务可靠性,引入本地失效恢复机制,同时考虑软件失效的影响,给出了一种考虑节点失效恢复能力的网格服务可靠性模型。为了提高模型的实用性,允许资源拥有者根据资源状况自行调节资源的失效恢复次数以及网格任务生存时间,在此基础上,研究了失效恢复限制下的网格服务可靠性建模问题。该模型为解决“大网格服务”可靠性偏低问题提供了一种有效的途径。
     (2)制造网格中制造资源的网格任务可靠性研究
     制造资源具有自治性、异构性及动态性等特点,制造资源除了完成制造网格分配的任务外,还负责本地管理域的工作任务。特别是在本地任务优先策略下,本地任务到达以及失效等因素都会严重影响资源能否在规定时间内完成网格任务。针对这一问题,采用Petri网技术对网格任务的执行过程进行了状态分析。在此基础上,通过蒙特卡洛仿真获得了本地任务优先策略下的网格任务可靠性,并分析了本地任务到达率、本地任务执行率等因素对网格任务可靠性的影响,从而为网格资源管理系统更好地实现网格任务调度提供依据。
     (3)失效恢复机制下的网格任务调度优化研究
     在建立的网格服务可靠性模型基础上,研究了网格服务可靠性最大化和执行费用最小化的多目标任务调度优化问题,并采用蚁群算法对该模型进行求解。为了提高网格服务可靠性,采用网格任务冗余调度模式,建立了费用约束下的网格任务冗余调度优化模型。在模型求解中,采用遗传算法,并针对资源约束问题设计了专门的修正因子,从而确保算法的正常运行。仿真结果验证了算法的有效性。
     (4)市场价值下的网格资源补偿研究
     市场机制下的网格资源管理是解决网格资源短缺的重要手段。通过深入分析目前网格资源稀缺的原因,得出了网格用户不仅需要支付一定的资源花费,而且还需要对资源由于执行网格任务而丧失执行本地任务的损失做出补偿的结论。分析了两种调度策略下的资源平均收益,并基于微观经济学中机会成本的概念,给出了网格任务时间限制下的资源最小补偿表示形式,在此基础上提出了一种市场机制下的资源定价模型。采用蒙特卡洛仿真模拟两种任务调度策略的期望收入,获得了资源最小补偿的具体数值,分析了网格任务特征和资源特征对资源最小补偿的影响。网格资源补偿的提出能够为资源拥有者提供加入网格的动力,从而吸引更多的网格资源加入网格。
With the increase of function and structural complexity of manufacture products, an increased demand of computing and storage ability is needed in product design, which has been beyond the current ability of a single computer. With the emergence of grid technology, people not only can gain more powerful computing power and data storage capacity from Internet, but also can use multiple computers to work together. It can tackle large-scale and difficult problems that would be impossible to feasibly solve using the computing resources of a single organization. However, due to the complexity of grid system, there are a lot of problems unsolved, i.e., grid reliability problem, so that the grid has not been widely used in manufacturing industry and then achieves great changes in industry mode.
     As one of the important measures of quality of service, grid service reliability can reflect the capacity of providing reliable services from a user's point of view. How to analyze and improve grid service reliability has attracted a lot of research and attention. In this paper, combining grid service reliability analysis with fault tolerance, we study grid service reliability modeling and the reliability-oriented optimization of grid task scheduling. The research can pave the way for thoroughly improving grid service reliability. Firstly, a fault recovery mechanism in grid nodes is introduced and the modeling of grid service reliability considering fault recovery and local task arrivals is studied. Based on the proposed model, an optimization model of grid task scheduling is presented to maximize the grid service reliability. Finally, for the crucial problem of resource pricing in grid task scheduling, a fair price model in market-oriented environment is presented, which can accelerate the grid penetrating into the society.
     The contributions of this dissertation are summarized as follows:
     (1) Grid service reliability model with fault recovery
     In order to improve grid service reliability, a fault recovery mechanism in local grid nodes is introduced. Considering fault recovery and software failure, a grid service reliability model is proposed. To make fault recovery more practical, certain constraints on fault recovery, i.e., constraints on the life times of subtasks and on the numbers of recoveries performed, are introduced, and grid service reliability models under these practical constraints are developed. The proposed models can provide an efficient solution for low reliability of time-consuming tasks.
     (2) Grid task reliability model in manufacturing grid
     In manufacturing grid, manufacturing resources have characteristics of autonomy, heterogeneousness and dynamic. They engage tasks coming from not only manufacturing grid system but also the local administrative domain. Especially in the priority strategy of local tasks, the arrival of local task and failure occurrence in the execution of grid task have great impact on the reliable completion of grid task in a specified time. To solve this problem,the state analysis of manufacturing resources based on Petri net is given to describe the complexity of grid task execution process in manufacturing resources. Based on the Monte Carlo Simulation, grid task reliability in the priority strategy of local tasks is obtained. Furthermore, the influence of local task arrival rate and local task execution rate on grid task reliability is analyzed. The results can provide some information to grid resource management so as to make grid task scheduling better.
     (3) Optimal redundant scheduling of grid tasks based on fault recovery
     Based on the proposed grid reliability model, a multi-objective task scheduling optimization model, i.e., minimizing cost and maximizing reliability, is presented and an ant colony optimization algorithm is developed to solve it effectively. Furthermore, to improve grid service reliability, a redundant scheduling strategy of grid tasks is used and an optimization model with a cost constraint is presented to maximize the grid service reliability. A genetic algorithm is developed to solve it and some repair operators are designed to adjust the infeasible solutions of the chromosomes, which can ensure the algorithm work well. A numerical example is given to show the efficiency of the algorithm.
     (4) Analysis of grid resource compensation in market-oriented environment
     Market-oriented grid resource management is an efficient solution to cope with the scarceness of grid resources in grid system. Through in-depth analysis of this scarceness, it can be known that grid users should pay resource owners a sum of money not only for resources consumed but also for the loss of local task execution. Based on the analysis of the expected incomes of two priority strategies in grid resources, the minimal compensation which grid users should pay to resources owners is determined using the concept of opportunity cost. Based on it, a variable price model is presented to ensure a fair market environment. To calculate minimal compensation, an evaluation approach based on Monte Carlo simulation is given and the minimal compensation can be determined. Furthermore, the influence of the attributes of grid tasks and grid resources on minimal compensation is studied. The research can provide an incentive to resource owners and attract more and more resources in the Internet to participate in the grid.
引文
[1] Foster I. The grid: a new infrastructure for 21st century science[J]. Physics Today, 2002, 55(2): 42-47.
    [2]修英姝,崔德刚.网格技术在航空制造业的应用研究[J].计算机研究与发展, 2004, 41(12): 2073-2078.
    [3]白瑞雪,王玉山,喻菲.数字详解“天河一号”[OL]. http://news.xinhuanet.com/tech/ 2009-10/29/content_12355604.htm, 2009-10-29.
    [4]桂小林.网络技术导论[M].北京:北京邮电大学出版社, 2005.
    [5]崔昊.虚拟化不仅提升服务器利用率[OL]. http://blog.sina.com.cn/s/blog_59eca0680 100ay86.html, 2008-09-17.
    [6]机电商情网.日本KDDI研发出世界最快光纤电缆[OL]. http://www.jd37.com/news/ 20098/64085.html, 2009-8-25.
    [7] Foster I, Kesselman C. The Grid 2: Blueprint for a New Computing Infrastructure [M]. San Francisco: Morgan-Kaufmann, 2004.
    [8]都志辉,陈渝,刘鹏.网格计算[M].北京:清华大学出版社, 2002.
    [9]人民网.日本各大公司纷纷开展网格计算研究[OL]. http://past.people.com.cn/GB/it/ 53/305/20030530/1004353.html, 2003-5-30.
    [10] Casanova H, Bartol T, Stiles J, et al. Distributing MCell simulations on the grid[J]. The International Journal of High Performance Computing and Supercomputing Applications, 2001, 15(3): 243-257.
    [11]徐志磊,尚林盛.国内外制造业发展趋势[J].航空制造技术, 2003, (10), 17-19.
    [12]王秀彦,费仁元,安国平. 21世纪制造业的发展趋势[J].北京工业大学学报(社会科学版), 2002, 2(1): 53-56.
    [13]叶作亮,顾新建,钱亚东,等.制造网格—网格技术在制造业中的应用[J].中国机械工程, 2004, 15(19): 1717-1720.
    [14] Berman F, Fox G, Hey T. Grid Computing: Making the Global Infrastructure a Reality[M]. Chichester: John Wiley & Sons, 2003.
    [15]郑怀亮,陈德焜.网格技术在制造业中的应用研究[J].机械研究与应用, 2004, 17(1): 7-9.
    [16] Johnston W E. Computational and data grids in large-scale science and engineering[J]. Future Generation Computer Systems, 2002, 18(8): 1085-1100.
    [17] NASA. http://www.nas.nasa.gov/.
    [18]肖田元.虚拟制造研究进展与展望[J].系统仿真学报, 2004, 16(9): 1879-1883.
    [19] GEODISE. http://www.geodise.org.
    [20] DAME. http://www.cs.york.ac.uk/dame.
    [21]海岸.网格诊断扼杀故障萌芽[J].中国计算机用户, 2005, (36): 19-19.
    [22]刘丽兰,蔡红霞,俞涛.制造网格基础、原理与技术[M].上海:上海大学出版社, 2008.
    [23] Qiu R G. Manufacturing grid: a next generation manufacturing model[C]. IEEE International Conference on Systems, Man and Cybernetics, The Hague, Netherlands, 2004, 4667-4672.
    [24]中华人民共和国科学技术部.国家高技术研究发展计划(863计划)信息技术领域“高效能计算机及网格服务环境”重大项目2009年度课题申请指南[OL]. http://www.most. gov.cn/tztg/200904/t20090430_68951.htm, 2009-04-30.
    [25]施战备,俞涛,刘丽兰.快速制造网格中服务注册与发现[J].计算机应用, 2003, 23(9): 85-87.
    [26]刘丽兰,俞涛,施战备,等.自组织制造网格及其任务调度算法[J].计算机集成制造系统, 2003, 9(6): 449-455.
    [27]施战备,俞涛,刘丽兰.制造网格及其资源配置算法[J].计算机工程, 2004, 30(5): 117-119.
    [28]刘丽兰,俞涛,施战备.制造网格中基于服务质量的资源调度研究[J].计算机集成制造系统, 2005, 11(4): 475-480.
    [29]孙海洋,俞涛,刘丽兰.制造网格中的QoS管理[J].计算机工程, 2008, 34(10): 261-263.
    [30]陈庆新,田文生,陈新,等.特许连锁模式下的模具制造网格系统架构[J].计算机集成制造系统, 2003, 9(7): 595-600.
    [31]王爱民,范莉娅,肖田元,等.面向制造网格的应用平台及虚拟企业建模研究[J].机械工程学报, 2005, 41(2): 176-181.
    [32]王同洋,秦保安,吴俊军.制造网格及其关键技术[J].计算机应用与软件, 2006, 23(2): 55-57.
    [33]郝洪艳,汤文威,孔凡新.基于制造网格平台的模具制造资源共享机制研究[J].中国制造业信息化, 2007, 36(10): 117-120.
    [34]叶作亮,顾新建.制造网格及其体系结构与关键技术[J].计算机科学, 2009, 36(3): 14-20.
    [35]王利娜,杨兴林.基于制造网格的船舶敏捷制造体系研究[J].舰船科学技术, 2009, 31(8): 9-13.
    [36]常艳,刘旭,程文渊,等.飞机多目标优化设计网格的研究与应用[J].计算机研究与发展, 2007, 44(1): 44-50.
    [37]吉笑峰,王和平.基于网格计算的飞机气动与隐身综合优化设计[J].科学技术与工程, 2008, 8(11): 2889-2892
    [38]崔德刚,王玉娟,钱德佩,等.基于网格的多学科飞机优化技术[J].航空制造技术, 2009, (12): 26-31.
    [39]余前帆.实践中的中国网格应用[J].中国计算机用户, 2005, (14), 23-23.
    [40]计算机网络信息中心超级计算中心.航空制造网格[OL]. http://www.cngrid.org/ 01_introduce/cn_application_hangkong.htm, 2005.
    [41] Carr F D. How Google works[OL]. http://www.baselinemag.com/c/a/Infrastructure/How -Google-Works-1/, 2006-07-26.
    [42] Dai Y S, Xie M, Poh K L. Reliability of grid service systems[J]. Computers and Industrial Engineering, 2006, 50(1-2): 130-147.
    [43] Foster I, Kesselman C, Nick J M. Grid services for distributed system integration[J]. Computer, 2002, 35(6): 37-46.
    [44] Zio E. Reliability engineering: old problems and new challenges[J]. Reliability Engineering & Systems Safety. 2009, 94(2): 125-141.
    [45] Bolosky W J, Douceur J R, Ely D, et al. Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs[C]. Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems, Santa Clara, USA, 2000, 34-43.
    [46] Osilca G, Bouteiller A, Cappello F, et al. MPICH-V: toward a scalable fault tolerant MPI for volatile nodes[C]. Proceedings of the International Conference on Supercomputing, Baltimore, USA, 2002, 1-18.
    [47] Wang X, Zhuang Y, Hou H. Byzantine fault tolerance in MDS of grid system[C]. Proceedings of the 2006 International Conference on Machine Learning and Cybernetics, Dalian, China, 2006, 2782-2787.
    [48] Li Q, Xu M, Zhang H. A root-fault detection system of grid based on immunology[C]. Proceedings of the 5th International Conference on Grid and Cooperative Computing, Changsha, China, 2006, 369-373.
    [49] Kola G, Kosar T, Livny M. Faults in large distributed systems and what we can do about them[C]. Proceedings of the 11th European Conference on Parallel Processing, Lisbon, Portugal, 2005, 442-453.
    [50] Paul T, Jie X. Fault tolerance within a grid environment[C]. Proceedings of UK e-Science All Hands Meeting 2003, Nottingham, UK, 2003, 272-275.
    [51] Foster I, Kesselman C, Tuecke S. The anatomy of the grid: enabling scalable virtual organizations [J]. International Journal of High Performance Computing Applications, 2001, 15(3): 200-222.
    [52] Sander V. Networking issues for grid infrastructure[S]. Informational Document GFD-I.037, Open Grid Forum, 2004.
    [53] Xie M. Software Reliability Modeling[M]. Singapore: World Scientific Publishing Company, 1991.
    [54] Hwang S, Kesselman C. A flexible framework for fault tolerance in the grid[J]. Journal of Grid Computing, 2003, 1(3): 251-272.
    [55] Christopher D. Reliability in grid computing systems[J]. Concurrency and Computation: Practice and Experience, 2009, 21(8): 927-959.
    [56] Macwan A P, Hanmer R S, Mutha K K. Reliability and communications networks[J]. Bell Labs Technical Journal, 2006, 11(3): 1-6.
    [57] Qin X, Jiang H. A dynamic and reliability-driven scheduling algorithm for parallel real-time jobs executing on heterogeneous clusters[J]. Journal of Parallel and Distributed Computing, 2005, 65(8): 885-900.
    [58] Chen D J, Huang T H. Reliability analysis of distributed systems based on a fast reliability algorithm[J]. IEEE Transactions on Parallel and Distributed Systems, 1992, 3(2): 139-154.
    [59] Yang B, Tan F, Dai Y S, et al. Performance evaluation of cloud service considering fault recovery[C]. Proceedings of CloudCom 2009, Beijing, China, 2009, 571-576.
    [60] Chandra T, Toueg S. Unreliable failure detectors for reliable distributed systems[J]. Journal of the Association for Computing Machinery, 1996, 43(2): 225-267.
    [61] Hayashibara N, Cherif A, Katayama T. Failure detectors for large-scale distributed systems[C]. Proceedings of the 21st IEEE Symposium on Reliable Distributed Systems, Suita, Japan, 2002, 404-409.
    [62] Tierney B, Aydt R, Gunter D, et al. A grid monitoring architecture[S]. Informational Document GFD-I.7, Open Grid Forum, 2002.
    [63] Stelling P, Foster I, Kesselman C, et al. A fault detection service for wide area distributed computations[J]. Cluster Computing, 1999, 2(2): 117-128.
    [64] Jain A, Shyamasundar R. Failure detection and membership management in grid environments[C]. Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing, Pittsburgh, USA, 2004, 44-52.
    [65] Horita Y, Taura K, Chikayama T. A scalable and efficient self-organizing failure detector for grid applications[C]. Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing, Seattle, USA, 2005, 202-210.
    [66] Abawajy J. Fault detection service architecture for grid computing systems[C]. Proceedings of the International Conference on Computational Science and Its Applications, Assisi, Italy, 2004, 107-115.
    [67]梁鸿,祁鑫,高元涛.网格中基于OGSA的自适应故障检测策略[J].北京交通大学学报(自然科学版), 2008, 32(6): 102-105.
    [68] Hofer J, Fahringer T. A multi-perspective taxonomy for systematic classification of grid faults[C]. Proceedings of the 16th Euromicro International Conference on Parallel, Distributed and Network-based Processing, Toulouse, France, 2008, 126-130.
    [69] Jitsumoto H, Endo T, Matsuoka S. ABARIS: An adaptable fault detection/recovery component framework for MPI[C]. Proceedings of the IEEE International Parallel and Distributed Processing Symposium, Long Beach, USA, 2007, 1-8.
    [70] Duan R, Prodan R, Fahringer T. Data mining-based fault prediction and detection on the grid[C]. Proceedings of the 15th IEEE International Symposium on High Performance Distributed Computing, Paris, France, 2006, 305-308.
    [71] Duarte A, Brasileiro F, Cirne W, Filho J. Collaborative fault diagnosis in grids through automated tests[C]. Proceedings of the 20th International Conference on Advanced Information Networking and Applications, Vienna, Austria, 2006, 69-74.
    [72]郭夙昌,杨波.分布式系统中一种进程迁移决策的新方法[C].中国电子学会第11届青年学术年会,济南,中国, 2005, 1399-1402.
    [73] Litzkow M, Tannenbaum T, Basney J, et al. Checkpoint and migration of unm processes in the Condor distributed processing system[R]. University of Wisconsin Madison Computer Sciences Department, 1997.
    [74] Affaan M, Ansari M A. Distributed fault management for computational grids[C]. Proceedings of the 5th International Conference on Grid and Cooperative Computing,Changsha, China, 2006, 363-368.
    [75] Jozsef K, Peter K. A migration framework for executing parallel programs in the grid[C]. European Across Grids Conference, Nicosia, Cyprus, 2004, 80-89.
    [76] Jin L, Tong W Q, Tang J Q, et al. A fault-tolerance mechanism in grid [C]. Proceedings of IEEE International Conference on Industrial Informatics, Banff, Canada, 2003, 351-357.
    [77]邱敏,桂小林.实现可靠计算的容错网格结构[J].微电子学与计算机, 2005, 22(7): 99-102.
    [78] Administering Platform Process Manager version 3.0[S]. Platform Computing Corporation, 2005-3-31.
    [79] de Camargo R Y, Cerqueira R, Kon F. Strategies for storage of checkpointing data using non-dedicated repositories on grid systems[C]. Proceedings of the 3rd International Workshop on Middleware for Grid Computing, Grenoble, France, 2005, 1-6.
    [80] Ren X, Eigenmann R, Bagchi S. Failure-aware checkpointing in fine-grained cycle sharing systems[C]. Proceedings of the 16th International Symposium on High performance Distributed Computing, Paris, France, 2006, 33-42.
    [81] Vadhiyar S, Dongarra J. SRS: a framework for developing malleable and migratable parallel software[J]. Parallel Processing Letters, 2003, 13(2): 291-312.
    [82] Fernandes R, Pingali K, Stodghill S. Mobile MPI programs in computational grids[C]. Proceedings of the 11th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, New York, USA, 2006, 22-31.
    [83] Weissman J, Lee B. The virtual service grid: an architecture for delivering high-end network services[J]. Concurrency and Computation: Practice and Experience, 2002, 14(4): 287-319.
    [84] Valcarenghi L, Piero C. QoS-aware connection resilience for network-aware grid computing fault tolerance[C]. Proceedings of the 7th International Conference on Transparent Optical Networks, Barcelona, Spain, 2005, 417-422.
    [85]叶建伟,方滨兴,田志宏,等.基于节点相似度的容错网格作业调度算法研究[J].高科技通信, 2008, 18(12): 1224-1230.
    [86] Zhang X, Junqueira F, Hiltunen M, et al. Replicating nondeterministic services on grid environments[C]. Proceedings of the 15th IEEE International Symposium on High Performance Distributed Computing, Paris, France, 2006, 105-116.
    [87] Zhang X, Zagorodnov D, Hiltunen M, et al. Fault-tolerant grid services using primary-backup feasibility and performance[C]. Proceedings of the IEEE InternationalConference on Cluster Computing, San Diego, USA, 2004, 105-114.
    [88]吕桦,钟诚,李智.一种基于任务复制方法的网格调度算法[J].计算机技术与发展, 2006, 16(8): 66-68.
    [89] Huedo E, Montero R, Llorente I. A framework for adaptive execution in grids[J]. Software-Practice and Experience, 2004, 34(7): 631-651.
    [90] Huedo E, Montero R, Llorente I. Evaluating the reliability of computational grids from the end user’s point of view[J]. Journal of Systems Architecture, 2006, 52(12): 727-736.
    [91] In J, Avery P, Cavanaugh R, et al. SPHINX: A fault-tolerant system for scheduling in dynamic grid environments[C]. Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium, Denver, USA, 2005, 12b.
    [92] Berman F, Casanova H, Chien A A, et al. New grid scheduling and rescheduling methods in the grads project[J]. International Journal of Parallel Programming, 2005, 33(2): 209-229.
    [93]郝宪文,代钰,张斌,等.可迁移网格依赖任务重调度模型及算法[J].沈阳工业大学学报, 2008, 30(1): 81-89.
    [94]郝宪文,代钰,张斌,等.基于约简任务资源分配图的网格依赖任务静态调度[J].东北大学学报(自然科学版), 2008, 29(7): 948-951.
    [95]孙海燕,王晓东,肖侬,等.数据网格中的数据复制技术研究[J].计算机科学, 2005, 32(7): 13-16.
    [96] Deris M, Abawajy J, Suzuri H. An efficient replicated data access approach for large-scale distributed systems[C]. Proceedings of the IEEE International Symposium on Cluster Computing and the Grid, Chicago, USA, 2004, 588-594.
    [97] Lei M, Vrbsky S V, Qi Z J. Online grid replication optimizers to improve system reliability[C]. Proceedings of the IEEE International Symposium on Parallel and Distributed Processing, Niagara Falls, Canada, 2007, 1-8.
    [98]高田,刘方爱.教育资源网格中的一种动态数据复制技术[J].计算机应用研究, 2008, 25(3): 869-871.
    [99]陶钧,沙基昌,王晖.网格环境下基于编码机制的数据复制研究[J].计算机科学, 2008, 35(2): 120-123.
    [100]张会霞,田卫萍,徐清宇.校园网格中数据复制和传输管理的研究[J].电脑开发与应用, 2007, 20(10): 49-51.
    [101]赵丹亚,崔萌.信息网格中基于Agent的数据复制方法探讨[J].计算机研究与发展, 2006, 43(z1): 379-382.
    [102]沈薇,刘方爱.网格中基于访问频率的数据复制管理策略[J].计算机技术与发展, 2006, 16(11): 185-187.
    [103]刘彩燕,白尚旺.网格中数据复制的一致性研究[J].计算机工程与设计, 2006, 27(17): 3163-3164.
    [104]张永洪,田俊峰,蔡红云,等.基于网格的远程数据复制二级缓存机制[J].计算机工程与应用, 2004, 40(29): 87-89.
    [105]邝坪,金海,袁平鹏,等.基于OGSA的网格服务容错框架的研究和应用[J].华中科技大学学报(自然科学版), 2005, 33(z1): 25-28.
    [106]廖凯,张来顺.高可靠性网格的分析与研究[J].计算机应用研究, 2006, 23(11): 224-225.
    [107]马满福,姚军.网格计算经济中资源的可靠性控制[J].计算机工程与设计, 2008, 29(2): 337-339.
    [108]张利永,韩燕波.一种主动式的网格工作流可靠性保障方法[J].中山大学学报(自然科学版), 2008, 47(6): 93-99.
    [109] Guo S C, Yang B, Huang H Z. A software reliability model with exponential repair time[C]. Proceedings of IEEE International Conference on System Integration and Reliability Improvements (SIRI 2006), Hanoi, Vietnam, 2006, in Session 2: System Reliability Computing.
    [110] Monnet S, Bertier M. Using failure injection mechanisms to experiment and evaluate a grid failure detector[C]. Proceedings of the Seventh International Conference on High Performance Computing for Computational Science, Beijing, China, 2007, 610-621.
    [111]陈锦富,卢炎生,谢晓东.软件错误注入测试技术研究[J].软件学报, 2009, 20(6): 1425-1443.
    [112] Reinecke P, van Moorsel A, Wolter K. The fast and the fair: a fault-injection-driven comparison of restart Oracles for reliable web services[C]. Proceedings of the 3rd International Conference on the Quantitative Evaluation of Systems, Riverside, USA, 2006, 375-384.
    [113] Hoara W, Tixeuil S. A language-driven tool for fault injection in distributed systems[C]. Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing, Seattle, USA, 2005, 8-27.
    [114] Davis D, Karmarkar A, Pilz G, et al. Web services reliable messaging[S]. Organization for the Advancement of Structured Information Standards, 2007.
    [115] Iwasa K, Durand J, Rutt T, et al. WS-Reliability 1.1[S]. Organization for the Advancement ofStructured Information Standards, 2004.
    [116] Pallickara S, Fox G, Pallickara S. An analysis of reliable delivery specifications for web services[C]. Proceedings of the International Conference on Information Technology: Coding and Computing, Las Vegas, USA, 2005, 360-365.
    [117] Mandrichenko I, Allcock W, et al. GridFTP v2 protocol description[S]. GFD-R-P.047, Open Grid Forum, 2005.
    [118] Lowekamp B, Hughes-Jones R, Tierney B, et al. Enabling network measurement portability through a hierarchy of characteristics[C]. Proceedings of the 4th International Workshop on Grid Computing, Phoenix, USA, 2003, 68-75.
    [119] Kosar T, Kola G, Livny M. Data pipelines: Enabling large scale multi-protocol data transfers[C]. Proceedings of the 2nd Workshop on Middleware for Grid Computing, Toronto, Canada, 2004, 63-68.
    [120] Maassen J, Bal H. Smartsockets: solving the connectivity problems in grid computing[C]. Proceedings of the 16th International Symposium on High Performance Distributed Computing, Monterey, USA, 2007, 1-10.
    [121] Massie M, Chun B, Culler D. Ganglia distributed monitoring system: design, implementation, experience[J]. Parallel Computing, 2004, 30(7): 817-840.
    [122] Juhasz Z, Andics A, Pota S. Towards a robust and fault-tolerant discovery architecture for global computing grids[J]. Scalable Computing: Practice and Experience, 2003, 6(2): 22-33.
    [123] Shang L, Wang Z, Zhou X, et al. TM-DG: A trust model based on computer users’daily behavior for desktop grid platform[C]. Proceedings of the 2007 Symposium on Component and Framework Technology in High Performance and Scientific Computing, Montreal, Canada, 2007, 59-66.
    [124] Budati K, Sonnek J, Chandra A, et al. RIDGE: combining reliability and performance in open grid platforms[C]. Proceedings of the 16th International Symposium on High Performance Distributed Computing, Monterey, USA, 2007, 55-64.
    [125] Fox G, Aydin G, Bulut H, et al. Management of real-time streaming data grid services[J]. Concurrency and Computation: Practice and Experience, 2007, 19(7): 983-998.
    [126] Song C, Topkara U, Woo J, et al. Reliability assessment of grid software systems using emergent features[C]. Second Workshop on Reliability and Robustness in Grid Computing Systems, Chapel Hill, USA, 2007.
    [127] Nguyen-Tuong A, Grimshaw A, Wasson G, et al. Towards dependable grids[R]. TechnicalReport CS-2004-11, University of Virginia, 2004.
    [128] Mills K, Dabrowski C. Investigating global behavior in computing grids[C]. Proceedings of the Second International Workshop on Self-Organizing Systems, Passau, Germany, 2006, 120-136.
    [129] Mills K, Dabrowski C. Can economics-based resource allocation prove effective in a computation marketplace[J]. Journal of Grid Computing, 2008, 6(3): 291-311.
    [130] Lin M, Chang M, Chen D. Distributed-program reliability analysis: complexity and efficient algorithms[J]. IEEE Transactions on Reliability, 1999, 48(1): 87-95.
    [131] Xie M, Dai Y S, Poh K L. Computing Systems Reliability[M]. New York: Kluwer Academic Publishers, 2004.
    [132] Levitin G, Dai Y S. Service reliability and performance in grid system with star topology[J]. Reliability Engineering and System Safety, 2007, 92(1): 40-46.
    [133] Levitin G, Dai Y S, Hanoch B H. Reliability and performance of star topology grid service with precedence constraints on subtask execution[J]. IEEE Transactions on Reliability, 2006, 55(3): 507-515.
    [134] Dai Y S, Levitin G. Reliability and performance of tree-structured grid services[J]. IEEE Transactions on Reliability, 2006, 55(2): 337-349.
    [135] Dai Y S, Levitin G, Wang X L, Optimal task partition and distribution in grid service system with common cause failures[J]. Future Generation Computer Systems, 2007, 23(2): 209-218.
    [136]肖鹏,胡志刚.一种扩展的虚拟树型网格可靠性评估模型[J].小型微型计算机系统, 2009, (8): 1571-1575.
    [137] Dai Y S, Pan Y, Zou X. A hierarchical modeling and analysis for grid service reliability[J]. IEEE Transactions on Computers, 2007, 56(5): 681-691.
    [138]董亦,陈琳.一种基于网格服务的可靠性度量模型[J].华中师范大学学报(自然科学版), 2008, 42(1): 32-36.
    [139]罗峰,俞涛,鲍新文.基于制造网格系统的可靠性研究[J].机电一体化, 2005, 11(2): 15-19.
    [140]李睿,俞涛,方明伦.制造网格系统可靠性管理研究与实现[J].计算机集成制造系统, 2005, 11(3): 358-363.
    [141]鲍新文,俞涛,李睿.制造网格中服务资源可靠性的数据采集与处理策略的研究[J].机电一体化, 2004, 10(6): 10-12.
    [142]邹启明,俞涛.基于移动Agent的制造网格资源可靠性数据采集[J].计算机工程, 2006,32(22): 119-120.
    [143] Kang O H, Agrawal D P. Scalable scheduling for symmetric multiprocessors (SMP)[J]. Journal of Parallel and Distributed Computing, 2003, 63(3): 273-285.
    [144] Krauter K, Buyya R, Maheswaran M. A taxonomy and survey of grid resource management systems for distributed computing[J]. Software—Practice and Experience, 2002, 32(2): 135-164.
    [145]罗红,慕德俊,邓智群,等.网格计算中任务调度研究综述[J].计算机应用研究, 2005, 22(5): 16-19.
    [146] Li C L, Li L Y. QoS based resource scheduling by computational economy in computational grid[J]. Information Processing Letter, 2006, 98(3):119-126.
    [147] Li C L, Feng M L, Li L Y. Multiple QoS modeling and algorithm in computational grid[J]. Journal of Systems Engineering and Electronics, 2007, 18(2): 412-417.
    [148] Hu J, Li M C, Sun W F, Chen Y F. An ant colony optimization for grid task scheduling with multiple QoS dimensions[C]. Eighth International Conference on Grid and Cooperative Computing, Lanzhou, China, 2009, 415-419.
    [149] Levitin G, Dai Y S. Optimal service task partition and distribution in grid system with star topology[J]. Reliability Engineering & System Safety, 2008, 93(1): 152-159.
    [150] Dai Y S, Levitin G. Optimal resource allocation for maximizing performance and reliability in tree-structured grid services[J]. IEEE Transactions on Reliability, 2007, 56(3): 444-453.
    [151] Dai Y S. Wang X L. Optimal resource allocation on grid systems for maximizing service reliability using a genetic algorithm[J]. Reliability Engineering and System Safety, 2006, 91(9):1071-1082.
    [152] Rahman M, Ranjan R, Buyya R. Dependable workflow scheduling in global grids[C]. The 10th IEEE/ACM International Conference on Grid Computing, Banff, Canada, 2009, 13-15.
    [153]姚军,何延年,李勇,等.一种规避风险的网格信任调度模型[J].计算机应用, 2009, 29(5): 1321-1323.
    [154]郭权,王希诚.网格环境下具有可靠性的任务调度策略[J].南京理工大学学报(自然科学版), 2006, 30(5): 592-598.
    [155] Wang S P, Yun X C, Yu X Z. Makespan and reliability driven scheduling algorithm for independent tasks in grids[J]. High Technology Letters, 2007, 13(4): 407-412.
    [156]黄华,张立东.一种基于冗余分配的网格计算任务调度方法的研究[J].南昌航空工业学院学报, 2007, 21(1): 61-63, 86.
    [157]宋玮.基于冗余分配的网格任务调度模型[J].电子技术应用, 2006, 32(2): 65-67.
    [158]赵念强,鞠时光.网格计算及网格体系结构研究综述[J].计算机工程与设计, 2006, 27(5): 728-730,734.
    [159] The Global Grid Forum Web Site, http://www.gridforum.org.
    [160] The Globus Project Web Site, http://www.globus.org.
    [161] Open Grid Forum, http://www.gridforum.org/
    [162]樊宁.网格体系结构概述[OL]. IBM中国:网格计算技术资料, http://www.ibm.com /developerworks/cn/grid/gr-fann/, 2006-9-11.
    [163]樊宁.网格的资源管理和信息管理概述[OL]. IBM中国:网格计算技术资料, http://www. ibm.com/developerworks/cn/grid/gr-fann1, 2006-9-21.
    [164] Nabrzyski J, Schopf J M, WeGlarz J. Grid Resource Management[M]. Dordrecht: Kluwer Publishing Company, 2003.
    [165]郭夙昌,杨波,黄洪钟.考虑节点失效恢复能力的网格服务可靠性建模与分析[J].西安交通大学学报, 2008, 42(6): 693-696,790.
    [166] Heddaya A, Helal A. Reliability, availability, dependability and performability: a user-centered view[R]. http://www.cs.bu.edu/techreports/pdf/1997-011-reliability-def.pdf, 1997-11.
    [167] GMA White Paper[S]. http://www-didc.lbl.gov/GGF-PERF/GMA-WG/. 2001-10-30.
    [168] Levitin G, Dai Y S, Performance and reliability of star topology grid service with data dependency and two types of failures[J]. IIE Transactions, 2007, 39(8): 783-794.
    [169] Kao E P C. An Introduction to Stochastic Processes[M]. Wadsworth Publishing Company, Belmont, 1997.
    [170] Yang B, Xie M. A study of operational and testing reliability in software reliability analysis[J]. Reliability Engineering and System Safety, 2000, 70(3): 323-329.
    [171]范玉顺,张立晴,刘博.网络化制造与制造网络[J].中国机械工程, 2004, 15(19): 1733-1738.
    [172]颜波,黄必清,郑力,等.网格研究现状及其在制造业中的应用[J].计算机集成制造系统—CIMS, 2004, 10(9): 1021-1029.
    [173]盛步云,李永锋,丁毓峰,等.制造网格中制造资源的建模[J].中国机械工程, 2006, 17(13): 1375-1380.
    [174]伍之昂,罗军舟,宋爱波.基于QoS的网格资源管理[J].软件学报, 2006, 17(11): 2264?2276.
    [175]蒲静.基于任务可分的网格资源预留机制[J].计算机工程与应用, 2008, 44(12): 118-120, 138.
    [176]林闯.随机Petri网和系统性能评价[M].北京:清华大学出版社, 2005.
    [177] Dong F P, Akl S G. Scheduling algorithms for grid computing: state of the art and open problems[R]. School of Computing, Queen’s University, 2006.
    [178] Berman F, Wolski R, Casanova H, et al, Adaptive computing on the grid using AppLeS[J]. IEEE Transaction on Parallel and Distributed Systems, 2003, 14(4): 369-382.
    [179] Dorigo M, Blum C. Ant colony optimization theory: a survey[J]. Theoretical Computer Science, 2005, 344(2-3): 243-278.
    [180] Kershenbaum A. Telecommunications Network Design Algorithms[M]. New York: McGraw-Hill, 1993.
    [181] Maniezzo V. Exact and approximate nondeterministic tree-search procedures for the quadratic assignment problem[J]. INFORMS Journal on Computing, 1999, 11(4): 358-369.
    [182]张丽萍,柴跃廷.遗传算法的现状及发展动向[J].信息与控制, 2001, 30(6): 531-536.
    [183]玄光男,程润伟.遗传算法与工程优化[M].于歆杰,周根贵,译.北京:清华大学出版社, 2004.
    [184] Michalewicz Z. Genetic Algorithms + Data Structures = Evolution Programs[M]. Berlin: Springer Verlag, 1992.
    [185] Tavakkoli-Moghaddama R, Safarib J, Sassani F. Reliability optimization of series-parallel systems with a choice of redundancy strategies using a genetic algorithm[J]. Reliability Engineering and System Safety, 2008, 93(4): 550-556
    [186] Grinstein S, Huth J. Schopf J M. Resource predictors in HEP applications[C]. Proceedings of Computing in High Energy Physics, Interlaken, Switzerland, 2004, 69-72.
    [187] Shneidman J. Ng C, Parkes D C, et al. Why markets could (but don’t currently) solve resource allocation problems in systems[C]. Proceedings of the 10th conference on Hot Topics in Operating Systems, Berkeley, USA, 2005, 7-12.
    [188] Rajkumar B. Economic-based distributed resource management and scheduling for grid computing[D]. PhD Dissertation, Monash University, 2002.
    [189] Broberg J, Venugopal S, Buyya R. Market-oriented grids and utility computing: the state-of-the-art and future directions[J]. Journal of Grid Computing, 2008, 6(3): 255-276.
    [190] Wolski R, Plank J S, Brevik J, et al. Analyzing market-based resource allocation strategies for the computational grid[J]. International Journal of High-Performance ComputingApplications, 2001, 15(3): 258-281.
    [191] Zheng Q, Tham C K, Veeravalli B. Dynamic load balancing and pricing in grid computing with communication delay[J]. Journal of Grid Computing, 2008, 6(3): 239-253.
    [192] Stuer G, Vanmechelen K, Broeckhove J. A commodity market algorithm for pricing substitutable grid resources[J]. Future Generation Computer Systems, 2007, 23(5): 688-701.
    [193] AuYoung A, Grit L, Wiener J, et al. Service contracts and aggregate utility functions[C]. Proceedings of the 15th IEEE International Symposium on High Performance Distributed Computing, Paris, France, 2006, 119-131.
    [194] Stiglitz J E. Principles of Microeconomics[M]. New York: W W Norton & Company, 2002.
    [195]董泽清.排队论及其应用[M].西安:西安系统工程学会, 1983.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700