若干典型网格应用的容错及性能研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
网络技术的飞速发展和可用带宽的高速增长极大改变了我们计算、通讯和协作的方式。计算网格的出现有望为普通用户提供一个近乎与用电一样简单易行的高性能计算机资源访问方式。着眼于大规模资源共享和动态问题求解,网格从概念上使得创建比当今分布并行应用更为复杂的应用程序成为可能,甚至包括先前未能求解的计算任务。然而,基于如下理由,很难给出一个满足各种网格应用容错和性能要求的通用解决方案:
     网格的规模、异构性和动态性降低了网格进程通信、协作和同步的效率,使得高效可靠的可扩展网格应用设计变得更加复杂;
     网格应用基本上依赖标准的网格协议在网格资源上执行计算任务,但网格协议采用的“沙漏”设计原则只能保证网格协议的通用性,而不对容错计算提供有效支持;
     不同的网格应用具有不同的性能目标,各应用的性能目标取决于其自身特征,难以使用通用的解决方案。
     因此,本文重点解决如下三类典型网格应用的容错或性能问题。这些应用在科研和商业领域均对网格技术发展具有重大推动作用,而且在广义上在如何高效执行协作与同步、有效利用网络资源和显著降低网络延迟等方面具有相同的性能目标。
     耗时计算:随着系统规模的增大,容错成为一个关键问题。没有容错机制的支持,网格应用程序几乎不可能成功执行,尤其对于那些需要数天甚至数月计算时间的耗时应用程序。在松耦合的计算环境中,容错技术不宜频繁使用系统范围的同步和协作,以保证应用程序具有较小的运行开销和良好的可扩展性。为简化应用程序的创建,容错功能必须与应用程序的逻辑功能分离,并抽象为通用服务对应用程序员开放。此外,对于耗时计算,可扩展的检查点设置及回滚恢复技术可能比带投票及一致意见的任务复制技术节约计算资源,因而更为可取。
     网格信息检索:在网格环境下,存在大量地理上分布的可用数据资源。这些数据资源通常分布在多个管理域,并具有不同的组织形式。考虑到数据资源的广泛性、异构性和松耦合性,从网格高效检索有用知识的难度远大于传统分布式系统中的信息检索。因此,网格信息检索系统要求具有更强的减少带宽消耗、降低网络时延、消除网络不稳定性影响及并行利用网格计算资源的能力。
     内容传递服务:作为当前存储、维护并提供文档访问的高效解决方案,内容传递服务日益得到信息提供者的青睐。信息提供者常常使用内容传递服务在因特网上复制或镜像内容,建立内容传递网络,从而解决流行网络站点的可扩展性、可用性及性能问题。内容传递服务对网格技术发展提供有力的商业支持,并将逐渐采用类似网格的基础设施来平衡服务器的负载、消除数据在网络上的重复传输、减小请求的响应时间、增加系统吞吐量、并保证预期的服务质量。然而内容传递网络的性能极大地受其代理放置策略的影响。尤其在网格环境中,大量可用的计算资源显著扩大了资源分配的决策空间,使得内容分布服务的代理放置问题变得更加重要。
     总之,本文的研究目标在于提出解决耗时网格应用容错、提高网格信息检索效率和改善内容传递服务性能的相关技术,其主要贡献如下:
     提出了一个新的、基于移动Agent的自适应、可扩展容错技术MACR。该技术分离容错功能与应用程序逻辑功能,因而便于与网格应用集成。MACR灵活运用移动Agent技术的移动特性,使用移动Agent协调本地进程之间的通信和协作。MACR综合了异步检查点设置方法和同步检查点设置方法的优点,降低了程序正常运行期间的额外开销,消除了回滚恢复期间可能发生的多米诺效应。MACR通过动态调整算法参数,具有自动适应系统规模和网络状态变化的能力。给出了基于移动Agent的混合检查点设置及回滚恢复算法,严格证明了算法的正确性,系统分析了算法的性能和可扩展性,提出了解决算法性能和可扩展性问题的技术,并利用仿真实验研究了算法的有效性和可扩展性。
     提出了一个用于提高网格信息检索性能的移动Agent系统的分层结构。基于该结构,提出了一个新的、用以规划移动Agent最优迁移的遗传算法。该算法保证检索任务消耗最少的网络资源,同时保证任务在用户指定的完成时间内结束。该算法采用树结点的前序遍历序列以及各个结点对应的子女数组成的序列共同对Agent迁移树进行编码。理论分析和仿真实验证明,该编码保证编码空间和题解空间之间存在一一对应关系,并具有编码及解码简单、遗传操作相对容易进行、较好解决问题空间的探索和局部信息利用之间的平衡、以及快速收敛等优点。
     提出了一个新的、旨在提高内容传递服务性能的代理放置策略LDASP。LDASP以最小化系统通信开销并同时保证最大化系统吞吐量为目标,求解最优的代理放置方式。与通信网络中的资源分配问题现有求解策略不同,LDASP通过模拟内容传递网络的请求路由机制考虑了代理服务器的负载分布及处理能力约束,从而保证系统具有最低的资源消耗、最大的吞吐能力和良好的负载均衡。提出了高效的贪婪算法和近似动态规划算法,用以求解树型网络条件下简化的LDASP问题。理论分析和仿真实验证明了二者的正确性、有效性和计算复杂性。
The rapid expansion of the Internet and the dramatic increase in available network bandwidth has qualitatively changed how the world computes, communicates, and collaborates. The emergence of computational grids provides a promising infrastructure that can potentially make high-performance computing power accessible to general users as easily and seamlessly as electricity from an electrical power grid. Due to the focus on large-scale resource sharing and innovative problem solving, grids enable applications that are conceptually more complex than current parallel and distributed applica-tions, including the applications that could not be solved before. However, it is not a trivial task to develop a generic solution that can offer fault tolerance and high performance for the wide spectrum of grid applications due to the following reasons:
     The scale, the heterogeneity, and the dynamic nature of a grid make it difficult to efficiently support grid-scale interprocess communication, coordination, and synchronization, thus signifi-cantly complicating the design of efficient, reliable, and scalable grid applications;
     Grid applications basically rely on the standard grid protocols for executing their computations on grid resources, but the hourglass design principle makes them too generic to support any fault tolerance mechanisms;
     Different grid applications have distinct performance goals inherently determined by the appli-cation’s characteristics so that they are hard to be addressed via a generic approach.
     Therefore, this dissertation aims to address the fault-tolerance and/or high-performance issues for the following three typical grid applications that motivate the development of grid technology in both scientific and commercial communities. In a broad sense, these issues share the same goals in effi-ciently performing coordination and synchronization, effectively utilizing network resources, and re-markably overcoming network latency.
     Computation-intensive applications. With increases in system scale, fault-tolerance has be-come such a critical issue that without its support, grid applications have little chance to finish successfully, especially for those that require days or even months to execute. Frequent sys-tem-wide synchronization and coordination should be avoided since it will result in high over-head and bad scalability in a loosely coupled computing environment. To simplify the con-struction of the applications, fault-tolerance function needs to be separated from application logic functionality and be abstracted as a generic service opened to application programmers. Also, for these computation-intensive applications, a scalable checkpointing and rollback re-covery scheme may be preferable to task replication with voting and consensus for its possi- bility in saving computational resources.
     Grid information retrieval. In emerging grids, vast volumes of geographically distributed data resources will be pooled together and are available for being integrated into applications. These data resources usually cross over different administrative domains and are structured in distinct ways. Given that data resources are loosely coupled and heterogeneous, the difficulty in effectively retrieving useful knowledge from a grid is more highlighted than doing that from a traditional distributed information system. Therefore, grid information retrieval sys-tems are expected to be enabled to reduce bandwidth utilization, overcome latency, tolerate network disconnection, and utilize the parallelized computational resources in grids.
     Content delivery services. Information providers who seek for a cost-effective solution today are increasingly choosing Web content delivery services to store, maintain, and provide access to their documents. To address the scalability, availability, and performance problem of their popular Internet sites, information providers commonly use delivery services to replicate or mirror their content over Internet, as exemplified by the prevalence of Content Delivery Net-works (CDNs). Surely, this kind of services is currently a driving force that propels the development of grid technology, and will eventually use a grid-like structure to alleviate server workload, eliminate redundant data traversal over the network, reduce response latency, increase system throughput, and most importantly, deliver desired QoS. However, the CDN performance is dramatically affected by its surrogate placement policy. In a grid environment, the optimal surrogate placement problem for content delivery services becomes a more serious issue as the vast available computing resources drastically enlarge the decision space.
     In a word, the goals of this dissertation are to develop techniques to achieve fault tolerance for computation-intensive grid applications, efficiency for grid information retrieval, and high-perform-ance for content delivery services in grids. The major contributions are summarized as follows:
     A novel, mobile agent enabled fault tolerance technique, MACR, that can be easily integrated into grid applications through the separation of concerns, is proposed. The importance of MACR is justified by our finding that no bridging mechanism exists that can integrate or reuse multiple underlying fault tolerance protocols deployed in different administrative domains without modifying them. MACR takes advantage of both the independent and the coordinated checkpointing approaches by exploring the appealing features of mobile agent technology and using mobile agents to carry out coordination tasks for cooperating local processes; as a result, it can not only be made adaptable to the system scale and the network state changes, but also can guarantee low runtime overhead, while at the same time eliminating the possibility of the undesirable domino effect. The feasbility of MACR is demostrated by designing a hybrid checkpointing and rollback algorithm using mobile agents as an aid. The correctness of the al-gorithm is formally proved and the performance and scalability issues are systematically dis-cussed. Techniques are proposed to address the scalability issue and the potential performance bottleneck of the proposed algorithm, and simulations are performed to show the effectiveness and scalability of the proposed protocol and its improving techniques.
     A flexible layered architecture of mobile agent system that aims to achieve high-performance for grid information retrieval is presented. Based on this architecture, a new genetic algorithm is proposed to schedule the optimal migration of mobile agents with the objective to minimize the total communication cost subject to a user-defined limit of task completion time. The pro-posed genetic algorithm encodes the agent migration trees by the pre-ordered traversal se-quence of tree vertices, together with the children number sequence of corresponding tree ver-tices. Through theoretical analysis and experimental simulations, this encoding is proved to be able to guarantee a one-to-one correspondence between the encoding space and the problem space, and has the advantages of simplicity for encoding and decoding, ease for GA operations, better equilibrium between exploration and exploitation, and fast convergence to the optimal or near-optimal solution.
     A new surrogate placement strategy, LDASP, is proposed to enhance performance for content delivery services. LDASP aims to address surrogate placement in a manner that minimizes the communication cost while ensuring at the same time the maximization of system throughput. This work differs from existing work on the resource allocation problem in communication networks in that it considers load distribution and processing capacity constraints on surro-gates by modeling the underlying request-routing mechanism, thus guaranteeing a CDN to have minimum network resource consumption, maximum system throughput, and better load balancing among surrogates. An efficient and effective greedy algorithm and an approximate dynamic programming algorithm are developed for a simplified version of the LDASP prob-lem in tree networks. The correctness, efficiency, and effectiveness of the proposed algorithms are systematically analyzed through formal proofs and experimental simulations.
引文
[AER99] L. Alvisi, E. Elnozahy, S. Rao, S.A. Husain, and A.D. Mel, “An Analysis of Communication Induced Checkpointing”, 29th Annual Int’l Symp. Fault-tolerant Computing, Digest of Pa-pers, Jun 1999, pp. 242-249.
    [Aglet] Aglets Software Development Kit, About Aglets, Available at http://www.trl.ibm.co.jp/aglets /about.html
    [AMM04] R. Aversa, B.D. Martino, N. Mazzocca, S. Venticinque, “Terminal-aware grid resource and service discovery and access based on mobile agents technology”, Proc. 12th Euromicro Conf. Parallel, Distributed and Network-Based Processing, Feb 2004, pp. 40-45.
    [Asc04] National Nuclear Security Administration, Advanced Simulation and Computing, Avail-able at http://www.nnsa.doe.gov/asc/, 2004.
    [BBB96] J. Baldeschwidler, R. Blumofe, and E. Brewer, “Atlas: An Infrastructure for global com-puting”, Proc. 7th ACMSIGOPS European Workshop: Systems Support for Worldwide Applications, 1996.
    [BBC02] G. Bosilca, A. Bouteiller, F. Cappello et al., “MPICH-V: toward a scalable fault tolerant MPI for volatile nodes”, Proc. 2002 ACM/IEEE Conf. Supercomputing, Nov 2002, pp. 1-18.
    [BCF99] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker, “Web Caching and Zipf-like Dis-tributions: Evidence and Implications”, Proc. IEEE INFOCOM’ 99, New York, NY, Mar 1999, pp. 126-134.
    [BGM99] B. Brewington, R. Gray, K. Moizumi, D. Kotz, G. Cybenko, and D. Rus, “Mobile Agents in Distributed Information Retrieval”, Chap.15, pp.355-395, in Intelligent Information Agents, edited by M. Klusch, Springer Verlag, 1999.
    [BGS84] D. Briatico, A. Giuffoletti, and L. Simoncini, “A Distributed Domino-Effect Free Recov-ery Algorithm”, IEEE Proc. 4th Symp. Reliability in Distributed Software and Database Systems, Oct 1984, pp. 207-215.
    [BKK98] A. Baratloo, M. Karaul, Z. Kedem, and P. Wyckoff, “Charlotte: An Infrastructure for net-work computing with Java Applets”, Concurrency: Practice and Experience, Vol. 10, No. 11-13, 1998, pp. 1029-1041.
    [Blg04] IBM Research, Blue Gene Project Overview, Available at http://www.research.ibm.com/ bluegene/, 2004.
    [BLi88] B. Bhargava, and S. Lian, “Independent Checkpointing and Concurrent Rollback Recov-ery for Distributed Systems -- An Optimistic Approach”, IEEE Proc. 7th Symp. Reliability in Distributed Systems, Oct 1988, pp. 3-12.
    [BM03a] G. Bronevetsky, D. Marques, K. Pingali, and P. Stodghill, “Automated application-level checkpointing of MPI programs”, Proc. 9th ACM SIGPLAN Symp. Principles and Prac-tice of ParallelPprogramming, Vol. 38, No. 10, Jun 2003, pp. 84-94.
    [BM03b] G. Bronevetsky, D. Marques, K. Pingali, and P. Stodghill, “Collective operations in appli-cation-level fault-tolerant MPI”, Proc. 17th Annual Int’l Conf. Supercomputing, Jun 2003, pp. 234-243.
    [BMA01] K. Bhatia, K. Marzullo, and L. Alvisi, “Scalable causal message logging for widearea en-vironments”, Technical Report CS2001-0671, Univ. of California, San Diego, May 2001.
    [BSS96] T. Brecht, H. Sandhu, M. Shan, and J. Talbot, “ParaWeb: Towards World-Wide Super-computing”, Proc. 7th ACMSIGOPS European Workshop: Systems Support for Worldwide Applications, 1996, pp. 181-188.
    [BYe03] J. Baek, H. Yeom, “d-Agent: An approach to mobile agent planning for distributed infor-mation retrieval”, IEEE Trans. Consumer Electronics, Vol. 49, No. 1, Feb 2003, pp. 115-122.
    [BYK02] J.W. Baek, J.H. Yeo, G.T. Kim, and H.Y. Yeom, “Cost-Effective Planning of Timed Mobile Agent”, Proc. Int’l Conf. Information Technology: Coding and Computing (ITCC’02), IEEE CS Press, 2002, pp. 536-541.
    [CAN03] I. Cao, M. Andersson, C. Nyberg, and M. Kihl, “Web Server Performance Modeling Us-ing an M/G/1/K*PS Queue”, Proc. 10th Int’l Conf. Telecommunications, Vol.2, 2003, pp.1501-1506.
    [Cas02] H. Casanova, “Distributed Computing Research Issues in Grid Computing”, ACM SI-GACT News, Vol. 33, No.3, 2002, pp. 50-70.
    [CCD02] H. Chan, H. Chen, T. Dillon, J. Cao, and R. Lee, “A Mobile Agent-based System for Con-sumer-oriented e-commerce”, Proc. 4th Int’l Conf. Electronic Commerce, Oct 2002, HK.
    [CCH04] Jiannong Cao, Yifeng Chen, Yanxiang He, W. Jia, and T. Dillon, “Checkpointing and Roll-back of Wide-Area Distributed Applications Using Mobile Agents”, submitted to IEEE Trans. Systems, Man and Cybernetics, 2004.
    [CCJ01] J. Cao, G.H. Chan, W. Jia, and T.S. Dillon, “Checkpointing and Rollback of Wide-Area Distributed Applications Using Mobile Agents”, Proc. of Int’l Parallel and Distributed Processing Symp., April 2001, San Francisco, CA, USA.
    [CCZ04] Jiannong Cao, Yifeng Chen, Kang Zhang, and Yanxiang He, “Checkpointing in Hybrid Distributed Systems”, Proc. 7th Int’l Symp. Parallel Architectures, Algorithms, and Net-works (ISPAN’04), Hong Kong, May 2004.
    [CDJ99] H. Casanova, J. Dongarra, C. Johnson, and M. Miller, “Application-Specific Tools”, In I. Foster and C. Kesselman, eds., The GRID: Blueprint for a New Computing Infrastructure, Morgan Kaufmann, San Francisco, 1999, pp. 159-180.
    [CFF01] K. Czajkowski, S. Fitzgerald, I. Foster, C. Kesselman, “Grid Information Services for Dis-tributed Resource Sharing”, Proc. 10th IEEE Int’l Symp. High Performance Distributed Computing, Aug 2001, pp. 181-194.
    [Chr97] B. Christiansen et al., “Javalin: Internet-based Parallel Computing using Java”, Concur-rency: Practice and Experience, Vol. 9, No. 11, 1997, pp. 1139-1160.
    [CJJ02] E. Cronin, S. Jamin, C. Jin, A.R. Kurc, D. Raz, and Y. Shavitt, “Constrained Mirror Place-ment on the Internet”, IEEE J. Selected Areas in Communications, Vol.20, No.7, Sep 2002, pp.1369-1381.
    [Cka97] T.H. Chia, and S. Kannapan, “Strategically Mobile Agents”, Proc 1st Int’l Workshop on Mobile Agents, Berlin, Springer, April 1997, pp.149-161.
    [Cka03] P. Crescenzi, and V. Kann, A Compendium of NP Optimization Problems, Available at http://www.nada.kth.se/~viggo/wwwcompendium.
    [Cla85] K.M. Chandy, and L. Lamport, “Distributed Snapshots: Determining Global State of Dis-tributed Systems,” ACM Trans. Computer Systems, Vol. 3, No. 1, 1985, pp. 63-75.
    [Cli01] M. Chen, and Z. Li, “A Real-Time Multicasting Routing Algorithm Based on Genetic Al-gorithms”, Chinese Journal of Software, 2001, 12(5): 721-728 (in Chinese).
    [CPC01] H. Chou, G. Premkumar, and C. Chu, “Genetic Algorithms for Communication Network Design -- An Empirical Study of the Factors that Influence Performance”, IEEE Trans. Evolutionary Computation, Vol.5, No.3, Jun 2001, pp. 236-249.
    [CSh99] F.A. Chudak, and D.B. Shmoys, “Improved Approximation Algorithms for a Capacitated Facility Location Problem”, Proc. 40th Annual ACM-SIAM Symp. Discrete Algorithms, 1999, pp.875-876.
    [CSi98] G. Cao, and M. Singhal, “On coordinated checkpointing in distributed systems”, IEEE Trans. Parallel and Distributed Systems, Vol. 9, No. 12, Dec 1998, pp. 1213-1225.
    [CW92a] J. Cao, and K.C. Wang, “An Abstract Model of Distributed Rollback Recovery Control Algorithms”, ACM Operating System Review, Vol. 26, No. 4, Oct 1992.
    [CW92b] J. Cao, and K.C. Wang, “Efficient Synchronous Checkpointing in Distributed Systems” Australia Computer Science Communications, Vol.14, No.1, 1992, pp. 165-179.
    [CWD03] J. Cao, X. Wang, and S.K. Das, “A Framework of Using Cooperating Mobile Agents to Achieve Load Balancing in Distributed Web Server Groups”, Journal of Parallel and Dis-tributed Computing (Elsevier Science), 2003.
    [CWW02] J. Cao, X. Wang, and J. Wu, “A Mobile Agent Enabled Fully Distributed Mutual Exclu-sion Algorithm”, Proc. 6th IEEE Int’l Conf. Mobile Agents, Oct 2002, Barcelona, Spain. Lecture Notes in Computer Science (Springer-Verlag), Vol. 2535, pp. 138-153.
    [DCT03] M. Day, B. Cain, G. Tomlinson, and P. Rzewski, “A Model for Content Internetworking”, RFC 3466, Network Working Group, Feb 2003.
    [Dfo82] L.W. Dowdy, and D.V. Foster, “Comparative Models of the File Assignment Problem”, ACM Computer Surveys, Vol. 14, No. 2, 1982, pp. 287-313.
    [DHK03] X. Défago, N. Hayashibara, and T. Katayama, “On the Design of a Failure Detection Ser-vice for Large-Scale Distributed Systems”, Proc. Int'l Symp. Towards Peta-Bit Ul-tra-Networks, Ishikawa, Japan, Sep 2003, pp.88-95.
    [DNM99] P. Dasgupta, N. Narasimhan, L.E. Moser, and P.M. Melliar-Smith, “MAgNET: Mobile Agents for Networked Electronic Trading”, IEEE Trans. Knowledge and Data Engineer-ing, Vol. 11, No. 4, July/Aug 1999, pp. 509-525.
    [DRe01] Y.S. Ding, and L.H. Ren, “Merging Mobile Agents, Genetic Algorithms and Fuzzy Logic for Intelligent Internet Search”, IEEE Int’l Conf. Systems, Man, and Cybernetics, Vol.2, 2001, pp.811-816.
    [EAW02] E.N. Elnozahy, L. Alvisi, Y.M. Wang, and D.B. Johnson, “A survey of rollback-recovery protocols in message-passing systems”, ACM Computing Surveys, Vol. 34, No. 3, Sep 2002, pp. 375-408.
    [EJZ92] E.N. Elnozahy, D.B. Johnson, and W. Zwaenepoel “The Performance of Consistent Check-pointing”, Proc. 11th Symp. Reliable Distributed Systems, Oct 1992, pp. 298-307.
    [EZw92] E.N. Elnozahy, and W. Zwaenepoel, “Manetho: Transparent Rollback-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit”, IEEE Trans. Computers, Vol. 41, No. 5, May 1992, pp. 526-531.
    [FHK78] G.N. Frederickson, M.S. Hecht, and C.E. Kim, “Approximation Algorithms for Some Routing Problems”, SIAM J. Comp., Vol.7, No.2, 1978, pp. 178-193.
    [FKN02] I. Foster, C. Kesselman, J. Nick, S. Tuecke, “Grid Services for Distributed System Inte-gration”, Computer, Vol. 35, No. 6, 2002, pp. 37-46.
    [FKT01] I. Foster, C. Kesselman, S. Tuecke, “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”, Int’l J. High-Performance Computing Applications, Vol. 15, No. 3, 2001, pp. 200-222.
    [FLW92] X. Fei, J. Luo, J. Wu, and G. Gu, “QoS Routing Based on Genetic Algorithm”, Computer Communications, 1992, 22(15): 1394-1399.
    [Fox03] G. Fox, “Integrating Computing and Information on Grids”, Computing in Science & En-gineering, Vol. 5, No. 4, Jul/Aug 2003, pp. 94-96.
    [FTF02] J. Frey, T. Tannenbaum, I. Foster, M. Livny, and S. Tuecke, “Condor-G: A Computation Management Agent for Multi-institutional Grids”, Cluster Computing, Vol.5, No.3, 2002, pp. 237-246.
    [FTS03] M. Fukuda, Y. Tanaka, N. Suzuki, L.F. Bic, S. Kobayashi, “A mobile agent-based PC grid”, Proc. Autonomic Computing Workshop, Jun 2003, pp. 142-150.
    [Fun98] S. Funfrocken, “Integrating Java-based Mobile Agents into Web Servers under Security Concerns”, Proc. 31st Hawaii Int’l Conf. System Sciences, Jan 1998, pp. 34-43.
    [GFL98] A.S. Grimshaw, A.J. Ferrari, G. Lindahl et al., “Metasystems”, Communications of the ACM, Vol. 41, No. 11, Nov 1998, pp. 46-55.
    [GFW96] A.S. Grimshaw, A. Ferrari, and E.A. West, “Mentat”, In G.V. Wilson, and P. Lu, editors, Parallel Programming Using C++, TheMIT Press, Cambridge Mass., 1996, pp. 382-427.
    [GIK98] M. Gen, K. Ida, and J. Kim, “A Spanning Tree-Based Genetic Algorithm for Bicriteria Topological Network Design”, IEEE Int’l Conf. Evolutionary Computation/IEEE World Congress on Computational Intelligence, 1998, pp.15-20.
    [GKP01] R. Gray, D. Kotz, R.A. Peterson et al., “Mobile Agent versus Client/Server Performance: Scalability in an Information-Retrieval Task”, Available at ftp://ftp.cs.dartmouth.edu/pub/ kotz/paters/gray: scalability.ps.Z.
    [Globus] The Globus Toolkit, Available at http://www.globus.org.
    [Gol91] R.A. Golding, Accessing Replicated Data in Large-scale Distributed System, MSc Thesis, Dept of Computer and Information Sciences, Univ. of California, Santa Cruz., Jun 1991.
    [GOP02] R. Glitho, E. Olougouna, and S. Pierre, “Mobile Agents and Their Use for Information Retrieval: A Brief Overview and an Elaborate Case Study”, IEEE Network, 2002, 16(1): 34-41.
    [GWL97] A.S. Grimshaw, W.A. Wulf, and The Legion Team, “The Legion Vision of a Worldwide Virtual Computer”, Communications of the ACM, Vol. 40, No. 1, 1997, pp. 39-45.
    [GWN00] Z. Ge, Y. Wang, L. Nan et al., “Genetic Algorithms Based on Bintree Structure Encoding”, Journal of Tsinghua University (Sci & Tech), 2000, 40(10): 125-128 (in Chinese).
    [HBM98] L. Hagen, M. Breugst, and T. Magedanz, “Impact of Mobile Agent Technology on Mobile Communication System Evolution”, IEEE Personal Communications, Aug 1998, pp. 56-69.
    [HCh01] Y. He, and X. Chen, Design and Application of Agent and Multiagent Systems, Wuhan: Wuhan University Press, 2001 (in Chinese).
    [HCh03] He Yanxiang, and Chen Yifeng, “A GA-Based Solution to the Migration Problem of Mo-bile Agents in Distributed Information Retrieval Systems”, Proc. 23rd Int’l Conf. Distrib-uted Computing Systems Workshops, IEEE CS Press, 2003, 466-471.
    [HCH04] He Yanxiang, Chen Yifeng, He Jing, and Cao Jiannong, “Optimization in the Migration Problem of Mobile Agents in Distributed Information Retrieval Systems”, Wuhan Univer-sity Journal of Natural Sciences, Vol.9, No.3, 2004.
    [HGK98] G.D.H. Hunt, G.S. Goldszmidt, R.P. King, and R. Mukherjee, “Network Dispatcher: A Connection Router for Scalable Internet Services”, Computer Networks and ISDN Sys-tems, Vol. 30, Elsevier Science, Amsterdam, Netherlands, Apr 1998, pp.347-357.
    [HIC01] Q. Hairong, S. Iyengar, and K. Chakrabarty, “Multiresolution data integration using mo-bile agents in distributed sensor networks”, IEEE Trans. Systems, Man and Cybernetics, Part C: Applications and Reviews, Vol.31, No. 3, Aug 2001.
    [HK03a] S. Hwang, and C. Kesselman, “A Generic Failure Detection Service for the Grid”, Techni-cal Report ISI-TR-568, USC Information Sciences Institute, Feb 2003.
    [HK03b] S. Hwang, and C. Kesselman, “Grid workflow: a flexible failure handling framework for the grid”, Proc. 12th IEEE Int’l Symp. High Performance Distributed Computing, Jun 2003, pp. 126-137.
    [HKi93] Y. Huang, and C. Kintala, “Software Implemented Fault Tolerance: Technologies and Ex-perience”, Proc. 23rd Int’l Symp. Fault-Tolerant Computing, Jun 1993, pp. 2-9.
    [HMi97] A. Heddaya, and A. Mirdad, “WebWave: Globally Load Balanced Fully Distributed Cach-ing of Hot Published Documents”, Proc. IEEE ICDCS’97, May 1997, pp.160-168.
    [HNC95] D.L. Hoffman, T.P. Novak, and P. Chatterjee, “Commercial Scenarios for the Web: Oppor-tunities and Challenges”, J. Computer-mediated Communications, Vol.3, No.3, 1995.
    [JLH01] X. Jia, D. Li, X. Hu, and D. Du, “Placement of Read-Write Web Proxies on the Internet”, Proc. IEEE ICDCS’01, Apr 2001, pp.687-690.
    [JSp89] K.D. Jong, and W.M. Spears, “Using Genetic Algorithms to Solve NP-complete Prob-lems,” Proc. 3rd Int. Conf. Genetic Algorithms and Their Application, J. D. Schaffer, Ed. San Mateo, CA: Morgan Kaufmann, 1989, pp. 124–132.
    [JZw90] D.B. Johnson, and W. Zwaenepoel, “Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing”, Journal of Algorithms, Vol. 11, 1990, pp. 462-491.
    [KBD02] H. Kuang, L.F. Bic, M.B. Dillencourt, “Iterative grid-based computing using mobile agents”, Proc. Int’l Conf. Parallel Processing (ICPP’02), Aug 2002, pp. 109-117.
    [KPR99] M.R. Korupolu, C.G. Plaxton, and R. Rajaraman, “Placement Algorithms for Hierarchical Cooperative Caching”, Proc. 10th Annual ACM-SIAM Symp. Discrete Algorithms, Jan 1999, pp.586-595.
    [KRS02] P. Krishnan, D. Raz, and Y. Shavitt, “The Cache Location Problem”, IEEE/ACM Trans. Networking, Vol.8, No.5, Oct 2002, pp.568-582.
    [KSi89] J. Kurose, and R. Simha, “A Microeconomic Approach to Optimal Resource Allocation in Distributed Computer Systems”, IEEE Trans. Computers, Vol. 38, No. 5, May 1989, pp. 705-717.
    [KTo87] R. Koo, and S. Toueg, “Checkpointing and Rollback Recovery for Distributed Systems”, IEEE Trans. Software Eng., Vol. SE-13, No. 1, Jan 1987.
    [Lam78] L. Lamport, “Time, Clocks, and the Ordering of Events in a Distributed Systems”, CACM, Vol. 21, No. 7, 1978, pp. 54-70.
    [LBh88] P.J. Leu, and B. Bhargava, “Concurrent Robust Checkpointing and Recovery in Distrib-uted Systems”, IEEE 4th Conf. Data Eng., 1988.
    [LFG00] G. von Laszewski, I. Foster, J. Gawor, W. Smith, and S. Tuecke, “Cog Kits: A Bridge be-tween Commodity Distributed Computing and High-Performance Grids”, ACM 2000 Java Grande Conference, 2000.
    [LGI99] B. Li, M.J. Golin, G.F. Italiano, X. Deng, and K. Sohraby, “On the Optimal Placement of Web Proxies in the Internet”, Proc. IEEE INFOCOM’99, Mar 1999, pp.1282-1290.
    [LKu99] W. Leinberger, V. Kumar, “Information Power Grid: The New Frontier in Parallel Com-puting?”, IEEE Concurrency, Vol. 7, No. 4, Oct/Dec 1999, pp. 75-84.
    [LLi01] Y. Liu, and S. Liu, “Degree-Constrained Multicasting for Multimedia Communications”, Chinese Journal of Software, 2001, 24(4): 367-372 (in Chinese).
    [LLi03] Y. Li, and M.T. Liu, “Optimization of Performance Gain in Content Distribution Networks with Server Replicas”, Proc. 2003 Symp. Applications and the Internet, 2003.
    [LLX98] Y. Leung, G. Li, and Z. Xu, “A Genetic Algorithm for the Multiple Destination Routing Problem”, IEEE Trans. Evolutionary Computation, Vol.2, No.4, Nov 1998, pp.150-161.
    [LOs98] D.B. Lange, and M. Oshima, Programming and Deploying Java Mobile Agents with Aglets, Addison Wesley, 1998.
    [LOs99] D.B. Lange, and M. Oshima, “Seven Good Reasons for Mobile Agents”, Communication of the ACM, Vol. 42, No. 3, Mar 1999, pp. 88-89.
    [LRG91] A. Lowry, J.R. Russell, and A.P. Goldberg, “Optimistic Failure Recovery for Very Large Networks”, Proc. 10th Symp. Reliable Distributed Systems, Pisa,Italy, Sep 1991, pp.66-75.
    [LTe01] I. Lazar, and W. Terrill, “Exploring Content Delivery Networking”, IEEE IT Pro, July / Aug 2001, pp.47-49.
    [Mar01] D.C. Marinescu, “Reflections on Qualitative Attributes of Mobile Agents for Computa-tional, Data, and Service Grids”, Proc. 1st IEEE/ACM Int’l Symp. Cluster Computing and the Grid, May 2001, pp. 442-449.
    [MSi99] D. Manivannan, and M. Singhal, “Quasi-synchronous checkpointing: Models, characteri-zation, and classification”, IEEE Trans. Parallel and Distributed Systems, Vol. 10, No. 7, Jul 1999, pp. 703-713.
    [MWE00] A. Mahanti, C. Williamson, and D. Eager, “Traffic Analysis of a Web Proxy Caching Hi-erarchy”, IEEE Network, May/Jun 2000, pp.16-23.
    [Nimro] Nimrod/G, Available at http://www.globus.org/research/applications/nimrod.html.
    [Ninfg] Ninf Home Page, Available at http://ninf.apgrid.org.
    [NLR98] N. Nisan, S. London, O. Regev, and N. Camiel, “Globally Distributed Computation Over the Internet – The POPCORN Project”, Proc. IEEE ICDCS’98, 1998, pp.592-601.
    [NVR02] W. Ni, S.V. Vrbsky, and S. Ray, “A note on distributed nonblocking checkpointing”, Tech-nical Report of University of Alabama, TR-2002-01, 2002.
    [NXu95] R.H.B. Netzer, and J. Xu, “Necessary and sufficient conditions for consistent global snap-shots”, IEEE Trans. Parallel and Distributed Systems, Vol. 6, No. 2, Feb 1995, pp.65-169.
    [PCD03] L. Pouchard, L. Cinquini, B. Drach et al., “An Ontology for Scientific Information in a Grid Environment: The Earth System Grid”, Proc. 3rd IEEE / ACM Int’l Symp. Cluster Computing and the Grid, May 2003, pp. 626-632.
    [PKa98] V.A. Pham, and A. Karmouch, “Mobile Software Agents: An Overview”, IEEE Commu-nications Magazine, Jul 1988, pp. 26-37.
    [QPV01] L. Qiu, V.N. Padmanabhan, and G.M. Voelker, “On the Placement of Web Server Repli-cas”, Proc. IEEE INFOCOM’01, Vol.3, Apr 2001, pp.1587-1596.
    [Ran75] B. Randell, “System Structure for Software Fault Tolerance”, IEEE Trans. Software Eng., Vol. SE-1, No. 2, 1975, pp. 220-232.
    [RGE01] P. Radoslavov, R. Govindan, and D. Estrin, “Topology-Informed Internet Replica Place-ment”, Proc. Web Caching and Content Distribution Workshop, Boston, MA, Jun 2001.
    [RJS03] D. De Roure, N. Jennings, N. Shadbolt, “The Semantic Grid: A Future e-Science Infra-structure,” Grid Computing: Making the Global Infrastructure a Reality, F. Berman, G. Fox, T. Hey, eds., John Wiley & Sons, 2003.
    [RMH98] R. van Renesse, Y. Minsky, and M. Hayden, “A Gossip-Style Failure Detection Service”, Proc. Middleware, Sep 1998.
    [RRR99] M. Rabinovich, I. Rabinovich, R. Rajaraman, and A. Aggarwal, “A Dynamic Replication and Migration Protocol for an Internet Hosting Service”, Proc. IEEE ICDCS’99, 1999, pp. 101-113.
    [Rus80] D.L. Russell, “State restoration in systems of communicating processes”, IEEE Trans. Software Engineering, Vol. 6, No.2, 1980, pp. 183-194.
    [SFK98] P. Stelling, I. Foster, C. Kesselman, C. Lee, and G. Von Laszewski, “A Fault Detection Service for Wide Area Distributed Computations”, Proc. 7th Int’l Symp. High Perform-ance Distributed Computing, Jul 1998, pp. 268-278.
    [SSc83] R.D. Schlichting, and F.B. Schneider, “Fail-stop processor: An approach to designing fault-tolerant computing systems”, ACM Trans. Computer Systems, Vol. 1, No. 3, Aug 1983, pp. 222-238.
    [SSi92] L.M. Silva, and J.G. Silva, “Global checkpointing for distributed programs”, Proc. 11th Symp. Reliable Distributed Systems, Oct 1992, pp. 155-162.
    [SSi98] L.M. Silva, and J.G. Silva, “System-level versus User-Level Checkpointing”, Proc. 17th IEEE Symp. Reliable Distributed Systems, 1998, pp. 68-74.
    [STe85] R. Strom, and S. Temini, “Optimistic Recovery in Distributed Systems”, ACM Trans. Computer Systems, Aug 1985, pp. 204-226.
    [TCh02] X. Tang, and S.T. Chanson, “Coordinated En-route Web Caching”, IEEE Trans. com-puters, Vol. 51, No. 6, Jun 2002, pp.595-607.
    [TR00a] W. Theilmann, and K. Rothermel, “Optimizing the Dissemination of Mobile Agents for Distributed Information Filtering”, IEEE Concurrency, Vol. 8, No. 2, Apr/Jun 2000, pp. 53-61.
    [TR00b] W. Theilmann, and K. Rothermel, “Dynamic Distance Maps of the Internet,” Proc. 19th IEEE INFOCOM 2000, IEEE CS Press, Los Alamitos, Calif., 2000, pp.275-284.
    [Tuo00] A. Nguyen-Tuong, Integrating Fault-Tolerance Techniques in Grid Applications, PhD Thesis, University of Virginia, Aug 2000.
    [TVi01] O. Tomarchio, L. Vita, “On the use of mobile code technology for monitoring Grid sys-tem”, Proc. 1th IEEE/ACM Int’l Symp. Cluster Computing and the Grid, May 2001, pp. 450-455.
    [VWD01] A. Venkataramani, P. Weidmann, and M. Dahlin, “Bandwidth Constrained Placement in a WAN”, ACM Symp. Principles of Distributed Computing, Aug 2001.
    [Wax88] B.M. Waxman, “Routing of Multipoint Connections”, IEEE J. Selected Areas in Commu-nications, Vol.6, No.9, Dec 1988, pp.1617-1622.
    [WCL92] Y.M. Wang, P.Y. Chung, I.J. Lin, and W.K. Fuchs, “Checkpoint space reclamation for un-coordinated checkpointing in message-passing systems”, IEEE Trans. Parallel and Distributed Systems, Vol. 6, No. 5, May 1995, pp. 546-554.
    [WMi91] O. Wolfson, and A. Milo, “The Multicast Policy and its Relationship to Replicated Data Placement”, ACM Trans. Database Sys., Vol. 16, No. 1, Mar 1991, pp. 181-205.
    [Woo81] W.G. Wood, “A Decentralized Recovery Control Protocol”, 1981 IEEE Symp. Fault Toler-ant Computing, 1981.
    [WSh01] Z. Wang, and B. Shi, “Solving QoS Multicast Routing Problem Based on Heuristic Ge-netic Algorithm”, Chinese Journal of Software, 2001,24(1): 55-61 (in Chinese).
    [WWH01] X. Wang, Z. Wang, M. Huang et al., “Quality of Service Based Initial Route Setup Algo-rithms for Multimedia Communication”, Chinese Journal of Software, 2001, 24(8): 830-837 (in Chinese).
    [XLL02] J. Xu, B. Li, and D.L. Lee, “Placement Problems for Transparent Data Replication Proxy Services”, IEEE J. Selected Areas in Communications, Vol.20, No.7, 2002, pp.1383-1398.
    [XNe93] J. Xu, and R.H.D. Netzer, “Adaptive Independent Checkpointing for Reducing Rollback Propagation”, Proc. 5th IEEE Symp. Parallel and Distributed Processing, 1993, pp.754-761.
    [XTa99] C. Xu, and D. Tao, “Building Distributed Applications with Aglet”, Available at http:// www.cs.duke.edu/chong/aglet.
    [XWi00] C.Z. Xu, and B. Wims, “A Mobile Agent Based Push Methodology for Global Parallel Computing”, Concurrency: Practice and Experience, Vol. 14, No.8, Jul 2000, pp.705-726.
    [Ye00] X. Ye, “A Checkpointing Scheme for Internet-based Computing Systems”, Proc. Int’l Symp. Software, 16th IFIP World Computer Congress, Beijing, China, Aug 2000.
    [ZGW96] G. Zhou, M. Gen, and T. Wu, “A New Approach to the Degree-Constrained Minimum Spanning Tree Problem Using Genetic Algorithm”, IEEE Int’l Conf. Systems, Man, and Cybernetics, Vol.4, 1996, pp.2683-2688.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700