数据网格中副本管理策略研究

英文题名：Research of Replica Management in Data Grid
作者：施晓烨
论文级别：硕士
学科专业名称：计算机软件与理论
中文关键词：数据网格 ; 副本创建 ; 副本选择 ; 副本定位 ; 副本一致性维护
英文关键词：Data Grid ; Replica Creation ; Replica Selection ; Replica Location ; Replica Consistency Maintenance
学位年度：2011
导师：王汝传
学科代码：081202
学位授予单位：南京邮电大学
论文提交日期：2011-03-01

摘要

信息爆炸带来了对数据存储及访问速度的空前要求,存储规模越来越大,管理也越来越复杂,同时对存储的可扩展及可靠性也带来了更高的挑战。为了解决这些问题,数据网格应运而生。数据网格正是一个以数据为主要资源的网格系统,它将网络上海量的、分散的、独立的、异构的储存系统组织成一个可靠、安全的逻辑意义上的整体,进行统一的管理,从而为用户提供透明的、高效的、高可靠的服务。
     数据网格中的副本技术是必不可少的。副本技术主要包括:副本创建、副本选择、副本定位以及副本一致性维护。其中,副本创建的优劣将直接影响到网格的系统性能,因此必须结合环境特点在合适的节点上建立副本;在副本创建之后,则需要副本的选择和定位机制来获取最优副本;同时由于网格的动态性,副本一致性维护也是副本管理中的重要组成部分,因为这直接影响到副本管理的性能和正确性。
     本文针对数据网格中副本的关键技术,做了以下几方面的研究:首先,改进了传统的副本创建算法,提出了改进的最佳副本创建算法;其次,提出了适合当前环境的副本定位及一致性维护方法以及给出了基于副本访问代价的副本选择算法;最后,实现了副本管理系统。
The information explosion has brought about an unprecedented demand in the speed of data storing and accessing. The storage scale is growing soaring larger, and management is becoming increasingly complex as well, all of these have caused greater challenges to the storage scalability and reliability. To solve all these difficulties, the data grid technology comes into being at this time. The data grid is a major resource based data-grid system that organizes the huge, decentralized, independent, heterogeneous storage system from the internet into a reliable, secure integration of logical meaning, and manages them unified, aiming at providing the users with transparent, efficient, highly reliable service.
     The replica technology of data grid is essential, such as: replica creation, replica selection, replica location, replica consistency maintenance. Firstly, among them, the quality of replica creation will directly affect the system performance of information grid, so a good foundation must be laid according to the characteristics of environment to create replica at the appropriate nodes. After the replica creation, it’s necessary to use the technology of replica selection and replica location to find the best replica. At the same time, replica consistency maintenance is also a copy of an important part of the replica management, which will directly impact on the performance and correctness of replica management.
     This thesis researches several aspects on key technologies of replica in the data grid: at first, we advance the traditional algorithm and propose the advanced best replica algorithm; secondly, accordingly to the environment, we propose suitable methods of replica location and replica consistency maintenance and raise the technology of replica selection based on replica cost. At last, we finish the replica management system.

引文

[1] I Foster, C Kesselman. Computational Grids in the Grid: Blueprint for a New Computing Infrastructure [M]. an Francisco, USA: Morgan Kaufman Publishers, 1999: 177-187.
    [2] Chervenak A, Foster I, Kesselman C, Salisbury C, Tuecke S.The data grid:towards an architecture for the distributed management and analysis of large scientific data sets [J]. Network and Computer Applications, 2001(3): 187-200.
    [3] H. Stockinger. Database replication in world-wide distributed data grids [D]. Phd Thesis, University of Vienna,November, 2001.
    [4]孙海燕.数据网格副本管理关键技术研究[D].湖南:国防科学技术大学,2005.
    [5]游新冬.数据网格中的副本管理策略研究[D].辽宁:东北大学,2005.
    [6]孙海燕,王晓东,周斌,贾焰,王怀民,邹鹏.基于存储联盟的双层动态副本创建策略——SADDRES[J].电子学报. 2005, 7:1222-1226.
    [7]王驰,施亮.数据网格中一种基于滑动窗口的副本创建和替换策略[J].计算机应用与软件. 2008,6:38-43.
    [8]吴长泽,陈蜀宇,田东.基于开销分摊的数据网格副本创建策略[J].华中科技大学学报(自然科学版). 2007:Ⅱ:94-97.
    [9]卢炎生,胡辉.基于hybird拓扑的数据网格副本创建策略[J].计算机应用与研究. 2007:11:286-288.
    [10]蔡正林,杨瑜萍.基于访问趋势的热点副本创建策略[J].计算机应用研究. 2007:12:57-59.
    [11]孙敏,孙济洲,李明楚,于策.基于蚂蚁算法的数据网格副本选择策略[J].计算机工程与应用. 2007:01:0145-03.
    [12]沈薇,刘方爱.基于模拟退火算法的数据副本选择策略[J].计算机工程与应用. 2006:35:0145-03.
    [13]柴洁.数据网格副本选择策略研究[D].湖北:武汉理工大学,2007.
    [14]陈蕾,杨鹏.蚂蚁算法在数据网格副本选择中的应用研究[J].计算机工程与设计. 2008:23:6157-04.
    [15]王茜,田荣阳.数据网格环境下的一种副本定位方法[J].计算机应用研究. 2006:07:0150-03.
    [16]高改梅,白尚旺,党伟超.一种分布式数据网格副本定位机制的研究[J].太原科技大学学报. 2007:04:0262-04.
    [17]荣翠芳.数据网格环境中副本一致性问题的研究[D].辽宁:大连理工大学,2009.
    [18]张春明.数据网格中副本一致性模型的研究[D].北京:中国石油大学,2008.
    [19]韩宝玲.数据网格中副本一致性维护及选择的研究与实现[D].湖南:国防科学技术大学,2008.
    [20]贾艳燕.分布异构多数据库中多副本一致性维护技术研究与实现[D].湖南国防科学技术大学,2006.
    [21]亓雪冬,仝兆岐,何潮观.快速瀑布模型动态副本创建策略研究[J].系统仿真学报. 2008,15:4054-4063.
    [22]刘瑰,朱鸿宇,韦海亮,谢向辉.基于聚类的动态副本创建策略[J].微电子学与计算机. 2008:9:11-17.
    [23]王剑.校园数据网格关键技术研究与设计[D].陕西:西北大学,2007.
    [24]Casavant, Tl, Jg Kuhl. A taxonomy of scheduling in general-purpose distributed computing systems [J]. IEEE Transactions on Software Engineering, 1988, 14(2): 141-154.
    [25]Tracy D. Braun, Howard Jay Siegel, and Noah Beck.A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems [J]. Journal of Parallel and Distributed Computing, 2001, 61(6): 810-837.
    [26]Casavant, T.L.; Kuhl, J.G.A Taxonomy of Scheduling in General-Purpose Distributed Computing Systems [J]. Software Engineering, IEEE Transactions, 1988, 12(2): 141-154.
    [27]Seneviratne, Christophe Diot and Aruna.Quality of Service in Heterogeneous Distributed Systems [C]. Proc of the 30th International Conference on System Sciences (HICSS). Maui, Hawaii. : 1997, 238--253.
    [28]Ian Foster, Carl Kesselman, Steven Tuecke. The physiology of the grid: an open grid services architecture for distributed systems integration [C]. Open Grid Service Infrastructure WG, Global Grid Forum. Toronto, Canada, 2002.
    [29]Srikumar Venugopal, Scheduling Distributed Data-Intensive Applications on Global Grids [D]. Department of Computer Science and Software Engineering The University of Melbourne, Australia: 2006.
    [30]王意洁,肖侬,任浩,卢锡城.数据网格及其关键技术研究[J].计算机研究与发展., 2002(8): 943~947.
    [31]吴豪,曾国荪,张季平.数据网格关键技术分析[J].计算机工程与应用, 2003(35): 28-32.
    [32]刘彩燕,白尚旺.网格数据复制的一致性研究.计算机工程与设计[J].2006(9): 3163-3164.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700