网格环境下基于多Replica的数据管理与传输模型的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
数据网格以其良好的数据共享和协同工作能力,满足了诸如高能物理、气候模拟等数据密集型任务的需求。然而,由于动态复杂的网格环境中节点失效、网络突变等情况时有发生,使得网格环境中数据传输的速度和稳定性都无法得到保障,成为制约网格技术应用的“瓶颈”。
     Replica技术是数据网格中的关键技术,它在本地创建远程数据的副本,降低了网络延迟及带宽消耗,同时也形成了多副本并存的网格资源共享方式,这种方式为解决传输问题提供了机遇,于是开展基于多Replica的数据传输研究,成为解决网格数据传输速度和稳定性问题的重要途径。
     本文以提高网格环境中数据传输速度和稳定性为目标,采用Globus Toolkit中间件,开展将Replica技术融入数据传输的研究,主要工作体现在:
     (1)分析了网格数据管理及其Replica技术:总结了网格数据管理、Replica技术,并对论文所涉及的Replica定位和选择算法进行了分析;
     (2)研究了网格数据传输机制:从资源共享方式和传输协议两方面对比分析了不同资源共享方式、不同传输协议等对网格数据传输的影响;
     (3)实验分析了GridFTP协议的传输性能:对GridFTP并行传输、条状传输等进行了实验,通过性能分析,进一步证明了课题研究的重要意义;
     (4)提出了基于多Replica的数据传输模型MRT及其算法:提出了MRT模型,并定义了模型的组成元素及其间的映射关系;设计了模型的区域化多层次副本定位策略;并借鉴概率预测方法,在启发式算法的基础上设计了启发式动态任务分配算法,最后对策略和算法进行了复杂度分析;
     (5)设计和实现了模型的测试系统:从整体和模块两个方面对系统进行了设计和实现,并基于测试系统对模型的性能进行了实验。
     理论分析和实验结果表明,MRT模型有效地提高了数据传输的速度和稳定性,特别是在传输大文件时效果比较明显。
Data grid meets the demand for data-intensive tasks with good data sharing and collaboration capabilities, such as high-energy physics, climate modeling and so on. However, because of the dynamic and complex grid environment, node failures and unexpected changes in network occur frequently. So the speed and stability of grid data transfer can’t be guaranteed, and it has become the“bottleneck”that restricts grid applications.
     Replica is the key technology of data grid. It creates local copies of the remote data, reduces network delay and bandwidth consumption, and simultaneously forms a way of multi-replica coexisting gird resources sharing which provides opportunity to resolve transfer problems. So the research of data transfer based on multi-replica becomes an important approach to resolve the problem of speed and stability of data transfer in grid environment.
     The purpose of this paper was to increase the data transfer speed and stability in grid environment. The Globus Toolkit middleware was used and the research focused on the combination of Replica technology with data transfer. The main works were that:
     (1) Grid data management and its replica technology were analyzed: this paper summarized gird data management and its replica technology, as well as the involved replica location and selection algorithms;
     (2) Research of data transfer mechanism in grid: the analysis of the influence to grid data transfer by different resource sharing ways or different transfer protocols was made;
     (3) Transfer performance analysis of GridFTP protocol was made by experiments: we did experiments about GridFTP parallel transfer and strip transfer. And through the analysis, the importance of this paper’s further research was much clearer;
     (4) A data transfer model based on multi-replica (MRT) and its algorithms were proposed: this paper proposed a data transfer model based on multi-replica, named MRT, and defined the model’s elements and mappings between them; Then, the model’s multi-level replica location strategy based on locality was designed; Besides these, a heuristic dynamic task allocation algorithm was designed based on heuristic method and probability forecast method. Finally, we made the analysis of complexity of the strategy and algorithm;
     (5) Design and implementation of MRT model’s test system: this paper designed and implemented MRT model’s test system from two aspects: the whole and modules. And the experiments testing model’s performance were done based on the testing system.
     Theoretical analysis and experimental results showed that the MRT model had effectively improved the speed and stability of data transfer, especially for bulk data transfer.
引文
[1] 张军华.基于HPF 的地震资料并行处理方法研究:[工学博士学位论文],石油大学(华东), 山东:2002
    [2] I Foster, C Kesselman, S Teucke. The Anatomy of the Grid: Enabling Scalable Virtual Organization. International. Supercomputer Applications, 2001;15 (3):80-84
    [3] Globus Homepage http://www.globus.org/ 2007
    [4] B. Allcock, J. Bester, J. Bresnahan et al. Data management and transfer in high performance computational grid environments. Parallel Computing Journal, Vol.28, May 2002:5-12
    [5] The Globus Team. GridFTP: Universal Data Transfer for the Grid. http://www.globus.org, White Paper.
    [6] Vazhkudais. Replica Selection in the Globus Data Grid. Proceedings of the First IEEE/ACMC CGRID, 2001
    [7] EU Datagrid Project Http://eudatagrid.web.cern.ch/eu-datagrid/ 2005
    [8] L Guy, P Kunszt, E Laure et.al. Replica Management in Data Grids, Technical report, DataGrid-02-TED-020724, July 2002
    [9] SRB http://www.ibm.com/developerworks/cn/grid/gr-srb/#N10052 2007
    [10] 朱朝艳.MASSIVE 网格环境中数据传输客户端 MFTP 的研究与实现: [工学硕士学位论文] 浙江大学, 浙江: 2005
    [11] 庞丽萍, 周润松, 吴松等. 基于 LDAP 的广域网存储虚拟化目录管理 华中科技大学学报(自然科学版),2004; 32 (05): 64~66
    [12] 金海, 官象山, 吴松等. 分布式存储系统中文件传输优化的设计与实现, 华中科技大学学报(自然科学版), 2005; 33 (01): 4~9
    [13] 刘鹏展, 刘亮, 杨寿保. GridFTP 传输性能分析, 计算机工程与应用 2005; 15:138-140
    [14] 肖侬, 付伟, 黄斌等. Griddaen数据网格系统的设计与关键技术实现. http://www.chinagrid.net, 2005
    [15] China Grid Home Page. http://www.chinagrid.edu.cn. 2007
    [16] 都志辉, 陈渝, 刘鹏. 网格计算. 北京: 清华大学出版社, 2002
    [17] 徐志伟, 冯百明, 李伟. 网格计算技术. 北京: 电子工业出版社, 2004
    [18] H Siegelmann, O Frieder. Document allocation in multiprocessor information retrieval systems advanced database systems, LNCS759 Springer, 1993:289-310
    [19] Kurt Geihs1. Middleware challenges ahead. Computer IEEE, 2001,34(6): 24-31.
    [20] I Foster. The Grid: A new infrastructure for 21st century science. Physics Today, 2002:55 (2):42-47
    [21] Data Farm Home Page. http://datafarm.apgrid.org/. 2007
    [22] I Foster, J Frey et al. Modeling stateful resources with web services, Version1.1http://www-106.ibm.com/develperworks/library/ws-resource/ws-modelingresource.pdf
    [23] Borja Sotomayor. The Globus Toolkit 4 programmer’s tutorial. http://www.globus.org/toolkit/. 2007
    [24] J Bester, I Foster, C Kesselman, et al. GASS:A data movement and access service for wide area computing systems. Sixth Workshop on I/O in Parallel and Distributed Systems. 1999.
    [25] W Hoschek, J Jaen-Martinez, A Samar, et al. Data management in an international grid project. 2000 International Workshop on Grid Computing, December 2000.
    [26] OGSA-DAI http://www.globus.org/toolkit/docs/4.0/techpreview/ogsadai/ 2007
    [27] B Allcock, J Bester, J Bresnahanet et al. GridFTP Protocol Specification, GGF GridFTP Working Group Document, September 2002.
    [28] GT 4.0 RLS http://www.globus.org/toolkit/docs/4.0/data/rls/. 2007
    [29] 孙海燕.数据网格副本管理关键技术的研究:[工学博士学位论文], 国防科技大学, 湖南: 2005
    [30] A Rowstron, P Druschel. Scalable, distributed object location and routing for large–scale peer-to-peer systems. Proc of the 18th IFIP/ACM International Conference on Distributed Systems Platforms. Heidelberg, 2001:329-350.
    [31] Hierarchical Distributed Replica Catalogues. EU DataGrid Project: Work Package 2, http://cern.ch/grid-data-management/.
    [32] Y Hu. IBL for replica selection in data intensive Grid applications. Mater’s Thesis, Department of Computer Science, University of Chicago, 2003
    [33] Sudharshan Vazhkudal, M Jennifer, Schopf. Using disk throughput data in predictions of end-to-end grid data transfer. Proceedings of Grid 2002, 2002: 291-304
    [34] Faerman, Wolski, Berman. Adaptive performance prediction for distributed data-Intensive appliation. Proceedings of the 1999 ACM/IEEE conferenece on Supercomputering(CDROM),1999.
    [35] Baru, R.Moore, et al. The SDSC storage resource broker. Proceedings of IBM Centers for Advanced Studies Conferenece 1998
    [36] 黄斌, 彭小宁, 肖侬等. 数据网格环境中数据传输服务的研究与实现,计算机应用研究, 2004; 21 (10): 212~214
    [37] Gnutella Home Page. http://gnutella.wego.com 2007
    [38] The freenet project. http://freenetproject.org 2007
    [39] Napster Home Page. http://www.napster.com 2007
    [40] 顾冠群. 基于HTTP协议的数据传输特性的研究: [工学硕士学位论文], 东南大学, 湖南: 2005
    [41] 蔡勇. FTP 服务器技术研究及实现: [工学硕士学位论文], 电子科技大学, 四川: 2005
    [42] 席军林. 基于 LDAP 的网格 FTP 系统研究: [工学硕士学位论文], 四川大学, 四川: 2005
    [43] A Herzberg, Y Mass, J Mihael. Access control meets public key infrastructure. Proceedings of the 2000 IEEE Symposium on Security and Privacy, IEEE Computer Society Press, 2000
    [44] GridFTP_C_interface. http://www.globus.org/api/ 2007
    [45] GridFTP_Client. http://www.globus.org/api/ 2007
    [46] GT Java CoG Kit. http://www.globus.org/toolkit/docs/ 2007
    [47] 应宏, 黄河. 网格体系结构、关键技术及其应用, 计算机应用研究 2004:9:7-11
    [48] Globus-url-copy. http://www-unix.globus.org/toolkit/docs/ 2006
    [49] 曾国荪, 陈闳中. 计算网格的抽象定义. 同济大学学报(自然科学版), 2003; 31(9):1092-1097
    [50] 时维国, 宋存利, 黄明. 基于启发式算法的并行多机调度问题研究,大连铁道学院学报, 2003; 24(4):55-57
    [51] 寿纪麟. 数学建模--方法与范例, 西安: 西安交通大学出版社, 1984
    [52] 梁鸿, 田世峰. 基于改进蚂蚁算法的网格任务调度策略研究, 电子技术应用, 2006; 32(11):42-44
    [53] 刘舒强. 概率论与数理统计, 天津: 天津大学出版社, 2003
    [54] GT4 Admin Guide http://www.globus.org/toolkit/docs/4.0/admin/ 2007

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700