一种基于可用性的动态云数据副本管理机制

英文篇名：Management Mechanism of Dynamic Cloud Data Replica Based on Availability
作者：陶永才 ; 巴阳 ; 石磊 ; 卫琳
英文作者：TAO Yong-cai;BA Yang;SHI Lei;WEI Lin;School of Information Engineering,Zhengzhou University;School of Software,Zhengzhou University;
关键词：动态副本管理 ; HDFS ; 数据副本 ; 可用性
英文关键词：dynamic replication management;;HDFS;;data replication;;availability
中文刊名：XXWX
英文刊名：Journal of Chinese Computer Systems
机构：郑州大学信息工程学院;郑州大学软件技术学院;
出版日期：2018-03-15
出版单位：小型微型计算机系统
年：2018
期：v.39
基金：河南省高等学校重点科研项目(16A520027)资助
语种：中文;
页：XXWX201803016
页数：6
CN：03
ISSN：21-1106/TP
分类号：92-97

摘要

副本是提高云存储数据可用性的关键技术之一.为提供低成本高效益的可用性,并提高云存储的性能和负载均衡,本文提出一种动态副本管理机制DRM(Dynamic Replica Management scheme).DRM研究确定数据可用性和副本数之间的关系模型,并利用此模型来动态计算和维护给定可用性要求的最小副本数,并基于节点性能和用户访问特性确定副本放置位置.根据节点规模变化,DRM动态调整副本数量以确保数据可用性需求.DRM在节省资源成本的前提下,提高了云存储的性能和负载平衡.本文在HDFS(Hadoop Distributed File System)上实现了DRM,实验结果表明DRM在成本、负载平衡和性能都优于现有HDFS副本管理机制.
Data replication has been widely used to improve the data availability of large cloud storage systems. To provide cost-effective availability and improve cloud storage performance and load balancing,this paper proposes a dynamic replica management scheme called DRM. The DRM study determines the relationship model between data availability and number of replicas and uses this model to dynamically calculate and maintain the minimum number of replicas for a given availability requirement and to determine the location of replicas based on node performance and user access characteristics. Depending on the change of the node size,DRM dynamically adjusts the number of replicas to ensure data availability requirements. DRM improves the performance and load balance of cloud storage under the premise of saving resource cost. DRM is implemented on HDFS(Hadoop Distributed File System),and the results show that DRM is superior to the default replication-managed HDFS in terms of cost,load balancing,and performance.

引文

[1]Ghemawat S,Gobioff H,Leung S T.The google file system[J].Acm Sigops Operating Systems Review,2003,37(5):29-43.
    [2]Weil S A,Brandt S A,Miller E L,et al.Ceph:a scalable,high-performance distributed file system[C].Symposium on Operating Systems Design and Implementation,USENIX Association,2006:307-320.
    [3]Konstantin Shvachko K,Hairong Kuang,Radia S,et al.The hadoop distributed file system[C].In Proc.of the 26th M ass Storage System and Technologies,2010:1-10.
    [4]Rahman R M,Barker K,Alhajj R.Replica placement strategies in data grid[J].Journal of Grid Computing,2008,6(1):103-123.
    [5]Xu Jing,Yang Shou-bao,Wang Shu-ling,et al.CDRS:an adaptive cost-driven replication strategy in cloud storage[J].Journal of the Graduate School of the Chinese Academy of Sciences,2011,28(6):759-767.
    [6]Amazon simple storage service website[EB/OL].http://aws.amazon.com/s3/,Jan,2017.
    [7]Melorose J,Perroy R,Careas S.Hadoop definitive guide[M].Hadoop:The Definitive Guide.Yahoo!Press,2015:1-4.
    [8]Chandy J A.A generalized replica placement strategy to optimize latency in a w ide area distributed storage system[M].Proceedings of the 2008 International Workshop on Data-aw are Distributed Computing(DADC'08),Boston,USA,New York NY,USA,ACM,2008:49-54.
    [9]Ding Y,Lu Y.Automatic data placement and replication in grids[C].High Performance Computing(Hi PC),2009 International Conference on.IEEE,2009:30-39.
    [10]Qu Y,Xiong N.RFH:a resilient,fault-tolerant and high-efficient replication algorithm for distributed cloud storage[C].International Conference on Parallel Processing,IEEE,2012:520-529.
    [11]Lin Wei-wei.An improved data placement strategy for hadoop[J].Journal of South China University of Technology,2012,40(1):152-158.
    [12]Ibrahim I A,Wei D,Bassiouni M.Intelligent data placement mechanism for replicas distribution in cloud storage systems[C].IEEE International Conference on Smart Cloud,IEEE Computer Society,2016:134-139.
    [13]Wei Q,Veeravalli B,Gong B,et al.CDRM:a cost-effective dynamic replication management scheme for cloud storage cluster[C].IEEE International Conference on CLUSTER Computing.IEEE,2010:188-196.
    [14]Deng J L.Control problems of grey systems[J].Systems&Control Letters,1982,1(5):288-294.
    [15]Tao Yong-cai,Zhang Ning-ning,Shi Lei,et al.Researching on dynamic management of data replicas of cloud computing in heterogeneous environments[J].Journal of Chinese Computer Systems,2013,34(7):1487-1492.
    [16]Lin Chang-hang,Guo Wen-zhong,Chen Huang-ning.Node-capability-aimed data distribution strategy in heterogeneous hadoop cluster[J].Journal of Chinese Computer Systems,2015,36(1):83-88.
    [17]Xun Ya-ling,Zhang Ji-fu,Qin Xiao.Data placement strategy for M apReduce cluster environment[J].Journal of Softw are,2015,26(8):2056-2073.
    [18]Qu K,Meng L,Yang Y.A dynamic replica strategy based on M arkov model for hadoop distributed file system(HDFS)[C].Cloud Computing and Intelligence Systems(CCIS),2016 4th International Conference on.IEEE,2016:337-342.
    [5]徐婧,杨寿保,王淑玲,等.CDRS:云存储中一种代价驱动的自适应副本策略[J].中国科学院大学学报,2011,28(6):759-767.
    [11]林伟伟.一种改进的Hadoop数据放置策略[J].华南理工大学学报:自然科学版,2012,40(1):152-158.
    [15]陶永才,张宁宁,石磊,等.异构环境下云计算数据副本动态管理研究[J].小型微型计算机系统,2013,34(7):1487-1492.
    [16]林常航,郭文忠,陈煌宁.针对Hadoop异构集群节点性能的数据分配策略[J].小型微型计算机系统,2015,36(1):83-88.
    [17]荀亚玲,张继福,秦啸.MapReduce集群环境下的数据放置策略[J].软件学报,2015,26(8):2056-2073.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700