基于预测的云计算热点数据副本因子决策算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Dynamic Replicas Strategy Based on Predicted Popularity
  • 作者:张松 ; 杜庆伟 ; 孙静 ; 孙振
  • 英文作者:ZHANG Song;DU Qing-wei;SUN Jing;SUN Zhen;College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics;Unit 94860 of PLA;
  • 关键词:热点数据 ; 副本管理 ; 云计算 ; Hadoop ; 灰色预测 ; 生灭过程
  • 英文关键词:high popular data;;replica management;;cloud computing;;Hadoop;;grey prediction;;birth and death process
  • 中文刊名:JYXH
  • 英文刊名:Computer and Modernization
  • 机构:南京航空航天大学计算机科学与技术学院;中国人民解放军94860部队;
  • 出版日期:2015-03-09 10:02
  • 出版单位:计算机与现代化
  • 年:2015
  • 期:No.234
  • 基金:国家自然科学基金资助项目(61202350)
  • 语种:中文;
  • 页:JYXH201502014
  • 页数:6
  • CN:02
  • ISSN:36-1137/TP
  • 分类号:65-69+75
摘要
为了提高数据的可用性和集群的整体性能,目前的HDFS(Hadoop Distributed File System)采用了副本数目固定的副本放置技术,然而由于文件热度存在较大差异,对那些具有较高热度文件的访问将影响作业的执行。为克服上述问题,本文提出一种基于预测的热点数据副本因子决策算法。根据数据的最近访问特征,基于灰色预测技术,采用马尔科夫预测模型修正因数据波动和突发访问造成的预测偏差,获取文件的未来访问热度,并基于预测值建立有限通道服务模型,寻找满足用户需求的最小副本因子。实验表明,较之现有的副本管理策略和基于实时热度调整副本因子策略,本策略可以有效减少热点数据的访问冲突,减少热点数据作业的执行时间和网络负载。
        To improve data availability and performance of cluster,current HDFS adapt uniform data replication. However,different files have different popularity and sometimes the disparity is enormous,access to high popular data may hurt job performance. To address this problem,a dynamic replicas strategy based on predicted popularity is put forward. By making full use of the recent data popularity,based on grey prediction model,we use Markov prediction model to correct the predicted deviation because of the burst access and shifting access,and get the accurate predicted popularity of file. After then,finite channel service model based on the predicted popularity is established to calculate the minimum replicas meeting user demand. Experimental result shows that compared with default data replication,our strategy can more effectively avoid contentions,reduce the time consuming of job,and alleviated the network traffic.
引文
[1]Wei Qingsong,Veeravalli B,Gong Bozhao,et al.CDRM:A cost-effective dynamic replication management scheme for cloud storage cluster[C]//IEEE International Conference on Cluster Computing(CLUSTER).2010:188-196.
    [2]王意洁,孙伟东,周松,等.云计算环境下的分布存储关键技术[J].软件学报,2012,23(4):962-986.
    [3]White Tom.Hadoop:The Definitive Guide[M].O’Reilly Media,Inc.,2009.
    [4]Kala Karun A,Chitharanjan K.A review on Hadoop—HDFS infrastructure extensions[C]//IEEE Conference on Information&Communication Technologies(ICT).2013:132-137.
    [5]Ananthanarayanan G,Agarwal S,Kandula S,et al.Scarlett:Coping with skewed content popularity in mapreduce clusters[C]//Proceedings of the 6th ACM Conference on Computer Systems.2011:287-300.
    [6]Kousiouris G,Vafiadis G,Varvarigou T.Enabling proactive data management in virtualized Hadoop clusters based on predicted data activity patterns[C]//IEEE 8th International Conference on P2P,Parallel,Grid,Cloud and Internet Computing(3PGCIC).2013:1-8.
    [7]孙大为,常桂然,高尚,等.Modeling a dynamic data replication strategy to increase system availability in cloud computing environments[J].Journal of Computer Science and Technology,2012,27(2):256-272.
    [8]Abad C L,Lu Y,Campbell R H.DARE:Adaptive data replication for efficient cluster scheduling[C]//IEEE International Conference on Cluster Computing(CLUSTER).2011:159-168.
    [9]Wang Zhe,Li Tao,Xiong Naiyue,et al.A novel dynamic network data replication scheme based on historical access record and proactive deletion[J].The Journal of Supercomputing,2012,62(1):227-250.
    [10]Shen Chunhui,Lu Weiming,Wu Jiangqin,et al.A digital library architecture supporting massive small files and efficient replica maintenance[C]//Proceedings of the 10th Annual Joint Conference on Digital Libraries.2010:391-392.
    [11]Li Wendong,Yang Yun,Yuan Dong.A novel cost-effective dynamic data replication strategy for reliability in cloud data centres[C]//IEEE 9th International Conference on Dependable,Autonomic and Secure Computing(DASC).2011:496-502.
    [12]Khaneghah E M,Mirtaheri S L,Grandinetti L,et al.A Dynamic Replication Mechanism to Reduce Response-Time of I/O Operations in High Performance Computing Clusters[C]//IEEE International Conference on Social Computing(Social Com).2013:738-743.
    [13]陶永才,张宁宁,石磊,等.异构环境下云计算数据副本动态管理研究[J].小型微型计算机系统,2013,34(7):1487-1492.
    [14]夏怒,李伟,罗军舟,等.一种基于波动类型识别的路由节点行为预测算法[J].计算机学报,2014,37(2):326-334.
    [15]温祥西,孟相如,马志强,等.小时间尺度网络流量混沌性分析及趋势预测[J].电子学报,2012,40(8):1609-1616.
    [16]周开利,康耀红.神经网络模型及其MATLAB仿真程序设计[M].北京:清华大学出版社,2005.
    [17]Box G E,Jenkins G M,Reinsel G C.Time Series Analysis:Forecasting and Control[M].John Wiley&Sons,2013.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700