用户名: 密码: 验证码:
一种面向HDFS的数据随机访问方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Data random access method oriented to HDFS
  • 作者:李强 ; 孙震宇 ; 孙功星
  • 英文作者:LI Qiang;SUN Zhenyu;SUN Gongxing;Institute of High Energy Physics, Chinese Academy of Sciences;University of Chinese Academy of Sciences;
  • 关键词:Hadoop分布式文件系统 ; 随机访问 ; 权限管理
  • 英文关键词:Hadoop Distributed File System;;random access;;permission management
  • 中文刊名:JSGG
  • 英文刊名:Computer Engineering and Applications
  • 机构:中国科学院高能物理研究所;中国科学院大学;
  • 出版日期:2017-05-15
  • 出版单位:计算机工程与应用
  • 年:2017
  • 期:v.53;No.881
  • 基金:国家自然科学基金(No.11375223,No.11375221)
  • 语种:中文;
  • 页:JSGG201710001
  • 页数:7
  • CN:10
  • 分类号:6-12
摘要
为了简化文件系统的实现,支持超大规模数据集的流式访问,HDFS牺牲了文件的随机访问功能,而在实际场景中很多应用都需要对文件进行随机访问。在深入分析HDFS数据读写原理的基础上,提出了一种面向HDFS的数据随机访问方法。其设计思想是为Datanode添加本地数据访问接口,用户程序可以读取Datanode上存放的数据块文件以及把数据写入到Datanode上的数据块存放目录。文件的首副本由用户程序直接产生,其余副本在首副本写入完成之后采用数据复制的方式生成。此外,为数据块添加了权限管理功能,Datanode上的文件副本属于用户所有。若名字空间中文件权限发生变化,文件对应的数据块权限也会改变。测试表明,数据读取性能提升了约10%,数据写入性能提升了20%以上,在高并发下写入性能最大可提升2.5倍。
        In order to simplify the realization of the file system, HDFS sacrifices the file's random access feature to support streaming access for large data set. But in the actual scene, many applications require random access to the file. After indepth analysis of HDFS data reading and writing principle, a data random access method oriented to HDFS is proposed.The idea is to add data access interface for Blocks on Datanode, the user program can read the Block file stored on the Datanode and write the data to the Block storage directory. The first file replica is written to the local Datanode by user program, the rest replicas produced by copy of the first replica stored on other Datanodes. In addition, add the permissions management for Block, the file replicas stored on Datanodes belongs to the user. If the file permissions changed in the namespace, the Block permissions also changed. Test results show that data read and write performance is improved about10% and 20% separately, the write performance can be increased by 2.5 times under the high concurrency.
引文
[1]Apache.HADOOP[EB/OL].[2016-12-28].http://HADOOP.apache.org.
    [2]Apache Hadoop HDFS Architecture Guide[EB/OL].(2016-12-22).http://hadoop.apache.org/docs/r1.0.4/hdfs_design.html.
    [3]Ghemawat S,Gobioff H,Leung S T.The Google file system[C]//ACM SIGOPS Operating Systems Review.New York,NY,USA:ACM Press,2003:29-43.
    [4]The ROOT Team.ROOT[EB/OL].(2016-12-25).http://root.cern.ch.
    [5]Glaser F,Neukirchen H,Rings T,et al.Using Map Reduce for high energy physics data analysis[C]//2013 IEEE 16th International Conference on Computational Science and Engineering(CSE).IEEE Computer Society.Ho Chi Minh City,Vietnam:IEEE Press,2013:1271-1278.
    [6]Gao Zhipeng,Qin Yinghao,Niu Kun.An effective merge strategy based hierarchy for improving small file problem on HDFS[C]//2016 4th International Conference on Cloud Computing and Intelligence Systems(CCIS),2016.
    [7]Meng B,Guo W,Fan G,et al.A novel approach for efficient accessing of small files in HDFS:TLB-Map File[C]//Ieee/acis International Conference on Software Engineering,Artificial Intelligence,NETWORKING and Parallel/distributed Computing,2016:681-686.
    [8]Mu Q,Jia Y,Luo B,et al.The optimization scheme research of small files storage based on HDFS[C]//International Symposium on Computational Intelligence and Design,2015:431-434.
    [9]Liu C.An improved HDFS for small file[C]//International Conference on Advanced Communication Technology,2016.
    [10]Jain B,Agarwal S.Application research of disk space utilization of HDFS and real time trouble shooting to maintain well balanced cluster[C]//International ConferenceCloud System and Big Data Engineering.IEEE,2016:378-383.
    [11]卢美莲,朱亮亮.基于CMM模型的HDFS负载均衡策略[J].北京邮电大学学报,2014,37(5):20-25.
    [12]Bui D M,Huynh-The T,Lee S,et al.Replication management framework for HDFS based on prediction technique[C]//International Conference on Advanced Cloud&Big Data.IEEE,2015:58-63.
    [13]席屏,薛峰.多层一致性哈希的HDFS副本放置策略[J].计算机系统应用,2015,24(2):127-133.
    [14]Patel Neha M,Patel Narendra M,Hasan M I,et al.Improving HDFS write performance using efficient replica placement[C]//International Conference-Confluence the Next Generation Information Technology Summit,2014:36-39.
    [15]He J,Chen H,Hu F.ERP:An enhanced read policy for HDFS to improve read performance for files under construction[C]//IEEE International Conference on Progress in Informatics and Computing.Nanjing,China.IEEE,2015.
    [16]Wang B,Chen L.Analysis and application of mechanism of Hadoop RPC communiucation[J].Journal of Xi’an University of Posts&Telecommunications,2012.
    [17]Liang,Sheng.Java(TM)Native Interface:Programmer’s guide and specification[M].[S.l.]:Prentice Hall,1999.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700