天地一体化网络中基于HDFS的元数据优化策略
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Metadata Optimization Strategy Based on HDFS in Integrated Space-ground Network
  • 作者:王坤 ; 杨杨 ; 邱雪松
  • 英文作者:WANG Kun;YANG Yang;QIU Xuesong;Collaborative Innovation Center of Trusted Cyber Communications,Beijing University of Posts and Telecommunications;
  • 关键词:Hadoop ; HDFS ; 元数据管理 ; 扩展性 ; 内存映射文件
  • 英文关键词:Hadoop;;HDFS;;metadata management;;expansibility;;memory mapped file
  • 中文刊名:WXDT
  • 英文刊名:Radio Communications Technology
  • 机构:北京邮电大学可信网络通信协同创新中心;
  • 出版日期:2017-12-29
  • 出版单位:无线电通信技术
  • 年:2018
  • 期:v.44;No.261
  • 基金:北京邮电大学可信网络通信协同创新中心预研基金项目;; 中央高校基本科研业务费专项资金项目;; 国家科技支撑计划项目(2015BAI11B01)
  • 语种:中文;
  • 页:WXDT201801002
  • 页数:5
  • CN:01
  • ISSN:13-1099/TN
  • 分类号:13-17
摘要
Hadoop分布式文件系统(HDFS)是Hadoop的核心之一,已经广泛应用于天地一体化网络数据的存储。但由于HDFS存储和管理的数据容量受限于命名节点(Name Node)的内存大小,其扩展性受到制约。针对Name Node管理元数据时存在的加载文件系统镜像(FSImage)时间过长、容量受内存大小限制等问题,提出将HDFS层级化的元数据结构调整为扁平化结构,并将元数据移出内存的优化思路,设计了基于日志结构合并树(Log-Structured Merge-Tree,LSM)与内存映射文件进行元数据管理的F-HDFS架构,并介绍了F-HDFS的元数据管理方式。通过F-HDFS的原型系统与HDFS的对比实验,表明F-HDFS性能整体优于HDFS,可提供稳定快速的元数据服务,能存储与管理超过HDFS 5.3倍以上的数据。
        Hadoop distributed file system(HDFS) is one of the cores of Hadoop.It has been widely used in data storage of integrated space and terrestrial information network.However,the scalability of HDFS is limited by the memory size of the Name Node.In order to solve the problem of long time when loading file system mirror(FSImage) to Name Node memory and the problem of capacity restricted by memory size,F-HDFS is designed by adjusting the HDFS hierarchical metadata structure to flat structure and moving metadata out of memory.The design of F-HDFS is based on log structured merge tree and memory mapped files.Through the contrast experiment of F-HDFS prototype system and HDFS,it's proved that the performance of F-HDFS is better than HDFS in general,and it can provide stable and fast metadata service,and can store and manage more than 5.3 times more data than HDFS.
引文
[1]Haddad I F.PVFS:A Parallel Virtual File System for Linux Clusters[J].Linux Journal,2000,2000(80es):5.
    [2]Bai S,Wu H.The Performance Study on Several Distributed File Systems[C]∥International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery.IEEE,2011:226-229.
    [3]Ghemawat S,Gobioff H,Leung S T.The Google File System[C]∥Nineteenth ACM Symposium on Operating Systems Principles.ACM,2003:29-43.
    [4]Shafer J,Rixner S,Cox A L.The Hadoop Distributed Filesystem:Balancing Portability and Performance[C]∥Performance Analysis of Systems&Software(ISPASS).2010 IEEE International Symposium on.IEEE,2010:122-133.
    [5]Shvachko K V.HDFS Scalability:the Limits to Growth[J].login:the Magazine of USENIX&SAGE,2010,35:6-16.
    [6]O'Neil P,Cheng E,Gawlick D,et al.The Log-structured Merge-tree(LSM-tree)[J].Acta Informatica,1996,33(4):351-385.
    [7]Chang F.Bigtable:A Distributed Storage System for Structured Data[J].ACM Transactions on Computer Systems(TOCS),2006,26(2):205-218.
    [8]Song N Y,Son Y,Han H,et al.Efficient Memory-mapped i/o on Fast Storage Device[J].ACM Transactions on Storage(TOS),2016,12(4):19.
    [9]Kumar A,Xu J,Wang J.Space-code Bloom Filter for Efficient Per-flow Traffic Measurement[J].IEEE Journal on Selected Areas in Communications,2006,24(12):2327-2339.
    [10]Run A K,Chitharanjan K.A review on hadoop—HDFS infrastructure extensions[C]∥Information&Communication Technologies.IEEE,2013:132-137.
    [11]Hadoop Benchmarking[EB/OL].http:∥hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Benchmarking.html.
    [12]Load Generator[EB/OL].http:∥hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/SLGUser Guide.html.
    [13]Dev D,Patgiri R.Dr.Hadoop:an Infinite Scalable Metadata Management for Hadoop-How the Baby Elephant Becomes Immortal[J].Frontiers of Information Technology&Electronic Engineering,2016,17(1):15-31.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700