基于海量数据的HBase写入性能测试与优化

英文篇名：Testing and optimization of HBase writing performance based on massive data
作者：青欣 ; 文伟军 ; 金星 ; 姜镇
英文作者：QING Xin;WEN Wei-jun;JIN Xing;JIANG Zhen;75837 Troops;
关键词：MapReduce ; Hadoop ; HBase ; 海量数据
英文关键词：MapReduce;;Hadoop;;HBase;;massive data
中文刊名：DNZS
英文刊名：Computer Knowledge and Technology
机构：75837部队;
出版日期：2019-02-25
出版单位：电脑知识与技术
年：2019
期：v.15
语种：中文;
页：DNZS201906005
页数：5
CN：06
ISSN：34-1205/TP
分类号：15-19

摘要

HBase解决了大规模数据的结构化存储和实时的随机读写访问,但HBase提供的API在大规模数据批量写入等方面存在着性能瓶颈,不能很好地满足应用需求。本文提出了基于MapReduce架构实现HBase的性能优化方案,并设计了分布式程序进行验证,实验表明在海量数据应用条件下采用MapReduce计算框架能够利用HBase集群的计算性能,相比传统的单线程和多线程数据写入方式具有更好的实用性和有效性,同时结合这三类数据写入方式的性能特征提出了以写入数据量为依据的选择策略。
HBase solves the structured storage of massive data and real-time random read and write access.But,There is a performance bottleneck of HBase API in large scale data batch write,and it cannot meet the demands of application.This paper realized performance optimization of HBase based on MapReduce architecture,and designs the distributed programs.The Experiments show that in the massive data application condition,MapReduce can take the advantage of the calculating capacity of HBase cluster,and more practical and effective than traditional single thread and multi-thread data writing method.Combined with The performance characteristics of the three types of data write mode,this paper proposed a selection policy based on data amount.

引文

[1]Armbrust M,Fox A,Griffith R,et al.A view of cloud computing[J].Communications of the ACM,2010,53(4):50-58.
    [2]陈全,邓倩妮.云计算及其关键技术[J].计算机应用,200929(9):2562-2567.
    [3]George L.HBase:the definitive guide[M].O'Reilly Media,Incorporated,2011.
    [4]HBase[EB/OL],https://zh.wikipedia.org/wiki/HBase,2013-04-10.
    [5]Hadoop[EB/OL],https://zh.wikipedia.org/wiki/HBase,2013-04-10.
    [6]Chang F,Dean J,Ghemawat S,et al.Bigtable:A distributed storage system for structured data[J].ACM Transactions on Computer Systems(TOCS),2008,26(2):4.
    [7]Dean J,Ghemawat S.MapReduce:simplified data processing on large clusters[J].Communications of the ACM,2008,51(1):107-113.
    [8]李明,胥光辉,戢瑶.MapReduce编程模型在网络I/O密集型程序中的应用研究[J].计算机应用研究,2011,28(9):3372-3374.
    [9]http://hadoop.apache.org/docs/r0.20.0/releasenotes.html,2011-03-13
    [10]https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753&version=12314223,2013-03-13.
    [11]Moise D,Shestakov D,Gudmundsson G T,et al.Indexing and Searching 100M Images with Map-Reduce[C].ACM International Conference on Multimedia Retrieval.2013.
    [12]杜晓东.大数据环境下基于Hbase的分布式查询优化研究[J].计算机光盘软件与应用,2014(8):22-24.
    [13]王海豹.基于Hadoop架构的数据共享模型研究[D].北京工业大学,2013.
    [14]彭宇,庞景月,刘大同,等.大数据:内涵、技术体系与展望[J].电子测量与仪器学报,2015(4):469-482.
    [15]孙知信,黄涵霞.基于云计算的数据存储技术研究[J].南京邮电大学学报(自然科学版),2014,34(4):13-19.
    [16]刘晓静.基于HBase的海量小视频存储与检索系统的研究与实现[D].西安电子科技大学,2014.
    [17]Lars G.HBase:the definitive guide:[random access to your planet-size data][J].2011.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700