配电网监测大数据的Impala快速查询技术
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Fast query technology for monitoring big data of distribution network based on Impala
  • 作者:屈志坚 ; 陈鼎龙 ; 巩奇
  • 英文作者:QU Zhi-jian;CHEN Ding-long;GONG Qi;School of Electrical and Automation Engineering,East China Jiaotong University;Zhengzhou Rail Transit Co.Ltd.;
  • 关键词:配电网大数据 ; 分布式存储 ; Impala ; MPP ; 快速查询
  • 英文关键词:big data of distribution network;;distributed storage;;Impala;;MPP;;fast query
  • 中文刊名:CSDL
  • 英文刊名:Journal of Electric Power Science and Technology
  • 机构:华东交通大学电气与自动化工程学院;郑州市轨道交通有限公司;
  • 出版日期:2018-06-28
  • 出版单位:电力科学与技术学报
  • 年:2018
  • 期:v.33;No.121
  • 基金:国家自然科学基金(51267005;51567008);; 江西省自然科学基金(20161BAB206156);; 江西省杰出青年人才计划项目(20162BCB23045)
  • 语种:中文;
  • 页:CSDL201802021
  • 页数:9
  • CN:02
  • ISSN:43-1475/TM
  • 分类号:150-158
摘要
针对目前配电网监测大数据SQL交互查询速度慢的问题,对配电网监测数据类型进行归类整理,利用Impala分布式处理工具重点研究一种监测大数据的MPP快速查询技术。通过协调节点将查询计划解析为执行计划树,将计划树的片段分配至多个从节点并行执行,各从节点将中间结果按执行计划树流式传递回协调节点,再通过多机集群的全内存并行执行加速查询。选用四机监控系统集群为例进行加载测试和查询性能测试,结果表明:相较关系数据库,MPP大数据快速查询技术大幅提高了数据加载速度。对北京某动车段配电监测的千万级数据记录,关系数据库和Hive数据仓库至少都需94s以上,而MPP快速查询仅需约320ms,查询性能提升近3个数量级,大幅提高了监测大数据的查询处理速度。
        Regarding to the low speed of SQL interactive query for monitoring big data in distribution network,the type of monitoring data is classified and an MPP fast query technology is proposed by using impala which is a distributed processing tool.Firstly,the query plan is parsed into an execution plan tree by the coordinate node.Next,the fragments of plan tree are delivered to multiple slave nodes and executed in parallel.Then,the results are steaming transferred from slave nodes to coordinate nodes according to the plan tree.Finally,the query speed is accelerated by performing the full memory of a machine cluster parallely.Besides,a monitoring system composed of four computers is simulated to verify this technology.It is shown that the loading speed of MPP big data querying technology is much faster than the relational database.For ten millions monitoring data records of a Beijing Motor Car Depot,MPP query technology only needs about 320 ms,while relational database and hive data warehouse need at least 94 s.The query processing speed of monitoring big data is well improved.
引文
[1]王守相,葛磊蛟,王凯.智能配电系统的内涵及其关键技术[J].电力自动化设备,2016,36(6):1-6.WANGShou-xiang,GE Lei-jiao,WANG Kai.Main contents and key technologies of snart distribution system[J].Electric Power Automation Equipment,2016,36(6):1-6.
    [2]于恒友,刘波,彭子平.基于HBase的输电线路综合数据存储方案设计[J].电力科学与技术学报,2014,29(2):58-64.YUHeng-you,LIU Bo,PENG Zi-ping.A storage solution for the comprehensive data of transmission lines based on HBase[J].Journal of Electric Power Science and Technology,2014,29(2):58-64.
    [3]苗新,张东霞,孙德栋.在配电网中应用大数据的机遇与挑战[J].电网技术,2015,39(11):3 122-3 127.MIAO Xin,ZHANG Dong-xia,SUN De-dong.The opportunity and challenge of big data’s application in power distribution networks[J].Power System Technology,2015,39(11):3 122-3 127.
    [4]宋亚奇,周国亮,朱永利,等.云平台下输变电设备状态监测大数据存储优化与并行处理[J].中国电机工程学报,2015,35(2):255-267.SONG Ya-qi,ZHOU Guo-liang,ZHU Yong-li,et al.Storage optimization and parallel processing of condition monitoring big data of transmission and transforming equipment based on cloud platform[J].Proceedings of the CSEE,2015,35(2):255-267.
    [5]张沛,吴潇雨,和敬涵.大数据技术在主动配电网中的应用综述[J].电力建设,2015,36(1):52-59.ZHANG Pei,WU Xiao-yu,HE Jing-han.Review on big data technology applied in active distribution network[J].Electric Power Construction,2015,36(1):52-59.
    [6]宋亚奇,周国亮,朱永利.智能电网大数据处理技术现状与挑战[J].电网技术,2013,37(4):927-935.SONGYa-qi,ZHOU Guo-liang,ZHU Yong-li.Present status and challenges of big data processing in smart grid[J].Power System Technology,2013,37(4):927-935.
    [7]Marie-Luce PICARD,潘旭阳.法国公共电力企业的视角看大数据带来的挑战和机遇[J].电网技术,2015,39(11):3 109-3 113.Marie-Luce PICARD,PAN Xu-yang.Big data challenges and opportunities for a utility as EDF[J].Power System Technology,2015,39(11):3 109-3 113.
    [8]王远,陶烨,蒋英明,等.智能电网时序大数据实时处理系统[J].计算机应用,2015,35(Z2):88-92.WANG Yuan,Tao Ye,JIANG Ying-ming,et al.Realtime processing system for time series big data in smart power grid[J].Journal of Computer Applications,2015,35(S2):88-92.
    [9]田世明,杨增辉,时志雄,等.智能配用电大数据关键技术研究[J].供用电,2015,32(8):12-18.TIAN Shi-ming,YANG Zeng-hui,SHI Zhi-xiong,et al.Research on the key technology of big data for smart power distribution and utilization[J].Distribution&Utilization,2015,32(8):12-18.
    [10]王相伟,史玉良,张建林,等.基于Hadoop的用电信息大数据计算服务及应用[J].电网技术,2015,39(11):3 128-3 133.WANG Xiang-wei,SHI Yu-liang,ZHANG Jian-lin,et al.Computation services and application of electricity big data based on Hadoop[J].Power System Technology,2015,39(11):3 128-3 133.
    [11]曲广龙,杨洪耕,张逸.采用Map-Reduce模型的海量电能质量数据交换格式文件快速解析方案[J].电网技术,2014,38(6):1 705-1 711.QU Guang-long,YANG Hong-geng,ZHANG Yi.A fast parallel parsing scheme for massive PQDIF files with Map-Reduce model[J].Power System Technology,2014,38(6):1 705-1 711.
    [12]王德文,肖凯,肖磊.基于Hive的电力设备状态信息数据仓库[J].电力系统保护与控制,2013,41(9):125-130.WANG De-wen,XIAO Kai,XIAO Lei,Data warehouse of electric power equipment condition information based on Hive[J].Power System Protection and Control,2013,41(9):125-130.
    [13]陈龙,万定生,顾昕辰.基于Hive的水利普查数据仓库[J].计算机与现代化,2014(5):127-130.CHEN Long,WAN Ding-sheng,GU Xin-chen.Water census data warehouse based on Hive[J].Computer and Modernization,2014(5):127-130.
    [14]陈达伦,陈荣国,谢炯.基于MPP架构的并行空间数据库原型系统的设计与实现[J].地球信息科学学报,2016(2):151-159.CHENDa-lun,CHEN Rong-guo,XIE Jiong.Research of the parallel spatial database proto system based on MPP architecture[J].Journal of Geo-information Science,2016(2):151-159.
    [15]Cloudera Inc.,Cloudera Impala User Guide[EB/OL].http://www.cloudera.com/documentation/cdh/5-1-x/Impala/impala.html,2015-09-03.
    [16]于雯.大连电网配电自动化系统的应用研究[D].大连:大连理工大学,2014.
    [17]张雨,蔡鑫,李爱民,等.分布式文件系统与MPP数据库的混搭架构在电信大数据平台中的应用[J].电信科学,2013(11):12-16.ZHANG Yu,CAI Xin,LI Ai-min,et al.Application of distribute file system&MPP database mashup architecture in telecom big data platform[J].Telecommunications Science,2013(11):12-16.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700