Spark Streaming框架下的气象自动站数据实时处理系统
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Real-time processing system for automatic weather station data on Spark Streaming architecture
  • 作者:赵文芳 ; 刘旭林
  • 英文作者:ZHAO Wenfang;LIU Xulin;Information Center,Beijing Meteorological Bureau;Observation Center,Beijing Meteorological Bureau;
  • 关键词:气象自动站 ; Spark ; Streaming ; 流计算 ; 气象数据处理 ; Flume
  • 英文关键词:Automatic Weather Station(AWS);;Spark Streaming;;stream computing;;meteorological data processing;;Flume
  • 中文刊名:JSJY
  • 英文刊名:Journal of Computer Applications
  • 机构:北京市气象信息中心;北京市气象探测中心;
  • 出版日期:2018-01-10
  • 出版单位:计算机应用
  • 年:2018
  • 期:v.38;No.329
  • 基金:中国气象局公益性行业科研专项基金资助项目(201206031)~~
  • 语种:中文;
  • 页:JSJY201801009
  • 页数:7
  • CN:01
  • ISSN:51-1307/TP
  • 分类号:44-49+61
摘要
针对现有气象自动站业务平台面临处理数据不及时、交互式响应慢、统计时效差等问题,提出了使用Spark Streaming技术和HBase解决该问题的方法,将实时计算框架和分布式数据库系统结合起来实现大规模流式数据处理。使用Flume收集自动站数据,Spark Streaming对数据进行流式处理并存储到HBase数据库中,并设计Spark框架下的自动站数据流式入库处理算法和要素极值的实时统计算法,在Cloudera平台下实现了一个高速可靠的实时采集、处理、统计的应用系统。通过对比分析和性能监测,验证了该系统具有低延迟和高吞吐量的优势,运行状况良好,负载均衡。实验结果表明,Spark Streaming用于气象自动站的实时业务处理,数据并行写入HBase、基于HBase的查询和各类要素统计均能达到毫秒级响应,完全能满足自动站数据的应用需求,有效地支撑天气预报业务。
        Aiming at these problems of the current data service of Automatic Weather Stations( AWS), including data processing delay, slow interactive response, and low statistical efficiency, a new method based on Spark Streaming and HBase technologies was proposed and introduced to process massive streaming AWS data by integrating stream computing framework and distributed database system. Flume was used for data collection, and data processing was conducted by Spark Streaming and data were stored into HBase. In framework of Spark, two algorithms, one for writing streaming AWS data into HBase database, the other for realizing real-time statistical calculation of different observed AWS meteorological elements were designed. Finally, a stable and high-efficient system for real-time acquisition, processing, and statistics of AWS data was developed on Cloudera platform. Based on comparative analysis and running monitoring, performances of the system were confirmed, including low latency, high I/O efficiency, stable running status and excellent load balance. The experimental results show that the response time of Spark Streaming-based real-time operational processing for AWS data can reach to millisecond level, which includes paralleled data writing into HBase, HBase-based data query and statistics on different meteorological elements. The system can fully meet needs of operational applications to AWS data, and provides effective support to weather forecast.
引文
[1]田兰,金石声,李波,等.基于XML和正则表达式的气象数据处理系统[J].计算机科学,2013,40(11A):432-435.(TIAN L,JIN S S,LI B,et al.Processing system of meteorological data based on XML and regular expression[J].Computer Science,2013,40(11A):432-435.)
    [2]李峰,秦世广,周薇,等.综合气象观测运行监控业务及系统升级设计[J].气象科技,2014,42(4):539-544.(LI F,QIN S G,ZHOU W,et al.Upgrading design of integrated atmospheric observing monitoring operation and system platform[J].Meteorological Science and Technology,2014,42(4):539-544.)
    [3]钱峥,曹艳艳,赵科科,等.私有云在市级气象业务平台的实现与应用[J].气象科技,2014,42(4):641-646.(QIAN Z,CHAO YY,ZHAO K K,et al.Implementation and application of private cloud in municipal-level meteorological operation platform[J].Meteorological Science and Technology,2014,42(4):641-646.)
    [4]ZHAO S,YANG X,LI X,et al.A Hadoop-based visualization and diagnosis framework for earth science data[C]//Proceedings of the2015 IEEE International Conference on Big Data.Piscataway,NJ:IEEE,2015:1972-1977.
    [5]DUFFY D Q,SCHNASE J L,THOMPSON J H,et al.Preliminary evaluation of MapReduce for high-performance climate data analysis[EB/OL].[2016-04-08].https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20120009187.pdf.
    [6]KARUN A K,CHITHARANJAN K.A review on Hadoop-HDFS infrastructure extensions[C]//Proceedings of the 2013 IEEE Conference on Information and Communication Technologies.Piscataway,NJ:IEEE,2013:132-137.
    [7]VORA M N.Hadoop-HBase for large-scale data[C]//Proceedings of the 2011 International Conference on Computer Science and Network Technology.Piscataway,NJ:IEEE,2011:601-605.
    [8]曾沁,李永生.基于分布式计算框架的风暴三维追踪方法[J].计算机应用,2017,37(4):941-944.(ZENG Q,LI Y S.Three dimensional storm tracking method based on distributed computing architecture[J].Journal of Computer Applications,2017,37(4):941-944.)
    [9]李英俊,韩雷.基于三维雷达图像数据的风暴体追踪算法研究[J].计算机应用,2008,28(4):1078-1080.(LI Y J,HAN L.Storm tracking algorithm development based on the three-dimensional radar image data[J].Journal of Computer Applications,2008,28(4):1078-1080.)
    [10]郑芳,许先斌,向冬冬,等.基于GPU的GRAPES数值预报系统中RRTM模块的并行化研究[J].计算机科学,2012,39(6A):370-374.(DENG F,XU X B,XIANG D D,et al.GPU-based parallel researches on RRTM module of GRAPES numerical prediction system[J].Computer Science,2012,39(6A):370-374.)
    [11]吴石磊,安虹,李小强,等.组网雷达估测降水系统并行化方案的设计与实现[J].计算机科学,2012,39(3):271-275.(WU SL,AN H,LI X Q,et al.Parallel program design and implementation on precipitation program of networking weather radar system[J].Computer Science,2012,39(3):271-275.)
    [12]杨润芝,沈文海,肖卫青,等.基于MapReduce计算模型的气象资料处理调优试验[J].应用气象学报,2014,25(5):618-627.(YANG R Z,SHEN W H,XIAO W Q,et al.A set of MapReduce tuning experiments based on meteorological operations[J].Journal of Applied Meteorological Science,2014,25(5):618-627.)
    [13]陈东辉,曾乐,梁中军,等.基于HBase的气象地面分钟数据分布式存储系统[J].计算机应用,2014,34(9):2617-2621.(CHEN D H,ZENG L,LIANG Z J,et al.HBase-based distributed storage system for meteorological ground minute data[J].Journal of Computer Applications,2014,34(9):2617-2621.)
    [14]薛胜军,刘寅.基于Hadoop的气象信息数据仓库建立与测试[J].计算机测量与控制,2012,20(4):926-929.(XUE S J,LIUY.Establishment and test of meteorological data warehouse based on Hadoop[J].Computer Measurement and Control,2012,20(4):926-929.)
    [15]薛胜军,周天波,周天杰.基于Hadoop的气象云储存与数据处理应用浅析[J].数字技术与应用,2012,15(5):82-84.(XUE SJ,ZHOU T B,ZHOU T J.Analysis of meteorological cloud storage and data processing based on Hadoop[J].Digital Technology&Application,2012,15(5):82-84.)
    [16]杨锋,吴华瑞,朱华吉,等.基于Hadoop的海量农业数据资源管理平台[J].计算机工程,2011,37(12):222-224.(YANG F,WU H R,ZHU H J,et al.Massive agricultural data resource management platform based on Hadoop[J].Computer Engineering,2011,37(12):222-224.)
    [17]熊安元,赵芳,王颖,等.全国综合气象信息共享系统的设计与实现[J].应用气象学报,2015,26(4):500-513.(XIONG A Y,ZHAO F,WANG Y,et al.Design and implementation of China integrated meteorological information sharing system[J].Journal of Applied Meteorological Science,2015,26(4):500-513.)
    [18]BHARDWAJ A,VANRAJ,KUMAR A,et al.Big data emerging technologies:a Case Study with analyzing twitter data using apache hive[C]//Proceedings of the 2015 2nd International Conference on Recent Advances in Engineering&Computational Sciences.Piscataway,NJ:IEEE,2015:1-6.
    [19]王金柱,李元诚.MD5算法在J2EE平台下用户管理系统中的应用[J].计算机工程与设计,2008,29(18):4728-4764.(WANG J Z,LI Y C.Application of MD5 algorithm based on J2EE in user management system[J].Computer Engineering and Design,2008,29(18):4728-4764.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700