基于流式计算的DPI数据处理方案及实践
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Scheme and Practice on DPI Data Processing Based on Stream Computing
  • 作者:范家杰 ; 田熙清 ; 郑博
  • 英文作者:FAN Jiajie;TIAN Xiqing;ZHENG Bo;Guangdong Research Institute of China Telecom Co., Ltd.;China Telecom Co., Ltd.,Guangdong Branch;
  • 关键词:DPI ; 流式计算 ; 数据处理
  • 英文关键词:DPI;;stream computing;;data processing
  • 中文刊名:YDTX
  • 英文刊名:Mobile Communications
  • 机构:中国电信股份有限公司广州研究院;中国电信股份有限公司广东分公司;
  • 出版日期:2018-01-15
  • 出版单位:移动通信
  • 年:2018
  • 期:v.42;No.455
  • 语种:中文;
  • 页:YDTX201801019
  • 页数:7
  • CN:01
  • ISSN:44-1301/TN
  • 分类号:87-93
摘要
如何对海量的DPI数据进行实时的采集以及处理是运营商研究的热点,传统基于MapReduce的批处理模式难以满足流式计算实时性要求,因此首先介绍了流式处理相关概念,然后分析了目前流行的流式计算技术,提出一种基于流式计算的DPI数据处理方案,并应用在实际项目中,满足电信运营商对数据处理实时性的要求,最后通过实践总结了流式处理的应用场景。
        How to collect and process the massive DPI data in real time is the hotspot of telecom operators. The traditional batch mode Map Reduce is difficult to meet the real-time requirements based on stream computing. Therefore, the related concepts of stream computing were introduced. Then, popular stream computing technologies were analyzed to propose a DPI data processing scheme based on stream computing which is applied to the practical projects to meet the real-time requirements of data processing for operators. Finally, the application scenario of stream processing was summarized by the practice.
引文
[1]Zikopoulos P,Eaton C.Understanding Big Data:Analytics for Enterprise Class Hadoop and Streaming Data[M].Mc Graw-Hill Osborne Media,1989.
    [2]陈康,付华峥,陈翀,等.基于DPI的用户兴趣实时分类[J].电信科学,2016,32(12):109-115.
    [3]孙大为,张广艳,郑纬民.大数据流式计算:关键技术及系统实例[J].软件学报,2014,25(4):839-862.
    [4]董斌,杨迪,王铮,等.流计算大数据技术在运营商实时信令处理中的应用[J].电信科学,2015,31(10):165-171.
    [5]Marz N,Warren J.Big Data:Principles and best practices of scalable realtime data systems[M].Manning,2015.
    [6]李祥池.基于ELK和Spark Streaming的日志分析系统设计与实现[J].电子科学技术,2015,2(6):674-678.
    [7]李圣,黄永忠,陈海勇.大数据流式计算系统研究综述[J].信息工程大学学报,2016,17(1):88-92.
    [8]姚仁捷.Kafka在唯品会的应用实践[J].程序员,2014(1):110-113.
    [9]郝璇.基于Apache Flume的分布式日志收集系统设计与实现[J].软件导刊,2014(7):110-111.
    [10]Spark.Spark Streaming Programming Guide[EB/OL].[2017-09-14].http://spark.apache.org/docs/latest/streaming-programming-guide.html.
    [11]Storm.Apache Storm[EB/OL].[2017-09-14].http://storm.apache.org/index.html.
    [12]Kafka Stream.Kafka Streams API[EB/OL].[2017-09-14].http://kafka.apache.org/documentation/streams/.
    [13]Flink.Introduction to Apache Flink?[EB/OL].[2017-09-14].https://flink.apache.org/introduction.html.
    [14]Pipeline DB.The Streaming SQL Database[EB/OL].[2017-09-14].https://www.pipelinedb.com/.
    [15]Apache Flume?.Apache Flume?[EB/OL].[2017-09-14].http://flume.apache.org/index.html.
    [16]Kafka Stream.ETL[EB/OL].[2017-09-14].https://github.com/styg/bumblebee-ETL.
    [17]Kafka Stream.Kafka Manager[EB/OL].[2017-09-14].https://github.com/yahoo/kafka-manager.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700