摘要
如何对海量的DPI数据进行实时的采集以及处理是运营商研究的热点,传统基于MapReduce的批处理模式难以满足流式计算实时性要求,因此首先介绍了流式处理相关概念,然后分析了目前流行的流式计算技术,提出一种基于流式计算的DPI数据处理方案,并应用在实际项目中,满足电信运营商对数据处理实时性的要求,最后通过实践总结了流式处理的应用场景。
How to collect and process the massive DPI data in real time is the hotspot of telecom operators. The traditional batch mode Map Reduce is difficult to meet the real-time requirements based on stream computing. Therefore, the related concepts of stream computing were introduced. Then, popular stream computing technologies were analyzed to propose a DPI data processing scheme based on stream computing which is applied to the practical projects to meet the real-time requirements of data processing for operators. Finally, the application scenario of stream processing was summarized by the practice.
引文
[1]Zikopoulos P,Eaton C.Understanding Big Data:Analytics for Enterprise Class Hadoop and Streaming Data[M].Mc Graw-Hill Osborne Media,1989.
[2]陈康,付华峥,陈翀,等.基于DPI的用户兴趣实时分类[J].电信科学,2016,32(12):109-115.
[3]孙大为,张广艳,郑纬民.大数据流式计算:关键技术及系统实例[J].软件学报,2014,25(4):839-862.
[4]董斌,杨迪,王铮,等.流计算大数据技术在运营商实时信令处理中的应用[J].电信科学,2015,31(10):165-171.
[5]Marz N,Warren J.Big Data:Principles and best practices of scalable realtime data systems[M].Manning,2015.
[6]李祥池.基于ELK和Spark Streaming的日志分析系统设计与实现[J].电子科学技术,2015,2(6):674-678.
[7]李圣,黄永忠,陈海勇.大数据流式计算系统研究综述[J].信息工程大学学报,2016,17(1):88-92.
[8]姚仁捷.Kafka在唯品会的应用实践[J].程序员,2014(1):110-113.
[9]郝璇.基于Apache Flume的分布式日志收集系统设计与实现[J].软件导刊,2014(7):110-111.
[10]Spark.Spark Streaming Programming Guide[EB/OL].[2017-09-14].http://spark.apache.org/docs/latest/streaming-programming-guide.html.
[11]Storm.Apache Storm[EB/OL].[2017-09-14].http://storm.apache.org/index.html.
[12]Kafka Stream.Kafka Streams API[EB/OL].[2017-09-14].http://kafka.apache.org/documentation/streams/.
[13]Flink.Introduction to Apache Flink?[EB/OL].[2017-09-14].https://flink.apache.org/introduction.html.
[14]Pipeline DB.The Streaming SQL Database[EB/OL].[2017-09-14].https://www.pipelinedb.com/.
[15]Apache Flume?.Apache Flume?[EB/OL].[2017-09-14].http://flume.apache.org/index.html.
[16]Kafka Stream.ETL[EB/OL].[2017-09-14].https://github.com/styg/bumblebee-ETL.
[17]Kafka Stream.Kafka Manager[EB/OL].[2017-09-14].https://github.com/yahoo/kafka-manager.