网络安全事件OLAP分析中流数据方构建技术的研究与实现

英文题名：Research and Implementation of Stream Cube Construction Techniques for OLAP Analysis of Network Security Incidents
作者：王亦兵
论文级别：硕士
学科专业名称：计算机科学与技术
中文关键词：数据流 ; 数据流立方体 ; OLAP ; 网络安全态势感知
英文关键词：Data Stream ; Data Stream Cube ; OLAP ; Network Security Situation Awareness
学位年度：2010
导师：杨树强
学科代码：081203
学位授予单位：国防科学技术大学
论文提交日期：2010-04-01

摘要

互联网是关键的国家信息基础设施,对互联网网络安全状态的实时监控是确保互联网有序运行的关键,而对互联网网络安全状态的监测分析是对其进行实时控制的前提。联机分析处理技术(OLAP)是一种重要的数据分析技术,可用于互联网网络安全状态分析。OLAP高效数据分析需要数据立方体的支持,网络安全数据的海量性和持续产生性不适合数据立方体的构建,从而制约了OLAP在互联网网络安全状态监测中的应用。
     本文提出了一种基于数据流管理系统的数据立方体构建方法,应用数据流管理系统对互联网网络安全监测数据进行预计算,对数据立方体进行实时增量更新和维护,本文的主要贡献如下:
     1.在对网络安全监测数据的数据流特点以及数据流处理技术、OLAP技术进行了深入分析的基础上,提出了根据时间切片的数据流立方体构建方法,并对该方法的有效性进行了分析。
     2.针对构建立方体过程中大型维表连接非常耗时的问题,提出了改进的表连接技术,并对数据流处理模型进行了优化,通过实验验证了该方法的有效性。
     3.提出了使决策人员可以注册大窗口查询的混合数据库模式(基于DSMS和DBMS)的概念。根据混合数据库的特点设计了数据流立方体的增量维护算法,并通过实验验证了算法的有效性。
     4.针对网络安全监控的需求,基于上述技术,设计并实现了网络安全分析系统的数据立方体构建子系统YH-STREAM。该子系统支持海量数据流的时间切片数据流立方体的构建,并提供了基于混合数据库的存储与查询。该子系统已经部署运行。
Internet is a national key information infrastructure. Monitoring and controlling the Internet network security incidents in real time is key to insure Internet running normally. Monitoring and analyzing the state of Internet network security is the precondition of controlling it in real time. On-Line Analytical Processing (OLAP) is an important technique of data analyzing, which could be used to analyzing the state of Internet network security. Data cube is needed for efficient OLAP data analyzing, however, the construction of data cube is not suitable for network security data as its characteristics of burst and massiness, which limits the application of monitoring the state of Internet network security.
     This thesis proposes a Data Stream Management System (DSMS) based method of construction of data cube, which pre-calculate the Internet network security monitoring data with DSMS, update the data cube incrementally and maintain it.
     The main contributions of this thesis are summarized as follows:
     1. This thesis proposes the construction method of Time-Sliced Stream Cube(TSS-Cube) on the base of profound study of characteristics of network security monitoring data stream, techniques of processing of data stream and OLAP techniques.
     2. This thesis proposes improved table joining techniques, because in the procedure of constructing data cube, big dimensional table joining is very time-consuming. This thesis also tests its validity.
     3. This thesis proposes the concept of hybrid database mode (based on DSMS and DBMS), which offer the policy makers the convenience of registering big-windowed query. According to the practical context, this thesis raises the data cube incremental update algorithm and tests its validity.
     4. Since the need of network security monitoring, based on the techniques above, this thesis implements YH-STREAM used to constructing data stream cube which is the sub-system of network security analysis system. YH-STREAM supports the constructing of TSS-Cube, and hybrid database based storages and queries. This sub-system has been deployed already.

引文

[1] B. Babcock, S. Babu, M. Datar, R. Motwani, J. Widom, Models and issues in data stream systems, In: ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS). Madison, Wisconsin, 2002: 1-16.
    [2]金澈清,钱卫宁,周傲英.流数据分析与管理综述.软件学报,2004,15(8): 1172-1181.
    [3] R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. S. Manku, C. Olston, J. Rosenstein, R. Varma, Query processing, approximation, and resource management in a data stream management system, In: Conference on Innovative Data Systems Research (CIDR). Asiloma, CA, 2003: 245-256.
    [4] A. Arasu, B. Babcock, S. Babu, M. Datar, K. Ito, R. Motwani, I. Nishizawa, U. Srivastava, D. Thomas, R. Varma, J. Widom. STREAM: The Stanford Stream Data Manager. IEEE Data Engineering Bulletin, 2003, 26(1): 19-26.
    [5] S. Chandrasekaran, O. Cooper, A. Deshpande, TelegraphCQ: Continuous dataflow processing for an uncertain world, In: Conference on Innovative Data Systems Research (CIDR). Asilomar, CA: Asilomar: Morgan Kaufman Publishers, 2003: 269-280.
    [6] F. Reiss, J. M. Hellerstein, Data Triage: An Adaptive Architecture for Load Shedding in TelegraphCQ, In: International Conference on Data Engineering (ICDE). Tokyo, Japan, 2005: 155-156.
    [7] S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, F. Reiss, M. A. Shah, TelegraphCQ: Continuous Dataflow Processing, In: ACM SIGMOD international conference on Management of data. California, USA, 2003: 668.
    [8] C. D. Cranor, Y. Gao, T. Johnson, V. Shkapenyuk, O. Spatscheck, Gigascope: high performance network monitoring with an SQL interface, In: ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS). Madison, Wisconsin, 2002: 623.
    [9] C. D. Cranor, T. Johnson, O. Spatscheck, V. Shkapenyuk, Gigascope: A Stream Database for Network Applications, In: ACM SIGMOD international conference on Management of data. California, USA, 2003: 647-651.
    [10] T. Johnson, S. Muthukrishnan, V. Shkapenyuk, O. Spatscheck, A Heartbeat Mechanism and Its Application in Gigascope, In: International Conference on Very Large Data Bases (VLDB). Trondheim, Norway, 2005: 1079-1088.
    [11] D. J. Abadi, D. Carney, U. ?etintemel, M. Cherniack, C. Convey, C. Erwin, E. F. Galvez, M. Hatoun, A. Maskey, A. Rasin, A. Singer, M. Stonebraker, N. Tatbul, Y. Xing, R. Yan, S. B. Zdonik, Aurora: A Data Stream Management System, In: ACMSIGMOD international conference on Management of data. California, USA, 2003: 666.
    [12] D. J. Abadi, D. Carney, U. ?etintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, S. B. Zdonik. Aurora: a new model and architecture for data stream management. VLDB Journal, 2003, 12(2): 120-139.
    [13] D. J. Abadi, Y. Ahmad, M. Balazinska, U. ?etintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, S. B. Zdonik, The Design of the Borealis Stream Processing Engine, In: Conference on Innovative Data Systems Research (CIDR). Asilomar, CA, 2005: 277-289.
    [14] Y. Xing, S. B. Zdonik, J.-H. Hwang, Dynamic Load Distribution in the Borealis Stream Processor, In: International Conference on Data Engineering (ICDE). Tokyo, Japan, 2005: 791-802.
    [15] Y. Ahmad, B. Berg, U. ?etintemel, M. Humphrey, J.-H. Hwang, A. Jhingran, A. Maskey, O. Papaemmanouil, A. Rasin, N. Tatbul, W. Xing, Y. Xing, S. B. Zdonik, Distributed operation in the Borealis stream processing engine, In: International Conference on Data Engineering (ICDE). Tokyo, Japan, 2005: 882-884.
    [16] M. Balazinska, H. Balakrishnan, S. Madden, M. Stonebraker, Fault-tolerance in the Borealis distributed stream processing system, In: International Conference on Data Engineering (ICDE). Tokyo, Japan, 2005: 13-24.
    [17] J. Chen, D. J. DeWitt, F. Tian, and Y. Wang. NiagraCQ: A scalable continuous query system for internet databases. In Proc. of the 2000 ACM SIGMOD Intl. Conf. on Management of Data, May 2000: 379–390.
    [18] S. B. Zdonik, M. Stonebraker, M. Cherniack, U. ?etintemel, M. Balazinska, H. Balakrishnan. The Aurora and Medusa Projects. IEEE Data Engineering Bulletin, 2003, 26(1): 3-10.
    [19] P.A.Tucker, D.Ma1er. Exploiting Punctuation Semantics in Continuous Data Streams. IEEE Trans.on Knowledge and Data Engineering, 2003, 15(3): 458-465
    [20] W.H.Imnon. Building the Data Warehouse. Prentice Hall. 1992: 84-128.
    [21] S. Agarwal, R. Agreal, P. M. Deshpande, A. Gupta,J. F. Naughton, R. Ramakrishnan, and S. Sarawagi.On the computation of multidimensional aggregates. VLDB'96. 1996: 506-521.
    [22] J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D.Reichart, M. Venkatrao, F. Pellow and H. Pirahesh.Data Cube: A Relational Aggregation Operator Gen-eralizing Group-By, Cross-Tab, and Sub-Totals. Data Mining and Knowledge Discovery, 1997: 29-54.
    [23] S. Muto, M. Kitsuregawa. Improving memory utilization for array-based data cube computation. ACM International Workshop on Data Warehousing and OLAP (DOLAP'98). Bethesda, Maryland, USA. 1998: 28-33.
    [24] S. Agarwal, R. Agarwal, P. M. Deshpande, etc. On the Computation ofMultidimensional Aggregates. in :Proceedings of 22th International Conference on Very Large Data Bases(VLDB'96). Bombay, India. 1996: 506-521.
    [25] S.Sarawagi, R.Agrawal, A.Gupta.On computing the data cube.Technical Report RJ10026. IBM Almaden Research Center, San Jose, CA, USA. 1996: 1-18.
    [26] P.M .Deshpande, S.Agarwal, J. F. Naughton, etc. Computation of Multidimensional Aggergates. Technical Report 1314. University of Wisconsin-Madison, USA. 1996: 1-16.
    [27] K. Ross, D. Srivastava. Fast Computation of Sparse Datacubes. Proceedings of 23rd Intenrational Conference on Very Large Data Bases (VLDB'97). Athens, Greece. 1997: 116-125.
    [28] K.Beyer, R .Ramakrishnan.Botom-Up Computation of Sparse and Iceberg CUBEs Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD'99). Philadelphia, Pennsylvania, USA. 1999: 359-370.
    [29] E.Baralis, S.Paraboschi, E.Teniente. Materialized views selection in a multidimensional database. Proceedings of 23rd Intenrational Conference on Very Large Data Bases (VLDB'97). Athens, Greece. 1997: 156-165.
    [30] V.Harinarayan, A.Rajaraman, J.D.Ullman. Implementing data cubes efficiently. Proceedings of ACM SIGMOD Intenrational Confeernce on Management of Data (SIGMOD'96). Montreal, Quebec, Canada. 1996: 205-216.
    [31] A.Shukla, P.M.Deshpande, J.F.Naughton, Materialized View Selection for Multidimensional Datasets. Proceedings of 24th Intenrational Conference on Very Large Data Bases (VLDB'98). NewYork, USA. 1998: 488-499.
    [32] K.Ross, K.Zaman. Serving Datacube Tuples from Main Memory. Proceedings of Statistical and Scientific Database Management (SSDBM'00). Berlin, Germany. 2000: 182-195.
    [33] S. Babu and J. Widom,“Continuous queries over data streams,”SIGMOD Record, vol. 30, pp. 109–120, 2001.
    [34] A.C.Gilbert,Y.Kotidis, S.Muthukrishnan, andM. Strauss,“Surfingwavelets on streams: One-pass summaries for approximate aggregate queries,”in Proc. 2001 Int. Conf. on Very Large Data Bases (VLDB’01), Rome, Italy, Sept. 2001, pp. 79–88.
    [35] D. B. Terry, D. Goldberg, D. Nichols, and B. M. Oki. Continuous queries over append-only database. In Proc. of the 1992 ACM SIGMOD Intl. Conf. on Management of Data, pages 321-330, June 1992.
    [36] A. Arasu, S. Babu, J. Widom. An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations. Technical Report, Nov. 2002. dbpubs.stanford.edu:8090/pub/2002-57.
    [37] Chen J., Dewit D., Tian F.,Wang Y.: NiagaraCQ: A Scalable Continuous Query system for Intemet Databases,In Proc. ACM Ini. Conf. on Management of Data,2000, pp. 379-390.
    [38] K. Ross, D. Srivastava. Fast Computation of Sparse Datacubes. Proceedings of 23rd Intenrational Conference on Very Large Data Bases (VLDB'97). Athens, Greece. 1997: 116-125.
    [39] K.Beyer, R. Ramakrishnan.Botom-Up Computation of Sparse and Iceberg CUBEs Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD'99). Philadelphia, Pennsylvania, USA. 1999: 359-370.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700