网络安全OLAP分析中数据立方体技术的研究与实现

英文题名：Research and Implementation of Data Cube Techniques for OLAP Analysis of Network Security
作者：杨庆民
论文级别：硕士
学科专业名称：计算机科学与技术
中文关键词：数据立方体 ; OLAP ; 数据流立方体 ; StreamDwarf ; 网络安全态势感知
英文关键词：Data Cube ; OLAP ; Stream Cube ; StreamDwarf ; Network Security Situation Awareness
学位年度：2008
导师：韩伟红
学科代码：081202
学位授予单位：国防科学技术大学
论文提交日期：2008-11-01

摘要

互联网正在成为国家关键信息基础设施,事关国家和全社会的根本利益。随着互联网技术的飞速发展,针对网络信息系统的恶意攻击正向着分布化、规模化、复杂化、间接化等趋势发展。因此迫切需要研究新的技术以实现对大规模网络信息系统的安全态势进行实时、准确的感知、监控和分析。如何在复杂的海量监测数据中对当前的网络安全状态进行获取、理解,发现潜在的变化趋势,从而把握大规模网络的宏观安全态势,是我们研究工作的出发点。
     联机分析处理(OLAP)技术是实现对大规模网络监测数据进行近实时综合分析的重要手段。OLAP通过对信息的多种可能的观察形式进行快速、稳定一致和交互性的存取,允许管理决策人员对数据进行深入观察,具有极大的分析灵活性。
     数据立方体的有效计算是支撑OLAP分析的关键。只有预先计算数据立方体的全部或部分,才能大幅度降低查询响应时间,提供联机分析处理的性能。如何在存储容量、计算能力的限制下,寻找到计算部分数据立方体的可伸缩的办法,在数据立方体的时空开销和查询响应性能之间进行微妙的折中,是本文工作的核心问题。
     基于网络安全态势的感知、监控和分析对实时性的需求,本文研究了数据流上的联机分析处理。数据流上数据立方体的计算其时空条件更加苛刻,研究有限时空条件下数据流立方体的部分物化方法,是本文工作的重点。
     本文的主要工作概述如下:
     1.介绍了数据立方体的基本概念和模型定义,讨论了数据立方体的实现方案,对各种数据立方体计算算法做了总结和深入分析。
     2.分析了数据流上的联机分析处理的特点,总结了数据流立方体的设计需求,提出了多层次倾斜窗口模型,在有限的时空条件下通过时间维有效的压缩了数据流立方体的体积。
     3.提出了一种新的数据流立方体部分物化方法—基于Dwarf结构的多维数据流立方体框架StreamDwarf,并给出相应的计算算法,包括增量更新算法和查询算法,并对算法进行实现,给出实验测试结果。
     4.研究开发了基于StarOLAP平台的网络安全态势分析系统,实现了对海量网络安全监测数据的多维多层次、近实时的综合分析。
Internet has become the key information facility of our country. With the rapid development of Internet technology, vicious attacks against network information system tend to be distributed, complicated, indirect and scalable. Thus it’s impendingly required to research for new technology to accurately acquire, monitor and analyze the security situation of large scale network system in real time. Figuring out methods to acquire and interpret current security state of the network and disclose the underlying changes to grasp the general security situation is where our study begins.
     OnLine Analytical Processing (OLAP) is an important technology to do integrated analysis on the massive and complicated network monitoring data. By rapid, consistent and interactive access of information from various possible viewpoints, OLAP allows the analysts to observe data in depth, providing greate flexibility.
     Efficient computation of data cube is the key to support OLAP analysis. To get OLAP capability, we have to precompute the whole or at least partial data cube in order to reduce the query response time. The core problem of our study is to find out scalable techniques to compute partial data cube under restraints of storage space and computation power to get a balance between data cube’s computation&storage cost and query response time.
     Since the acquirement, monitoring and analysis of network security situation is often required to be done in real time, we proceed to the study of OnLine Analytical Processing on rapid changing streams. With streams, the data cube computation has a more rigorous restrict on computation time and storage space. Studying partial materialization techniques of stream cube under restraints is the emphases of our work. We summarize our work as follow.
     First, basic concepts of data cube are introduced with discussion of its implementation schemes followed.
     Second, the characteristics of OnLine Analitical Processing on data streams, and the design requirements of stream cube are analyzed. Then a hierarchical tilted window model, which decreases the size of stream cube to adapt to the computation and storage constraints, is proposed.
     Third, a new method for partial materialization of stream cube, a Dwarf-based stream cube framework called StreamDwarf, is proposed. The corresponding computation algorithms, including incremental update algorithm and query algorithm, are developed. Then the algorithms are implemented and testing results are presented.
     At last, a prototype for network security situation analysis, which is based on StarOLAP platform and is capable of multi-dimensional, multi-level and integrated analysis on the massive network monitoring data in real time, is developed.

引文

[1]应向荣.网络攻击新趋势下主动防御系统的重要性.计算机安全. 2003(7): 53-55
    [2] W.H.Imnon. Building the Data Warehouse. Prentice Hall. 1992
    [3] E.F.Codd, S.B.Codd, C.T.Salley. Providing OLAP to User Analysts: An IT Mandate. White Paper. Arbor Software Coproartion, 1993
    [4] J.Han, M.Kamber. Data Mining Concepts and Techniques.北京:机械工业出版社, 2007. 3
    [5] S.R.Gardner. Building the data warehouse. Communications of the ACM. 1998, 41(9): 52-60
    [6] S. Agarwal, R. Agreal, P. M. Deshpande, A. Gupta,J. F. Naughton, R. Ramakrishnan, and S. Sarawagi.On the computation of multidimensional aggregates. VLDB'96. 1996: 506-521
    [7] J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D.Reichart, M. Venkatrao, F. Pellow and H. Pirahesh.Data Cube: A Relational Aggregation Operator Gen-eralizing Group-By, Cross-Tab, and Sub-Totals. Data Mining and Knowledge Discovery, 1997: 29-54
    [8] Y. Zhao, P. M. Deshpande, J. F. Naughton. An Array-Based Algorithm for Simultaneous Multidimensional Aggregates. in :Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD'97). Tucson, Arizona, USA. 1997: 150-170
    [9] S. Muto, M. Kitsuregawa. Improving memory utilization for array-based data cube computation. ACM International Workshop on Data Warehousing and OLAP (DOLAP'98). Bethesda, Maryland, USA. 1998: 28-33
    [10] S. Agarwal, R. Agarwal, P. M. Deshpande, etc. On the Computation of Multidimensional Aggregates. in :Proceedings of 22th International Conference on Very Large Data Bases(VLDB'96). Bombay, India. 1996: 506-521
    [11] S.Sarawagi, R.Agrawal, A.Gupta.On computing the data cube.Technical Report RJ10026. IBM Almaden Research Center, San Jose, CA, USA. 1996: 1-18
    [12] P.M .Deshpande, S.Agarwal, J. F. Naughton, etc. Computation of Multidimensional Aggergates. Technical Report 1314. University of Wisconsin-Madison, USA. 1996: 1-16
    [13] K. Ross, D. Srivastava. Fast Computation of Sparse Datacubes. Proceedings of 23rd Intenrational Conference on Very Large Data Bases (VLDB'97). Athens, Greece. 1997: 116-125
    [14] K.Beyer, R .Ramakrishnan.Botom-Up Computation of Sparse and Iceberg CUBEsProceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD'99). Philadelphia, Pennsylvania, USA. 1999: 359-370
    [15] S. Goil, A. Parallel data cube construction for high performance analytical processing. Proceedings of 4th International Conference on online High Performance Computing. Bangalore, India. 1997: 10-15
    [16] K.Frank, A.Dehne, T.Eavis, etc. Parallelizing the data cube. in :Proceedings of International Conference on Database Theory (ICDT'01). London, United Kingdom. 2001: 129-143
    [17] J.S.Viter, M.Wang, B.Iyer. Data Cube Approximation and Histograms via Wavelets. Proceedings of 7th International Conference on Information and Knowledge Management (CIKM'98). Bethesda, Maryland, USA. 1998: 96-104
    [18] D.Barbara, M.Sullivan. A Space-Eficient way to support Approximate Multidimensional Databases. Technical report, ISSE-TR-98-03. George Mason University, USA. 1998: 1-10
    [19] P.B.Gibbons, Y.Matias. New Sampling-Based Summary Statistics for Improving Approximate Query Answers. Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD'98). Seattle, USA. 1998: 331-342
    [20] S.Acharya, P.B.Gibbons, V.Poosala. Congressional Samples for Approximate Answering of Group-By Queries. Proceedings of ACM SIGMOD Intenrational Confeernce on Management of Data (SIGMOD'00). Dallas, USA. 2000: 487-498
    [21] J.Shanmugasundaram, U.Fayyad, P.S.Bradley. Compressed Data Cubes for OLAP Aggregate Query Approximation on Continuous Dimensions. Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD'99). SanDiego, CA, USA. 1999: 223-232
    [22] W.Wang, J.Feng, H.Lu, etc. Condensed Cube: An Effective Approach to Reducing Data Cube Size. Proceedings of 18th International Conference on Data Engineering (ICDE'02). San Francisco, USA. 2002: 155-165
    [23] Y.Sismanis, A.Deligiannakis, N.Roussopoulos. Dwarf: Shrinking the PetaCube.Proceedings of ACM SIGMOD Intenrational Confeernce on Management of Data, (SIGMOD'02). Madison, Wisconsin, USA. 2002: 464-475
    [24] L.Lakshmanan, J.Pei, J.Han. Quotient Cube: How to Summarize the Semantics of a Data CubeFast. Proceedings of 28th Intenrational Conference on Very Large Data Bases (VLDB'02). HongKong, China. 2002: 778-789
    [25] D.Barbara, X.Wu. Using loglinear models to comperss datacubes. Proceedings of International Confeernce on Web-Age Information Management (WAIM'00). Shanghai, China. 2000: 311-322
    [26] N.Roussopoulos, Y.Kotidis, M.Roussopoulos. Cubetree: Organization of and Bulk Incremental Updates on the Data Cube. Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD'97). Tucson, USA.1997: 89-99
    [27] T.Johnson, D.Shasha. Some Apporaches to Index Design for Cube Forest. IEEE Data Engineering Bulletin, 1997, 22(1): 27-35
    [28] T.Johnson, D.Shasha. Some Approaches to Index Design for Cube Forest. IEEE Data Engineering Bulletin, 1999, 22(4): 22-30
    [29] G.Sathe, S.Sarawagi. Intelligent Rollups in Multidimensional OLAP Data. Porceedings of 27th International Conference on Very Large Data Bases (VLDB'01). Rome, Italy. 2001: 531-540
    [30] S.Goil, A.N.Choudary. High performance of Multidimensional analysis of large datasets. ACM Intenrational Workshop on Data Warehousing and OLAP (DOLAP'98). Bethesda, Maryland, USA. 1998: 34-39
    [31] S.Sarawagi. Explaining diferences in multidimensional aggregates. Proceedings of 25th Intenrational Confeernce on Very Large Databases (VLDB'99). Edinburgh, Scotland, United Kingdom. 1999: 42-53
    [32] S.Sarawagi. User-adaptive exploration of multidimensional data. Proceedings of 26th International Conference on Very Large Databases (VLDB'00). Cairo, Egypt. 2000: 307-316
    [33] E.Baralis, S.Paraboschi, E.Teniente. Materialized views selection in a multidimensional database. Proceedings of 23rd Intenrational Conference on Very Large Data Bases (VLDB'97). Athens, Greece. 1997: 156-165
    [34] V.Harinarayan, A.Rajaraman, J.D.Ullman. Implementing data cubes efficiently. Proceedings of ACM SIGMOD Intenrational Confeernce on Management of Data (SIGMOD'96). Montreal, Quebec, Canada. 1996: 205-216
    [35] A.Shukla, P.M.Deshpande, J.F.Naughton, Materialized View Selection for Multidimensional Datasets. Proceedings of 24th Intenrational Conference on Very Large Data Bases (VLDB'98). NewYork, USA. 1998: 488-499
    [36] K.Ross, K.Zaman. Serving Datacube Tuples from Main Memory. Proceedings of Statistical and Scientific Database Management (SSDBM'00). Berlin, Germany. 2000: 182-195
    [37]金澈.流数据分析与管理综述.软件学报, 2004, 15(8): 134-138
    [38] P.A.Tucker, D.Ma1er. Exploiting Punctuation Semantics in Continuous Data Streams. IEEE Trans.on Knowledge and Data Engineering, 2003, 15(3): 458-465
    [39] L.Golab, M.T.Ozsu. Issues in Data stream management. SIGMOD Record, 2003, 32(2): 5-14
    [40] M.Datar, A.Gionis, P. Indyk, and R.Motwani. Maintaining stream statistics over sliding windows. Proc. of the 2002 Annual ACM-SIAM Symp. On Discrete Algorithm, 2002: 635-644
    [41] B.Babcock, M.Datar, R.Motwani. Sampling from a moving window over streaming data. Proc. Of the 2002 Annual ACM-SIAM Symp. On Discrete Algorithms, 2002:633-635
    [42] B.Babcock, M.Datar, etc. Maintaining variance and k-Medians over data stream windows. Proc. Of the 22nd ACM SIGACT-SIGMOD–SIGART Symp. On Principles of Datsbase systems. San Diego: ACM Press, 2003: 234-243
    [43] Y. Zhu, D.Shasha. StatStream: Statistical monitoring of thousands of data streams in real time. In Proceedings of the 28th international conference on Very Large Data Bases, 2002(8): 358-369
    [44]陈秀真,郑庆华,管晓宏.网络化系统安全态势评估的研究.西安交通大学学报, 2004, 38 (4): 404-408
    [45]王慧强,赖积保,朱亮.网络态势感知系统研究综述.计算机科学, 2006, 33(10): 5-10
    [46]陈彦德,赵陆文,王琼.网络安全态势感知系统结构研究.计算机工程与应用, 2008, 44(1): 329-335
    [47] A. Datta and H. Thomas. A Conceptual Model and Algebra for On-Line Analytical. Int. Seventh Annual Workshop on Information Technologies and Systems (WITS 1997). 1997: 91–100
    [48]李盛恩,王珊.封闭的Data Cube及其查询处理.软件学报, 2004, 15(8): 165-171
    [49] Y. Kotidis, N. Roussopoulos. DynaMat: a dynamic view management system for data warehouses. Proceedings of the 1999 ACM SIGMOD international conference on Management of data. Philadelphia, Pennsylvania, United States. 1999. 371-382
    [50] B.Shah, K.Ramach, V.Raghavan. A Hybrid Approach for Data Warehouse View Selection.in:www.iscas2007.org/~bns0742/homepage/Publications/IJDWM-Final-PostRevisions.pdf
    [51] Jiawei Han, Yixin Chen, Guozhu Dong, Jian Pei, Benjamin W. Wah etl. Stream Cube: An Archetecture for Multi-Dimensional Analysis of Data Streams. Distributed and Parallel Databases, 2005(18): 173–197
    [52] Moonjung Cho, Jian Pei, Ke Wang. Answering ad hoc aggregate queries from data streams using prefix aggregate trees. Knowledge and Information Systems, 2006
    [53] Yannis Sismanis, Nick Roussopoulos. The Polynomial Complexity of Fully Materialized Coalesced Cubes. Proceedings of the 30th VLDB Conference, Toronto, Canada, 2004: 1324-1330

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700