基于日志挖掘的网络安全审计系统研究与实现

英文题名：Research and Implementation of the Network Security Audit System Based on Log Mining
作者：宁兴旺
论文级别：硕士
学科专业名称：通信与信息系统
中文关键词：日志挖掘 ; 关联规则 ; 多源日志 ; 兴趣度 ; 安全审计 ; 层次化安全防护体系
英文关键词：Log Mining ; Association Rules ; Multisource Log ; Interest Degree ; Security Audit ; Hierarchical Security System
学位年度：2010
导师：刘培玉
学科代码：081001
学位授予单位：山东师范大学
论文提交日期：2010-06-05

摘要

随着计算机应用的普及和计算机技术的飞速发展,计算机犯罪日益增多。面对这种新兴的犯罪行为,如何审计出问题并有力地打击计算机犯罪成为了一个新的课题。网络安全审计系统能有效防止敏感信息等技术资料的流失;能够有效监督工作人员在网上浏览情况,从而抵制有害信息的入侵;能够有效地对违法犯罪活动进行日志挖掘和留存整理,打击网上和内部人员犯罪活动。
     网络安全审计的效果好坏将直接影响到我们能否及时和准确地发现入侵或异常。本文首先研究并分析比较了目前传统的安全审计技术。目前应用于安全审计领域的技术较多,其核心普遍采用的是先验库方式。这种方式的缺点在于无法发现数据中存在的关联规则,缺乏挖掘数据背后隐藏知识的手段,存在着准确率低、检测速度慢、自适应差等问题。本文针对上述问题,所做的主要工作包括:
     1.设计实现了多源日志的实时采集
     针对单一的数据源容易造成审计分析的不准确,采用代理来实现多点分布式采集,将主机数据源和网络数据源有机地结合起来,并设计了MyOnEntryWritten函数和provider模式,将其应用在审计日志采集中,提高了审计数据的全面性和实时性。
     2.给出了一种较为全面的审计方式
     设计了以基于日志库中挖掘关联规则方法为主,并结合传统的先验库知识和数理统计方法的一种较为全面的审计方式,通过审计分析主机操作和网络通信行为是否出现异常,做出相应的响应方式。其中应用了三种规则库模式匹配方式:序列模式、时间合理性和数理统计方式,提高了审计的准确率。
     3.改进了传统的关联规则挖掘算法
     通过采用新的基于树的孩子兄弟表示法的数据结构,改进了传统的关联规则挖掘算法中频繁扫描数据库的不足,提高了算法效率;在支持度和置信度的框架下,系统中引入了另一个评价阈值——兴趣度,用来修剪无用的规则,避免生成干扰性的关联规则,优化了关联规则的评价标准;在原始数据集符号化转换为事务数据库时,通过结合实际情况,对事务数据库进行了再优化。从而较好的解决了审计规则的自动生成与更新,提高了审计效率。
     4.设计建立了层次化的安全防护体系
     通过综合应用MD5散列函数、进程守护、SSL以及HOOK API等,设计了实现了一种从最外层的用户名口令保护到最内层的磁盘日志文件保护这样一种层次化的安全防护体系,保证了审计结果的真实性和可靠性,实现了审计系统自身的安全性。同时通过实时监测操作系统的进程、内存等信息,并给予性能分析和预警报告,也在一定程度上保证了操作系统的安全性。
     5.设计实现了基于日志挖掘的网络安全审计系统
     通过系统详细设计和实现,并予以实验分析,可以发现日志挖掘技术对局域网服务器进行审计具有比较明显的优势和较强的自适应能力,同时误报率也能够达到预期效果,说明基于改进的日志挖掘技术的网络安全审计系统具有可行性,能够提高审计的效率和准确率。
With the popularization of computer applications and the rapid development of computer technology, the number of computer crimes is larger day by day. In the face of the emerging crimes, how to audit and effectively combat computer crime has become a new task. Network security audit system not only could prevent the loss of technical information-sensitive information for example, and monitor the situation of staff browsing the Internet which can resist the invasion of harmful information, but effectively do log mining and retention to combat criminal activities on-line or internal staff as well.
     Network Security Audit results directly affect our ability to detect the intrusion or abnormality timely and accurately. The traditional security audit technology is studied and compared at first. The core technology of which is adopted in the field of security audit at present is the way of transcendental database. The disadvantage of this approach is that association rules can not be found which exist in the data, and it lack of the way of mining the hidden knowledge contained in data which has the problem of low accuracy, slow detection speed, and poor adaptability. Aiming at these problems, the major research work is as follows:
     1. A real-time collection of multisource log is designed and implemented
     For the inaccurate audit analysis caused by single data source, agents are adopted to achieve distributed data acquisition which organically combined sources of host data and network data. The MyOnEntryWritten function and provider model is designed and applied in the audit log collection, so as to achieve the comprehensiveness and real-time of audit log.
     2. An overall audit approach is given
     An overall audit approach is designed which combined digging association rules based on log data mainly with traditional priori knowledge and mathematical statistics. On the basis of that, the system can audit the host operation and behavior of network traffic to find abnormal situation and make the appropriate way of responding. Three pattern matching methods of rule base are used to increase the accuracy of audit: sequential patterns, time matching and mathematical statistics.
     3. The traditional association rule mining algorithm is improved
     By using the new data structure which is represented by son-brother based on tree, it improve the disadvantage of scanning the database frequently in the traditional association rule mining algorithm so as to improve the algorithm efficiency. Beside the support-confidence degree framework, interest degree is introduced as an evaluation threshold to pruning useless rules avoiding generating interferential association rules so as to optimize association rules evaluation criteria. The transaction database converted from original dataset is re-optimized in the actual situation. All of these are used to achieve the automatic generation and updating of audit rules and improve audit efficiency.
     4. A hierarchical security system is designed and established
     A hierarchical security system is designed from the password protection of outermost user name to the innermost layer of disk log file protection by integrating the applications of MD5 hash function, process guard, SSL and HOOK API, etc. So that it can ensure the authenticity and reliability of audit results and its own security of audit system. At the same time, the system could provide process and memory information performance analysis and early warning report through real-time monitoring of operating system which to a certain extent can guarantee the security of the operating system.
     5. The network security audit system based on log mining is designed and implemented
     Through detailed design and implementation of system, experimental analysis can be found that log mining audit system installed on the LAN server has obvious advantages and strong adaptive ability, while false positives can also achieve the desired results from which indicates that the network security audit system based on improved log mining is feasible and can improve the efficiency and accuracy of the audit.

引文

[1] Robert Moskovitch, Yuval Elovici, Lior Rokach: Detection of unknown computer worms based on behavioral classification of the host[J].Computational Statistics & Data Analysis, 2008,52(9):4544-4566.
    [2] Andrew Hay, Daniel Cid Creator of OSSEC, Rory Bary, Stephen Northcutt: Rootkit Detection Using Host-based IDS [M].OSSEC Host-Based Intrusion Detection Guide, 2008:275-280.
    [3]鲁梦.分布式网络安全审计系统研究与实现[D].贵州:贵州大学.2006年.
    [4] Chinyang, Henry Tseng, Shiau-Huey Wang, Karl Levitt: DRETA: distributed routing evidence tracing and authentication intrusion detection model for MANET[C].Proceedings of the 2nd ACM symposium on Information, computer and communications security, Singapore, 2007:395–397.
    [5] Lih-Chyau Wuu, Chi-Hsiang Hung, Sout-Fong Chen: Building intrusion detection pattern miner for snort network intrusion detection system[J].Journal of System and Software,2007,30(16):3203-3213.
    [6] Haoyu Song, John W.Lockwood: Efficient packet classification for network intrusion detection using FPGA[C].Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays,California,USA,2005:238–245.
    [7] Ajith Abraham, Ravi Jain, Johnson Thomas, Sang Yong Han: D-SCIDS: Distributed soft computing intrusion detection system[J].Journal of Network and Computer Applications,2007, 30(1):81-98.
    [8]卢少峰.网络安全审计系统的设计与实现[D].西安:西北工业大学. 2004年.
    [9]宁兴旺,刘培玉,孔祥霞.基于Windows日志的安全审计技术研究[J].山东科学,2009,22(1):40-45.
    [10] Slobodan Petrovic, Sverre Bakke: Improving the Efficiency of Misuse Detection by Means of the q-gram Distance[C], Proceedings of the 2008 The Fourth International Conference on Information Assurance and Security, DC USA,2008:205–208.
    [11] Terrence OConnor, Douglas Reeves: Bluetooth Network-Based Misuse Detection[C].Proceedings of the 2008 Annual Computer Security Applications Conference, DC USA, 2008:377–391.
    [12] Lingyun Yang, Chuang Liu, Jennifer M.Schopf, Ian Foster: Anomaly detection and diagnosis in grid environments[C].Proceedings of the 2007 ACM/IEEE conference on High Performance Networking and Computing, NY USA,2007:1–9.
    [13] George K.Baah, Alexander Gray, PMary Jean Harrold: On-line anomaly detection of deployed software: a statistical machine learning approach[C].Proceedings of the 3rd international workshop on Software quality assurance, NY USA, 2006:70–77.
    [14] Gleb V.Nosovskiy, Dongquan Liu, Olga Sourina: Automatic clustering and boundary detection algorithm based on adaptive influence function[J].Pattern Recognition, 2008, 41(9):2757-2776.
    [15] PFabrizio Angiulli, PGianluigi Greco, PLuigi Palopoli: Outlier detection by logic programming[J].ACM Transactions on Computational Logic,2007, 9(1):7-13.
    [16] Kwang-Kyu Seo: An application of one-class support vector machines in content-based image retrieval[J].Expert Systems with Applications, 2007,33(2):491-498.
    [17] M.Arun Kumar, M.Gopal: Least squares twin support vector machines for pattern classification[J].Expert Systems with Applications, 2009,36(4):7535-7543.
    [18] Ahmet Baylar, Davut Hanbay, Murat Batan: Application of least square support vector machines in the prediction of aeration performance of plunging over fall jets from weirs[J].Expert Systems with Applications, 2009,36(4):8368-8374.
    [19] Davut Hanbay, Ahmet Baylar, Murat Batan: Prediction of aeration efficiency on stepped cascades by using least square support vector machines[J].Expert Systems with Applications, 2009, 36(3):4248-4252.
    [20] Wen Yu,Xiaoou Li:Online fuzzy modeling with structure and parameter learning[J].Expert Systems with Applications,2009,36(4):7484-7492.
    [21] Jyh-Win Huang, Ting-Wei Hou: A controllable and accountable state-oriented Card-Aided Firewall[J].Computer Standards & Interfaces, 2009,31(1):66-76.
    [22]景志刚,王相林.基于人工免疫的网络入侵检测技术[J].现代计算机,2005,(2):49-54.
    [23]孔祥霞.免疫网络及在网络安全审计中的应用研究[D].济南:山东师范大学. 2009年.
    [24] Wei Song, Cheng Hua Li, Soon Cheol Park: Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures[J].Expert Systems with Applications, 2009,36(5):9095-9104.
    [25] Lin-Yu Tseng, Ya-Tai Lin: A hybrid genetic local search algorithm for the permutation flow shop scheduling problem[J].European Journal of Operational Research, 2009,198(1):84-92.
    [26]王强,皮德常,李伟奇,吕军.基于Agent和数据挖掘的分布式信息审计平台[J].计算机技术与发展,2006,16(4):141-146.
    [27] Swapnil Patil, Anand Kashyap, Gopalan Sivathanu, and Erez Zadok. I3FS: An In-Kernel Integrity Checker and Intrusion Detection File System[C]. Appears in the Proceedings of the 18th USENIX Large Installation System Administration Conference , LISA , 2004.
    [28] Xiaowen Ding,Bing Liu,PHILIP S YU.A Holistic Lexicon-Based Approach to Opinion Mining[C].To appear in Proceedings of First ACM International Conference on Web Search and Data Mining WSDM-2008 Stanford University, Stanford, California, USA 2008:11-29.
    [29] LIQIANG GENG, HOWARD J.HAMILTON. Interestingness Measures for Data Mining: Asurvey[J]. ACM Computing Surveys, 2006, 38(3):11-29.
    [30]辛义忠.基于数据挖掘的网络安全审计技术的研究与实现[D].沈阳:沈阳工业大学,2004年.
    [31] Anderson J P. Computer Security Threat Monitoring and Surveillance. Fort Washington, James P. Anderson Co, 1980.
    [32]赵艳铎.关联规则算法研究及其在网络安全审计系统中的应用[D].北京:清华大学,2005年.
    [33] Wenke Lee Salvatore J. Stolfo. Data Mining Approaches for Intrusion Detection[C]. Computer Science Department Columbia University, 1999.
    [34] Wenke Lee Salvatore J. Stolfo, Philip K. Chan, Eleazar Eskin, Wei Fan, Matthew Miller, Shlomo Hershkop, and Junxin Zhang . Real Time Data Mining-based Intrusion Detection[C]. DARPA InformationSurvivability Conference and Exposition - DISCEX, 2001.
    [35] Susan M. Bridges,Rayford B. Vaughn . INTRUSION DETECTION VIA FUZZY DATA MINING[C] Accepted for Presentation at The Twelfth Annual Canadian Information Technology Security Symposium , The Ottawa Congress Centre , 2000,(6):19-23.
    [36] Susan M. Bridges, Rayford B. Vaughn . FUZZY DATA MINING AND GENETIC ALGORITHMS APPLIED TO INTRUSION DETECTION[C]. Presented at the National Information Systems Security Conference(NISSC), Baltimore, MD , 2000,(10):16-19.
    [37]张夏丽.基于数据挖掘的网络安全审计技术研究[D].郑州:解放军信息工程大学,2007年.
    [38]胡敏,潘雪增,平玲娣.基于数据挖掘的实时入侵检测技术的研究[J].计算机应用研究,2004,(1):105-108.
    [39]胡笑蕾,胡华平,宋世杰.数据挖掘算法在入侵检测系统中的应用[J].计算机应用研究,2004,(7):88-90.
    [40]徐著,刘宝旭,许榕生.基于数据挖掘计数的入侵检测系统设计与实现[J].计算机工程,2002, 28(6):9-10.
    [41] Lee W, Stolfo S J, Mok K W. Ming Audit Data to Build Intrusion Detection models[C]. Stolarchuk, ed. Proc of the 4th International Conference on Knowledge Discovery and Data Mining . New York: AAAI Press, 1998:212-225.
    [42]徐群.基于网格安全审计分析算法的研究[D].大连:大连理工大学,2007年.
    [43]范明,范宏建等译.数据挖掘导论[M].北京:人民邮电出版社,2007:4-6.
    [44]苏新宁,杨建林,江念南.数据仓库和数据挖掘[M].北京:清华大学出版社,2005:149-161.
    [45] K.C.Tan, E.J.Teoh, Q.Yu, K.C.Goh: A hybrid evolutionary algorithm for attribute selection in data mining[J].Expert Systems with Applications, 2009, 36(4):8616-8630.
    [46] Guangli Nie, Lingling Zhang, Ying Liu, Xiuyu Zheng, Yong Shi: Decision analysis of data mining project based on Bayesian risk[J].Expert Systems with Applications, 2009, 36(3):4589-4594.
    [47]窦祥国,胡学钢.关联规则的评价方法研究[J].安徽技术师范学院学报,2005,19(4):44-47.
    [48]王员根.关联规则在课程相关性模式中的应用与研究[J].现代计算机,2007,253(2):79-82.
    [49]宋旭东.关联规则评价指标的研究[J].微计算机信息,2007,23:174-176.
    [50]刘培玉,宁兴旺等.网络安全审计与计算机取证技术研究[R].济南:山东师范大学,2008,12:34-36.
    [51]郭俊芳,谢益武,周生宝.关联规则相关性度量[J].计算机应用,2007,27 (4):892-894.
    [52]黄德才,张良燕.一种改进的关联规则增量式更新算法[J].计算机工程,2008,34(10):38-42.
    [53]徐勇,周森鑫.一种改进的关联规则挖掘方法研究[J].计算机技术与发展,2006,16(3):77-79.
    [54]朱意霞,姚力文,黄水源.基于排序矩阵和树的关联规则挖掘算法[J].计算机科学,2006,33(7):196-198.
    [55]程玉胜,邓小光,江效尧. Apriori算法中频繁项集挖掘实现研究[J].计算机技术与发展,2006,16(3): 58-60.
    [56]杨健兵.数据挖掘中关联规则的改进算法及其实现[J].微计算机信息,2006,22(7-3):195-197.
    [57]秦光洁,张颖.基于综合兴趣度的协同过滤推荐算法[J].计算机工程,2009,35(17):81-83.
    [58]谢文阁,梅红岩,李欣,周月鹏.基于兴趣度的关联规则在选课分析中的应用[J].内蒙古大学学报,2009,40(2):199-202.
    [59]徐科,崔志明.基于搜索历史的用户兴趣模型的研究[J].计算机技术与发展,2006,16(5):18-20.
    [60] Bojan Babic, Nenad Nesic, Zoran Miljkovic: A review of automated feature recognition with rule-based pattern recognition[J].Computers in Industry, 2008,59(4):321-337.
    [61]周皓峰,朱扬勇,施伯乐.一个基于兴趣度的关联规则的采掘算法[J].计算机研究与发展,2002(4): 450-457.
    [62]陈安龙.基于兴趣度的关联规则挖掘算法的研究[D].成都:西南交通大学,2003年.
    [63]郭琳琳.基于Windows日志安全保护的计算机取证技术研究[D].天津:天津大学,2006年.
    [64]宁兴旺,刘培玉.支持审计与取证联动的日志系统设计[J].计算机工程与设计,2009,30(24): 5580-5583.
    [65]微软技术中心.安全监视和攻击检测规划指南[EB/OL]. http://technet.microsoft.com/zh-cn/library/ cc163158.aspx, 2006-02-17.
    [66] Mikko T.Siponen, Harri Oinas-Kukkonen: A review of information security issues and respective research contributions[J].ACM SIGMIS Database,2007,38(1):60–80.
    [67] Akiko Takedaa, Takafumi Kanamorib: A robust approach based on conditional value-at-risk measure to statistical learning problems[J].European Journal of Operational Research,2009,198(1):287-296.