用户名: 密码: 验证码:
基于主动贝叶斯分类器检测未知恶意可执行代码的研究与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
计算机安全是系统的安全。随着黑客入侵事件的日益猖獗,人们发现只从静态防御的角度构造安全系统是不够的。入侵检测技术是继“防火墙”、“数据加密”等传统安全保护措施后新一代的安全保障技术。它对计算机和网络资源上的恶意行为进行识别和响应。它不仅检测来自外部的入侵行为,同时也监督内部用户的未授权活动,并在网络资源受到危害之前通过对防御体系自动配置进行拦截和响应。入侵检测是对传统计算机安全机制的一种补充,它的开发应用增大了网络与系统安全的保护纵深,成为目前动态安全工具的主要研究和开发的方向。
     传统的观点根据入侵行为的属性将其分为异常和滥用两种,然后分别对其建立异常检测模型和滥用检测模型。近四五年来又涌现出了一些新的检测方法,它们产生的模型对异常和滥用都适用,如人工免疫方法、遗传算法、数据挖掘等。这类检测方法称为混合检测。这类检测在做出决策之前,既分析系统的正常行为,同时还分析可疑的入侵行为,所以判断更全面、准确、可靠。它通常根据系统的正常数据流背景来检测入侵行为,故而也有人称其为“启发式特征检测”。
     现有入侵检测产品存在以下缺点:缺乏有效性、自适应性。入侵检测系统的有效性可用检测率、虚报率两个参数来标记。虚报率又可分为误报率和漏报率。好的入侵检测系统应具有高的检测率,低的虚报率。低虚报率的关键是降低漏报率。自适应性是指除了能检测到已知的入侵外,还能检测到与已知入侵不同的入侵,即能检测到未知的入侵。当前,建造一个有效的入侵检测系统是个巨大的知识工程。系统构造员依赖他们直觉和经验选择某种统计标准度量异常检测。专家首先对攻击场景和系统弱点进行分析和分类,用手工方式写出适合违规检测的反应规则。因为采用手动和专家的人工处理过程,使入侵检测系统欠缺有效性、自适应性。在入侵检测系统中运用数据挖掘技术可以有效
    
     #
     地从各为数据中提取出有m的信B、。数掂挖o以《术-卜常适川丁从历史行为的大量数据丁!。
     进行特征提取,入侵检测系统在建立入侵模式知u则1!时可采川数据挖掘技术。Wenke Lee
     从数抿挖掘中得到启示,介发出了一个混升检测器RIPPER。它什不为不同的入侵行为
     分别建立模型,而是首先通过大量的】沛州1数抓挖删方法来学习什么是入侵行为以及什
     么是系统的汇常行为,发现掀述系统特征的一致仙门1帧式,然后丙形成对异常和滥用都
     适川的检测 模呗,这是了 合价测 的 个此呷的多 门1。
     J、。本文的卞要工作是针对未知的恶意。。J执行代码的检测,构进:)&于卞动学习的朴素贝
     叶斯分类器,采用最大最小悯选抒训练样本,对义件的机器代码段进行分析、训练贝叶
     J旧}类器,仲分类器。。]”以对小常和北总代码进订卜分。达到检测术知恶谕丁执行代码义
     件的门的。人泰的安个威胁火门于恶怠的。山入行‘代码,特别足米知的代码。这种未知的
     恶意代码*现有基于特征扫拙的炳毒门描器发现不了。而恶意可执行代码与汇常可执行
     代码的机器代码是不同的,可用数抓挖倔的介类V法对其进行区分。与数抓挖掘的义它
     方法相比,贝卜I嘶网络的优点是可以综合光验信恩和样本信息,这在样本难得时特别有
     o],所以贝叶斯网络分类器特别适合检测未知恶意可执行代码。根抓分类学习对训练样
     本的处理方法,可将分类算法分为两类:被动学习算法和卞动学习算法。传统的帅11嘶
     分类器学习算法多属被动学习方沦。山-]‘贝ill嘶网络分类只fi增垦学习的特忏,它史适
     “l丫1{动学习,在增量地分类和增量学习概率分hifj而叮以大大地提高效率。传统的叩l!
     斯分类器学习算法仙*给定的训纳朴小f判不L;帅序‘了习分类参数,学习效率较低以训练
     抖本必须是带有类训休汀的,并n 回般部假定各训练样本是独立问分h川v。棚反门二动
     学刀算法丁动在候选什本集中选择测试例十,斤将这‘》实例以一定的方式加入到训练柒
     中。一般地在卞动学习的初始训练集小,,川何f川I峭b样本个数都很少。它利用这些带有
     类另标注的U!1练样水学习-个分类x,然} 宋且 大的选怦策略,从们选拧小集(A卅
     玉 类乡佃杆Z)选Z!2包日最9MV罗回譬本酉入* 训练坎邑,卫豁修号分类8V参数口按有/修.卜
     的介类器卜,选挥卜-个最好的什个进订训练,0‘f到候选什木集为个或达到故利·付卅为
     且正二。
     子一刁F乏三多二1卜】<正、一亡兰二一一手气了一丫珍一二直3之圭一羊主一孟一气主:专享喜吾乏u吾主一亨J是三三>右三一一二IJ‘二一千,-j一塞一二一二言>二三;I’l<J二一一乏三宅土丝。一三i是二一一一二一三一>三二,且。J民有多飞亏丫且一二直M二瑟二‘吾生一一乞上主一二圭;多乏三土:哼一二气广)飞L上一耳二J一\
The computer security is the systematic security,it is rampant day by day as the hacker invaded the incident,People find that only securities system of structures are not enough in terms of static defence. Intrusion detection technology is the new generation of security technology after the traditional safe protective measure,such as" firewall","data encrypted",etc. It is not only discerns and responds to computer and network hostile behavior of resource but also measure external invasion behavior and supervise inside mandate activity of user at the same time,before network resource is endangered and intercepted. Intrusion detection technology,a supplement to traditional computer security,increasing network and system safe protection depth has become the direction of the main research and development of the safe tool of trends at present.
    Traditionally,the feature of invade behaviors are divided into abnormal and misuse detection on which build up abnormal detecting mode and misuse detecting mode. Some new detection methods have been emerging in the past 45 years,developing the models to perform well on misuse detection and anomaly detection,such as artificial immunity method,hereditary algorithm,data mining etc.. This kind of detection method is called mixed detection measure. This kind of detecting measuring,analyses the normal behavior of the system,and observe the suspicious invasion behavior at the same time before the decision is made,so it can make judgment more overall,accurately and reliably. It usually measures the behavior of invading according to the normal dataflow background of the system,so is called" the heuristic characteristic measures ".
    Nowadays there are the following shortcomings in intrusion detection products ,lack the effectiveness,flexibility,adaptivity. We can use two parameters to mark effectiveness of intrusion detection system,the false report rate and detection rate.The false report rate can be divided into wrong report rate and rate of failing to report. A good intrusion detection system should have low false report rate and high detection rate,The key is to reduce the rate of failing to report in the false report rate. Adaptivity mean can measure the intrusion different from already known to intrusion detection system ,that is can measure the unknown intrusion. At present,it is an enormous knowledge engineering to build a intrusion detection system. The person who constructs the system relies on their intuition and experience to choose the tolerance of a certain static to measure anomaly detection. The expert carries on the analysis and classification in attacking scene and system weakness at first,show the suitable response ru
    le that measured in violation of rules and regulations in craft way. Because it is manual with the expert's artificial treatment course to adopt,current intrusion detection system is short of the effectiveness,flexibility,adaptivity. When intrusion detection system being on dealing with datum of magnanimity,data mining can very fine to carry on feature extraction and establish intrusion pattern knowledge base on a large number of data of historical behavior mode .intrusion detection system can adopt Date Mining at the time of establishing the knowledge base of the intrusion mode. Wenke Lee get enlightenment in data mining ,develop one mix detection RIPPER .It does not set up models separately for different invasion behaviors ,at first ,through a large number of example with data what invade behavior to study what normal behavior and intrusion behavior of system that is .
    Find and describe to use mode unanimously systematic characteristic ,form to unusual abusing suitable measuring model,This a typical application measuring in mixed detection measure. Model set up with data mining method,and model's data from primitive data sc.iirce. So It have flexibility,adaptivity.
    
    
    Much security threats of ours come from the hostile executable code,especially unknown code. Such unknown hostile code could not be find on the basis of the existing virus scanner th
引文
[1] S.M.Bellovin. Security problems in the TCP/IP protocol suite. Computer Communitation Review, 19(2),April 1989. page 32-48
    [2] F.T. Grampp and R.H.Morris.Unix system security.
    [3] Information Security: Computer Attacks at Deparment of Defense Pose Increasing Risks,GAO/AIMD-96-84,May 1996
    [4] Information Security:Opportunities for Improved OMB Oversight of Agency Practices, GAO/AIMD-96-110,Sept. 1996
    [5] Spafford E.H. Security seminar, Department of Computer Sciences, Purdue University, 1996
    [6]《计算机安全教程》,南京大学 1998
    [7] Calvin Ko, Execution Monitoring of Security-Critical Programs in Distributed Systems: A Specification-based Approach. [Ph. D. Dissertation], Graduate Division of the Univ. of California Davis, 1996.
    [8]计算机网络(第二版) 谢希仁 电子工业出版社
    [9]蒋建春,冯登国 网络入侵检测原理与技术 国防工业山版社 2001
    [10] 软件生存周期法http://aka.org.cn/Magazine/Aka2/zhouqifa.html
    [11] Anderson J P. computer security threat monitoring and surveillance. http://seclab.cs.ucdavis.edu/projects/history/CD/ande80.pdf
    [12]Dorothy E Denning. An intrusion detection model. In:IEEE Syrup on Security and Privacy. Oakland,California, 1986. page 118-131
    [13]李镇江 戴黄侠 陈越 IDS 入侵检测系统研究计算机工程 2001年4月 第27卷 page 7-9
    [14]周军民 足迹:IDS 技术研究历史在线文档http://sinbad.dhs.org/read.html?board=IDS&num=14
    [15]H.S. Javitz and A. Valdes. The SRI statistical anomaly detctor. In Proceedings of the 1991 IEEE Symposium on Research in Security and Privacy, May 1991:316-326.
    [16]L. T. Heberlein, G. V. Dias, K.N. Levitt, et al. A network security monitor, in: Proceedings of the IEEE Symposium on Research in Security and Privacy, Oakland, CA, May 1990: pp.296-304.
    [17]Anderson D., Frivold, Th. and Valdes, A. Next-generation Intrusion-Detection Expert System (NIDES): A Summary. SRI-CSL-95-07, SRI International, Menlo Park, CA, May 1995.
    [18]S. Staniford-Chen, S. Cheung, et al. GRIDS: A Graph-based Intrusion Detection System for Large Networks. In Proceedings of the 19~(th) National Information Systems Security Conference, volume 1,pp. 361-370. National Institute of Standards and Technology. Oct. 1996.
    [19]S. R. Snapp, J. Brentano, G. V. Dias, et al., DIDS (Distributed Intrusion Detection System): Motivation, Architecture, and an Early Prototype. in Proceedings of 14~(th) National Computer Security Conference, Washington, D. C., Oct. 1-4,1991:167-176.
    [20]王晓程 刘恩德 谢小权 攻击分类研究与分布式网络入侵检测系统计算机研究与发展 June 2001 Vol.38,No.6
    [21]Wenke Lee A Data Mining Framework for Constructing Features and Models for Intrusion Detection Submitted in partial ful? Ilment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences, COLUMBIA UNIVERSITY, 1999.
    [22]胡侃,夏绍玮等 基于大型数据仓库的数据挖掘:研究综述 软件学报,1998,9(1),page 53-63。
    
    
    [23]Fayyad U, Piatetsky-Shapiro G, Smyth P knowledge discovery and data mining towards a unifying framework Proceeding of the Second International Conference on Knowledge Discovery and Data Mining(KDD-96), Portland, Oregon ,Auguest 2-4,1996 , AAAI Press
    [24]蒋嶷川 田盛丰 数据挖掘技术在入侵检测系统中的应用 计算机工程2001年4月第27卷page130—131
    [25]王珊 数据仓库技术与联机分析处理科学出版社,1999,page 119—120。
    [26]陈华浑 薛春阳 一种基于贝叶斯网的“垃圾”邮件过滤器 微机发展2000年第4期
    [27]林士敏 田凤占 陆玉昌 贝叶斯网络的建造及其在数据采掘中的应用 清华大学学报(自然科学版)2001年第41卷第1期2001,Vol.41,No.113
    [28]数据挖掘概念与技术 Jiawei Han. Micheline Kamber.著范明 孟小峰 等译page 196机械工业出版社
    [29]李鸿培 入侵检测中几个关键问题的研究[博士论文]中国 西安2001年4月page 45—46
    [30]Nir Friedman ,Bayesian Network Classifiers Machine Learning 29 page 131-163 1997
    [31]曲英杰 孙光亮 李志敏 最大熵原理及应用 青岛建筑工程学院学报第17卷第2期 1996
    [32]宫秀军 孙建平 史忠植 主动贝叶斯网络分类器 已投计算机研究与发展
    [33]Matthew G. Schultz Eleazar Eskin Erez Zadok. Data Mining Methods for Detection of New Malicious Executables. To appear in IEEE Symposium on Security and Privacy, Oakland, CA, May 2001.
    [34]Peter Milter. Hexdump. Online publication, 2000. http:/www.pcug.org.au/milerp/hexdump.html
    [35]王珊 数据仓库技术与联机分析处理科学出版社,1999,page 119—120。
    [36]R Kohavi. A study of cross-validation and bootstap for accuracy estimation and model selection .IJCAI, 1995.
    [37]sendmail http://www.senbmail.org
    [38]如何利用procmail来对付垃圾邮件http://linuxdoor, home.chinaren.com/newbies/20010303-01 .htm
    [39]Eleazar Eskin,William Noble Grundy, and Yoram Singer, Protein Family Classification using Sparse Markov Transducers.Proceedings of the Eighth International conference on Intelligent Systems for Motecular Biology ,2000.
    [40] Jerome Friedman. Trevor Hastie. Robert Tibshirani. Special Invited Paper-Additive logistic regression:A statistical view of boosting. The Annals of Statistics. page 374-400

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700