基于贝叶斯理论的网页木马检测技术研究

英文题名：Research on Web Trojan Bayesian-based Detection
作者：李炜
论文级别：硕士
学科专业名称：计算机技术
中文关键词：网页木马 ; 静态代码特征 ; 动态行为特征 ; 贝叶斯分类 ; 多项式事件模型
英文关键词：web Trojan ; static code characteristics ; dynamic behavior characteristics ; Bayesian classification ; Multinomial event Model
学位年度：2011
导师：韩兰胜
学科代码：081201
学位授予单位：华中科技大学
论文提交日期：2011-05-01

摘要

伴随互联网的飞速发展,计算机应用已经渗透到社会的各个领域,互联网给我们提供了很多服务,给生活和工作带来了便利,但是让信息安全问题成为了一个很重要的问题。由于浏览器的广泛应用,黑客利用浏览器和第三方软件的漏洞传播网页木马,获得系统权限,破坏、窃取用户信息,使用户利益受到了很大的损失。
     网页木马具有传播速度快,变形简单等特点,传统的特征码检测技术很难检测网页木马。网页木马的检测方法研究是必要的。
     网页木马和传统的木马不同之处在于网页木马的运行必须借助浏览器。当浏览器触发网页木马程序后,网页木马就会利用对方系统或者浏览器的漏洞自动将配置好的木马服务端下载到访问者的电脑,然后自动执行,从而达到破坏、窃取计算机信息的目的。因此,本课题首次提出利用贝叶斯理论的多项式事件模型计算待检测程序的威胁值,依此判定是否是网页木马。
     课题采用网页程序的静态代码和动态行为作为检测特征,运用信息增益的概念对特征进行分类筛选。在特征集的筛选过程中,课题重点考虑了特征的出现次数。课题基于贝叶斯分类方法,使用基于词频的多项式事件模型,计算未知网页程序的静态代码特征和动态行为特征威胁值,并分别与相应的阀值比较,从而判定待检测程序是否是网页木马。
     最后,实验部分以新的检测模型为理论依据,设计了网页木马检测系统,并给出了部分检测系统的算法设计和实现。实验验证了检测模型的可行性,为网页木马的检测技术提供了一个新的思路。
With the rapid development of Internet, computer applications have penetrated into all areas of the society, provides us a lot of services and bring conveniences to our life and work. At the same time, information security has become an important issue. As the browser widely used, hackers use browser vulnerabilities and third-party software to deliver web Trojan and obtain system privileges, destroy and steal user information. It make the users interests a great loss.
     Web Trojan spreads fast and easily changes its forms. Traditional signal-based virus detection techniques are hard to detect web Trojan. It is necessary to find a new detection method.
     The difference between the web Trojan and the traditional Trojan is the browser. The web Trojan must use the browser. When the browser is triggering the malicious web page, web Trojan download the Trojan program by using the vulnerabilities, and achieve the destruction and theft of computer information purposes. Therefore, the paper firstly extract features of the static code characteristics and dynamic behavior characteristics and calculate threats by using the Multinomial event Model based on Bayes theory. The threats determine whether the web page program is a Trojan.
     The method takes static code characteristics and dynamic behavior characteristics, propose detection principle, uses the concept of information gaining to filter the characteristics, and describes the static code features and API call sequence feature extraction method, then details the API interception technology. The paper focuses on the events of Bayesian classification and polynomial model,using the model to determine whether the unknown web program is web Trojan.

引文

[1]罗川.网页木马剖析与实现.计算机安全,2007,12(1),pp 83-85.
    [2] Ming-Wei Wu and Sy-Yen Kuo. Examining Web-Based Spyware Invasion with Stateful Behavior Monitoring. In: Proceedings of the 13th Pacific Rim International Symposium on Dependable Computing (PRDC '07), 2007, pp 275-281.
    [3] Niels Provos,Dean MCNamee,Panayiotis Mavrommatis etc.The ghost in the browser analysis of web-based malware. In: Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets. Cambridge:May 2007, pp 4-12.
    [4] Alexander Moshchuk, Tanya Bragin, Steven D. Gribble. A Crawler-based Study of Spyware on the Web .In: Proceedings of the 13th Annual Network and Distributed System Security Symposium (NDSS 2006), pp 321-325.
    [5]张慧琳,诸葛建伟,宋程昱,韩心慧.基于网页动态视图的网页木马检测方法.清华大学学报(自然科学版),2009,S(2),pp 56-59.
    [6] H.Kvarnstrom. A Survey of Commercial Tools for Instrusion Detection. Technical Report. Department of Computer Engineering Chalmers, University of Technology.1999 7(2), pp 115-119.
    [7]吴润浦,方勇,吴少华.基于统计与代码特征分析的网页木马检测模型.信息与电子工程,2009,10(1), pp 241-245.
    [8] Provos N, Mavrommatis P, Rajab M A, et al. All Your iFrames point to us . In: Proceedings of 17th USENIX Security Symposium(USENIX Security 08). 2008, pp 18-22.
    [9]葛建伟,韩心慧,周勇林.一个基于高交互式密罐技术的恶意代码自动捕获器[J].通信学报, 28(12),2007, pp 8-13.
    [10] Shukla,Sudhindra,Nah,Fui-Hoon.Web browsing and spyware instrusion. Communications of the ACM, August 2005 48(8), pp 85-90.
    [11] Jiang Wang, Anup Ghosh and Yih Huang. Web Canary: A Virtualized Web Browser to Support Large-Scale Silent Collaboration in Detecting Malicious Web Sites.Lecture Notes of the Institute for Computer Sciences, Social Informatics andTelecommunications Engineering, 2009 10(1), pp 24-33.
    [12] Egele M, Kirda E, Kruegel C. Defending browsers against drive-by downloads: Mitigating heap-spraying code injection attacks. In: Proceedings of Detection of Intrusions and Malware,and Vulnerability Assessment,6th International Conference,2009, pp 88-91.
    [13] Ulrich Bayer1, Imam Habibi2, Davide Balzarotti2. A Crawler-based Study of Spyware on the Web. In: Proceedings of the 2006 Network and Distributed System Security Symposium,February 2006, pp 17-33.
    [14] Tzu-Yen Wang, Chin-Hsiung Wu, Chu-Cheng Hsieh. A Virus Prevention Model Based on Static Analysis and Data Mining Methods. In: Proceedings of IEEE 8th International Conference on Computer and Information Technology Workshops. Piscataway, USA: IEEE, 2008. pp 288-293.
    [15]张波云,殷建平,张鼎兴,等.基于多重贝叶斯算法的未知病毒检测.计算机工程,2006,32(10),pp 18~21.
    [16] Engin Kirda, Christopher Kruegel, Greg Banks et al. Behavior-based spyware detection. In: Proceedings of 15th USENIX Security Symposium. Vancouver, Canada: USENIX 2006,pp 273-288.
    [17]王维,肖新光,张栗炜.基于决策树模型的恶意程序判定方法.全国网络与信息安全技术研讨会, 2007 NO.3,pp 51-52.
    [18] Shugang Tang. The Detection of Trojan Horse Based on the Data Mining. 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.2009, pp 311-14.
    [19]张昊,陶然,李志勇,等.网页恶意脚本检测方法研究.全国网络与信息安全技术研讨会. 2007 NO.3,pp 84-90.
    [20]胡明,刘嘉勇,刘亮.一种基于代码特征的网页木马改良模型研究.通信技术43(8), 2010, pp 224.
    [21]邹梦松.计算机病毒行为检测方法研究:[硕士学位论文].武汉:华中科技大学,2011.
    [22] Yang-seo Choi,Ik-kyun Kim,Jin-tae Oh,Jae-cheol Ryou.PE File Header Analysis-Based Packed PE File DetectionTechnique.Computer Science and its Applications, 2008.CSA '08. International Symposium on 13-15 Oct. 2008 , pp 28 -31.
    [23]何申.网络脚本病毒的统计分析方法.计算机学报,2006(6),pp 131-134.
    [24]魏建平,魏强,吴灏.网页病毒防御系统的设计.计算机应用研究.2006,(8),pp 120-122.
    [25]刘桂庆,邹立娣,李凯.入侵检测中的数据挖掘分析方法.合肥学院学报(自然科学版).2004 3(1).pp26-29.
    [26]陆虎,宋余庆等.一种基于正则表达式匹配的协议分析异常检测方法.计算机应用与软件,2008 3(1), pp 89-90.
    [27] Kolter J Z, Maloof M A. Learning to Detect and Classify Malicious Executables in the Wild. Journal of Machine Learning Research, 2006 7(1), pp 2721-2744.
    [28]张小康,帅建梅,史林.基于加权信息增益的恶意代码检测方法.计算机工程,2010 36(6),pp 149-151
    [29] DaiMin.Trojan Horse Detection Model Based on File’s Static Attributes. Computer Engineering, 2006 5(2), pp 198-200.
    [30] HARIHARAN KOLAM. Applications and Enhancements of Feather- weight Virtual Machine (FVM).Computer Science.2008 7(1),pp 164-185.
    [31] Yi-min Wang, Doug Beck, Xuxian Jiang, Roussi Roussev, Chad Verbowski, Shuo Chen, Samuel T. King .Automated Web Patrol with Strider HoneyMonkeys: Finding Web Sites That Exploit Browser Vulnerabilities. Network and Distributed System Security Symposium - NDSS , 2006 ,4(2), pp 156-158.
    [32] G. Hunt, D. Brubacher. Detours: Binary Interception of Win32 Functions. In: Proceedings of the 3rd USENIX Windows NT Symposium Seattle,Washington,1999,pp 135-144.
    [33] Yang Yu, Hariharan Kolam, Lap-chung Lam,Tzi-cker Chiueh. Applications of a feather-weight virtual machine.In :Processdings of the International Conference on Virtual Execution Environments– VEE, 2008,pp.171-180.
    [34] Stewart, B.Predicting project delivery rates using the naive-Bayes classifier. Journal of Software Maintenance and Evolution Research and Practice,2002 14(3),pp 161-79 .
    [35] P.Langley,W.Iba,and K.Thompson.An analysis of Bayesian classifiers. In: Proceedings of the Tenth National Conference on Artificial Intelligence, San Jose, CA, 1992.AAAI Press, pp 399-406.
    [36] I Androutsopoulos, J Koutsias, etc. An evaluation of Naive Bayesian anti-spare filtering[C]. In: Proceedings of the Workshop on Machine Learning in the New Information Age, l, lth European Conferenee on Machine Learning(ECML’00), June 3,2000, pp9-17.
    [37] A. Fiskiran,R. Lee. Runtime Execution Monitoring(REM)to Detect and Prevent Malicious Code Execution. In: Proceedings of the IEEE International Conference on Computer Design, San Jose,CA.2004,pp 215-219.
    [38]王涛,裘国永,何聚厚.基于改进Naive Bayes的垃圾邮件过滤模型研究.计算机工程与应用, 2007,43(2),pp 13-15.
    [39] YANG Y , Pedersen J P . A comparative study on feature selection in text categorization .In: Proeeedings of the Fourteenth International Conference on Machine Learning(ICML’97).San Francisco, CA: Morgan Kaufmann Publishers,1997,pp 412-420.
    [40] E Eskin,WN Grundy,Y Singer. Protein Family Classification using Sparse Markov Transducers. In:Proceedings of the Eighth International conference Intelligent Systems for Molecular Biology,2000,pp 354-357.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700