用于行为分析反木马的模糊分类算法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
信息安全问题,在网络迅速发展与广泛应用的现代社会中,日益受到人们重视。特洛依木马(Trojan-horse)是影响信息安全的一个重要问题,与绝大多数的网络政治经济案件联系在一起,数量和危害程度上均占到较大比重,并呈上升趋势,至今尚未能找到行之有效的办法来充分遏制木马危害的持续上升。因此,反木马的研究一直是信息安全领域的一个热点和重点。
     目前木马检测方法大多为特征码检测技术体系。行为分析由于具有检测特征码未知的木马、病毒等非法程序的能力,具有主动防御的特点,成为反木马和反病毒等研究领域中的最重要技术。在对现有的基于行为分析技术的反木马策略分析后,本文发现,大多数策略中存在着误报、漏报率过高、应用效率过低、交互式结构不符合用户使用等诸多不足。鉴于上述不足,本文通过对反木马算法理论体系研究和木马行为特征分析,建立了一套反木马算法设计标准,为算法的构建和实际应用提供有力的依据。在此基础上提出并设计了用于行为分析反木马的模糊模型分类算法,最后利用实验证明了本文反木马算法的有效性。本文的主要研究工作如下:
     1.归纳总结出行为分析反木马技术面临的核心问题。本文深入的分析木马危害的形成过程,描述了木马产业链的构成。讨论了主流木马查杀技术与评判标准。对现有的反木马算法进行了深入地剖析。最后本文总结认为:用于行为分析判别的分类算法不成熟是该技术面临的核心问题。
     2.建立了反木马算法设计标准,同时指出了判别精度的理论上界。本文依据国外著名学者Frederick.Cohen博士提出的恶意代码的不可精确判定理论体系,指出了冯.诺伊曼体系下的恶意代码是无法在多项式时间内100%精确地判别恶意代码,从而从理论上得到反木马系列算法只可做到局部100%的精度上界。在总结和归纳目前反木马分类算法上存在的不足,结合行为分析技术以及反木马的特点,我们给出了三条原则作为基于行为分析的反木马设计算法标准:算法能够有效缓减特征属性的增长带来的算法效率下降;算法允许在多项式时间内自适应局部收敛到一个有效精度;算法可以自动提炼特征属性。
     3.提出了一种新的基于多层模糊分类系统的反木马算法。模糊分类是指用来处理带有模糊性模式的识别方法,具备概率推理能力强、语义清晰、易于理解等特点。木马与合法程序的行为特征就存在这样的一种模糊性。基于这种模糊性的特点以及反木马算法设计的三个标准,本文给出了一种新的基于多层模糊分类系统的反木马算法,它根据模糊规则初始分类的正确与否自适应的调整规则的置信强度来训练算法中的规则,最终实现木马判别的局部高精度分类。
     4.对本文给出的算法进行实验验证,并利用交叉验证法首次解决了木马实验数据集一直存在的小数据量及测试样本库权威性的问题,从而使测试结果更加有效。实验中,共收集了200个合法程序,并查阅和研究了200个木马的技术细节,提取了常见的7个行为特征来进行实验。通过实验结果分析,证明了本文给出模糊分类算法在训练阶段可在有效的时间实现局部100%的分类精度,并在测试阶段取得高精度判别。同时,本文通过与同等实验条件下的贝叶斯反木马分类算法的比较后发现,我们的算法在平均精度和最佳精度上均高于贝叶斯算法。
The rapid development and wide application of network are raising more and more concerns on information security. Trojan-horse is always associated with information crimes, be it economical, political or both. No practical solution has been found. The significance of the Trojan-horse problem is represented in the frequency and severity of related cases. Solving the problem has attracted concentrated effort of information research.
     Current anti-Trojan is almost signature-based strategies. Behavior analysis, with the ability to detect Trojans with unknown signatures, is a technique of initiative defense. Its potential to meet the future needs of information security has made behavior analysis a hotspot in anti-Trojan studies. Current behavior analysis based anti-Trojan strategies have the following problems: high false or failure alarm rate, poor efficiency, and poor user-friendly interface design, etc. we conclude that the core problem lies in the immature categorization algorithm model, which is used to analyze and judge the behaviors. Most of the previous studies have used existing classification algorithms that were not specifically designed for anti-Trojan and may cause problems. This paper works on the design of an anti-Trojan oriented algorithm based on behavior analysis. Our work is as follows:
     Firstly, we conclude that the core problem lies in anti-Trojan based on behavior analysis. We analyze the process of the Trojan harm, and next its class and character of behavior and illustrate the industry chain based on Trojan. Then we discuss the main anti-Trojan technique and criterion. We introduce the behavior analysis and point out the virtue with the above compeers. Some existing examples are proposed to present the anti-Trojan base on behavior analysis for the purpose that finding out the core problem, and we point out that the immature categorization algorithm model is the key, which is used to analyze and judge the behaviors.
     Secondly, we construct standard of anti-Trojan algorithm system and point the up-limit of the precision. We began with the theory that all malicious codes in a Von Neumann System cannot be precisely predicted within a polynomial computation time, so theoretically there is an up-limit of the precision of detection. First, we point out three principles of algorithm design: first, the algorithm should automatically extract features. Second, the algorithm should adapt its efficiency to increasing number of features Third, the algorithm should self-adaptively converge to a certain precision within polynomial computation time.
     Thirdly, we propose algorithms of Trojan horse detection based on behavior analysis. Fuzzy classification is a method that deals with some fuzzy pattern which always have a fuzzy domain but clearly pattern. The feature of both Trojan and legal code belongs to the fuzzy pattern. Based on the certain fuzzy point and three principles, we propose algorithms of Trojan horse detection based on behavior analysis. The method can adaptively tune the confidence value based on whether it is false or right classification primly in order to train the rules, finally, to get a powerful classification machine for anti-Trojan.
     Fourthly, we organize the experiment to get the result. In order to insure the authority of experiment and to solve the problem of limited of pattern number, we introduced the cross-fold method to test the algorithm. 200 legal codes and 200 pattern description of Trojan from Symantec corp. were analyzed, from which 7 behavioral features were extracted for experiment. Our results show 100% local accuracy after a practical amount of training, and high precision classification in the testing phase. We've compared our algorithm with the Bayesian classification algorithm. Under equal conditions, our algorithm yielded better results in terms of both average and optimal accuracy.
引文
[1]赛门铁克公司.赛门铁克安全威胁报告[R].2007年六月发布安全报告.2007.6
    [2]金山公司.2007上半年中国大陆地区电脑病毒疫情&互联网安全报告[R].2007年六月发布安全报告.2007.6
    [3]赛门铁克公司.木马,病毒,蠕虫的区别以及防范[OL].http://servic el.symantec.com/SUPPORT/nav.nsf/f70575ac37738e6a88256689006d4438/024c927836400f528825675100593eb2?
    [4]木马程序.卡巴斯基官方的程序描述[OL].http://www.viruslist.com/en/virusesdescribed?chapter=152540521
    [5]张晓兵.反木马技术选择了谁[J].软件世界.2006.9.20
    [6]陈型号,罗敏,张焕国.入侵检测技术概述[J].计算机工程与应用,2004年2月,第2期:133-135.
    [7]高蔷,朱虹.行为金融学与传统金融学的比较分析[J].商业研究,2005年02期.
    [8]黄珊,刘跃.消费者网上购买行为分析及对策[J].价值工程,2004/23/10
    [9]范泽孟,刘怡君,汪云林,李丁,付允.社会物理学国际前沿研究透视[J].价值工程,2007年8月.
    [10]陈型号,罗敏,张焕国.入侵检测技术概述[J].计算机工程与应用,2004年2月,第2期:133-135.
    [11]天融信网络.行为识别技术为防范垃圾邮件提速[N].计算机安全.2006.7,PP:62
    [12]Fred Cohen.Computer Viruses-Theory and Experiments[J].Top-Help,1984,
    [13]Fred Cohen.Models of Practical Defenses Against Computer Viruses[J].Computers and Security,Volume 8,Issue 2,pp.149-160.ISSN 0167-4048 April 1989.
    [14]Fred Cohen.Reply to Comment on A Framework for Modelling Trojans and Computer Virus Infection by E.Makinen[J].The Computer Jouranl,vol.44.No 4.2001.pp.326-327.ISSN 1460-2067.
    [15]王剑,唐朝京.基于扩展通用图灵机的计算机病毒传染模型[J].计算机研究与发展.2003.9.vol.40,No.9.
    [16]Diomidis Spinellis.Reliable Identification of Bounded-Length Viruses Is NP-Complete[J].IEEE TRANSACTIONS ON INFORMATION THEORY VOL.49.NO.1,2003,11:955-968
    [17]R.Rivest,RFC文档1321:The MD5 message-digest algorithm Status[J].INFORMATIONAL,Apr.1992.
    [18]V.PREVELAKIS AND D.SPINELLIS,Sandboxing applications in USENix[J].2001 Technical Conf.Proc.:FreeNIX Track.Boston,MA:Usenix Assoc,June 2001.
    [19]郑飞.计算机病毒探究[J].小型微型计算机系统.1991.12卷.
    [20]N.Belacel,M.R Boulassel.Multicriteria fuzzy classification procedure procftn:methodogy and medical and medical application[J].Fuzzy Sets and Systems.vol.141,pp.203-217,2004.
    [21]J.C.Bezdek.Pattern recognition with fuzzy objective function algorithms[J].Plenum press,1981.
    [22]J.K.Kishore,L.M.Patnaik,V.Maini ad V.K.Agrawal,Application of genetic programming for multicategory pattern classication.IEEE Transactions on Evolutionary computation,vol.4,pp.242-257,2000
    [23]J.Abonyi,J.A.Roubo and F.Szeifert,Data0driven generation of compact,accurate,and linguistically sound fuzzy classiers based on a decision-tree initialization.International Journal of Approximate Reasoning[J],vol.32,pp.1-21,2003
    [24]A.N.Srivastava,R.Su and A.S.Weigend.Data mining for features using scale-sensitive gated experts[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,vol.21,pp.1268-1279,1999.
    [25]M.Setnes and H.Roubos.GA-fuzzy modeling and classification:complexity and performance[J].IEEE Transactions on Fuzzy Systems.vol.8,pp.509-522.2000
    [26]H.Ishibuchi,K.Nozaki,N.Yamamoto and H.Tanaka,Selecting fuzzy if-then rules for classification problems using genetic algorithms[J].IEEE Transactions on Fuzzy Systems,vol.3,pp.260-270,1995.
    [27]V.Ravi and J.Zimmermann.Fuzzy rule based classification with feature selector and modified threshold accepting[J].European Journal of Operational Research,vol.123,pp.16-28,2000
    [28]N.Ye and X.Li.A scalable incremental learning algorithm for classification problems[J].Computers and Industrial Engineering.vol.43,pp.677-692,2002.
    [29]Y.Yuan and H.Zhuang.A genetic algorithm for generating fuzzy classification rules[J].Fuzzy Sets Systems,vol.84,pp.1-19,1996
    [30]R.Battiti and A.M.Colla.Democracy in neural nets:Voting Schemes for classification[J].Neural Networks,vol.7,pp.691-707,1994.
    [31]D.A.Chiang and N.P.Lin.Partial correlation of fuzzy sets[J].Fuzzy Sets Systems,vol.110,pp.209-215,2000.
    [32]L.K.Hansen and P.Salamon.neural network ensembles[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,vol.12,pp.993-1001,1990.
    [33]L.Xu,A.Krzyzak and C.Y.Snen.Methods of combining multile classifiers and their applications to handwriting recognition[J].IEEE Transactions on Systems,Man and Cybernetics,Part B,vol.22,pp.418-435,1992.
    [34]S.B.Cho and J.H.Kim.Combing multiple neural networks by fuzzy intergral for robust classification[J].IEEE Transactions on Systems,man and Cybernetics,Part B,vol.25,pp.380-384,1995.
    [35]S.B.Cho and J.H.Kim.Multiple network fusion using fuzzy logic[J].IEEE Transactions on Neural Networks,vol.6,pp.497-501,1995.
    [36]T.Nakasbima and H.ishibuchi.Supervised and unsupervised fuzzy discrimination of Continuous Attributes for Pattern Classification Problems[J].in Proc.of Knowledge-Based Intelligent information Engineering Systems &Allied Technologies,pp.32-36,2001.
    [37]A.Devillez,M.Sayed-Mouchaweh and P.Billaudel.A process monitoring module based on fuzzy logic and pattern recognition[J].International Journal of Approximate Reasoning.vol.37,pp.43-70,2004.
    [38]杨立,左春,王裕国.基于语义距离的K-最近邻分类方法[J].软件学报.2005.Vol.16,No.12
    [39]Y.leung,J-H,Maand W.X Zhang.A new method for mingfor regression classes in large data sets[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,vol.23,pp.5-21,2001.
    [40]杨玲,孟传良.基于启发式分析的木马检测技术研究[J].计算机应用.2006.3.29
    [41]智明,郑姨婷.基于缓冲区溢出的木马研究[J].研究与设计.2006年第22卷第9期.
    [42]王小伟,王黎明.基于动态人工免疫的邮件分类算法[J].计算机应用.vol.26,No.10.2006.
    [43]梁晓,李毅超.基于线程调度的进程隐藏检测技术研究[J].计算机科学.2006vol.33No.10
    [44]雷校勇,黄小平.windows RootKit技术原理及防御策略[J].2006第22卷第7期.
    [45]苗恺,周轲.ARP协议简介及ARP木马防治[J].2006年11月。
    [46]H.Ishibuchi and T.Nakashima.Improving the performance of fuzzy classifier systems for pattern classification problems with continuous attribute[J]s.IEEE Transactions Indus.Electronics vol.46,pp.1057-1068,1999.
    [47]K.Nozaki,H.Ishibuchi and H.Tanaka.Adaptive fuzzy rule-based classification systems[J].IEEE Transactions Fuzzy Systems,vol.4,pp.238-250,1996.
    [48]X.Wang and j.Hong.Learning optimization in simplifying fuzzy rules[J].Fuzzy Sets and Systems,vol.106,pp.349-356,1999.
    [49]T.RWu and S.M.Chen.A new method for constructing membership functions and fuzzy rules from training examples[J].IEEE Transactions on Systems,Man and Cybernetics,Part B.vol.29,pp.334-347,1999.
    [50]B.Novakovic,D.Scap and D.Novakovic.An analytic approach to fuzzy robot control synthesis[J].Engineering Applications of Artificial Intelligence,vol.13,pp.71-83,2000.
    [51]郭振强.人工免疫算法及其应用研究[D].博士论文.2004.6.28
    [52]R.Babuska,J.Oosterhoff,A.Oudshoorn and RM.Bruijn.Fuzzy self-tuning PI control of pH in Fermentation[J].Engineering Applications of Artificial Intelligence,vol.15,pp.3-15,2002.
    [53]W.Melek,A.Goldenberg and M.R.Emami.A fuzzy noise-rejection data partitioning algorithm[J].International Journal of Approximate Reasoning,vol.38,pp.1-17,2005.
    [54]H.J.Zimmermann,Fuzzy set theory and its application[J].Kluwer academi,1991.
    [55]Y.C.Hu,R.S.Chen and G.H.Tzeng.Ming fuzzy association rules for classification problems[J].Computers and IndustrialEngineering,vol.43,pp.735-750,2002.
    [56]T.P.Wu and S.M.Chen.A new method for constructing membership functions and fuzzy rules from training examples[J].IEEE Transactions on Systems,man and Cybernetics,Part B,vol.29,pp.334-347,1999.
    [57]A.Vernet and G.A.Kopp.Classification of turbulent flow patterns with fuzzy clustering[J].Engineering Applicationsof Artificial Intelligence,vol.15,pp.315-326,2002.
    [58]D.T.Redden.Further examination of fuzzy linear regression[J].Fuzzy Sets Systems,vol.79,pp.203-211,1996.
    [59]D.A.Savic and W.Pedrycz.Evaluation of fuzzy linear regression medel[J].Fuzzy Sets Systems,vol.39,pp.51-63,1991.
    [60]H.Tanaka.Fuzzy data analysis by possibility linear models[J].Fuzzy sets Systems,vol,24,pp.143-149,1997.
    [61]Y.P.Kuo and T.H.S.Li.Ga-based fuzzy PI/PD controller for automotive active suspension system[J].IEEE Transactions on Industrial Electronics,vol.46,pp.1051-1056,1999.
    [62]SCHULTZ M,ESKIN E,ZADOKE,et al.Data mining methods for detection of new malicious executables[A].Proceedings of the IEEE Symposium on Security and Privacy[C].Los Alarnitos,CA,2001.38-49.
    [63]商海波.一种基于行为分析的反木马策略[D].学位论文.2005.
    [64]Chess D.The Future of Viruses on the Internet[OL].http://www.research.ibm.com/antivirus/SciPapers/Chess/Future.html
    [65]Kim J,Bentley RAnArificail Immune Model for Network Intrusion Detection[J].In:Proc.of 7th European Congress on Intelligent Techniques and Soft Computing (EUFIT'99),Aachen,Germany,1999.
    [66]Forrest S,Perelson A S,Allen L,Cherukuri R.Self-nonself Discrimanation in a Computer[J].In:proc.of the 1994 IEEE Symposium on Research in Security and Privacy,Los Alamitos,CA.IEEE Computer Society Press,1994.
    [67]Forrest S,Hofmeyr S,Somayaji A.Computer Immunology[J].Communications of the ACM,1997,40(10):88-96
    [68]鲍欣龙,马建烽.用于未知病毒检测的免疫识别模型和算法研究[J].计算机科学.2005,vol32,No 1;
    [69]TESAURO GJ,KEPHART JO,SORKIN GB,Neural networks for computer virus recognition[J]IEEE Expert,1996,(8):5-6
    [70]Ishibuchi,H.,Nozaki,K.,Yamamoto,N.,Tanaka,H.,Selecting fuzzy if-then rules for classification problems using genetic algorithms[J].IEEE Transactions on Fuzzy Systems 3,260-270.1995.
    [71]张波云,殷建平,唐文胜.基于模糊模式识别的未知病毒检测[J].计算机应用.25卷第9期.2005.9
    [72]张亮.基于人工免疫机制的木马检测与防御技术研究[D].硕士论文.2004
    [73]王维,肖新光,戴敏,李柏松.文件静态特性评估下的木马检测机制[J].信息安全与保密通信.2005.8
    [74]Nai Ren Guo,Tzuu-Hseng S.Li and Chao-Lin Kuo.Hierarchical Fuzzy Model for Classification Problem[J]IEEE Transactions on IECON 02 Volume:3,On page(s):2096- 2101 vol.3
    [75]雷英杰,张善文,李续武,周创明.遗传算法工具箱及应用[M].西安电子科技大学出版社,2005

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700