基于机器学习的高效恶意软件分类系统

英文篇名：Machine learning-based efficient malware classification system
作者：屈巍 ; 侍啸 ; 李东宝
英文作者：QU Wei;SHI Xiao;LI Dongbao;Software College,Shenyang Normal University;
关键词：恶意软件 ; 机器学习 ; 特征选择 ; 分类系统
英文关键词：malware;;machine learning;;feature selection;;classification system
中文刊名：SYSX
英文刊名：Journal of Shenyang Normal University(Natural Science Edition)
机构：沈阳师范大学科信软件学院;
出版日期：2018-12-15
出版单位：沈阳师范大学学报(自然科学版)
年：2018
期：v.36;No.124
基金：辽宁省科技厅自然科学基金资助项目(20180550536)
语种：中文;
页：SYSX201806013
页数：6
CN：06
ISSN：21-1534/N
分类号：73-78

摘要

现代恶意软件多用变形和多态的方法躲避检测,导致恶意软件变种数量大幅度增加。反恶意软件今天面临的主要挑战是需要评估潜在的有恶意意图的大量的数据和文件,研究者每天收到大量的新恶意软件,并对它们进行分类,而后提取基于家族的特征。而传统的人工分类无论是在工作耗时还是在分类的准确程度都已经无法应对大量出现的新样本,并且人工的干预很难发现样本中隐藏的信息,因此高效自动化的恶意软件分类系统对反恶意软件的研究具有重大的意义。基于以上问题,提出了一种基于机器学习的多特征选择融合的高性能、高效率的自动分类系统。经过实验,系统达到了较低的对数损失0.006 4,并且提取特征的平均耗时仅需约6.5s,相比于单特征的系统在性能与效率上有了极大的提升。
Number of malware variants are increasing rapidly,as a consequence of metamorphosis and polymorphosis which are mostly used in modern malicious software to evade detection.Today a major challenge anti-virus and anti-spyware are faced with is the analysis of data and files with potential malicious intention in a huge amount.Large amounts of malwares are sent to researchers every day,from which family-based features are extracted after classifications.However,within the limit of time and accuracy,the emergence of abundant new malware samples can no longer be fed by traditional manual approaches,for which it is hard to reveal implicit information hidden in samples.and this makes new automatic approaches meaningful,with high efficiency in classification of malware.Motivated by this problem,a classification system using machine learning is proposed,with a high performance and fusion of multi-feature selection.According to experimental results,compared with single-feature methods,our approach have a promotion in performance and efficiency,with log loss 0.0064 and a 6.5-second feature extraction time.

引文

[1]王彩荣.计算机病毒的防治[J].沈阳师范学院学报(自然科学版),1997,15(2):12-15.
    [2]李潇,刘俊奇,范明翔.WannaCry勒索病毒预防及应对策略研究[J].电脑知识与技术,2017,13(19):19-20.
    [3]吴凌飞.恶意软件变种间相似度的分析技术研究[D].杭州:杭州电子科技大学,2011.
    [4]潘学瑞.基于Android平台的静态混淆和动态防御的研究[D].南京:南京大学,2015.
    [5]MOSER A,KRUEGEL C,KIRDA E.Limits of Static Analysis for Malware Detection[C]∥23rd Annual Computer Security Applications Conference,2007:421-430.
    [6]SCHULTZ M,ESKIN E,ZADOK F,et al.Data Mining Methods for Detection of New Malicious Executables[C]∥Proceedings of2001 IEEE Symposium on Security and Privacy,2001:38-49.
    [7]NATARAJ L,KARTHIKEYAN S,JACOB G,et al.Malware Images:Visualization and Automatic Classification[C]∥Proceedings of the 8th International Symposium on Visualization for Cyber Security,2011:Article No.4.
    [8]KONG D,YAN G.Discriminant Malware Distance Learning on Structural Information for Automated Malware Classification[C]∥Proceedings of the ACM SIGMETRICS/International Conference on Measurement and Modeling of Computer Systems,2013:347-348.
    [9]李志周,白金荣.基于操作码N-Gram的Windows恶意软件检测[J].中小企业管理与科技(中旬刊),2015(4):241-243.
    [10]陈健,范明钰.基于恶意软件分类的特征码提取方法[J].计算机应用,2011,31(S1):83-84.
    [11]邱景.面向软件安全的二进制代码逆向分析关键技术研究[D].哈尔滨:哈尔滨工业大学,2015.
    [12]陈云龙.基于信息熵的恶意软件检测方法研究[D].天津:天津大学,2013.
    [13]王金铨,梁茂成,俞洪亮.基于N-gram和向量空间模型的语句相似度研究[J].现代外语,2007,30(4):405-413.
    [14]陈维.恶意软件识别方法研究与应用[D].成都:电子科技大学,2017.
    [15]HU X,JANG J,WANG T,et al.Scalable malware classification with multifaceted content features and threat intelligence[J].Ibm Journal of Research&Development,2016,60(4):6:1-6:11.
    [16]AHMADI M,ULYANOV D,SEMENOV S,et al.Novel Feature Extraction,Selection and Fusion for Effective Malware Family Classification[C]∥ACM Conference on Data and Application Security and Privacy,2016:183-194.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700