摘要
随着计算机技术的发展和普及,计算机病毒带来的危害日趋严重。传统N-Gram算法难以提取不同长度的特征,导致有效特征缺失,并产生庞大的特征集合,造成空间的浪费。针对这些问题,提出一种改进的基于N-Gram的特征码自动提取方法。该方法在原有N-Gram特征提取算法的基础上引入变长N-Gram特征,提取不同长度的有效特征,生成不定长病毒特征码。综合考虑特征频率的相关性,利用特征浓度对N-Gram特征进行有向筛选,生成数据字典,节省存储空间。实验结果表明,与单纯使用定长N-Gram的算法相比,该方法能有效降低特征码自动提取的误报率。
With the rapid development of computer technology,security threats brought by computer virus have become more and more serious.The traditional N-Gram algorithm is difficult to capture bytes of different length,leading to the lack of effective signature and the geheration of huge signature sets,and creating a waste of storage space.Instead of using fixed-length N-Gram feature that the traditional way dose,an improved computer virus signature automatic extraction algorithm based on variable-length N-Gram was proposed to solve these problems.It extracts the effective signature to generate variable-length virus signature.Taking the correlation of signature frequency into account,the algorithm uses signature concentration to extract the N-Gram feature of malware samples and generates a data dictionary to save the storage space.In the experiment results,compared with the traditional algorithm which uses fixed-length NGram feature,the proposed method can effectively decrease the false rate of signature extraction.
引文
[1]YEGNESWARAN V,GIFFIN J T,BARFOD P,et al.An architecture for generating semantics-aware signatures[C]∥Conference on Usenix Security Symposium.USENIX Association,2004:7-7.
[2]LEE H,KIM W,HONG M.Biologically Inspired Computer Virus Detection System[J].Lecture Notes in Computer Science,2004,3141:153-165.
[3]KIJEWSKI P.Automated Extraction of Threat Signatures from Network Flows[OL].http://www.first.org/conference/2006/papers/kijewski-piotr-paper.pdf.
[4]KREIBICH C,ROWCROFT J.Honeycomb:creating intrusion detection signatures using honeypots[J].Acm Sigcomm Computer Communication Review,2015,34(1):51-56.
[5]张小康,帅建梅,史林.基于加权信息增益的恶意代码检测方法[J].计算机工程,2010,36(6):149-151.
[6]KEPHART J O,ARNOLD W C.Automatic extraction of computer virus signatures[C]∥4th Virus Bulletin International Conference.1994.
[7]张福勇.基于n-gram词频的恶意代码特征提取方法[J].网络安全技术与应用,2015(11):88-89.
[8]白金荣,王俊峰,赵宗渠.基于PE静态结构特征的恶意软件检测方法[J].计算机科学,2013,40(1):122-126.
[9]RAFF E,ZAK R,COX R,et al.An investigation of byte n-gram features for malware classification[J].Journal of Computer Virology&Hacking Techniques,2016:1-20.
[10]曾键,赵辉.一种基于N-Gram的计算机病毒特征码自动提取方法[J].计算机安全,2013(10):2-5.
[11]李沁蕾,王蕊,贾晓启.OSN中基于分类器和改进n-gram模型的跨站脚本检测方法[J].计算机应用,2014,34(6):1661-1665.
[12]DHAYA R,POONGODI M.Detecting software vulnerabilies in android using static analysis[C]∥International Conference on Advanced Communication,Control and Computing Technologies.2014.
[13]O’KANE P,SEZER S,MCLAUGHLIN K.N-gram density based malware detection[C]∥Computer Applications&Research.IEEE,2014:1-6.
[14]SHABTAI A,MOSKOVITCH R,FEHER C,et al.Detecting unknown malicious code by applying classification techniques on OpCode patterns[J].Security Informatics,2012,1(1):1-22.
[15]SANTOS I,BREZO F,UGARTE-PEDRERO X,et al.Opcode sequences as representation of executables for data-miningbased unknown malware detection[J].Information Sciences,2013,231(9):64-82.
[16]吴军.数学之美[M].北京:人民邮电出版社,2012.
[17]恶意代码网站[OL].http://vxheaven.org.
[18]金雄斌.计算机病毒特征码自动提取技术的研究[D].武汉:华中科技大学,2011.
[19]TANG Y,XIAO B,LU X.Using a bioinformatics approach to generate accurate exploit-based signatures for polymorphic worms[J].Computers&Security,2009,28(8):827-842.