基于词法特征的恶意域名快速检测算法

英文篇名：Fast malicious domain name detection algorithm based on lexical features
作者：赵宏 ; 常兆斌 ; 王乐
英文作者：ZHAO Hong;CHANG Zhaobin;WANG Le;School of Computer and Communication, Lanzhou University of Technology;
关键词：恶意域名 ; 词法特征 ; 检测算法 ; 编辑距离 ; 实时性
英文关键词：malicious domain name;;lexical feature;;detection algorithm;;editing distance;;performance of real time
中文刊名：JSJY
英文刊名：Journal of Computer Applications
机构：兰州理工大学计算机与通信学院;
出版日期：2018-11-05 13:02
出版单位：计算机应用
年：2019
期：v.39;No.341
基金：国家自然科学基金资助项目(51668043);; 赛尔网络下一代互联网技术创新项目(NG1120160311,NG1120160112)~~
语种：中文;
页：JSJY201901040
页数：5
CN：01
ISSN：51-1307/TP
分类号：233-237

摘要

针对互联网中恶意域名攻击事件频发,现有域名检测方法实时性不强的问题,提出一种基于词法特征的恶意域名快速检测算法。该算法根据恶意域名的特点,首先将所有待测域名按照长度进行正则化处理后赋予权值;然后利用聚类算法将待测域名划分成多个小组,并利用改进的堆排序算法按照组内权值总和计算各域名小组优先级,根据优先级降序依次计算各域名小组中每一域名与黑名单上域名之间的编辑距离;最后依据编辑距离值快速判定恶意域名。算法运行结果表明,基于词法特征的恶意域名快速检测算法与单一使用域名语义和单一使用域名词法的恶意域名检测算法相比,准确率分别提高1. 7%与2. 5%,检测速率分别提高13. 9%与6. 8%,具有更高的准确率和实时性。
Aiming at the problem that malicious domain name attacks frequently occur on the Internet and existing detection methods are not effective enough in performance of real time, a fast malicious domain name detection algorithm based on lexical features was proposed. According to characteristics of malicious domain name, all domain names to be tested were firstly normalized according to their lengths and the weights were given to them in the algorithm. Then a clustering algorithm was used to divide domain names to be tested into several groups, and the priority of each domain group was calculated by the improved heap sorting algorithm according to the sum of weights in group, the editing distance between each domain name in each domain name group and the domain name on blacklist was calculated in turn. Finally, malicious domain name was quickly determined according to the editing distance value. The running results of algorithm show that compared with the malicious domain name detection algorithm of only using domain name semantics and the algorithm of only using domain name lexical features, the accuracy of fast malicious domain name detection algorithm based on lexical features is increased by1. 7% and 2. 5% respectively, the detection rate is increased by 13. 9% and 6. 8% respectively. The proposed algorithm has higher accuracy and performance of real-time.

引文

[1]网络安全信息与动态周报.第13期互联网安全威胁报告[EB/OL].[2018-04-01]. http://www. cert. org. cn/publish/main/44/2018/20180404150414268888501/20180404150414268888501_201html.(National Internet Emergency Center. 13th Internet security threat report[EB/OL].[2018-04-01]. http://www. cert. cn./publish/main/44/20180404150414268888501/20180404150414268888501_. html.)
    [2]WANG T S, LIN H T, CHENG W T, et al. DBod:clustering and detecting DGA-based botnets using DNS traffic analysis[J]. Computers&Security, 2016, 64:1-15.
    [3]牛伟纳,张小松,孙恩博,等.基于流相似性的两阶段P2P僵尸网络检测方法[J].电子科技大学学报,2017,46(6):902-906.(NIU W N, ZHANG X S, SUN E B, et al. Two-stage peer-to-peer zombie network detection method based on flow similarity[J]. Journal of University of Electronic Science and Technology of China,2017, 46(6):902-906.)
    [4]POMOROVA O, SAVENKO O, LYSENKO S, et al. A technique for the botnet detection based on DNS-traffic analysis[C]//Proceedings of the 22nd International Conference on Computer Networks. Berlin:Springer, 2015:127-138.
    [5]YU B, OLUMOFIN F, SMITH L, et al. Behavior analysis based DNS tunneling detection and classification with big data technologies[C]//Proceedings of the 2016 International Conference on Internet of Things and Big Data. Setubal:SciTePress, 2016:284-290.
    [6]PERDISCI R, CORONA I, DAGON D, et al. Detecting malicious flux service networks through passive analysis of recursive DNS traces[C]//Proceedings of the 25th Computer Security Applications Conference. Washington, DC:IEEE Computer Society, 2009:311-320.
    [7]张维维,龚俭,刘茜,等.基于词素特征的轻量级域名检测算法[J].软件学报,2016,27(9):2348-2364.(ZHANG W W, GONG J, LIU Q, et al. Lightweight domain name detection algorithm based on morpheme features[J]. Journal of Software, 2016, 27(9):2348-2364.
    [8]黄诚,刘嘉勇,刘亮,等.基于上下文语义的恶意域名语料提取模型研究[J].计算机工程与应用,2018,54(9):101-108.(HUANG C, LIU J Y, LIU L, et al. Research on the extraction model of malicious domain name corpus based on context semantics[J]. Computer Engineering and Applications, 2018, 54(9):101-108.)
    [9]WANG W, SHIRLEY K. Breaking bad:detecting malicious domains using word segmentation[J]. Ar Xiv Preprint, 2015, 2015:1506. 04111.
    [10]张洋,柳厅文,沙泓州,等.基于多元属性特征的恶意域名检测[J].计算机应用,2016,36(4):941-944.(ZHANG Y, LIU T W, SHA H Z, et al. Detection of malicious domain names based on multivariate attribute features[J]. Journal of Computer Applications, 2016, 36(4):941-944.)
    [11]刘爱江,黄长慧,胡光俊.基于改进神经网络算法的木马控制域名检测方法[J].电信科学,2014,30(7):39-42.(LIU A J,HUANG C H, HU G J. A method of Trojan control domain name detection based on improved neural network algorithm[J]. Tele-communications Science, 2014, 30(7):39-42.)
    [12]TRUONG D-T, CHENG G, AHMAD J, et al. Detecting DGAbased botnet with DNS traffic analysis in monitored network[J].Journal of Internet Technology, 2016, 17(2):217-230.
    [13]左晓军,董立勉,曲武.基于域名系统流量的Fast-Flux僵尸网络检测方法[J].计算机工程,2017,43(9):185-193.(ZUO X J, DONG L M, QU W. Fast-Flux zombie network detection based on domain name system traffic[J]. Computer Engineering, 2017,43(9):185-193.)
    [14]周昌令,栾兴龙,肖建国.基于深度学习的域名查询行为向量空间嵌入[J].通信学报,2016,37(3):165-174.(ZHOU C L,LUAN X L, XIAO J G. Domain name query behavior vector space embedding based on depth learning[J]. Journal on Communications, 2016, 37(3):165-174.)
    [15]KHALIL I, YU T, GUAN B. Discovering malicious domains through passive DNS data graph analysis[C]//Proceedings of the11th ACM Asia Conference on Computer and Communications Security. New York:ACM, 2016:663-674.
    [16]周维柏,李蓉.基于关联规则挖掘的集中式僵尸网络检测[J].兰州理工大学学报,2016,42(6):109-113.(ZHOU W B, LI R.Centralized zombie network detection based on association rules mining[J]. Journal of Lanzhou University of Technology, 2016,42(6):109-113.)
    [17]周勇林,由林麟,张永铮.基于命名及解析行为特征的异常域名检测方法[J].计算机工程与应用,2011,47(20):50-52.(ZHOU Y L,YOU L L, ZHANG Y Z. An anomaly domain name detection method based on naming and analytic behavior features[J]. Computer Engineering and Applications, 2011, 47(20):50-52.)
    [18]陈春萍.基于SVM与AdaBoost组合的分类算法研究[D].西安:西安电子科技大学,2012.(CHEN C P. Research on classification algorithm based on SVM and AdaBoost combination[D].Xi'an:Xidian University, 2012.)
    [19]ZHANG W. Relief feature selection and parameter optimization for support vector machine based on mixed kernel function[J/OL].International Journal of Performability Engineering, 2018, 14(2)[2018-02-20]. http://www. ijpe-online. com/relief-feature-selection-and-parameter-optimization-for-support-vector-machine-basedon-mixed-kernel-function. html#axzz5TzKru9vC.
    [20]Malware domain list. Malware domain list[EB/OL].[2018-05-08]. http://www. malwaredomainlist. com. php.
    [21]罗文塽,曹天杰.基于非用户操作序列的恶意软件检测方法[J].计算机应用,2018,38(1):56-60.(LUO W S, CAO T J. A malicious software detection method based on non-user operation sequence[J]. Journal of Computer Applications, 2018, 38(1):56-60.)
    [22]Alexa Top Global Sites. Alexa top global sites[EB/OL].[2018-05-08]. http://www. alexa. com/topsites.