一种基于域名请求伴随关系的恶意域名检测方法

英文篇名：Detecting Malicious Domains Using Co-Occurrence Relation Between DNS Query
作者：彭成维 ; 云晓春 ; 张永铮 ; 李书豪
英文作者：Peng Chengwei;Yun Xiaochun;Zhang Yongzheng;Li Shuhao;Institute of Computing Technology, Chinese Academy of Sciences;University of Chinese Academy of Sciences;Institute of Information Engineering, Chinese Academy of Sciences;
关键词：域名请求 ; 请求伴随 ; 恶意域名 ; 时间序列切割 ; 向量化表示 ; 域名分类
英文关键词：DNS queries;;co-occurrence;;malicious domains;;DNS cut;;tensor representation;;domain classification
中文刊名：JFYZ
英文刊名：Journal of Computer Research and Development
机构：中国科学院计算技术研究所;中国科学院大学;中国科学院信息工程研究所;
出版日期：2019-06-15
出版单位：计算机研究与发展
年：2019
期：v.56
基金：国家重点研发计划项目(2016YFB0801502);; 国家自然科学基金项目(U1736218)~~
语种：中文;
页：JFYZ201906014
页数：12
CN：06
ISSN：11-1777/TP
分类号：133-144

摘要

恶意域名在网络非法攻击活动中承担重要的角色.恶意域名检测能够有效地减少攻击活动所带来的经济损失.提出CoDetector恶意域名检测模型,通过挖掘域名请求之间潜在的时空伴随关系进行恶意域名检测.研究发现域名请求之间存在彼此伴随关系,而并非相互独立.因此,彼此伴随的域名之间存在紧密关联,偏向于同时是正常域名或恶意域名.1)利用域名请求的先后时间顺序对域名数据进行粗粒度的聚类操作,将彼此伴随出现的域名划分到同一簇中;2)采用嵌入学习构建映射函数,在保留域名伴随关系的同时将每一个域名投影成低维空间的特性向量;3)结合有标记的数据,训练恶意域名检测分类器,用于检测更多未知恶意域名.实验结果表明,CoDetector能够有效地检测恶意域名,具有91.64%检测精度和96.04%召回率.
Malicious domains play a vital role in illicit online activities. Effectively detecting the malicious domains can significantly decrease the damage of evil attacks. In this paper, we propose CoDetector, a novel technique to detect malicious domains based on the co-occurrence relationships of domains in DNS(domain name system) queries. We observe that DNS queries are not isolated, whereas co-occur with each other. We base it design on the intuition that domains that tend to co-occur in DNS traffic are strongly associated and are likely to be in the same property(i.e., malicious or benign). Therefore, we first perform coarse-grained clustering of DNS traffic based on the chronological order of DNS queries. The domains co-occurring with each other will be clustered. Then, we design a mapping function that automatically projects every domain into a low-dimensional feature vector while maintaining their co-occurrence relationships. Domains that co-occur with each others are mapped to similar vectors while domains that not co-occur are mapped to distant vectors. Finally, based on the learned feature representations, we train a classifier over a labeled dataset and further apply it to detect unknown malicious domains. We evaluate CoDetector using real-world DNS traffic collected from an enterprise network over two months. The experimental results show that CoDetector can effectively detect malicious domains(91.64% precision and 96.04% recall).

引文

[1]Plohmann D,Yakdan K,Klatt M,et al.A comprehensive measurement study of domain generating malware[C]Proc of USENIX Security Symp.Berkeley,CA:USENIXAssociation,2016:263-278
    [2]Antonakakis M,Perdisci R,Nadji Y,et al.From throwaway traffic to bots:Detecting the rise of DGA-based malware[C]Proc of the 21st USENIX Security Symp.Berkeley,CA:USENIX Association,2012:491-506
    [3]Szurdi J,Kocso B,Cseh G,et al.The long“taile”of typosquatting domain names[C]Proc of the 23rd USENIXSecurity Symp.Berkeley,CA:USENIX Association,2014:191-206
    [4]Cisco.Cisco 2016 annual security report[R/OL].2016[2018-06-28].http:www.cisco.com/c/m/en_us/offers/sc04/2016-annual-security-report/index.html
    [5]Antonakakis M,Perdisci R,Dagon D,et al.Building a dynamic reputation system for DNS[C]Proc of the 19th USENIX Security Symp.Berkeley,CA:USENIXAssociation,2010:273-290
    [6]Bilge L,Sen S,Balzarotti D,et al.Exposure:A passive DNS analysis service to detect and report malicious domains[J].ACM Transactions on Information and System Security,2014,16(4):14
    [7]Antonakakis M,Perdisci R,Lee W,et al.Detecting malware domains at the upper DNS hierarchy[C]Proc of the 20th USENIX Security Symp.Berkeley,CA:USENIXAssociation,2011(11):1-16
    [8]Mikolov T,Sutskever I,Chen Kai,et al.Distributed representations of words and phrases and their compositionality[C]Proc of Advances in Neural Information Processing Systems.Cambridge,MA:MIT Press,2013:3111-3119
    [9]Bojanowski P,Grave E,Joulin A,et al.Enriching word vectors with subword information[J].arXiv preprint arXiv:1607.04606,2016
    [10]Khalil I,Yu Ting,Guan Bei.Discovering malicious domains through passive DNS data graph analysis[C]Proc of the11th ACM on Asia Conf on Computer and Communications Security.New York:ACM,2016:663-674
    [11]Peng Chengwei,Yun Xiaochun,Zhang Yongzheng,et al.Discovering malicious domains through alias-canonical graph[C]Proc of the 16th IEEE Int Conf on Trust,Security and Privacy in Computing and Communications.Piscataway,NJ:IEEE,2017:225-232
    [12]Manadhata P K,Yadav S,Rao P,et al.Detecting malicious domains via graph inference[G]LNCS 8712:Proc of European Symp on Research in Computer Security.Berlin:Springer,2014:1-18
    [13]Rahbarinia B,Perdisci R,Antonakakis M.Segugio:Efficient behavior-based tracking of malware-control domains in large ISP networks[C]Proc of the 45th Annual IEEE/IFIP Int Conf on Dependable Systems and Networks(DSN).Piscataway,NJ:IEEE,2015:403-414
    [14]Wang Xiaoqi,Li Qiang,Yan Guanghua,et al.Detection of covert and suspicious DNS behavior in advanced persistent threats[J].Journal of Computer Research and Development,2017,54(10):2334-2343(in Chinese)(王晓琪,李强,闫广华,等.高级持续性威胁中的隐蔽可以DNS行为的检测[J].计算机研究与发展,2017,54(10):2334-2343)
    [15]Gao Hongyu,Yegneswaran V,Jiang Jian,et al.Reexamining DNS from a global recursive resolver perspective[J].IEEE/ACM Transactions on Networking,2016,24(1):43-57
    [16]Kiefer J,Wolfowitz J.Stochastic estimation of the maximum of a regression function[J].The Annals of Mathematical Statistics,1952,23(3):462-466
    [17]DNS-BH.Malware domain blocklist by risk-analytics[EB/OL].[2017-01-03].http:www.malwaredomains.com
    [18]OpenDNS.Phishtank[EB/OL].[2017-01-03].http:www.phishtank.com
    [19]OpenPhish.Timely,accurate,relevant threat intelligence[EB/OL].[2017-01-03].https:openphish.com
    [20]AbuseList.Ransomware tracker[EB/OL].[2017-01-03].https:ransomwaretracker.abuse.ch
    [21]Porras P A,Sa6di H,Yegneswaran V.A foray into conficker's logic and rendezvous points[EB/OL].[2017-01-03].https:www.usenix.org/legacy/event/leet09/tech/full_papers/porras/porras_html/
    [22]Google.Google safe browsing[EB/OL].[2017-10-14].https:www.google.com/transparencyreport/safebrowsing/
    [23]Alexa.Alexa top 1 million[EB/OL].[2017-03-05].http:s3.amazonaws.com/alexa-static/top-1m.csv.zip
    [24]Pedregosa F,Varoquaux G,Gramfort A,et al.Scikit-learn:Machine learning in python[J].Journal of Machine Learning Research,2011(12):2825-2830
    [25]Chen Tianqi,Guestrin C.XGBoost:A scalable tree boosting system[C]Proc of the 22nd ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining.New York:ACM,2016:785-794

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700