用户名: 密码: 验证码:
基于支持向量机的加密流量识别方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Identification method of encrypted traffic based on support vector machine
  • 作者:程光 ; 陈玉祥
  • 英文作者:Cheng Guang;Chen Yuxiang;School of Computer Science and Engineering,Southeast University;Key Laboratory of Computer Network and Information Integration of Ministry of Education,Southeast University;
  • 关键词:加密流量识别 ; 相对熵 ; 蒙特卡洛仿真 ; 支持向量机
  • 英文关键词:encrypted traffic identification;;relative entropy;;Monte Carlo simulation;;support vector machine
  • 中文刊名:DNDX
  • 英文刊名:Journal of Southeast University(Natural Science Edition)
  • 机构:东南大学计算机科学与工程学院;东南大学教育部计算机网络与信息集成重点实验室;
  • 出版日期:2017-07-20
  • 出版单位:东南大学学报(自然科学版)
  • 年:2017
  • 期:v.47
  • 基金:国家高技术研究发展计划(863计划)资助项目(2015AA015603);; 国家自然科学基金资助项目(61602114);; 中兴通讯研究基金资助项目;; 软件新技术与产业化协同创新中心资助项目
  • 语种:中文;
  • 页:DNDX201704005
  • 页数:5
  • CN:04
  • ISSN:32-1178/N
  • 分类号:28-32
摘要
针对现有的加密流量识别方法难以区分加密流量和非加密压缩文件流量的问题,对互联网中的加密流量、txt流量、doc流量、jpg流量和压缩文件流量进行分析,发现基于信息熵的方法能够有效地将低熵值数据流和高熵值数据流区分开.但该方法不能识别每个字节是随机的而全部流量是伪随机的非加密压缩文件流量,因此采用相对熵特征向量{h_0,h_1,h_2,h_3}区分低熵值数据流和高熵值数据流,采用蒙特卡洛仿真方法估计π值的误差p_(error)来区分局部随机流量和整体随机流量.最终提出基于支持向量机的加密流量和非加密流量的识别方法 SVM-ID,并将特征子空间SVM={h_0,h_1,h_2,h_3,p_(error)}作为SVM-ID方法的输入.将SVM-ID方法和相对熵方法进行对比实验,结果表明,所提方法不仅能够很好地识别加密流量,还能区分加密流量和非加密的压缩文件流量.
        The existing methods of encrypted traffic classification are difficult to effectively distinguish encrypted traffic and compressed file traffic. Through analyzing the encrypted traffic,txt traffic,doc traffic,jpg traffic,and compressed file traffic,it is found that the methods based on information entropy can effectively separate the low entropy traffic and the high entropy traffic. However,this method cannot distinguish non-encrypted compressed file traffic with byte randomness and full flow pseudo randomness. Therefore,the relative entropy feature vector { h_0,h_1,h_2,h_3} is employed to distinguish the low entropy traffic and the high entropy traffic,and the Monte Carlo simulation method is used to estimate the error of π value,p_(error),which can be used to distinguish the local random traffic and the whole random traffic. Finally,a support vector machine( SVM)-based identification method( SVM-ID) for encrypted traffic and non encrypted traffic is proposed. And,the SVM-ID method uses the feature space SVM = { h_0,h_1,h_2,h_3,p_(error)} as the input. The SVM-ID method is compared with the relative entropy method. The experimental results show that the proposed method can not only identify the encrypted traffic well,but also distinguish the encrypted traffic and the non-encrypted compressed file traffic.
引文
[1]Fadlullah Z M,Taleb T,Vasilakos A V,et al.DTRAB:Combating against attacks on encrypted protocols through traffic-feature analysis[J].IEEE/ACM Transactions on Networking,2010,18(4):1234-1247.DOI:10.1109/tnet.2009.2039492.
    [2]Gu G,Perdisci R,Zhang J,et al.Bot Miner:Clustering analysis of netw ork traffic for protocol-and structure-independent botnet detection[C]//USENIX Security Symposium.San Jose,CA,USA,2008:139-154.
    [3]Tankard C.Advanced persistent threats and how to monitor and deter them[J].Network Security,2011,2011(8):16-19.DOI:10.1016/s1353-4858(11)70086-1.
    [4]潘吴斌,程光,郭晓军,等.网络加密流量识别研究综述及展望[J].通信学报,2016,37(9):154-167.DOI:10.11959/j.issn.1000-436x.2016187.Pan Wubin,Cheng Guang,Guo Xiaojun,et al.Review and perspective on encrypted traffic identification research[J].Journal on Communications,2016,37(9):154-167.DOI:10.11959/j.issn.1000-436x.2016187.(in Chinese)
    [5]Cao Z,Xiong G,Zhao Y,et al.A survey on encrypted traffic classification[C]//International Conference on Applications and Techniques in Information Security.Berlin:Springer,2014,490:73-81.DOI:10.1007/978-3-662-45670-5_8.
    [6]赵博,郭虹,刘勤让,等.基于加权累积和检验的加密流量盲识别算法[J].软件学报,2013,24(6):1334-1345.Zhao Bo,Guo Hong,Liu Qinrang,et al.Protocol independent identification of encrypted traffic based on w eighted cumulative sum test[J].Journal of Software,2013,24(6):1334-1345.(in Chinese)
    [7]Bonfiglio D,Mellia M,Meo M,et al.Revealing skype traffic:When randomness plays w ith you[J].ACM SIG COMM Computer Communication Review,2007,37(4):37-48.DOI:10.1145/1282427.1282386.
    [8]Okada Y,Ata S,Nakamura N,et al.Comparisons of machine learning algorithms for application identification of encrypted traffic[C]//10th IEEE International Conference on Machine Learning and Applications and Workshops.Honolulu,USA,2011,2:358-361.DOI:10.1109/icmla.2011.162.
    [9]Dorfinger P,Panholzer G,John W.Entropy estimation for real-time encrypted traffic identification(short paper)[C]//International Workshop on Traffic Monitoring and Analysis.Vienna,Austria,2011:164-171.DOI:10.1007/978-3-642-20305-3_14.
    [10]Sun G L,Xue Y,Dong Y,et al.A novel hybrid method for effectively classifying encrypted traffic[C]//2010IEEE Global Telecommunications Conference.Miami,USA,2010:1-5.DOI:10.1109/glocom.2010.5683649.
    [11]Callado A,Kelner J,Sadok D,et al.Better network traffic identification through the independent combination of techniques[J].Journal of Network and Computer Applications,2010,33(4):433-446.DOI:10.1016/j.jnca.2010.02.002.
    [12]Alshammari R,Zincir-Heywood A N.Can encrypted traffic be identified w ithout port numbers,IP addresses and payload inspection?[J].Computer Networks,2011,55(6):1326-1350.DOI:10.1016/j.comnet.2010.12.002.
    [13]Wang Y,Zhang Z,Guo L,et al.Using entropy to classify traffic more deeply[C]//2011 IEEE Sixth International Conference on Networking,Architecture,and Storage.Dalian,China,2011.DOI:10.1109/nas.2011.18.
    [14]徐峻岭,周毓明,陈林,等.基于互信息的无监督特征选择[J].计算机研究与发展,2012,49(2):372-382.Xu Junling,Zhou Yuming,Chen Lin,et al.An unsupervised feature selection approach based on mutual information[J].Journal of Computer Research and Development,2012,49(2):372-382.(in Chinese)
    [15]Burges C J C.A tutorial on support vector machines for pattern recognition[J].Data Mining and Knowledge Discovery,1998,2(2):121-167.
    [16]Bernaille L,Teixeira R.Early recognition of encrypted applications[C]//International Conference on Passive and Active Network Measurement.Louvain-la-neuve,Belgium,2007:165-175.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700