基于堆栈式自动编码器的加密流量识别方法

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于堆栈式自动编码器的加密流量识别方法

详细信息查看全文 | 推荐本文 |

英文篇名：SAE-based Encrypted Traffic Identification Method
作者：王攀 ; 陈雪娇
英文作者：WANG Pan;CHEN Xuejiao;School of Modern Posts,Nanjing University of Posts and Telecommunications;School of Communication,Nanjing Vocational College of Information Technology;
关键词：加密流量识别 ; 深度学习 ; 堆栈式自动编码器 ; 流量分类 ; 多层感知机 ; 卷积神经网络
英文关键词：encrypted traffic identification;;deep learning;;Stacked Autoencoder(SAE);;traffic classification;;Multilayer Perceptron(MLP);;Convolutional Neural Network(CNN)
中文刊名：JSJC
英文刊名：Computer Engineering
机构：南京邮电大学现代邮政学院;南京信息职业技术学院通信学院;
出版日期：2018-11-15
出版单位：计算机工程
年：2018
期：v.44;No.494
基金：江苏高校品牌专业建设工程项目(PPZY2015A092)
语种：中文;
页：JSJC201811024
页数：9
CN：11
ISSN：31-1289/TP
分类号：146-153+159

摘要

基于浅层机器学习的加密流量识别方法准确率偏低,在特征提取和选择方面耗时耗力。为此,提出一种基于堆栈式自动编码器(SAE)的加密流量识别方法。该方法利用SAE的无监督特性及在数据降维等方面的优势,结合多层感知机(MLP)的有监督分类学习,实现对加密应用流量的准确识别。考虑到样本数据集的类别不平衡性对分类精度的影响,采用SMOTE过抽样方法对不平衡数据集进行处理。实验结果表明,该方法各项性能指标均优于MLP加密流量识别方法,识别精确度和召回率以及F1-Score均可达到99%。
To solve the problem that encrypted traffic identification methods based on machine learning are low in accuracy and time-consuming and costly in feature extraction and selection,this paper proposes a Stacked Autoencoder( SAE)-based encrypted traffic identification method. The method utilizes the unsupervised characteristics of SAE and its advantages in dimensional reduction, combined with supervised classification learning of Multilayer Perceptron( MLP) to achieve accurate identification of encrypted application traffic. Considering the influence of the class imbalance of the sample dataset on the classifier performance,the unbalanced dataset is processed by the SMOTE oversampling method.Experimental results show that,the performance indicators of this method are higher than the MLP encrypted traffic identification method, and the precision, recall, and F1-Score can reach 99%.

引文

[1]TOUCH J,KOJO M,LEAR E,et al.Service name and transport protocol port number registry[EB/OL].[2018-06-06].http://www.iana.org/assignments/port-numbers.
    [2]KHALIFE J,HAJJAR A,DIAZ-VERDEJO J.Amultilevel taxonomy and requirements for an optimal traffic-classification model[J].International Journal of Network Management,2014,24(2):101-120.
    [3]PARK B C,WON Y J,KIM M S,et al.Towards automated application signature generation for traffic identification[C]//Proceedings of Network Operations and Management Symposium.Washnington D.C.,USA:IEEE Press,2008:160-167.
    [4]SHERRY J,LAN C,POPA R A,et al.Blindbox:deep packet inspection over encrypted traffic[C]//Proceedings of ACM Conference on Special Interest Group on Data Communication.New York,USA:ACMPress,2015:213-226.
    [5]陈伟,胡磊,杨龙.基于载荷特征的加密流量快速识别方法[J].计算机工程,2012,38(12):22-25.
    [6]MENG P,ZHOU G,MENG J.Fast identification of encrypted traffic via large-scale sparse screening[C]//Proceedings of International Conference on Advanced Cloud&Big Data.Washington D.C.,USA:IEEE Press,2017:273-278.
    [7]OKADA Y,ATA S,NAKAMURA N,et al.Comparisons of machine learning algorithms for application identification of encrypted traffic[C]//Proceedings of International Conference on Machine Learning and Applications and Workshops.Washington D.C.,USA:IEEE Press,2011:358-361.
    [8]WANG Z.The applications of deep learning on traffic identification[C]//Proceedings of Black Hat 2015.Singapore:[s.n.],2015.
    [9]LOTFOLLAHI M,ZADE R S H,SIAVOSHANI M J,et al.Deep packet:a novel approach for encrypted traffic classification using deep learning[J/OL].[2018-06-06].http://cn.arxiv.org/pdf/1709.02656v3.
    [10]王勇,周慧怡,俸皓,等.基于深度卷积神经网络的网络流量分类方法[J].通信学报,2018,39(1):14-23.
    [11]赵英,韩春昊.马尔科夫模型在网络流量分类中的应用与研究[J].计算机工程,2018,44(5):291-295.
    [12]CAUDILL M.Neural networks primer,part I[J].AIExpert,1987,2(12):46-52.
    [13]陈雪娇,王攀,刘世栋.网络应用流类别不平衡环境下的SSL加密应用流识别关键技术[J].电信科学,2015,31(12):83-89.
    [14]CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16(1):321-357.
    [15]DRAPER-GIL G,LASHKARI A,MAMUN M,et al.Characterization of encrypted and VPN traffic using time-related features[C]//Proceedings of the 2nd International Conference on Information Systems Security and Privacy.Setúbal,Portugal:Science and Technology Publications,2016,407-414.
    [16]GitHub,Inc.Keras:deep learning for humans[EB/OL].[2018-06-06].https://github.com/fchollet/keras.
    [17]ABADI M,AGARWAL A,BARHAM P,et al.TensorFlow:large-scale machine learning on heterogeneous systems[EB/OL].[2018-06-06].http://cn.arxiv.org/pdf/1603.04467v1.
    [18]KIM H,CLAFFY K,FOMENKOV M,et al.Internet traffic classification demystified:myths,caveats,and the best practices[C]//Proceedings of the 2008 ACMCoNEXT Conference.New York,USA:ACM Press,2008:1-12.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700