SmoteAdaNL: a learning method for network traffic classification
详细信息    查看全文
  • 作者:Zhen Liu ; Ruoyu Wang ; Ming Tao
  • 关键词:Network traffic classification ; Flow classification accuracy ; Byte classification accuracy ; Data re ; sampling ; Ensemble learning
  • 刊名:Journal of Ambient Intelligence and Humanized Computing
  • 出版年:2016
  • 出版时间:February 2016
  • 年:2016
  • 卷:7
  • 期:1
  • 页码:121-130
  • 全文大小:705 KB
  • 参考文献:Carela-Español V, Barlet-Ros P, Cabellos-Aparicio A, Solé-Pareta J (2010) Analysis of the impact of sampling on netflow traffic classification. Comput Netw 55(5):1083–1099. doi:10.​1016/​j.​comnet.​2010.​11.​002 (ISSN:1389-1286)CrossRef
    Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357 (ISSN:1076-9757)MATH
    Dainotti A, Pescapé A (2012) Issues and future directions in traffic classification. IEEE Netw 26(1):35–40. ISSN:0890-8044. doi:10.​1109/​MNET.​2012.​6135854
    Dewaele G, Himura Y, Borgnat P, Fukuda K, Abry P, Michel O, Fontugne R, Cho K, Esaki H (2010) Unsupervised host behavior classification from connection patterns. Int J Netw Manag 20(5):317–337. doi:10.​1002/​nem.​750 CrossRef
    Erman J, Mahanti A, Arlitt M, Cohen I, Williamson C (2007a) Offline/realtime traffic classification using semi-supervised learning. Perform Eval 64(9–12):1194–1213. doi:10.​1016/​j.​peva.​2007.​06.​014 (ISSN:0166-5316)
    Erman J, Mahanti A, Arlitt M (2007b) Byte me: a case for byte accuracy in traffic classification. In: Proceedings of the 3rd annual ACM workshop on mining network data, New York, NY, USA. ACM, pp 35–38. ISBN:978-1-59593-792-6. doi:10.​1145/​1269880.​1269890
    Gebert S, Pries R, Schlosser D, Heck K (2012) Internet access traffic measurement and analysis. In: Proceedings of the 4th international conference on traffic monitoring and analysis. Springer, Berlin, Heidelberg, pp 29–42. ISBN:978-3-642-28533-2. doi:10.​1007/​978-3-642-28534-9_​3
    Hall MA (1999) Correlation-based feature selection for machine learning. Ph.D. thesis, Waikato University
    He HT, Che CH, Ma FT, Luo XN, Wang JM (2008) Improve flow accuracy and byte accuracy in network traffic classification. In: Proceedings of the 4th international conference on intelligent computing, vol 5227. Springer, Berlin, Heidelberg, pp 449–458. ISBN:978-3-540-85984-0. doi:10.​1007/​978-3-540-85984-0_​54
    Ikeda M, Kulla E, Hiyama M, Barolli L, Takizawa M (2013) Investigation of TCP and UDP multiple-flow traffic in wireless mobile ad-hoc networks. J High Speed Netw 19(2):129–145 (ISSN:0926-6801)
    Jin Y, Duffield N, Erman J, Haffner P, Sen S, Zhang ZL (2012) A modular machine learning system for flow-level traffic classification in large networks. ACM Trans Knowl Discov Data 6(1):1–34. doi:10.​1145/​2133360.​2133364 (ISSN:1556–4681)CrossRef
    Law KLE, So S (2012) Qos control framework for content satisfaction in ubiquitous multimedia computing. J Ambient Intell Hum Comput 3(2):103–112. doi:10.​1007/​s12652-011-0077-8 (ISSN:1868-5137)CrossRef
    Lee S, Kim H, Barman D, Lee S, Kim CK, Kwon T, Choi Y (2011) Netramark: a network traffic classification benchmark. Comput Commun Rev 41(1):23–30. doi:10.​1145/​1925861.​1925865 (ISSN:0146-4833)
    Liu Z, Liu Q (2012a) Balanced feature selection method for internet traffic classification. IET Netw 1(2):74–83. doi:10.​1049/​iet-net.​2011.​0049 (ISSN:2047-4954)CrossRef
    Liu Z, Liu Q (2012b) Studying cost-sensitive learning for multi-class imbalance in internet traffic classification. J China Univ Posts Telecommun 19(6):63–72. doi:10.​1016/​S1005-8885(11)60319-1 CrossRef
    Moore AW, Zuev D (2005) Internet traffic classification using bayesian analysis techniques. In: Proceedings of the 2005 ACM SIGMETRICS international conference on measurement and modeling of computer systems, New York, NY, USA. ACM, pp 50–60. ISBN:1-59593-022-1. doi:10.​1145/​1064212.​1064220
    Moore AW, Zuev D, Crogan M (2005) Discriminators for use in flow-based classification. Department of Computer Science, Queen Mary, University of London, RR-05-13 (ISSN 1470–5559)
    Palmieri F, Fiore U (2008) A nonlinear, recurrence-based approach to traffic classification. Comput Netw 53(6):761–773. doi:10.​1016/​j.​comnet.​2008.​12.​015 (ISSN 1389-1286)CrossRef
    Palmieri F, Fiore U, Castiglione A, De Santis A (2013) On the detection of card-sharing traffic through wavelet analysis and support vector machines. Appl Soft Comput 13(1):615–627. doi:10.​1016/​j.​asoc.​2012.​08.​045 (ISSN:1568-4946)CrossRef
    Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Francisco (ISBN:1-55860-238-0)
    Soysal M, Schmidt EG (2010) Machine learning algorithms for accurate flow-based network traffic classification: evaluation and comparison. Perform Eval 67:451–467. doi:10.​1016/​j.​peva.​2010.​01.​001 CrossRef
    Tao M, Yuan HQ, Dong SB, Yu HW (2012) Initiative movement prediction assisted adaptive handover trigger scheme in fast MIPv6. Comput Commun 35(10):1272–1282. doi:10.​1016/​j.​comcom.​2012.​03.​015 (ISSN:0140-3664)CrossRef
    Tao M, Yuan HQ, Wei WH (2014) Active overload prevention based adaptive map selection in hmipv6 networks. Wirel Netw 20(2):197–208. doi:10.​1007/​s11276-013-0603-z (ISSN:1022–0038)CrossRef
    Wang S, Chen HH, Yao X (2010) Negative correlation learning for classification ensembles. In: Proceedings of international joint conference on neural networks, pp 2893–2900. doi:10.​1109/​IJCNN.​2010.​5596702
    Wang RY, Liu Z, Zhang L (2014) Method of data cleaning for network traffic classification. J China Univ Posts Telecommun 21(3):35–45. doi:10.​1016/​S1005-8885(14)60299-5 CrossRef
    Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, San Francisco (ISBN:0-12-088407-0)
    Ye W, Cho K (2014) Hybrid p2p traffic classification with heuristic rules and machine learning. Soft Comput 18(9):1815–1827. doi:10.​1007/​s00500-014-1253-5 (ISSN:1432-7643)CrossRef
  • 作者单位:Zhen Liu (1)
    Ruoyu Wang (2) (3)
    Ming Tao (3) (4)

    1. School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, China
    2. Information and Network Engineering and Research Center, South China University of Technology, Guangzhou, 510006, China
    3. School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510006, China
    4. School of Computer, Dongguan University of Technology, Dongguan, 523808, China
  • 刊物类别:Engineering
  • 刊物主题:Computational Intelligence
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1868-5145
文摘
Machine learning based network traffic classification is a critical technique for network management, and has attracted much attention. Recently, most of the researchers focus on achieving high flow classification accuracy (FCA). However the amount of “mice” flows is more than that of “elephant” flows in the Internet, these classifiers hence are more suitable for “mice” flows, but have low byte classification accuracy (BCA). To address this issue, the notion of byte misclassification is firstly explored. According to the exploration that most misclassified bytes belong to the minority class, a novel method of network traffic classification is proposed by combining the data re-sampling and ensemble learning algorithms. To enhance the classification accuracy of the minority class, the data re-sampling algorithm is employed to increase the number of minority class flows. The data re-sampling however will change the data distribution and degrade the generalization of a classifier. A boosting-style ensemble learning algorithm with the consideration of ensemble diversity hence is employed to improve the generalization. The experiments conducted on the real-world traffic datasets show that the proposed method achieves over 90 % BCA and 96 % FCA on average, and improves about 7.15 % BCA by comparing with the existing methods. Keywords Network traffic classification Flow classification accuracy Byte classification accuracy Data re-sampling Ensemble learning

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700