Effectiveness of Statistical Features for Early Stage Internet Traffic Identification
详细信息    查看全文
  • 作者:Lizhi Peng ; Bo Yang ; Yuehui Chen…
  • 关键词:Feature selection ; Early stage traffic classification ; Machine learning
  • 刊名:International Journal of Parallel Programming
  • 出版年:2016
  • 出版时间:February 2016
  • 年:2016
  • 卷:44
  • 期:1
  • 页码:181-197
  • 全文大小:810 KB
  • 参考文献:1.Bernaille, L., Teixeira, R., Akodkenou, I., Soule, A., Salamatian, K.: Traffic classification on the fly. In: ACM SIGCOMM’06, pp. 23–26 (2006)
    2.Bahl, L.B., de Souza, P., Mercer, R.P., et al.: Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’86), pp. 49–52, IEEE Press (1986)
    3.Breiman, L.: Bagging predictors. Mac. Learn. 24, 123–140 (1996)MathSciNet MATH
    4.Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)CrossRef MATH
    5.Dainotti, A., Pescapé, A., Claffy, K.C.: Issues and future directions in traffic classification. IEEE Netw. 26(1), 35–40 (2012)CrossRef
    6.Dainotti, A., Pescapé, A., Sansone, C.: Early classification of network traffic through multi-classification. Lect. Notes Comput. Sci. 6613, 122–135 (2011)CrossRef
    7.Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29, 103–137 (1997)CrossRef MATH
    8.Estan, C., Varghese, G.: New directions in traffic measurement and accounting: focusing on the elephants, ignoring the mice. ACM Trans. Comput. Syst. 21(3), 270–313 (2003)CrossRef
    9.Este, A., Gringoli, F., Salgarelli, L.: On the stability of the information carried by traffic flow features at the packet level. In: ACM SIGCOMM’09, pp. 13–18 (2009)
    10.Este, A., Gringoli, F., Salgarelli, L.: Support vector machines for TCP traffic classification. Comput. Netw. 53, 2476–2490 (2009)CrossRef MATH
    11.Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: The Fifteenth International Conference on Machine Learning, pp. 144–151. IEEE Press (1998)
    12.Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29(2–3), 131–163 (1997)CrossRef MATH
    13.Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)MathSciNet CrossRef MATH
    14.Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 11(1), 63–90 (1993)MathSciNet CrossRef MATH
    15.Huang, N., Jai, G., Chao, H.: Early identifying application traffic with application characteristics. In: IEEE International Conference on Communications (ICC’08). pp. 5788–5792 (2008)
    16.Huang, N., Jai, G., Chao, H., et al.: Application traffic classification at the early stage by characterizing application rounds. Inf. Sci. 232(20), 130–142 (2013)CrossRef
    17.Hullár, B., Laki, S., Gyorgy, A.: Early identification of peer-to-peer traffic. In: 2011 IEEE International Conference on Communications (ICC), pp. 1–6. IEEE Press (2011)
    18.Gringoli, F., Salgarelli, L., Dusi, M., et al.: Gt: picking up the truth from the ground for internet traffic. ACM SIGCOMM Comput. Commun. Rev. 39(5), 12–18 (2009)CrossRef
    19.Kohavi, R.: Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In: The Second International Conference on Knowledge Discovery and Data Mining (KDD), pp. 202–207. IEEE Press (1996)
    20.Li, W., Moore, A.W.: A machine learning approach for efficient traffic classification. In: Proceedings of IEEE MASCOTS’07, pp. 310–317 (2007)
    21.Maes, F., Collignon, A., Vandermeulen, D., et al.: Multimodality image registration by maximization of mutual information. IEEE Trans. Med. Imaging 16(2), 187–198 (1997)CrossRef
    22.Maron, M.E.: Automatic indexing: an experimental inquiry. J. ACM 8(3), 404–417 (1961)CrossRef MATH
    23.Moore, A.W., Zuev, D., Crogan, M.: Discriminators for use in flow-based classification. Intel Research Tech. Rep (2005)
    24.Moore, A.W., Zuev, D.: Internet traffic classification using Bayesian analysis techniques. In: ACM SIGMETRICS’05, pp. 50–60 (2005)
    25.Nguyen, T.T.T., Armitage, G.: A survey of techniques for internet traffic classification using machine learning. IEEE Commun. Surv. Tutor. 10(4), 56–76 (2008)CrossRef
    26.Nguyen, T.T.T., Armitage, G., Branch, P., et al.: Timely and continuous machine-learning-based classification for interactive IP traffic. IEEE/ACM Trans. Netw. 20(6), 1880–1894 (2012)CrossRef
    27.Peng, H.: Mutual infomation Matlab toolbox, http://​www.​mathworks.​com/​matlabcentral/​fileexchange/​14888-mutual-information-computation
    28.Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)CrossRef
    29.Peng, L., Zhang, H., Yang, B., et al.: Traffic labeller: collecting internet traffic samples with accurate application information. China Commun. 11(1), 67–78 (2014)MathSciNet CrossRef
    30.Qu, B., Zhang, Z., Guo, L., et al.: On accuracy of early traffic classification. In: IEEE 7th International Conference on Networking, Architecture and Storage (NAS), pp. 348–354. IEEE Press (2012)
    31.Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kauffman, Los Altos (1993)
    32.Rizzi, A., Colabrese, S., Baiocchi, A.: Low complexity, high performance neuro-fuzzy system for internet traffic flows early classification. In: 2013 9th International Wireless Communications and Mobile Computing Conference (IWCMC), pp. 77–82. IEEE Press (2013)
    33.Svetnik, V., Liaw, A., Tong, C., et al.: Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43(6), 1947–1958 (2003)CrossRef
    34.Tcpdump/Libpcap. http://​www.​tcpdump.​org
    35.UNIBS: Data sharing. http://​www.​ing.​unibs.​it/​ntw/​tools/​traces/​
    36.Waikato Internet Traffic Storage (WITS). http://​www.​wand.​net.​nz/​wits
    37.Weka 3: Data Mining Software in Java, http://​www.​cs.​waikato.​ac.​nz/​ml/​weka/​
    38.Zander, S., Nguyen, T.T.T., Armitage, G.: Automated traffic classification and application identification using machine learning. In: IEEE Conference on Local Computer Networks 30th Anniversary, IEEE Press (2005)
  • 作者单位:Lizhi Peng (1)
    Bo Yang (1)
    Yuehui Chen (1)
    Zhenxiang Chen (1)

    1. Shandong Provincial Key Laboratory for Network Based Intelligent Computing, University of Jinan, Jinan, 250022, People’s Republic of China
  • 刊物类别:Computer Science
  • 刊物主题:Theory of Computation
    Processor Architectures
    Software Engineering, Programming and Operating Systems
  • 出版者:Springer Netherlands
  • ISSN:1573-7640
文摘
Identifying network traffic at their early stages accurately is very important for the application of traffic identification. In recent years, more and more studies have tried to build effective machine learning models to identify traffic with the few packets at the early stage. Packet sizes and statistical features have been proved to be effective features which are widely used in early stage traffic identification. However, an important issue is still unconcerned, that is whether there exists essential effectiveness differences between the two kinds of features. In this paper, we set out to evaluate the effectiveness of statistical features in comparing with packet sizes. We firstly extract the packet sizes and their statistical features of the first six packets on three traffic data sets. Then the mutual information between each feature and the corresponding traffic type label is computed to show the effectiveness of the feature. And then we execute crossover identification experiments with different feature sets using ten well-known machine learning classifiers. Our experimental results show that most classifiers get almost the same performances using packet sizes and statistical features for early stage traffic identification. And most classifiers can achieve high identification accuracies using only two statistical features. Keywords Feature selection Early stage traffic classification Machine learning

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700