Effectiveness of Statistical Features for Early Stage Internet Traffic Identification

详细信息查看全文

作者：Lizhi Peng ; Bo Yang ; Yuehui Chen…
关键词：Feature selection ; Early stage traffic classification ; Machine learning
刊名：International Journal of Parallel Programming
出版年：2016
出版时间：February 2016
年：2016
卷：44
期：1
页码：181-197
全文大小：810 KB
参考文献：1.Bernaille, L., Teixeira, R., Akodkenou, I., Soule, A., Salamatian, K.: Traffic classification on the fly. In: ACM SIGCOMM’06, pp. 23–26 (2006)
2.Bahl, L.B., de Souza, P., Mercer, R.P., et al.: Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’86), pp. 49–52, IEEE Press (1986)
3.Breiman, L.: Bagging predictors. Mac. Learn. 24, 123–140 (1996)MathSciNet MATH
4.Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)CrossRef MATH
5.Dainotti, A., Pescapé, A., Claffy, K.C.: Issues and future directions in traffic classification. IEEE Netw. 26(1), 35–40 (2012)CrossRef
6.Dainotti, A., Pescapé, A., Sansone, C.: Early classification of network traffic through multi-classification. Lect. Notes Comput. Sci. 6613, 122–135 (2011)CrossRef
7.Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29, 103–137 (1997)CrossRef MATH
8.Estan, C., Varghese, G.: New directions in traffic measurement and accounting: focusing on the elephants, ignoring the mice. ACM Trans. Comput. Syst. 21(3), 270–313 (2003)CrossRef
9.Este, A., Gringoli, F., Salgarelli, L.: On the stability of the information carried by traffic flow features at the packet level. In: ACM SIGCOMM’09, pp. 13–18 (2009)
10.Este, A., Gringoli, F., Salgarelli, L.: Support vector machines for TCP traffic classification. Comput. Netw. 53, 2476–2490 (2009)CrossRef MATH
11.Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: The Fifteenth International Conference on Machine Learning, pp. 144–151. IEEE Press (1998)
12.Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29(2–3), 131–163 (1997)CrossRef MATH
13.Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)MathSciNet CrossRef MATH
14.Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 11(1), 63–90 (1993)MathSciNet CrossRef MATH
15.Huang, N., Jai, G., Chao, H.: Early identifying application traffic with application characteristics. In: IEEE International Conference on Communications (ICC’08). pp. 5788–5792 (2008)
16.Huang, N., Jai, G., Chao, H., et al.: Application traffic classification at the early stage by characterizing application rounds. Inf. Sci. 232(20), 130–142 (2013)CrossRef
17.Hullár, B., Laki, S., Gyorgy, A.: Early identification of peer-to-peer traffic. In: 2011 IEEE International Conference on Communications (ICC), pp. 1–6. IEEE Press (2011)
18.Gringoli, F., Salgarelli, L., Dusi, M., et al.: Gt: picking up the truth from the ground for internet traffic. ACM SIGCOMM Comput. Commun. Rev. 39(5), 12–18 (2009)CrossRef
19.Kohavi, R.: Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In: The Second International Conference on Knowledge Discovery and Data Mining (KDD), pp. 202–207. IEEE Press (1996)
20.Li, W., Moore, A.W.: A machine learning approach for efficient traffic classification. In: Proceedings of IEEE MASCOTS’07, pp. 310–317 (2007)
21.Maes, F., Collignon, A., Vandermeulen, D., et al.: Multimodality image registration by maximization of mutual information. IEEE Trans. Med. Imaging 16(2), 187–198 (1997)CrossRef
22.Maron, M.E.: Automatic indexing: an experimental inquiry. J. ACM 8(3), 404–417 (1961)CrossRef MATH
23.Moore, A.W., Zuev, D., Crogan, M.: Discriminators for use in flow-based classification. Intel Research Tech. Rep (2005)
24.Moore, A.W., Zuev, D.: Internet traffic classification using Bayesian analysis techniques. In: ACM SIGMETRICS’05, pp. 50–60 (2005)
25.Nguyen, T.T.T., Armitage, G.: A survey of techniques for internet traffic classification using machine learning. IEEE Commun. Surv. Tutor. 10(4), 56–76 (2008)CrossRef
26.Nguyen, T.T.T., Armitage, G., Branch, P., et al.: Timely and continuous machine-learning-based classification for interactive IP traffic. IEEE/ACM Trans. Netw. 20(6), 1880–1894 (2012)CrossRef
27.Peng, H.: Mutual infomation Matlab toolbox, http://www.mathworks.com/matlabcentral/fileexchange/14888-mutual-information-computation
28.Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)CrossRef
29.Peng, L., Zhang, H., Yang, B., et al.: Traffic labeller: collecting internet traffic samples with accurate application information. China Commun. 11(1), 67–78 (2014)MathSciNet CrossRef
30.Qu, B., Zhang, Z., Guo, L., et al.: On accuracy of early traffic classification. In: IEEE 7th International Conference on Networking, Architecture and Storage (NAS), pp. 348–354. IEEE Press (2012)
31.Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kauffman, Los Altos (1993)
32.Rizzi, A., Colabrese, S., Baiocchi, A.: Low complexity, high performance neuro-fuzzy system for internet traffic flows early classification. In: 2013 9th International Wireless Communications and Mobile Computing Conference (IWCMC), pp. 77–82. IEEE Press (2013)
33.Svetnik, V., Liaw, A., Tong, C., et al.: Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43(6), 1947–1958 (2003)CrossRef
34.Tcpdump/Libpcap. http://www.tcpdump.org
35.UNIBS: Data sharing. http://www.ing.unibs.it/ntw/tools/traces/
36.Waikato Internet Traffic Storage (WITS). http://www.wand.net.nz/wits
37.Weka 3: Data Mining Software in Java, http://www.cs.waikato.ac.nz/ml/weka/
38.Zander, S., Nguyen, T.T.T., Armitage, G.: Automated traffic classification and application identification using machine learning. In: IEEE Conference on Local Computer Networks 30th Anniversary, IEEE Press (2005)
作者单位：Lizhi Peng (1)
Bo Yang (1)
Yuehui Chen (1)
Zhenxiang Chen (1)

1. Shandong Provincial Key Laboratory for Network Based Intelligent Computing, University of Jinan, Jinan, 250022, People’s Republic of China
刊物类别：Computer Science
刊物主题：Theory of Computation
Processor Architectures
Software Engineering, Programming and Operating Systems
出版者：Springer Netherlands
ISSN：1573-7640

文摘

Identifying network traffic at their early stages accurately is very important for the application of traffic identification. In recent years, more and more studies have tried to build effective machine learning models to identify traffic with the few packets at the early stage. Packet sizes and statistical features have been proved to be effective features which are widely used in early stage traffic identification. However, an important issue is still unconcerned, that is whether there exists essential effectiveness differences between the two kinds of features. In this paper, we set out to evaluate the effectiveness of statistical features in comparing with packet sizes. We firstly extract the packet sizes and their statistical features of the first six packets on three traffic data sets. Then the mutual information between each feature and the corresponding traffic type label is computed to show the effectiveness of the feature. And then we execute crossover identification experiments with different feature sets using ten well-known machine learning classifiers. Our experimental results show that most classifiers get almost the same performances using packet sizes and statistical features for early stage traffic identification. And most classifiers can achieve high identification accuracies using only two statistical features. Keywords Feature selection Early stage traffic classification Machine learning

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700