Toward an efficient and scalable feature selection approach for internet traffic classification

详细信息查看全文

作者：Adil Fahad^a ; ^{alharthi.adil@gmail.com}Author Vitae ; Zahir Tari^a ; ^{Zahir.tari@rmit.edu.au}Author Vitae ; Ibrahim Khalil^a ; ^{Ibrahim.khalil@rmit.edu.au}Author Vitae ; Ibrahim Habib^b ; ^{habib@ccny.cuny.edu}Author Vitae ; Hussein Alnuweiri^c ; ^{hussein.alnuweiri@qatar.tamu.edu}Author Vitae
关键词：Feature selection ; Metrics ; Traffic classification
刊名：Computer Networks
出版年：2013
出版时间：19 June, 2013
年：2013
卷：57
期：9
页码：2040-2057
全文大小：1995 K

文摘

There is significant interest in the network management and industrial security community about the need to identify the ¡°best¡± and most relevant features for network traffic in order to properly characterize user behaviour and predict future traffic. The ability to eliminate redundant features is an important Machine Learning (ML) task because it helps to identify the best features in order to improve the classification accuracy as well as to reduce the computational complexity related to the construction of the classifier. In practice, feature selection (FS) techniques can be used as a preprocessing step to eliminate irrelevant features and as a knowledge discovery tool to reveal the ¡°best¡± features in many soft computing applications. In this paper, we investigate the advantages and disadvantages of such FS techniques with new proposed metrics (namely goodness, stability and similarity). We continue our efforts toward developing an integrated FS technique that is built on the key strengths of existing FS techniques. A novel way is proposed to identify efficiently and accurately the ¡°best¡± features by first combining the results of some well-known FS techniques to find consistent features, and then use the proposed concept of support to select a smallest set of features and cover data optimality. The empirical study over ten high-dimensional network traffic data sets demonstrates significant gain in accuracy and improved run-time performance of a classifier compared to individual results produced by some well-known FS techniques.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700