Feature selection with partition differentiation entropy for large-scale data sets

详细信息查看全文

作者：Fachao Li^a ; ^{lifachao@tsinghua.org.cn" class="auth_mail" title="E-mail the corresponding author} ; Zan Zhang^b ; ^{zhangzan166@163.com" class="auth_mail" title="E-mail the corresponding author} ; ^{zanzhang@tju.edu.cn" class="auth_mail" title="E-mail the corresponding author} ; Chenxia Jin^a ; ^{jinchenxia2005@126.com" class="auth_mail" title="E-mail the corresponding author}
关键词：Feature selection ; Partition differentiation entropy ; Attributes significance ; Large-scale data sets ; Uncertainty
刊名：Information Sciences
出版年：2016
出版时间：1 February 2016
年：2016
卷：329
期：Complete
页码：690-700
全文大小：347 K

文摘

Feature selection, especially for large data sets, is a challenging problem in areas such as pattern recognition, machine learning and data mining. With the development of data collection and storage technologies, the data has become bigger than ever, thus making it difficult for learning from large data sets with traditional methods. In this paper, we introduce the partition differentiation entropy from the viewpoint of partition in rough sets to measure the significance and uncertainty of attributes, and present a feature selection method for large-scale data sets based on the information-theoretical measurement of attribute significance. Given a large-scale decision information system, the proposed method first divides it into small sub information systems according to the decision classes. Then by computing partition differentiation entropy in the sub-systems, the partition differentiation entropy of the attribute subset in the original decision information system is obtained. Accordingly, the important features are selected based on the value of partition differentiation entropy. The experimental results show that the idea of the proposed method is feasible and valid.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700