基于特征聚类集成技术的在线特征选择
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Online feature selection based on feature clustering ensemble technology
  • 作者:杜政霖 ; 李云
  • 英文作者:DU Zhenglin;LI Yun;College of Computer,Nanjing University of Posts and Telecommunications;Guangxi Colleges and Universities Key Laboratory of Cloud Computing and Complex Systems,Guilin University of Electronic Technology;
  • 关键词:组特征选择 ; 聚类集成 ; 流特征 ; 在线特征选择
  • 英文关键词:group feature selection;;clustering ensemble;;streaming feature;;online feature selection
  • 中文刊名:JSJY
  • 英文刊名:Journal of Computer Applications
  • 机构:南京邮电大学计算机学院;桂林电子科技大学广西高校云计算与复杂系统重点实验室;
  • 出版日期:2017-03-10
  • 出版单位:计算机应用
  • 年:2017
  • 期:v.37;No.319
  • 基金:江苏省自然科学基金资助项目(BK20131378,BK20140885);; 广西高校云计算与复杂系统重点实验室资助项目(15206)~~
  • 语种:中文;
  • 页:JSJY201703046
  • 页数:6
  • CN:03
  • ISSN:51-1307/TP
  • 分类号:260-264+299
摘要
针对既有历史数据又有流特征的全新应用场景,提出了一种基于组特征选择和流特征的在线特征选择算法。在对历史数据的组特征选择阶段,为了弥补单一聚类算法的不足,引入聚类集成的思想。先利用k-means方法通过多次聚类得到一个聚类集体,在集成阶段再利用层次聚类算法对聚类集体进行集成得到最终的结果。在对流特征数据的在线特征选择阶段,对组构造产生的特征组通过探讨特征间的相关性来更新特征组,最终通过组变换获得特征子集。实验结果表明,所提算法能有效应对全新场景下的在线特征选择问题,并且有很好的分类性能。
        According to the new application scenario with both historical data and stream features,an online feature selection based on group feature selection algorithm and streaming features was proposed.To compensate for the shortcomings of single clustering algorithm,the idea of clustering ensemble was introduced in the group feature selection of historical data.Firstly,a cluster set was obtained by multiple clustering using k-means method,and the final result was obtained by integrating hierarchical clustering algorithm in the integration stage.In the online feature selection phase of the stream feature data,the feature group generated by the group structure was updated by exploring the correlation among the features,and finally the feature subset was obtained by group transformation.The experimental results show that the proposed algorithm can effectively deal with the online feature selection problem in the new scenario,and has good classification performance.
引文
[1]边肇祺,张学工.模式识别[M].2版.北京:清华大学出版社,2000:176-178.(BIAN Z Q,ZHANG X G.Pattern Recognition[M].2nd ed.Beijing:Tsinghua University Press,2000:176-178.)
    [2]DASH M,LIU H.Feature selection for classification[J].Intelligent Data Analysis,1997,1(3):131-156.
    [3]KOHAVI R,JOHN G H.Wrappers for feature subset selection[J].Artificial Intelligence,1997,97(1/2):273-324.
    [4]李志杰,李元香,王峰,等.面向大数据分析的在线学习算法综述[J].计算机研究与发展,2015,52(8):1707-1721.(LI Z J,LI Y X,WANG F,et al.Online learning algorithms for big data analytics:a survey[J].Journal of Computer Research and Development,2015,52(8):1707-1721.)
    [5]WU X,YU K,DING W,et al.Online feature selection with streaming features[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(5):1178-1192.
    [6]PERKINS S,THEILER J.Online feature selection using grafting[EB/OL].[2016-01-22].http://public.lanl.gov/jt/Papers/perkins_icml03.pdf.
    [7]ZHOU J,FOSTER D,STINE R,et al.Streaming feature selection using alpha-investing[EB/OL].[2016-02-06].http://www.cis.upenn.edu/~ungar/Datamining/Publications/p384-zhou.pdf.
    [8]ZHOU J,FOSTER D P,STINE R A,et al.Streamwise feature selection[J].Journal of Machine Learning Research,2006,7:1861-1885.
    [9]WANG J,ZHAO P,HOI S C H,et al.Online feature selection and its applications[J].IEEE Transactions on Knowledge and Data Engineering,2014,26(3):698-710.
    [10]NOGUEIRA S,BROWN G.Measuring the stability of feature selection with applications to ensemble methods[EB/OL].[2016-02-03].http://xueshu.baidu.com/s?wd=paperuri%3A%281a009adab91ad944631001ba336f4e25%29&filter=sc_long_sign&tn=SE_xueshusource_2kduw22v&sc_vurl=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.728.549%26rep%3Drep1%26type%3Dpdf&ie=utf-8&sc_us=5540872035374925413.
    [11]YU K,WU X,DING W,et al.Towards scalable and accurate online feature selection for big data[C]//Proceedings of the 2014IEEE International Conference on Data Mining.Washington,DC:IEEE Computer Society,2014:660-669.
    [12]黄莎莎.稳定的特征选择算法研究[D].南京:南京邮电大学,2014.(HUANG S S.Stable feature selection algorithm[D].Nanjing:Nanjing University of Posts and Telecommunications,2014.)
    [13]黄莎莎.基于特征聚类集成技术的组特征选择方法[J].微型机与应用,2014(11):79-82.(HUANG S S.Group feature selection based on feature clustering ensemble[J].Microcomputer and its Applications,2014(11):79-82.)
    [14]LOSCALZO S,YU L,DING C.Consensus group stable feature selection[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2009:567-576.
    [15]AU W H,CHAN K C C,WONG A K C,et al.Attribute clustering for grouping,selection,and classification of gene expression data[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2005,2(2):83-101.
    [16]YU L,DING C,LOSCALZO S.Stable feature selection via dense feature groups[C]//Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2008:803-811.
    [17]GUO Z,ZHANG T,LI X,et al.Towards precise classification of cancers based on robust gene functional expression profiles[J].BMC Bioinformatics,2005,6(1):1-12.
    [18]RAPAPORT F,ZINOVYEV A,DUTREIX M,et al.Classification of microarray data using gene networks[J].BMC Bioinformatics,2007,8(1):1-15.
    [19]KLEINBERG J.An impossibility theorem for clustering[EB/OL].[2016-02-15].http://www.cc.gatech.edu/~isbell/classes/reading/papers/kleinberg-nips15.pdf.
    [20]DIETTERICH T G.Ensemble methods in machine learning[M]//Multiple Classifier Systems,LNCS 1857.Berlin:Springer,2000:1-15.
    [21]罗会兰.聚类集成关键技术研究[D].杭州:浙江大学,2007.(LUO H L.Research on key technologies of clustering ensemble[D].Hangzhou:Zhejiang University,2007.)
    [22]FRED A.Finding consistent clusters in data partitions[EB/OL].[2016-02-01].http://xueshu.baidu.com/s?wd=paperuri%3A%284b1317d334fc32b2cec0dde7e8a4ca2b%29&filter=sc_long_sign&tn=SE_xueshusource_2kduw22v&sc_vurl=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Bjsessionid%3D444EE0EED4CB34E00254EB7CB735820B%3Fdoi%3D10.1.1.97.1296%26rep%3Drep1%26type%3Dpdf&ie=utf-8&sc_us=5845832111985885344.
    [23]STREHL A,GHOSH J.Cluster ensembles—a knowledge reuse framework for combining multiple partitions[J].Journal of Machine Learning Research,2003,3:583-617.
    [24]FERN X Z,BRODLEY C.Random projection for high-dimensional data clustering:a cluster ensemble approach[C]//Proceedings of the 20th International Conference on Machine Learning.Menlo Park,CA:AAAI Press,2003:186-193.
    [25]GAO J,FAN W,HAN J.On the power of ensemble:supervised and unsupervised methods reconciled—an overview of ensemble methods[C]//Proceedings of the 2010 SIAM International Conference on Data Mining.Columbus,Ohio:SIAM,2010:2-14.
    [26]FRED A.Finding consistent clusters in data partitions[M]//Multiple Classifier Systems,LNCS 2096.Berlin:Springer,2001:309-318.
    [27]PENA J M.Learning Gaussian graphical models of gene networks with false discovery rate control[C]//Proceedings of the 6th European Conference on Evolutionary Computation,Machine Learning and Data Mining in Bioinformatics.Berlin:Springer,2008:165-176.
    [28]UCI.Machine learning repository[DB/OL].[2016-01-11].http://archive.ics.uci.edu/ml/.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700