用户名: 密码: 验证码:
基于节点度中心性的无监督特征选择
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Degree?Centrality Based Feature Selection
  • 作者:闫泓任 ; 马国帅 ; 钱宇
  • 英文作者:Yan Hongren;Ma Guoshuai;Qian Yuhua;Institute of Big Data Science and Industry, Shanxi University;
  • 关键词:特征选择 ; 复杂网络 ; 节点度中心性 ; 特征相关性
  • 英文关键词:feature selection;;complex network;;degree centrality;;feature correlation
  • 中文刊名:SJCJ
  • 英文刊名:Journal of Data Acquisition and Processing
  • 机构:山西大学大数据科学与产业研究院;
  • 出版日期:2019-03-15
  • 出版单位:数据采集与处理
  • 年:2019
  • 期:v.34;No.154
  • 基金:国家自然科学基金(61672332,61432011,U1435212)资助项目
  • 语种:中文;
  • 页:SJCJ201902014
  • 页数:10
  • CN:02
  • ISSN:32-1367/TN
  • 分类号:122-131
摘要
特征选择方法可以从成千上万个特征中选择合适的少量特征,使模型更加有效、高效。本文考虑到真实场景下高维数据集中特征之间互相关联以及使用复杂网络结构描述特征空间的全局性与合理性,提出无监督场景下的基于复杂网络节点度中心性的特征选择方法。根据特征间的相关性大小,设定阈值选择保留符合要求的关联;再利用保留的关联生成以特征为节点的无向无权重网络结构;最后以衡量节点度中心性的方法筛选此网络中影响力最大的节点集,亦即最优特征子集。本文方法为处理特征重要性及特征冗余增加了灵活性。采用对比实验,将本文方法与常用特征选择或特征提取方法在多个高维数据集上进行性能比较。实验分析结果表明此方法的有效性以及普适性。
        Feature selection by picking a small size of important features out of the feature space facilitates learning algorithms to perform more accurately and more efficiently on the datasets. Considering the universal existence of relevance between features in real datasets,this paper proposes an unsupervised feature selection framework in which the feature correlating to each other form a network structure and the importance of each of them is measured by degree centrality index of a complex network. The bigger the degree centrality of a feature in this network,the higher the rank of its importance. At the end we select a given number of features with the highest ranks. This framework allows more flexibility on handling feature importance and feature redundancy. Later the proposed method will be compared to classical selection/extraction techniques on six high?dimensional datasets. Experiments demonstrate the advantages of our model on both continuous and discrete datasets.
引文
[1]Liu S,Motani M.Feature selection based on unique relevant information for health data[EB/OL].(2018?12?02).https://arXiv.org/abs/1812.00415.
    [2]Davis J C,Sampson R J.Machine learning feature selection methods for landslide susceptibility mapping[J].Mathematical Geosciences,2013,46(1):33?57.
    [3]Li J,Hu X,Wu L,et al.Robust unsupervised feature selection on networked data[C]//ICDM.[S.l.]:SIAM,2016:387?395.
    [4]Wang S,Tang J,Liu H.Embedded unsupervised feature selection[C]//AAAI.[S.l.]:AAAI Press,2015:471?476.
    [5]Guyon I,Gunn S,Nikravesh M,et al.Feature extraction:Foundations and applications[M].[S.l.]:Springer,2008:1?22.
    [6]Chandrashekar G,Sahin F.A survey on feature selection methods[J].Computers&Electrical Engineering,2014,40(1):16?28.
    [7]Miao J,Niu L.A survey on feature selection[J].Procedia Computer Science,2016(91):919?926.
    [8]Luo M,Nie F,Chang X,et al.Adaptive unsupervised feature selection with structure regularization[J].IEEE Transactions on Neural Networks and Learning Systems,2018,29(4):944?956.
    [9]He X,Cai D,Niyogi P.Soft?constrained Laplacian score for semi?supervised multi?label feature selection[J].Knowledge and Information Systems,2016,47(1):75?98.
    [10]Brown G,Pocock A,Zhao M,et al.Conditional likelihood maximisation:A unifying framework for information theoretic feature selection[J].Journal of Machine Learning Research,2012,13(1):27?66.
    [11]Liu H,Motoda H.Computational methods of feature selection[M].[S.l.]:CRC Press,2007:147?165.
    [12]Yu L,Liu H.Efficient feature selection via analysis of relevance and redundancy[J].Journal of Machine Learning Research,2004,5:1205?1224.
    [13]Zhu J,Rosset S,Tibshirani R,et al.??norm support vector machines[C]//NIPS.[S.l.]:MIT Press,2004:49?56.
    [14]Tibshirani R.Regression shrinkage and selection via the lasso:a retrospective[J].Journal of the Royal Statistical Society(Series B),2011,73(3):273?282.
    [15]Bach F R.Consistency of the group and multiple kernel learning[J].Journal of Machine Learning Research,2008,9:1179?1225.
    [16]Shi L,Du L,Shen Y D.Robust spectral learning for unsupervised feature selection[C]//ICDM.[S.l.]:SIAM,2014:977?982.
    [17]Kabir M,Shahjahan M,Murase K.New local search based hybrid genetic algorithm for feature selection[J].Neurocomputing,2011(74):2914?2928.
    [18]Cai D,Zhang C,He X.Unsupervised feature selection for multi?cluster data[C]//KDD.New York:ACM,2010:333?342.
    [19]Das K,Samanta S,Pal M.Study on centrality measures in social networks:A survey[EB/OL].(2018?02?28).Social Network Analysis and Mining,https://link.springer.com/article/10.1007/s13278?018?0493?2.
    [20]Lazar C,Taminau J,Meganck S,et al.A survey on filter techniques for feature selection in gene expression microarray analysis[J].IEEE/ACM Trans Comput Biol Bioinform,2012,9(4):1106?1119.
    [21]Li J,Cheng K,Wang S,et al.Feature selection:A data perspective[EB/OL].(2016?01?29).https://arXiv.org/abs/1601.07996.
    [22]Zhao Z,Liu H.Spectral feature selection for supervised and unsupervised learning[C]//ICML.New York:ACM,2007:1151?1157.
    [23]Zhu P,Zhu W,Hu Q.Subspace clustering guided unsupervised feature selection[J].Pattern Recognition,2017(66):364?374.
    [24]Yang Y,Shen H T,Ma Z,et al.?2,1?norm regularized discriminative feature selection for unsupervised learning[C]//IJCAI.[S.l.]:AAAI Press,2011:1589?1594.
    [25]Li Z,Yang Y,Liu J,et al.Unsupervised feature selection using nonnegative spectral analysis[C]//AAAI.[S.l.]:AAAIPress,2012:1026?1032.
    [26]Za?udo G T J,Yang G,Albert R.Structural control of nonlinear complex networks[J].Proceedings of the National Academy of Sciences,2017,114(28):7234?7239.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700