基于超图的稀疏属性选择算法

英文篇名：Hypergraph-based sparse feature selection
作者：雷聪 ; 钟智 ; 胡晓依 ; 方月 ; 余浩 ; 郑威
英文作者：Lei Cong;Zhong Zhi;Hu Xiaoyi;Fang Yue;Yu Hao;Zheng Wei;Guangxi Key Laboratory of Multi-source Information Mining & Security,Guangxi Normal University;College of Computer & Information Engineering,Guangxi Teachers Education University;
关键词：属性选择 ; 属性自表达 ; 子空间学习 ; 超图表示 ; 低秩约束
英文关键词：feature selection;;feature self-representation;;subspace learning;;hypergraph representation;;low-rank constraint
中文刊名：JSYJ
英文刊名：Application Research of Computers
机构：广西师范大学广西多源信息挖掘与安全重点实验室;广西师范学院计算机与信息工程学院;
出版日期：2017-11-15 10:41
出版单位：计算机应用研究
年：2018
期：v.35;No.325
基金：国家重点研发计划资助项目(2016YFB1000905);; 国家自然科学基金资助项目(61672177,61573270);; 国家“973”计划资助项目(2013CB329404);; 广西自然科学基金资助项目(2015GXNSFCB139011);; 广西多源信息挖掘与安全重点实验室开放基金资助项目(16-A-01-01,16-A-01-02);; 广西研究生教育创新计划项目(XYCSZ2017064,XYCSZ2017067,YCSW2017065)
语种：中文;
页：JSYJ201811004
页数：5
CN：11
ISSN：51-1196/TP
分类号：19-22+25

摘要

针对噪声或者离群点通常会增加矩阵的秩的问题,提出一个在低秩限制下的基于超图的稀疏属性选择算法。算法利用其他属性稀疏地表达每一个属性来获得属性自表达系数矩阵,再利用超图正则化因子获取数据的局部结构将子空间学习嵌入到属性选择的框架中;同时,利用l2,p-范数惩罚自表达系数矩阵和损失函数,挖掘出属性之间的关系和样本间的关系来帮助算法有效地进行属性选择,最终提高模型的预测能力。在UCI数据集上的实验结果表明,该算法相比其他对比算法,能更有效地选取重要属性,并取得很好的分类效果。
It is a fact that,during real data mining applications,noises or outliers can increase the rank of a matrix. This paper proposed a novel feature selection via hypergraph-based sparse structure combined with a low-rank constraint. Specially,it obtained the self-representation coefficient matrix through sparsely represent each feature by other features. Then,obtained the local structure of the data via a hypergraph-based regularizer,so as to integrate the subspace learning into the framework of feature selection. Meanwhile,it obtained the correlation between features via using an l2,p-norm regularization to penalize the selfrepresentation matrix. And designed the l2,p-norm on the loss function for building the relation among samples. It used the correlation and relation for selecting those features that assisted in improving the performance. Experimental results demonstrate that the proposed method is much better than extant methods at classification performance and stability.

引文

[1] Zhu Xiaofeng,Huang Zi,Shen Hengtao,et al. Dimensionality reduction by mixed kernel canonical correlation analysis[J]. Pattern Recognition,2012,45(8):3003-3016.
    [2] Zhu Xiaofeng,Zhang Shichao,Jin Zhi,et al. Missing value estimation for mixed-attribute data sets[J]. IEEE Trans on Knowledge and Data Engineering,2011,23(1):110-121.
    [3] Zhang Shichao. Shell-neighbor method and its application in missing data imputation[J]. Applied Intelligence,2011,35(1):123-133.
    [4] Zhu Xiaofeng,Li Xuelong,Zhang Shichao,et al. Robust joint graph sparse coding for unsupervised spectral feature selection[J]. IEEE Trans on Neural Networks&Learning Systems,2017,28(6):1263-1275.
    [5] Gheyas I A,Smith L S. Feature subset selection in large dimensionality domains[J]. Pattern Recognition,2010,43(1):5-13.
    [6] Tabakhi S,Moradi P,Akhlaghian F. An unsupervised feature selection algorithm based on ant colony optimization[J]. Engineering Applications of Artificial Intelligence,2014,32(6):112-123.
    [7] Zhang Shichao,Jin Zhi,Zhu Xiaofeng. Missing data imputation by utilizing information within incomplete instances[J]. Journal of Systems&Software,2011,84(3):452-459.
    [8] Unler A,Murat A,Chinnam R B. mr2PSO:a maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification[J]. Information Sciences,2011,181(20):4625-4641.
    [9] Zhang Shichao,Li Xuelong,Zong Ming,et al. Learning k for kNN classification[J]. ACM Trans on Intelligent Systems&Technology,2017,8(3):43.
    [10]Zhu Xiaofeng,Suk H,Wang Li,et al. A novel relational regularization feature selection method for joint regression and classification in AD diagnosis[J]. Medical Image Analysis,2017,38:205-214.
    [11]Shi Xiaoshuang,Guo Zhenhua,Lai Zhihui,et al. A framework of joint graph embedding and sparse regression for dimensionality reduction[J]. IEEE Trans on Image Processing,2015,24(4):1341-1355.
    [12]Zhu Xiaofeng,Huang Zi,Yang Yang,et al. Self-taught dimensionality reduction on the high-dimensional small-sized data[J]. Pattern Recognition,2013,46(1):215-229.
    [13] Pyatykh S,Hesser J,Zheng Lei. Image noise level estimation by principal component analysis[J]. IEEE Trans on Image Processing,2013,22(2):687-699.
    [14]Zhu Xiaofeng,Suk H,Shen Dinggang. A novel matrix-similarity based loss function for joint regression and classification in AD diagnosis[J]. Neuroimage,2014,100:91-105.
    [15]Zhu Xiaofeng,Zhang Lei,Huang Zi. A sparse embedding and least variance encoding approach to hashing[J]. IEEE Trans on Image Processing,2014,23(9):3737-3750.
    [16]Wang Tao,Qin Zhenxing,Zhang Shichao,et al. Cost-sensitive classification with inadequate labeled data[J]. Information Systems,2012,37(5):508-516.
    [17] Zhang Shichao. Decision tree classifiers sensitive to heterogeneous costs[J]. Journal of Systems&Software,2012,85(4):771-779.
    [18]Zhu Xiaofeng,Suk H,Lee S W,et al. Subspace regularized sparse multitask learning for multiclass neurodegenerative disease identification[J]. IEEE Trans on Biomedical Engineering,2016,63(3):607-618.
    [19]Zhu Xiaofeng,Li Xuelong,Zhang Shichao. Block-row sparse multiview multilabel learning for image classification[J]. IEEE Trans on Cybernetics,2016,46(2):450-461.
    [20] UCI repository of machine learning datasets[EB/OL].[2016-05-27]. http://archive. ics. uci. edu/ml/.
    [21]Wang De,Nie Feiping,Huang Heng. Unsupervised feature selection via unified trace ratio formulation and K-means clustering(TRACK)[C]//Proc of Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2014:306-321.
    [22]Zhu Pengfei,Zuo Wangmeng,Zhang Lei,et al. Unsupervised feature selection by regularized self-representation[J]. Pattern Recognition,2015,48(2):438-446.
    [23]Chang Xiaojun,Nie Feiping,Yang Yi,et al. A convex formulation for semi-supervised multi-label feature selection[C]//Proc of the28th AAAI Conference on Artificial Intelligence.[S. l.]:AAAI Press,2014:1171-1177.
    [24]Cai Xiao,Nie Feiping,Huang Heng. Exact top-k feature selection via l2,0-norm constraint[C]//Proc of the 23rd International Joint Conference on Artificial Intelligence.[S. l.]:AAAI Press,2013:1240-1246.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700