基于核函数的稀疏属性选择算法

英文篇名：Sparse Feature Selection Algorithm Based on Kernel Function
作者：张善文 ; 文国秋 ; 张乐园 ; 李佳烨
英文作者：ZHANG Shan-wen;WEN Guo-qiu;ZHANG Le-yuan;LI Jia-ye;Guangxi Key Lab of Multi-source Information Mining &Security,College of Computer Science and Information Engineering,Guangxi Normal University;
关键词：核函数 ; 属性选择 ; 稀疏 ; L1范数 ; L2 ; 1范数
英文关键词：Kernel function;;Feature selection;;Sparse;;L1-norm;;L2,1-norm
中文刊名：JSJA
英文刊名：Computer Science
机构：广西师范大学计算机科学与信息工程学院广西多源信息挖掘与安全重点实验室;
出版日期：2019-02-15
出版单位：计算机科学
年：2019
期：v.46
基金：国家自然科学基金(61170131,61263035,61573270,90718020);; 中国博士后基金(2015M570837);; 广西自然科学基金(2015GXNSFCB139011,2015GXNSFAA139306);; 国家重点研发计划资助项目(2016YFB1000905);; 广西科技基地与人才计划项目(Guike 541804573)资助
语种：中文;
页：JSJA201902014
页数：6
CN：02
ISSN：50-1075/TP
分类号：71-76

摘要

鉴于传统属性选择算法无法捕捉属性之间的关系的问题,文中提出了一种非线性属性选择方法。该方法通过引入核函数,将原始数据集投影到高维的核空间,因在核空间内进行运算,进而可以考虑到数据属性之间的关系。由于核函数自身的优越性,即使数据通过高斯核投影到无穷维的空间中,计算复杂度亦可以控制得较小。在正则化因子的限制上,使用两种范数进行双重约束,不仅提高了算法的准确率,而且使得算法实验结果的方差仅为0.74,远小于其他同类对比算法,且算法更加稳定。在8个常用的数据集上将所提算法与6个同类算法进行比较,并用SVM分类器来测试分类准确率,最终该算法得到最少1.84%,最高3.27%,平均2.75%的提升。
In view of the condition that the traditional feature selection algorithm can not capture the relationship between features,a nonlinear feature selection method was proposed.By introducing a kernel function,the method projects the original data set into a high-dimensional kernel space,and considers the relationship between sample features by performing operations in the kernel space.Due to the superiority of the kernel function,even if the data are projected into the infinite dimensional space through the Gaussian kernel,the computational complexity can be controlled to a small extent.For the limitation of the regularization factor,the use of two norms for double constraint not only improves the accuracy of the algorithm,but also makes the variance of the algorithm only be 0.74,which is much smaller than other similar comparison algorithms,and it is more stable.6similar algorithms were compared on 8common data sets,and the SVM classifier was used to test the effect.The results demonstrate that the proposed algorithm can get the improvement by a minimum of 1.84%,a maximum of 3.27%,and an average of 2.75%.

引文

[1] ZHU X,SUK H I,SHEN D.Matrix-Similarity Based Loss Function and Feature Selection for Alzheimer’s Disease Diagnosis[C]∥Computer Vision and Pattern Recognition.IEEE,2014:3089-3096.
    [2] GU Q,LI Z,HAN J.Joint feature selection and subspace learning[C]∥International Joint Conference on Artificial Intelligence.AAAI Press,2011:1294-1299.
    [3] ZHU X,HUANG Z,CHENG H,et al.Sparse hashing for fast multimedia search[J].Acm Transactions on Information Systems,2013,31(2):1-24.
    [4] ZHU X,HUANG Z,YANG Y,et al.Self-taught dimensionality reduction on the high-dimensional small-sized data[J].Pattern Recognition,2013,46(1):215-229.
    [5] PYATYKH S,HESSER J,ZHENG L.Image noise level estimation by principal component analysis[J].IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society,2013,22(2):687-699.
    [6] KONIETSCHKE F,PAULY M.Bootstrapping and permuting paired t-test type statistics[J].Statistics&Computing,2014,24(3):283-296.
    [7] LIIMATAINEN K,HEIKKILR,YLIHARJA O,et al.Sparse logistic regression and polynomial modelling for detection of artificial drainage networks[J].Remote Sensing Letters,2015,6(4):311-320.
    [8] BENABDESLEM K,HINDAWI M.Constrained laplacian score for semi-supervised feature selection[C]∥Machine Learning and Knowledge Discovery in Databases-European Conference Proceedings.DBLP,2011:204-218.
    [9] ZHANG S,CHENG D,ZONG M,et al.Self-representation nearest neighbor search for classification[J].Neurocomputing,2016,195(C):137-142.
    [10]DENG Z,ZHANG S,YANG L,et al.Sparse sample self-representation for subspace clustering[J].Neural Computing&Applications,2018,29(11):43-49.
    [11]VARMA M,BABU B R.More generality in efficient multiple kernel learning[C]∥International Conference on Machine Learning.ACM,2009:1065-1072.
    [12]COMANICIU D,RAMESH V,MEER P P.Kernel-Based Object Tracking[J].Pattern Analysis&Machine Intelligence,2003,25(5):564-575.
    [13]GONG Y H,ZONG M,ZHU Y H,et al.Knn regression based on mixed-norm reconstruction[J].Computer Applications and Software,2016(2):232-236.(in Chinese)龚永红,宗鸣,朱永华,等.基于混合模重构的kNN回归[J].计算机应用与软件,2016(2):232-236.
    [14]WANG H,NIE F,HUANG H,et al.Sparse multi-task regression and feature selection to identify brain imaging predictors for memory performance[C]∥International Conference on Computer Vision.2011:557.
    [15]GU Q,LI Z,HAN J.Linear discriminant dimensionality reduction[C]∥Joint European Conference on Machine Learning and Knowledge Discovery in Databases.Springer Berlin Heidelberg,2011:549-564.
    [16]ZHU X,ZHANG L,HUANG Z.A sparse embedding and least variance encoding approach to hashing[J].IEEE Transactions on Image Processing,2014,23(9):3737-3750.
    [17]ZHU X,SUK H I,SHEN D.A Novel Multi-relation Regularization Method for Regression and Classification in AD Diagnosis[C]∥International Conference on Medical Image Computing and Computer-Assisted Intervention.Springer International Publishing,2014:401-408.
    [18]UCI repository of machine learning datasets[EB/OL].[2016-05-27].http://archive.icsuci.edu/ml.
    [19]NIE F,HUANG H,CAI X,et al.Efficient and robust feature selection via joint2,1-norms minimization[C]∥International Conference on Neural Information Processing Systems.Curran Associates Inc.,2010:1813-1821.
    [20]CHANG X,NIE F,YANG Y,et al.A convex formulation for semi-supervised multi-label feature selection[C]∥TwentyEighth AAAI Conference on Artificial Intelligence.AAAI Press,2014:1171-1177.
    [21]CAI D,ZHANG C,HE X.Unsupervised feature selection for multi-cluster data[C]∥ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2010:333-342.
    [22]YAMADA M,JITKRITTUM W,SIGAL L,et al.High-Dimensional Feature Selection by Feature-Wise Non-Linear Lasso[J].Neural Computation,2012,26(1):185-207.
    [23]NIE F,ZHU W,LI X.Unsupervised feature selection with structured graph optimization[C]∥Thirtieth AAAI Conference on Artificial Intelligence.AAAI Press,2016:1302-1308.
    [24]YANG Y,SHEN H T,MA Z,et al.l 2,1-norm regularized discriminative feature selection for unsupervised learning[C]∥International Joint Conference on Artificial Intelligence.AAAI Press,2011:1589-1594.
    [25]LIBSVM-ALibrary for Support Vector Machinces[EB/OL].[2015-04-10].http://www/csie.ntu.edu.tw/~cjlin/libsvm.
    [26]ZHAO Z,HE X,CAI D,et al.Graph Regularized Feature Selection with Data Reconstruction[J].IEEE Transactions on Knowledge&Data Engineering,2016,28(3):689-700.
    [27]XUE H,SONG Y,XU H M.Multiple Indefinite Kernel Learning for Feature Selection[C]∥Twenty-Sixth International Joint Conference on Artificial Intelligence.2017:3210-3216.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700