摘要
鉴于传统属性选择算法无法捕捉属性之间的关系的问题,文中提出了一种非线性属性选择方法。该方法通过引入核函数,将原始数据集投影到高维的核空间,因在核空间内进行运算,进而可以考虑到数据属性之间的关系。由于核函数自身的优越性,即使数据通过高斯核投影到无穷维的空间中,计算复杂度亦可以控制得较小。在正则化因子的限制上,使用两种范数进行双重约束,不仅提高了算法的准确率,而且使得算法实验结果的方差仅为0.74,远小于其他同类对比算法,且算法更加稳定。在8个常用的数据集上将所提算法与6个同类算法进行比较,并用SVM分类器来测试分类准确率,最终该算法得到最少1.84%,最高3.27%,平均2.75%的提升。
In view of the condition that the traditional feature selection algorithm can not capture the relationship between features,a nonlinear feature selection method was proposed.By introducing a kernel function,the method projects the original data set into a high-dimensional kernel space,and considers the relationship between sample features by performing operations in the kernel space.Due to the superiority of the kernel function,even if the data are projected into the infinite dimensional space through the Gaussian kernel,the computational complexity can be controlled to a small extent.For the limitation of the regularization factor,the use of two norms for double constraint not only improves the accuracy of the algorithm,but also makes the variance of the algorithm only be 0.74,which is much smaller than other similar comparison algorithms,and it is more stable.6similar algorithms were compared on 8common data sets,and the SVM classifier was used to test the effect.The results demonstrate that the proposed algorithm can get the improvement by a minimum of 1.84%,a maximum of 3.27%,and an average of 2.75%.
引文
[1] ZHU X,SUK H I,SHEN D.Matrix-Similarity Based Loss Function and Feature Selection for Alzheimer’s Disease Diagnosis[C]∥Computer Vision and Pattern Recognition.IEEE,2014:3089-3096.
[2] GU Q,LI Z,HAN J.Joint feature selection and subspace learning[C]∥International Joint Conference on Artificial Intelligence.AAAI Press,2011:1294-1299.
[3] ZHU X,HUANG Z,CHENG H,et al.Sparse hashing for fast multimedia search[J].Acm Transactions on Information Systems,2013,31(2):1-24.
[4] ZHU X,HUANG Z,YANG Y,et al.Self-taught dimensionality reduction on the high-dimensional small-sized data[J].Pattern Recognition,2013,46(1):215-229.
[5] PYATYKH S,HESSER J,ZHENG L.Image noise level estimation by principal component analysis[J].IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society,2013,22(2):687-699.
[6] KONIETSCHKE F,PAULY M.Bootstrapping and permuting paired t-test type statistics[J].Statistics&Computing,2014,24(3):283-296.
[7] LIIMATAINEN K,HEIKKILR,YLIHARJA O,et al.Sparse logistic regression and polynomial modelling for detection of artificial drainage networks[J].Remote Sensing Letters,2015,6(4):311-320.
[8] BENABDESLEM K,HINDAWI M.Constrained laplacian score for semi-supervised feature selection[C]∥Machine Learning and Knowledge Discovery in Databases-European Conference Proceedings.DBLP,2011:204-218.
[9] ZHANG S,CHENG D,ZONG M,et al.Self-representation nearest neighbor search for classification[J].Neurocomputing,2016,195(C):137-142.
[10]DENG Z,ZHANG S,YANG L,et al.Sparse sample self-representation for subspace clustering[J].Neural Computing&Applications,2018,29(11):43-49.
[11]VARMA M,BABU B R.More generality in efficient multiple kernel learning[C]∥International Conference on Machine Learning.ACM,2009:1065-1072.
[12]COMANICIU D,RAMESH V,MEER P P.Kernel-Based Object Tracking[J].Pattern Analysis&Machine Intelligence,2003,25(5):564-575.
[13]GONG Y H,ZONG M,ZHU Y H,et al.Knn regression based on mixed-norm reconstruction[J].Computer Applications and Software,2016(2):232-236.(in Chinese)龚永红,宗鸣,朱永华,等.基于混合模重构的kNN回归[J].计算机应用与软件,2016(2):232-236.
[14]WANG H,NIE F,HUANG H,et al.Sparse multi-task regression and feature selection to identify brain imaging predictors for memory performance[C]∥International Conference on Computer Vision.2011:557.
[15]GU Q,LI Z,HAN J.Linear discriminant dimensionality reduction[C]∥Joint European Conference on Machine Learning and Knowledge Discovery in Databases.Springer Berlin Heidelberg,2011:549-564.
[16]ZHU X,ZHANG L,HUANG Z.A sparse embedding and least variance encoding approach to hashing[J].IEEE Transactions on Image Processing,2014,23(9):3737-3750.
[17]ZHU X,SUK H I,SHEN D.A Novel Multi-relation Regularization Method for Regression and Classification in AD Diagnosis[C]∥International Conference on Medical Image Computing and Computer-Assisted Intervention.Springer International Publishing,2014:401-408.
[18]UCI repository of machine learning datasets[EB/OL].[2016-05-27].http://archive.icsuci.edu/ml.
[19]NIE F,HUANG H,CAI X,et al.Efficient and robust feature selection via joint2,1-norms minimization[C]∥International Conference on Neural Information Processing Systems.Curran Associates Inc.,2010:1813-1821.
[20]CHANG X,NIE F,YANG Y,et al.A convex formulation for semi-supervised multi-label feature selection[C]∥TwentyEighth AAAI Conference on Artificial Intelligence.AAAI Press,2014:1171-1177.
[21]CAI D,ZHANG C,HE X.Unsupervised feature selection for multi-cluster data[C]∥ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2010:333-342.
[22]YAMADA M,JITKRITTUM W,SIGAL L,et al.High-Dimensional Feature Selection by Feature-Wise Non-Linear Lasso[J].Neural Computation,2012,26(1):185-207.
[23]NIE F,ZHU W,LI X.Unsupervised feature selection with structured graph optimization[C]∥Thirtieth AAAI Conference on Artificial Intelligence.AAAI Press,2016:1302-1308.
[24]YANG Y,SHEN H T,MA Z,et al.l 2,1-norm regularized discriminative feature selection for unsupervised learning[C]∥International Joint Conference on Artificial Intelligence.AAAI Press,2011:1589-1594.
[25]LIBSVM-ALibrary for Support Vector Machinces[EB/OL].[2015-04-10].http://www/csie.ntu.edu.tw/~cjlin/libsvm.
[26]ZHAO Z,HE X,CAI D,et al.Graph Regularized Feature Selection with Data Reconstruction[J].IEEE Transactions on Knowledge&Data Engineering,2016,28(3):689-700.
[27]XUE H,SONG Y,XU H M.Multiple Indefinite Kernel Learning for Feature Selection[C]∥Twenty-Sixth International Joint Conference on Artificial Intelligence.2017:3210-3216.