特征降维技术的研究与进展
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Research and Development of Feature Dimensionality Reduction
  • 作者:黄铉
  • 英文作者:HUANG Xuan;School of Information Science and Technology,Southwest Jiaotong University;
  • 关键词:降维 ; 特征选择 ; 特征提取 ; 研究进展
  • 英文关键词:Dimensionality reduction;;Feature selection;;Feature extraction;;Research progress
  • 中文刊名:JSJA
  • 英文刊名:Computer Science
  • 机构:西南交通大学信息科学与技术学院;
  • 出版日期:2018-06-15
  • 出版单位:计算机科学
  • 年:2018
  • 期:v.45
  • 语种:中文;
  • 页:JSJA2018S1004
  • 页数:7
  • CN:S1
  • ISSN:50-1075/TP
  • 分类号:29-34+66
摘要
数据特征的质量会直接影响模型的准确度。在模式识别领域,特征降维技术一直受到研究者们的关注。随着大数据时代的到来,数据量巨增,数据维度不断升高。在处理高维数据时,传统的数据挖掘方法的性能降低甚至失效。实践表明,在数据分析前先对其特征进行降维是避免"维数灾难"的有效手段。降维技术在各领域被广泛应用,文中详细介绍了特征提取和特征选择两类不同的降维方法,并对其特点进行了比较。通过子集搜索策略和评价准则两个关键过程对特征选择中最具代表性的算法进行了总结和分析。最后从实际应用出发,探讨了特征降维技术值得关注的研究方向。
        Quality of data characteristics directly impacts the accuracy of the model.In the field of pattern recognition,dimensionality reduction technique is always the focus of researchers.At the era of big data,massive data needs to be processed while the dimension of the data is rising.The performance of the traditional methods of data mining is degraded or losing efficiency for processing high dimensional data.Studies show that dimensionality reduction technology can be implemented to effectively avoid the"Curse of Dimensionality"in data analysis,thus it has wild application.This paper gave detailed description about two dimensionality reduction methods which are feature selection and feature extraction,in addition,a thoroughly comparison about the feature of these two methods was performed.Feature selection algorithm was summarized and analyzed by two key steps of algorithm,which are searching strategy and evaluation criterion.Finally,the direction for future research of the dimensionality reduction was discussed based on its practical application.
引文
[1]SHEIK A.A Survey on Evolutionary Techniques for Feature Selection[C]∥IEEE Conference on Emerging Devices and Smart Systems.Tiruchengode India:IEEE Press,2017.
    [2]SAMINA K,TEHMINA K.A Survey of Feature Selection and Feature Extraction Techniques in Machine Learning[C]∥Science and Information Conference.London:IEEE Press,2014:372-378.
    [3]JOLLIFFE I T.Principal component analysis[M].Berlin:Springer-Verlag,1986.
    [4]DUDA R O,HART P E,STORK D G.Pattern Classification(2nd Edition)[M]∥En Broeck the Statistical Mechanics of Learning Rsity.2000:32-39.
    [5]COMON P.Independent component analysis,a new concept[J].Signal Processing,1994,36(3):287-314.
    [6]BRONSTEIN A M,BRONSTEIN M M,KIMMEL R.Generalized multidimensional scaling:a framework for isometry-invariant partial surface matching[J].Proceedings of the National Academy of Sciences of the United States of America,2006,103(5):1168-1172.
    [7]WANG J Y.Geometric structure of high-dimensional data and dimensionality reduction[M].New York:Springer Heidelberg,2011:131-147.
    [8]SCHOLKOPF B,SMOLA A,MULLER K R.Nonlinear Component Analysis as a Kernel Eigenvalue Problem[J].Neural Computation,1998,10(5):1299-1319.
    [9]MIKA S,RATSCH E,WESTON J,et al.Fisher Discriminant Analysis with Kernels[C]∥Proceedings of IEEE Workshop Neural Networks for Signal Processing.1999:41-48.
    [10]WEINBERGER K Q,SAUL L K.Unsupervised learning of image manifolds by semidefinite programming[J].International Journal of Computer Vision,2006,70(1):77-90.
    [11]TENENBAUM J B,SILVA V,UNGFORD J C.A global geometric framework for nonlinear dimensionality reduction[J].Science,2000,290(12):2319-2323.
    [12]ROWEIS S T,SAUL L K.Nonlinear dimensionality reduction by locally linear embedding[J].Science,2000,290(5500):2323-2326.
    [13]BELKIN M.Problems of learning on manifolds[D].Chicago:The University of Chicago,2003.
    [14]HE X F,NIYOGI P.Locality preserving projections[C]∥Advances in Neural Information Processing Systems 16.Vancouver,Canada:MIT Press,2003:153.
    [15]DONOHO D L,GRIMES C.Hessian Eigenmaps:New Locally Linear Embedding Techniques for High-dimensional Data[J].Proceedings of the National Academy of Sciences of the Unite States of America,2003,100(10):5591-5596.
    [16]MOALLEN P,AYOUGHI S A.Removing potential flat spots on error surface of multilayer perceptron(MLP)neural networks[J].International Journal of Computer Mathematics,2011,88(1/3):21-36.
    [17]JUNCHIN A,ANDRI M.Supervised,Unsupervised,and SemiSupervised Feature Selection:A Review on Gene Selection[J].Transactions on Computational Biology and Bioinformatics,2016,13(5):971-989.
    [18]SUN Z H,GEORGE B,RONALD M.Object detection using feature subset selection[J].Pattern Recognition,2004,37(11):2165-2176.
    [19]CAI Z Y,YU J G,LI X P,et al.Feature selection algorithm based on kernel distance measure[J].Pattern Recognition and Artificial Intelligence,2010,23(2):235-240.
    [20]PUDIL P,NOVOVICOVA J,KITTLER J.Floating Search Methods in Feature Selection[J].Pattern Recognition Letters,1994,15(11):1119-1125.
    [21]LIU H,YU L.Toward integrating feature selection algorithms for classification and clustering[J].IEEE Transactions on Knowledge and Data Engineering,2005,17(4):491-502.
    [22]KOLLER D,SAHAMI M.Toward optimal feature selection[C]∥Thirteenth International Conference on International Conference on Machine Learning.Morgan Kaufmann Publishers Inc.,1996:284-292.
    [23]MITRA P,MURTHY C A,SANKAR K P.Unsupervised feature selection using feature similarity[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,24(3):301-312.
    [24]GUYON I,WESTON J,BARNHILL S,et al.Gene selection for cancer classification using support vector machines[J].Machine Learning,2002,46(1):389-422.
    [25]YANG J B,ONG C J.Feature selection for support vector regression using probabilistic prediction[C]∥16 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2010:343-352.
    [26]SHEN K Q,CHONG C J,LI X P,et al.Feature selection via sensitivity analysis of SVM probabilistic outputs[J].Machine Learning,2008,70(1):1-20.
    [27]FORMAN G.An extensive empirical study of feature selection metrics for text classification[J].Journal of Machine Learning Research,2003,3:1289-1305.
    [28]NG A Y.Feature selection,L1vs.L2regularization,and rotational invariance[C]∥Proceedings of the Twenty-first International Conference on Machine Learning.New York:ACM,2004:78.
    [29]MANGASARIAN O L,WILD E W.Feature Selection for Nonlinear Kernel Support Vector Machines[C]∥Seventh IEEE International Conference on Data Mining-workshops.2007:231-236.
    [30]WANG L F,SHEN X T.Multi-category support vector machines,feature selection and solution path[J].Statistica Sinica,2006,16(2):617-633.
    [31]LEUNG Y,HUNG Y.A multiple-filter-multiple-wrapper approach to gene selection and microarray data classification[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2010,7(1):108-117.
    [32]LAZAR C,TAMINAU J,MEGANCK S,et al.A survey on filter techniques for feature selection in gene expression microarray analysis[J].IEEE/ACM Transactions on computational Biology and Bioinformatics,2012,9(4):1106-1119.
    [33]SHEN Q,DIAO R,SU P.Feature Selection Ensemble[C]∥Turing.2012:289-306.
    [34]LI G Z,YANG J Y.Feature selection for ensemble learning and its application[M]∥Machine Learning in Bioinformatics.2008:135-155.
    [35]PENG Y H,WU Z Q,JIANG J M.A novel feature selection approach for biomedical data classification[J].Journal of Biomedical Informatics,2010,43(1):15-23.
    [36]CHIN A J,MIRZAL A,et al.Supervised Unsupervised,and Semi-Supervised Feature Selection:A Review on Gene Selection[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2016,13(5):971-989.
    [37]OPITZ D W.Feature Selection for Ensembles[C]∥Proceedings of National Conference on Artificial Intelligence.Orlando,FL,1999:379-384.
    [38]ABEEL T,HELLEPUTTE T,VAN D P Y,et al.Robust biomarker identification for cancer diagnosis with ensemble feature selection methods[J].IEEE/ACM Transactions on computational Biology and Bioinformatics,2010,26(3):392-398.
    [39]WONG H S,ZHANG S,SHEN Y,et al.A New Unsupervised Feature Ranking Method for Gene Expression Data Based on Consensus Affinity[J].IEEE/ACM Transactions on Computational Biology&Bioinformatics,2012,9(4):1257-1263.
    [40]张靖,胡学钢,张玉红,等.K-split Lasso:有效的肿瘤特征基因选择方法[J].计算机科学与探索,2012,6(12):1136-1143.
    [41]JIN L L,LIANG H.Deep Learning for Underwater Image Recognition in Small Sample Size Situations[C]∥IEEE Conference on Oceans.Aberdeen UK:IEEE Press,2017.
    [42]HINTON G.Reducing the Dimensionality of Data with Neural Networks[J].Science,2016,313(5786):504-507.
    [43]孙志远,鲁成祥,史忠植,等.深度学习研究与进展[J].计算机科学,2016,43(2):1-8.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700