摘要
特征选择是数据挖掘过程中非常重要的环节,针对特征选择过程中如何选取最优特征的问题,利用成对约束分算法,通过最优特征子集的选取和稳定性比较,对不同的数据集进行了实验。实验结果表明,在特征选择方面,成对约束分算法和其他方法相比还是有一定优势的,在稳定性方面,该方法应用于多个数据集均具有较好的效果。
In the process of data mining,feature selection is a greatly important step. One problem to be solved in feature selection is how to select the optimal features. By using Pairwise Constraint Score algorithm,an experiment has been performed on several different datasets to select the optimal features and make a comparison of stabilities. The results show that Pairwise Constraint Score algorithm is superior to other methods in feature selection and it can get better performance tested on many datasets in stability.
引文
[1]YU L,LIU H. Eficient Feature Selection Via Analysis of Relevance and Redundancy[J]. Journal of Machine Learning Research,2004,5(12):1205-24.
[2]ANG J C,MIRZAL A,HARON H,et al. Supervised,Unsupervised,and Semi-Supervised Feature Selection:A Review on Gene Selection[J]. IEEE/ACM Transactions on Computational Biology&Bioinformatics,2016,13(5):971-89.
[3]GHEYAS I A,SMITH L S. Feature subset selection in large dimensionality domains[J]. Pattern Recognition,2010,43(1):5-13.
[4]GUTKIN M,SHAMIR R,DROR G. SlimPLS:a method for feature selection in gene expression-based disease classification[J]. Plos One,2009,4(7):e6416.
[5]GUYON,ISABELLE,ELISSEEFF,et al. An introduction to variable and feature selection[J]. Journal of Machine Learning Research,2003,3(6):1157-82.
[6]SUN D,ZHANG D. Bagging Constraint Score for feature selection with pairwise constraints[J]. Pattern Recognition,2010,43(6):2106-18.
[7]SAEYS Y,PEER Y V,DR P,et al. Feature Selection for Classification of Nucleic Acid Sequences[J]. 2004,3:33-60.
[8]WEBB A R,COPSEY K D. Statistical Pattern Recognition,Third Edition[J]. John Wiley&Sons Inc,2002,265(2):183–90.
[9]周志华.机器学习[M].北京:清华大学出版社,2016:248-249.ZHOU Zhihua. Machine Learning[M]. Beijing:Tsinghua University Press,2016:248-249.
[10]LIN J,KAI L U,SUN Y. A Novel Relief Feature Selection Algorithm Based on Mean-Variance Model[J]. System Simulation Technology,2013,8(16):3921-9.
[11]NANDI G. An enhanced approach to Las Vegas Filter(LVF)feature selection algorithm[C]//Emerging Trends and Applications in Computer Science. IEEE,2011:1-3.
[12]YAMADA M,JITKRITTUM W,SIGAL L,et al. High-dimensional feature selection by feature-wise kernelized Lasso[J]. Neural Computation,2014,26(1):185-207.
[13]ZHANG D,ZHOU Z-H,CHEN S. Semi-Supervised Dimensionality Reduction[M]. Proceedings of the 2007 SIAM International Conference on Data Mining,2007:629-34.
[14]AHARON,ILHERTZ C H A,TOMER,et al. Learning a Mahalanobis Metric from Equivalence Constraints[J].Journal of Machine Learning Research,2005,6(6):937-65.
[15]ZHANG D,CHEN S,ZHOU Z H. Constraint Score:A new filter method for feature selection with pairwise constraints[J]. Pattern Recognition,2008,41(5):1440-51.
[16]Blake C,Keogh E,Merz C J.UCI repository of machine learning database[EB/OL].(1998). http://www.ics.uci.edu/mlearn/ML Repository.html.
[17]KALOUSIS A,PRADOS J,HILARIO M. Stability of Feature Selection Algorithms[M]. Proceedings of the Fifth IEEE International Conference on Data Mining. IEEE Computer Society. 2005:218-25.
[18]YANG P,ZHOU B B,YANG J Y-H,et al. Stability of Feature Selection Algorithms and Ensemble Feature Selection Methods in Bioinformatics[M]. Biological Knowledge Discovery Handbook. John Wiley&Sons, Inc.2013:333-52.
[19]FERRET O. Combining bootstrapping and feature selection for improving a distributional thesaurus[M]. Proceedings of the 20th European Conference on Artificial Intelligence. Montpellier,France;IOS Press. 2012:336-41.
[20]MAO S,JIAO L C,XIONG L,et al. Greedy optimization classifiers ensemble based on diversity[J]. Pattern Recognition,2011,44(6):1245-61.