基于信息理论的特征选择算法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
图像处理、信息检索以及生物信息学等大规模机器学习问题的不断涌现,对已有的特征选择算法和机器学习算法提出了严峻的挑战,迫切需要适应大规模数据集的准确性和运行效率等综合性能较好的特征选择算法以及机器学习算法。本文在高维数据的特征选择以及无监督的动态特征选择方面开展了研究。
     本文首先介绍了信息理论和特征选择的基础知识,并且介绍了几个典型的特征选择方法,其中ReliefF算法被公认为一种简单高效的Filter类型的特征选择算法。
     针对ReliefF算法的不足,利用信息论中的散度对其进行了改进,在相同的时间复杂度下,使得结果的有效性得到了一定的改善。为了弥补Individual Evaluation结果的有效性较差和Subset Evaluation的效率较低的缺点,提出了两步法的特征选择框架,并且实现了去除冗余特征的算法,在保证结果有效性的前提下相对于Subset Evaluation大大降低了时间复杂度。
     对无监督的特征选择算法进行了尝试性的研究,在增加无标签样本的情况下实现了对特征集合的自动修正,验证了这种实验方法的可行性。
The emergence of high-dimensional machine learning fields such as image processing,information retrieval and bioinformatics pose severe challenges to the existing feature selection and machine learning algorithms. This dissertation mainly studies on feature selection and dynamical feature selection with unsupervised learning.
     In this dissertation,we firstly review the basic knowledge of information theory and feature selection algorithm,and detaily introduce some feature selection algorithms,in which ReliefF algorithm is considered to be an efficient one of Filter category.
     Given the shortcoming of ReliefF method,we improve it by using Kullback divergence,which is an important concept in information theory.This improving makes the feature selection result more efficient.In order to obtain more effective feature subset in a short time,we promote a new feature selection framework which can be achieved in two steps. Algorithm using this framework can remove redundant features effectively and obviously reduce the time complexity comparing with subset evaluation algorithm.
     We also do some study in unsupervised learning feature selection method,which can automatically amend the feature subset when there are new unlabelled instances are added.
引文
[1] Langley P,Iba W. Average-case analysis of a nearest neighbour algorithm. Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, 1993,2:889-894
    [2] Kohavi R,John G H.Wrappers for feature subset selec Intelligence journal,special issue on relevance, 1997,97(1-2):273-324
    [3] Jain A k,Duin R,Mao J C.Statistical pattern recognition:a review.IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22(1):4-37
    [4] Liu H,Motoda H.Feature Selection for Knowledge Discovery and Data Mining. Boston:Kluwer Academic Publishers, 1998
    [5] Cover T M.The best two independent measurements are not the two best.IEEE Transactions on System,Man and Cybernetics, 1974,4(1): 116-117
    [6] Kohavi R,Frasca B.Useful feature subsets and rough set reducts.The Third International Workshop on Rough Sets and Soft Computing, 1994:310-317
    [7] Setiono R,Liu H.Neural-Network feature selector.IEEE Transactions on Neural Networks, 1997,8(3):654-662
    [8] Weston J,Mukherjee S,et al.Feature selection for SVMS.In Advances in Neural Information Processing Systems.2000,13:668-674
    [9] Lee H M,Chen C M,et al.An efficient fuzzy classifier with feature selection based on fuzzy entropy.IEEE Transactions on Systems and Cybernetics-Part B:Cybernetics, 2001,31(3):426-432
    [10] Koller D,Sahami M.Toward optimal feature selection.In:Saitta L,Eds. Proceedings of the Thirteenth International Conference on Machine Learning. San Fransisco:Morgan Kaufmann, 1996,284-292
    [11] Huang Y,Shian-Shyong T,Wu G,et al.A two-phase feature selection methods using both filter and wrapper.Proceedings of the 1999 IEEE International Conference on Systems,Man,and Cybernetics. 1999,2:132-136
    [12] Davies S,Russl S.Np-completeness of searches for smallest possible feature sets Proceedings of the AAAI Fall 94 Symposium on Relevance.Menlo Park,CA.:AAAI Press 1994,37-39
    [13] Atkeson C,Moore A,Schaal S.Locally weighted learning.Artificial Intelligence Review,1997,11(1-5):11-73
    [14] Friedman J.On bias,variance,0/1-loss and the curse of dimensionality.Data Mining and Knowledge Discovery, 1997,1 (1):55-77
    [15] Blum A L.Learning boolean functions in an infinite attribute space.Machine Learning, 1992,9(4):373-386
    [16] Quinlan J R.Learning efficient classification procedures and their application to chess end games.Machine Learning:An artificial intelligence approach,San Francisco,CA Morgan Kaufmann, 1983,463-482
    [17] Breiman L,Friedman J H,et al.Classification and Regression Trees.Wadsforth International Group, 1984.
    [18] John G,Kohavi R,Pfleger K. Irrelevant features and the subset selection problem.In:Cohen W W,Hirsh H,Eds.The Eleventh International Conference on Machine Learning. San Fransisco:Morgan Kaufmann, 1994,121 -129
    
    [19] Aha D W,Bankert R L.Feature selection for case-based classification of cloud types An empirical comparison.In:Aha D W Eds.In Working Notes of the AAAI94 Workshop on Case-based Reasoning.Menlo Park,CA.:AAAI Press, 1994,106-112
    
    [20] Provan G M,Singh M.Learning bayesian networks using feature selection. Proceedings of the 5th International Workshop on AI and Statistics, 1995,450-456
    
    [21] Caruana R A,Freitag D.Greedy attribute selection.In:Cohen W W,Hirsh H Eds.Proceedings of the 11th International Conference on Machine Learning.San Fransisco Morgan Kaufmann, 1994,28-36
    
    [22] Moore A W,Lee M S.Efficient algorithms for minimizing cross validation error. In:Cohen W W,Hirsh H,Eds.Proceedings of the 17th International Conference on Machine Learning.San Fransisco:Morgan Kaufmann, 1994,190-198
    
    [23] Lewis P M,The characteristic selection problem in recognition system. IRE Transaction on Information Theory, 1962,8:171 -178
    
    [24] Torrieri D.The eigenspace separation transform for neural-network classifiers. Neural Networks,2000,12(3):419-427
    
    [25] Natendra P,Fakunaga K.A branch and bound algorithm for feature subset selection. IEEE Transations on Computers,1977,26(9):917-922
    
    [26] Pudil P,Novovicova J,Kittler J. Floating search methods in feature selection. Pattern Recognition Letters, 1994,15(11): 1119-1125
    
    [27] Somol P,Pudil P,Novovicova J,et al.Adaptive floating search methods in feature selection.Pattern Recogniton Letters,1999,20(11-13):1157-1163
    
    [28] Yang J,Vasant H.Feature subset selection using a genetic algorithm.IEEE Intelligent Systems,1998,13:44-49
    
    [29] Kudo M,Jack S.Comparison of algorithms that select features for patternclassifiers.Pattern Recoginiton,2000,33:25-41
    
    [30] Almuallim H,Dietterich T G.Learning with many irrelevant features.In: Fisher D,Lenz J H,Eds.Proceeding of the 9th National Conference on Artificial Intelligence.Menlo Park,CA:AAAI Press,1991,547-552
    
    [31] Hamming R W.Coding and information theory.Englewood Cliffs,NJ: Prentice-Hall, 1986
    
    [32] Domingos P.The role of Occam's razor in knowledge discovery.Data Mining and Knowledge Discovery, 1999,3(4):409-425
    
    [33] Kira K,Rendell L A.The feature selection problem:traditional methods and a new algorithm.Proceedings of the Ninth National conference on Artificial Intelligence, 1992: 129-134
    
    [34] Kononenko I.Estimation attributes:analysis and extensions of RELIEF. Proceedings of the 1994 European Conference on Machine Learning,1994,171-182
    
    [35] http://clopinet.com/isabelle/Projects/NIPS2001/
    
    [36] http://mlearn.ics.uci.edu/databases/letter-recognition/
    
    [37] M. A. Hall and L. A. Smith. Practical feature subset selection for machine learning. In Proc. Of the 21st Australian Computer Science Conference, 1998.
    铩颷38]http://www.cs.comell.edu/People/tj/
    [39]章毓晋 基于内容的视觉信息检索 科学出版社2003,6-14
    [40]张丽新.高维数据的特征选择及基于特征选择的集成学习研究[学位论文].全国 420家博士培养单位的博士学位论文.1999.11-22
    [41]杨福生 独立分量分析的原理与应用 清华大学出版社2006 20-35
    [42]U.M.Fayyad and K..B.Irani Multi_intervaldiscretisation of continous_value attributes.InProc of the Thirteenth International Joint conference on Artificial Intelligence,1993.