基于特征间合作度的非监督特征选择算法

英文题名：An Unsupervised Feature Selection Method Based on the Degree of Feature Cooperation
作者：颜廷旭
论文级别：硕士
学科专业名称：计算机软件与理论
中文关键词：非监督过滤器 ; 特征间合作度 ; 特征选择 ; 维数约减
英文关键词：unsupervised filter ; degree of feature cooperation ; feature selection ; dimension reduction
学位年度：2010
导师：侯越先
学科代码：081202
学位授予单位：天津大学
论文提交日期：2010-11-01

摘要

特征选择作为维数约减领域的一个重要分支,对增加机器学习结果的精确度和提高计算效率有着显著的作用。虽然特征选择算法已在监督条件下被广泛研究,然而在非监督条件下,由于缺少类别信息而使这项任务显得犹为困难。目前大多数非监督特征选择算法的思想都旨在通过消除噪声特征和冗余特征这种间接手段而获得有益特征的目的,但噪声和冗余并不总是能被算法同时消除。本文对特征选择思想做出了重新的解读,认为采用直接选择有益特征的手段不仅自然地可以同时消除冗余和噪声,而且能够显式地对所选特征之间的相互关系进行建模,概念上也更为明确。基于这种思想,本文假设整个特征空间的信息可以通过可互补的特征子集表达出来,进而试图通过一种基于特征间合作度的非监督特征选择过滤器方法以选择出其中一个互补特征子集。通过定义合作度的概念,本文首先对特征之间的相互关系进行描述,并基于此概念区分出互补特征。接着一种基于合作度概念且借助了层次聚类思想的算法框架在文章中被提出,试图选择出一组相互间合作度最大且满足模数要求的特征子集,并以此作为一组可互补的特征子集。本文在随后的篇幅中也给出了关于此算法对抑制引入噪声和冗余的分析,从而在理论上说明了该算法从本质上与消除噪声和冗余特征的思想在根本上是一致的。在对比实验中,该算法与其它主流非监督特征选择方法的效果优劣在九个不同的数据集上加以评估,并从结果中证实了本算法的有效性。最后,文章在总结了全文的同时也给出了该算法可能的改进方向,以及一些可能被进一步加以研究的新课题。
As a branch of dimension reduction, feature selection plays an important role in improving the accuracy of machine learning and reducing the time complexity. Although it has been widely studied in supervised configuration, in unsupervised learning the task seems rather difficult due to the lack of real class labels. Most of the current feature selection methods tend to achieve the goal by means of eliminating noisy and redundant features, but the two kinds of features cannot be always removed simultaneously. The paper reinterprets the basic idea of feature selection and considers that directly selecting the useful features instead can not only remove the above two kinds of features but also model the interactions between features which induces a more explicit concept of feature selection. On the basis of this idea, the paper assumes the whole feature space could be approximately represented by any set of complementary features and then proposes an unsupervised feature selection method based on the degree of feature cooperation to select one of them in the direct manner. By defining the concept of the degree of feature cooperation, the paper describes the interaction between features and discriminates the complimentary features from the rest at the first step. Next, a framework based on this concept and the essential idea of hierarchical clustering is proposed, aiming at selecting a feature subset with maximal degree of feature cooperation and considering it as a complementary feature subset. In the following section of this paper, the analysis about the suppression of noisy and redundant features is given to demonstrate the idea in this paper is essentially consistent with the previous work. In the comparative experiments, the proposed method and some other state-of-art unsupervised feature selection methods are evaluated on nine different datasets and the result show the effectiveness of our method. Finally, the paper summarizes the full text and gives some possible directions for further improvement and some potential research topics related to this field.

引文

[1]Padraig Cunningham.Dimention Reduction.Ireland: University College Dublin, 2007.
    [2]Hou, Y., Zhang, P., Yan, T., Li, W. and Song, D. Beyond Redundancies: A Metric-Invariant Method for Unsupervised Feature Selection.IEEE Transactions on Knowledge and Data Engineering (TKDE), 2010, 22(3):348~364.
    [3]Lindsey Smith.A Tutorial on Principal Component Analysis.http://www.cs. otago.ac.nz/cosc453/student_tutorials/principal_components.pdf.2008-5-20.
    [4]He X., Cai D. and Niyogi P..Laplacian score for feature selection.In: Advances in Neural Information Processing Systems.Cambridge:MIT Press, 2005.220~248.
    [5]Zhao Z. and Liu H..Spectral feature selection for supervised and unsupervised learning.In: Proceedings of ICML 2007.Corvalis,Oregon,USA:2007.1151~1157.
    [6]L. Cayton. Algorithms for manifold learning. USA:UCSD, 2005.
    [7]Tenenbaum, J.B., de Silva, V. and Langford, J.C.A global geometric framework for nonlinear dimensionality reduction. Science, 2000, 290:2319~2323.
    [8]Roweis, S. and Saul, L. Nonlinear dimensionality reduction by locally linear embedding. Science, 2000, 290:2323~2326.
    [9]Belkin, M. and Niyogi, P. Laplacian eigenmaps and spectral techniques for embedding and clustering. T.G. Dietterich, S. Becker and Z. Ghahramani (eds.). In: Advances in Neural Information Processing Systems 14, USA: MIT Press, 2002, 585~591.
    [10]Silva V., Tenenbaum J B.. Global versus local methods in nonlinear dimensionality reduction. In: Advances in Neural Information Processing Systems 15, USA:MIT Press, 2003, 705~712.
    [11]A. Blum and P. Langley.Selection of Relevant Features and Examples in Machine Learning.Artificial Intelligence,1997,97(1-2):245-271.
    [12]Ron Kohavi and George H. John.Wrappers for feature subset selection.Artificial Intelligence,1997,97(1–2):273~324.
    [13] Isabelle Guyon.An introduction to variable and feature selection.Journal of Machine Learning Research,2003,3(3):1157~1182.
    [14] Martin H. C. Law and Mario A. T. Figueiredo and Anil K. Jain.Simultaneous feature selection and clustering using mixture models.IEEE Trans. Pattern Anal. Machine Intell,2004,26(9):1154~1166.
    [15] Jennifer G. Dy, Carla E. Brodley and Stefan Wrobel.Feature selection for unsupervised learning.JMLR,2004,5:845~889.
    [16]Yang Y. and Pedersen J. O..A comparative study on feature selection in text categorization. In: Proc. of ICML-97.Nashville,Tennessee,USA:Morgan Kaufmann, 1997.412~420.
    [17]Liu T., Liu S., Chen Z., et al.An evaluation on feature selection for text clustering.In: Proceedings of ICML 2003.Washington,DC,USA:AAAI Press, 2003.488~495.
    [18]Mário A. T. Figueiredo and Anil K. Jain.Unsupervised Learning of Finite Mixture Models.IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,24:381~396.
    [19] P. Mitra, C. A. Murthy, S. K. Pal.Unsupervised feature selection using feature similarity.IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,24:301~312.
    [20]Hua-Liang Wei and Stephen A. Billings.Feature subset selection and ranking for data dimensionality reduction.IEEE Transactions on Pattern Analysis and Machine Intelligence,2007,29(1):162~170.
    [21]Hanchuan Peng,Fuhui Long and Chris Ding.Feature selection based on mutual information: criteria of Max-Dependency, Max-relevance and Min-Redundancy.IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(8):1226~1237.
    [22]Huan Liu and Lei Yu.Toward integrating feature selection algorithms for classification and clustering.IEEE Transactions on Knowledge and Data Engineering,2005,17(4):491~502.
    [23]蔡元龙.模式识别.西安:西安电子科技大学出版社,1986.50~55.
    [24]Manoranjan Dash, Kiseok Choi, Peter Scheuermann and Huan Liu.Feature Selection for Clustering-A Filter Solution. In: Proc. of ICDM-02.Maebashi TERRSA, Maebashi City, Japan:2002.115~122.
    [25]Tom Mitchell.Machine Learining.USA:McGraw Hill,1996.
    [26]Andrew Y. Ng and Michael I. Jordan and Yair Weiss.On Spectral Clustering: Analysis and an algorithm. In: Proc. of NIPS 2001, USA: MIT Press, 2001, 849~856.
    [27]Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. USA: Wiley-Interscience, 2006. 20~40.
    [28]D.D.Lewis and M. Ringuette.A Comparison of two learning algorithms for text categorization.In Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval.Las Vegas,USA:1994.81~93.
    [29]Trevor Hastie, Robert Tibshirani, Jerome H. Friedman. The elements of statistical learning: data mining, inference, and prediction. USA: Springer-Verlag, 2009.
    [30]Christopher M. Bishop. Pattern Recognition and Machine Learning. USA:Springer, 2006.
    [31]Stuart J. Russell, Peter Norvig. Artificial Intelligence: A Modern Approach. USA: Prentice Hall, 2009.
    [32]Hartigan, J. A., Wong, M. A. Algorithm AS 136: A K-Means Clustering Algorithm. Journal of the Royal Statistical Society, Series C (Applied Statistics), 1979, 28 (1): 100~108.
    [33]Ahl, Valerie; Allen, Timothy F. H.. Hierarchy Theory. New York: Columbia University Press, 1996.
    [34]Jiawei Han, Micheline Kamber. Data mining: concepts and techniques. USA: Morgan Kaufmann, 2001.
    [35]Wilcoxon, F.. Individual comparisons by ranking methods. Biometrics, 1945, 1: 80~83.
    [36]Walter Zucchini. An Introduction to Model Selection. Journal of Mathematical Psychology, 2000, 44: 41~61.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700