基于模糊熵的特征选择方法的研究与实现

英文题名：Research and Implementation of Fuzzy Entropy-based Feature Selection Method
作者：李想
论文级别：硕士
学科专业名称：计算机软件与理论
中文关键词：模式分类 ; 模糊熵 ; 特征选择 ; 蚁群算法
英文关键词：Pattern Classification ; Fuzzy Entropy ; Feature selection ; Ant Colony Algorithm
学位年度：2010
导师：林和平
学科代码：081202
学位授予单位：东北师范大学
论文提交日期：2010-05-01

摘要

特征选择在模式分类过程中发挥着重要作用,选择的特征正确与否直接关系到分类结果的正确率,因此特征选择方法直接影响着系统的性能和质量。但是目前的多数特征选择方法都存在容易陷入局部最优的问题,为了能够快速准确的找到最小特征子集从而更好的实现分类,本文对特征选择方法进行了大胆的探索和尝试。
     蚁群优化算法作为一种群集智能算法,具有群集智能算法能够解决大多数全局优化算法的优点,蚁群优化算法是一种求解组合优化问题的新型通用启发式方法,蚁群优化算法具有正反馈、分布式计算和自组织性的特点,它是一种贪婪启发式搜索方法。而在实际应用中我们发现蚁群优化算法搜索时间较长,所以选择正确有效的信息素和约束条件变得十分重要。蚁群优化算法可以分为两个基本阶段:适应阶段和协作阶段。在适应阶段,各个候选解根据所积累的信息不断地调整自身结构。而在协作阶段,各个候选解之间不断进行信息交流,从而产生性能更好的解。
     熵描述了一个概率分布的不确定性程度。将熵概念移植到模糊集理论,可以得到模糊熵。模糊熵描述了一个模糊集的模糊性程度。一个模糊集合越模糊,它的模糊熵就越大,反之亦然。模糊熵描述的是概率分布的不确定程度,一个集合的不确定性越大,它的分布越均匀,信息量也就越大。在本文中,模糊熵作为信息素应用于蚁群优化算法中,能够帮助蚁群快速准确的找到重要特征子集。首先将整个数据集作为蚁群优化算法的输入,再随机选择一组特征作为初始特征子集,然后让蚁群按照规则运动,改变各个特征的信息素,最后获得目标特征子集。
Feature selection process plays an important role in pattern classification, whether or not the correct choice of features directly related to the classification accuracy rate, so feature selection method is a direct impact on system performance and quality. However, the current existence of the majority of feature selection methods are easily trapped into local optimization problems, in order to be able to quickly and accurately find the smallest feature subset in order to better achieve the purpose of classification, feature selection methods is explored and tried bold in this paper.
     Ant colony optimization algorithm, as a swarm intelligence algorithm, with a swarm intelligence algorithm can resolve most of the advantages of global optimization algorithms. Ant colony optimization algorithm is a combinatorial optimization problem solving the new common heuristic method, ant colony optimization algorithm with positive feedback, distributed computing and self-organizational characteristics; it is a greedy heuristic search method. In practical applications, we found that ant colony optimization algorithm for the shortcomings of a longer search time, so choose the correct and effective pheromone and constraints become important. Ant colony optimization algorithm can be divided into two basic stages: stage adaptation stage and collaboration. In the adaptation phase, all candidate solutions according to the information accumulated by constantly adjusting its structure. In the collaboration phase, each candidate solution in the continuous exchange of information, resulting in better performance of the solution.
     Entropy of a probability distribution describing the degree of uncertainty. Transplanted to the entropy concept of fuzzy sets theory, can be fuzzy entropy. Fuzzy entropy describes a degree of ambiguity of fuzzy sets. A fuzzy set of the vaguer, and its fuzzy entropy, the greater and vice versa. Fuzzy entropy describes the probability distribution of the degree of uncertainty, the greater the uncertainty of a collection, its distribution more uniform, the greater the amount of information. In this article, fuzzy entropy as a pheromone used in ant colony optimization algorithm, the ant colony quickly and accurately to help find important feature subset. First, the entire data set as an ant colony optimization algorithm for the input, and then randomly select a set of features as the initial feature subset, and then let the ant colony movement in accordance with the rules to change the various characteristics of pheromones, finally won the goal of feature subset.

引文

[1]杨打生.特征选择的信息论算法研究[D]:[硕士学位论文].南京:东南大学,2005.
    [2] Liu H, Motoda H. Feature Selection for Knowledge Discovery and Data Mining[J]. Boston: Kluwer Academic Publishers,1998.
    [3] M Dash and H Liu, Feature Selection for Classification[J]. Intelligent Data Analysis, Elsevier, Vol. I, No.3, 1997.
    [4]苏映雪.特征选择算法研究[D]:[硕士学位论文].四川:国防科学技术大学, 2006.
    [5]陈彬,张学工.模式识别[M].北京:清华大学出版社,2000.
    [6]姜百宁.机器学习中的特征选择算法研究[D]:[硕士学位论文].青岛:中国海洋大学,2009.
    [7] Hall M A. Correlation-based Feature Selection for Machine Learning [thesis for the degree of Doctor of Philosophy] [J]. Hamilton, NewZealand: Department of Computer Science, the University of Waikato, 1999.
    [8] Dash M, Choi K, Scheuermann P, Liu H. Feature Selection for Clustering–a Filter Solution [J]. In: Proc. of the Second International Conference on Data Mining, 2002. 115–122.
    [9] M Ben- Bassat.Pattern Recognition and Reduction of Dimen—sionality[A].P R Krishnaiah,L N Kanal,ed.Handbook ofStatistics[M].1982.773—791.
    [10] Marco Dorigo,Christian Blum。Ant colony optimization theory: A survey[J].Theoretical Computer Science 344 (2005) 243– 278
    [11]段海滨.蚁群算法原理及其应用[D]:[硕士学位论文].北京:科学出版社,2005.
    [12] Walter J, Gutjahr. A Graph based Ant System and its convergence [J]: Future Generation Computer System,2000,16: 837-888.
    [13] Dorigo M, Maniezzo V and Colomi A. Ant system: an autocatalytic optimizing process[R].Technical Report91-116, Politecnico di Mlano, 1991.
    [14]陈艳.基于蚁群算法的最优路径选择研究[D]:[硕士学位论文].北京:北京交通大学,2007.
    [15] L A Zadeh. Fuzzy sets[J].Information and Control,1965,8:338-353.
    [16] C.PALANISAMY, S.SELVAN, Efficient subspace clustering for higher dimensional data using fuzzy entropy[J]. journal of Systems Engineering Society. 2009, 18(1).
    [17] D J Brus, J de Gruijter, J W van Groenigen, Chapter 14 Designing Spatial Coverage Samples Using the k-means Clustering Algorithm Developments in Soil Science[J]. 2006(31):183-192.
    [18] Sujith Vijay, Eleven Euclidean distances are enough [J]. Journal of Number Theory,200(128):1655–1661.
    [19]赵军阳,张志利.基于模糊粗糙集信息熵的蚁群特征选择方法[J].计算机应用.2009,29(1)109-111.
    [20] Blake, C L, Merz, C J , UCI Repository of Machine Learning Databases[EB/OL]. University of California at Irvine. http://www.ics.uci.edu/~mlearn/, 1998.
    [21] Guyon I and Elisseeff A.An introduction to variable and feature selection[J].Journal ofMachine Learning Research,2003,3:1157–1182.
    [22] P. M. Narendra and K. Fukunaga, A branch and bound algorithm for feature selection. IEEE Transactions on Computers, C-26(9): 917-922, September 1977.
    [23] K. Kira and L. A. Rendell, The feature selection problem: Traditional methods and a new algorithm. Proceedings of Ninth National Conference on Artificial Intelligence, 129-134, 1992.
    [24] D. Koller and M. Sahami, Toward optimal feature selection. Proceedings of International Conference on Machine Learning, 1996.
    [25] A. L. Blum, Learning Boolean Functions in an Infinite Attribute Space. Machine Learning, 1992, 9(4): 373～356.
    [26] T M. Cover, The best two independent measurements are not the two best. IEEE Trans. Syst Man Cybern, 1974, 4(2): 116～117.
    [27] G. John, R. Kohavi, K. Pfleger, Irrelevant features and the subset selection problem. The Eleventh International Conference on Machine Learning, 1994. 121～129
    [28]范九伦.模糊熵理论[M].西北大学出版社,1999年6月.
    [29] DORIGO M.Optimization,learning and natural algorithms[C].Ph.D.Thesis, Department of Electronics,Politecnico diMilano,Italy,1992:41-52.
    [30] DORIGO M,MANIEZZO V,COLORIN A.Ant system:optimization by a colony of cooperating agents[J].IEEE Transaction on Systems,Man,and Cybernetics-Part B,1996,26(1):29-41.
    [31]张纪会,徐心和.一种新的进化算法-蚁群算法[J].系统工程理论实践,1999: 19(3):84-87.
    [32]邵宏建,陈峻,徐晓华等.改进的增强型蚁群算法[J].计算机工程,2005, 31(2):176-178.
    [33]吴庆洪,张纪会,徐心和.具有变异特征的蚁群算法[J].计算机研究与发展,1999,36(10):1240-1245.
    [34]倪庆剑,邢汉承,张志政,王纂荞.蚁群算法及其应用研究进展.计算机应用与软件,2008, 25(8):12一16
    [35]熊伟清,魏平.二进制蚁群进化算法.自动化学报,2007,33(3):259一264.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700