改进的遗传机器学习系统及其应用

英文题名：Improved Genetic Algorithm Based on Machine Learning and Application
作者：刘明姬
论文级别：硕士
学科专业名称：运筹学与控制论
中文关键词：遗传算法 ; 机器学习 ; 模式识别 ; 多目标优化
学位年度：2004
导师：吕显瑞
学科代码：070105
学位授予单位：吉林大学
论文提交日期：2004-05-01

摘要

遗传算法是一种借鉴生物界自然选择和进化机制发展起来的高度并行的随机自适应搜索算法，是由美国的Holland教授首次提出的。近年来众多研究者不断的对其进行改进和发展，并将其广泛应用于那些难以用传统方法进行求解的复杂问题，如组合优化、模式识别、图像处理、数值优化等。
     遗传算法采用简单的编码来表示各种不同问题的复杂结构，对解群体的选择、交叉、变异等遗传操作不依赖于所解的问题，而是简单的按照优胜劣汰的自然选择规律确定搜索方向，是一种有向的随机搜索。从而特别适用于大规模并行处理，具有不受搜索空间条件(如可微、单峰、连续等)的约束及不需要其它辅助信息的特点。这些特点使得遗传算法不仅能获得较高的效率，而且具有简单性，易操作性，全局最优性，隐并行性，鲁棒性及通用性。但是它也存在着收敛速度慢，收敛过程中稳定性差，可控性差和早熟收敛等缺陷。
     基于遗传算法的机器学习是将遗传算法与机器学习系统相结合的产物，是当前遗传算法研究的一个重要方面。其中最引人注目的是对分类器系统的研究。竞争的信度分配和以遗传算法为核心的规则发现构成了基于分类器的遗传机器学习系统。1986年Holland等实现了第一个基于遗传算法和桶队列算法反馈机制的分类器系统。
     本文将遗传算法与机器学习基本思想相结合，在分类器学习系统的基础上，对遗传机器学习系统进行了一些重要的局部改进，提出改进的遗传机器学习系统。
     (1) 增强因子的引入。在信度分配中，对获胜分类器进行奖励，保证了

    最优个体的存在性，增强了算法的局部搜索能力，使种群向着最优解不断
    进化.
     (2)排挤因子的引入.在规则与消息系统和遗传算法过程中均引入了排
    挤因子.每次机器学习后用最优环境消息替换规则集中最差个体;每次遗
    传算法后，用交叉操作产生的较优子代替换原种群中与其最相似的最差个
    体.
     排挤因子的引入解决了选择压力与种群多样性的矛盾，不但保证了最
    优个体的存在性，还没有破坏种群的多样性.
     (3)合并因子的引入.每次遗传机器学习后对相似分类器进行合并，最
    终权值取所有相似分类器的平均值.这样防止超级个体的产生，避免了搜
    索带逐渐变窄而产生的过早收敛，并维持了原来的算法搜索空间.
     (4)改进系统中对于信度分配的具体计算:
     假定一个分类器c在t时刻的权值为S(c，约，投标系数为几记，
    有效投标中随机噪声为N(a。、)，投标税系数为几idta二，存活税系数为q价。二，
    进行投标
    未进行投标
     1上n︶
    了!l，、esesL
    投标控制参数b’二
    旧优胜者为?n，新胜者为m+1，
    对优胜者的奖励为侧，收入为州约，且州约二及《二，t)
     那么我们就能够得到
     分类器C的投标值为
    B乞d(C，t)=e。:以·S(C，亡)
    有效投标值为
    EB:己=B:d+N(a。、。)
    税值为
    Tax=Cl:了。乙a二·S+几:己亡a二·b‘·S

    候选分类器C参加投标一条消息后，它的权值为
    S(C，t+l)=S(C，亡)一B乞d(C，t)一T(C，t)+R(亡)
    有效投标最大者为当前优胜者，其权值为
    S(。+l，亡+1)=S(。+l，亡)一B:d(m+1，亡)一T(m+1，t)+R(艺)+R‘
     定理1.1当分类器的回报趋于稳定时，投标值接近于回报值.
     定理表明在分类器系统中，规则的权值是否处于稳定状态，对遗传算
    法的学习过程很大影响.
     经实践我们发现如此将遗传算法与机器学习相结合是非常有效的.机
    器学习对一些函数关系很明确的数据收敛速度很快，而对于一些函数关系
    不是很确定的例子来说其表现就不是很理想了，机器学习会产出摆动，不
    够精确，甚至陷入局部极小;而此时遗传算法就会表现出其优势，遗传算法
    根据要求建立一个规则重组机制，并且根据这个机制来对规则进行重组，
    产生新的，可能性能更好的规则，并淘汰不好的规则，跳出局部极小的圈
    子，扩大搜索范围，加速向最优解逼近.这样两种保证收敛的算法相结合，
    更加保证了整个算法的收敛性，加速算法收敛速度，是很有效的组合.
     对于本改进的遗传机器学习系统，将遗传算法与机器学习有效的结合
    起来，并辅以改进因子，令二者交替进行，在程序运行的前期，由于要求的
    相似度较低，分类器投标活跃，机器学习占主导地位;而在后期，机器学习
    到了一定程度，遗传算法就相应的占了主导.这样更加保证了算法的稳定
    性，收敛性，全局搜索性，克服了非成熟收敛等弊病.改进算法不要求所
    要解决问题目标函数的连续性，凸性，光滑性等，特别适用于维数高，总体
    大，环境复杂，问题结构不十分清楚的情况.
     最后我仃J将改进的遗传机器学习系统应用于模式识别和多目标优化问
    题，分别针对疾病的诊断模型和投资的收益与风险模型，给出了具体的算
    例.

     (一)改进的遗传机器学习系统在模式识别中的应用.
     改进的遗传机器学习系统具有强大的学习功能，是解决模式识别问题
    的有效工具.用它来解决医学诊断中的数据优化问题一一用最少的诊断数
    据得出较为正确的结论，使医学诊断能够更加科学、经济和便捷.
     这里以乳腺癌病例诊断为例，由病人的表征输入，产生最可能的疾病
    状态，实现自动医学诊断.
     我们依据已确诊病例信息的编?
Genetic Algotithm is a kind of highly parallel adaptive random search method which is advanced by professor. Hollad originally. It is based on the biological nature selection and evolutionism together. In recent years people are continuing to improve and develope it, applying it into such complex problems which couldn't be solved in the traditional methods, e.g. advantageous making-up combinatorial optimization, image processing and numerical value optimization. Now it has become a heated topic.
    Genetic Algorithm uses the simple codes to express different complex structures of questions, the selection, crossover and mutation operators to the population are not dependent on the question, while denning the searhing direction according to the natural selecting rule of survical of the fittest simply. This is a sort of directed random selection. Therefore this way is suitable to large par-ellel dealing. It is unbound of the condition of searching space differentiability, single-peak, continuity and there is no need of other assistant instrument. These features not only make the genetic algorithm become higher efficient and easy to operate, but aslo make it possess global optimality., implicit parallelism, robustness and general. This method also has some defects, e.g. the slow speed of convergence, the badness of stability and operation, premature convergence.
    Genetics Based Machine Learning is one important aspect of current ge-


    netic algorithm research. The most outstanding study is the research of classifier system. The genetic machine learning system based on classifier system is composed of credit assignment algorithm, the rule and message system and genetic algorithm. Holland presented the first classifier system which is centered on the genetic and the bucket brigade algorithm in 1986.
    This article combines the ideas between genetic algorithm and machine learning, gives some important improvements on the classifier system and puts forward the improved genetic algorithm based on machine learning.
    (1) The application of strengthening factor.
    In the credit assignment, the reward to the winning classifier ensures the existence of the best individual and strengthens local searching ability of this algorithm, making the population approach the most excellent solution continuously.
    (2) The application of crowding factor.
    We use the crowding factor both in the rule and message system and the process of the genetic algorithm. We replace the worst classifiers with the best conditional messages after each machine learning; in addition, we take the place of the worst individual in the original group with the most similar better one of the filial generation produced by the crossover operator after each genetic algorithm.
    The introduction of crowding factor solves the contradiction between the selection pressure and the population diversity. It not only ensures the existence of the best individual, but also keeps the population diversity.
    (3) The application of combine factor.
    We combine similiar classifiers after each algorithm perfoment and make the average value of all similiar classifiers as the final strength value. This way prevents the appearance of super individual and avoids premature convergence be-

    cause of the gradual narrowing of searching strap, maintaining the former searching space.
    (4) The calculation of the credit assignment in improved genetic algorithm based on machine learning.
    Let the strengh of classifier C in time t be S(C, t) , the bid coefficient be Cbid, the random noise in the valid bid be N(bid), the bid tax coefficient be Cbidtax, the life tax coefficient be Clifetax, the bid control parameter be
    1, when it bid
    the old winner be m, the new one be m + 1,
    0, when it dosen't bid , the reward to the winner be R', the repay be R(t), and R(t) = Bid(m,t).
    When C becomeing the candidate, the bid value is The valid bid is
    EBid = Bid + N( bid)
    The tax is
    When the candidate classifier C takes part in biding a message, its strengh is
    Strength(C, t + 1) = Strength(C, t) - Bid(C,

引文

[1] Ayten Turkcan and M. Selim Akturk, A problem space genetic algorithm in multiobjective optimization, Journal of Intelligent Manufacturing, 14(2003), 363-378.
    [2] Bagley, J. D., The behavior of adaptive systems which employ genetic and corrdlation algorithms (Doctoral dissertation, University of Michigan), Dissertation Abstracts International, 12(1967).
    [3] Booker, L. B., Goldberg, D. E. and Holland, J. H., Classfier systems and genetic algorithms, Artifical Intelligence, 40(1989).
    [4] Brad L. Miller, Michael J. Shaw, GAs with Dynamic Niche Sharing for Multimodal Function Optimization, IlliGAL Report, 95010(1995).
    [5] 陈小平，实数遗传算法交叉策略的改进，电子学报，1(2003)．
    [6] 陈小平，李云飞，遗传算法中适应度评估的改进，数据采集与处理，1(2003)．
    [7] De Jong, K. A., An analysis of the behavior of a class of genetic adaptive systems (Doctoral dissertation, University of Michigan), Dissertation Abstracts International, 36(1975).
    [8] De Jong, Genetic-algorithms-based Learning, Machine Learning: An Artificial Intelligence Approach, Morgan Kaufmann, 1990, 611-638.
    [9] Fitzpatrick, J. M., Grefenstette, J. J. and Van Gucht, D., Image registration by genetic search, Proceedings of IEEE Southeast Conference, 1984.
    [10] 傅京孙，模式识别应用，北京大学出版社，1990．
    [11] Grefenstette J. J., Multilevel Credit Assignment in A Genetic Learning System, Proceedings of the Second International Conference on Genetic Algorithms, Lawrence Erlbaum Associates,Publishers, 1987, 202-209.
    [12] Goldberg D. E., Genetic Algorithms in Search, Optimization, and Machine Learning, Reading, MA, Addison-Wisely, 1989.
    [13] Hollstien, R. B., Artifical genetic adaptation in computer control systems (Doctoral dissertation, University of Michigan), Dissertation Abstracts International, 32(1971).


    [14] Holland, J. H., Adaptation in Natural and Artificial Systems, The University of Michigan Press, Ann Arbor, 1975.
    [15] Holland, J. H., Properties of the Bucket Brigade,1985,1-7.
    [16] Holland, J. H., Genetic algorithms, Scientific American, 4(1992).
    [17] 韩斌，王士同，个体自适应变异遗传算法，计算机工程与应用，11(2002)．
    [18] 赫孝良，戴永红，周义仓，数学建模竞赛赛题简析与论文点评，西安交通大学出版社，2002．
    [19] Jeffrey Horn and Nicholas Nafpliotis, Optimization using the Niched Pareto GA, IlliGAL Report, 93005(1993).
    [20] 季文埘，周傲英，张亮，金文，一种基于遗传算法的优化分类器的方法，软件学报，13(2002)，245-249．
    [21] 金聪，启发式遗传算法及其应用，数值计算与计算机应用，1(2003)．
    [22] Kuhn, H. W., Tucker, A. W., Nonlinear Programming, Proccedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, 1951.
    [23] 林锉云，董加礼，多目标优化的方法与理论，吉林教育出版社，1992．
    [24] 李莹甄，王海涛，龙海英，王琼，基于遗传算法的地震短期综合预报分类系统研究，西北地震学报，4(2002)，295-302．
    [25] 李南，祝明光，用遗传算法优化双目标Job-shop作业计划问题，工业工程，5(2002)，55-64．
    [26] 李陶深，人工智能，重庆大学出版社，2002．
    [27] 李鹏，董聪，基于实数编码的广义遗传算法及其再优化问题中的应用，控制与决策，17(2002)，487-490．
    [28] 刘洪杰，王秀峰，王治宝，改进的多模态遗传算法及其在投资组合中的应用，控制与决策，18(2003)，173-176．
    [29] 刘宝碇，赵瑞清，王纲，不确定规划及应用，清华大学出版社，2003．
    [30] 全国大学生数学建模竞赛组委会，大学数学建模的理论与实践-2001中国大学生数学建模夏令营，湖南教育出版社，2004．
    [31] Robin Allenson, GA with Gender for Multi-Function Optimization, EPCCSS, 1992.


    [32]孙艳丰，郑加齐，王德兴，武华，基于遗传算法的约束优化方法评述，北方交通大学学报，24(2000)，14-19．
    [33]孙波，赵怡，投资的收益和风险的数学模型及算法研究，数学的实践与认识，31(2001)，385-390．
    [34]Whitley, D., The GENITOR Algorithm and Selection Pressure:Why Rank-Based Allocation Reproduction Trials is Best, 1989.
    [35]王遵亮，吴新根，罗立民，基于遗传算法的肝病诊断学习系统，东南大学学报，29(1999)，106-109．
    [36]王小平，曹立明，遗传算法-理论、应用与软件实现，西安交通大学出版社，2002．
    [37]王正志，薄涛，进化计算，国防科技大学出版社，2000．
    [38]徐宗本，张讲社，郑亚林，计算智能中的仿生学：理论与算法，科学出版社，2003．
    [39]叶正华，谢勇，郑金华，一种改进的基于实数编码的遗传算法，湘潭大学自然科学学报，24(2002)，33-36．
    [40]张文修，梁怡，遗传算法的数学基础，西安交通大学出版社，2000．
    [41]周明，孙树栋，遗传算法原理及应用，国防工业出版社，2002．
    [42]朱福喜，汤怡群，傅建明，人工智能原理，武汉大学出版社，2002．
    [43]庄健，王孙安，自调节遗传算法的研究，系统仿真学报，15(2003)，281-282．

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700