基于遗传算法的粗糙集属性约简研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
粗糙集理论是继概率论、模糊集理论、证据理论之后的又一个处理含糊性和不确定性的数学工具。属性约简算法是粗糙集理论的核心内容。粗糙集属性约简的研究在知识获取、机器学习、模式识别、决策分析、模型建立等实际应用中有重要的意义;但是,由于属性约简被证明是一个NP-hard问题,因此,研究更为有效的属性约简算法,有效地获取较优的属性约简,降低算法的时间复杂度,寻求快速的约简算法仍是粗糙集理论的主要研究课题之一。本文对基于遗传算法的粗糙集属性约简算法进行了研究。
     本文首先介绍了粗糙集理论的基本概念和遗传算法的相关知识。对粗糙集理论中基于区分矩阵、属性重要度、属性依赖度的属性约简算法以及启发式遗传约简算法进行了系统综述,并且对各种算法进行了比较分析。
     在对粗糙集理论和遗传算法的研究基础上,通过分析比较现有的遗传约简算法,吸收算法的优点,并且加以改进,提出了一种基于属性依赖度的遗传约简算法的改进算法。本算法的主要特点在于:一是在适应度函数中引入了决策属性对条件属性的依赖度,使算法在加强局部搜索能力的同时保持了该算法全局寻优的特性,也保证了所求约简既含较少的属性又保证分类质量,能够获得最佳的搜索效果。二是对传统遗传算法中随机产生的二进制初始种群加以改进,用属性核加以限制,以增强遗传算法的局部搜索能力,缩短算法的计算时间,并提高决策表属性约简结果的准确性。三是在遗传算法的遗传算子的基础上,新算法进行了局部优化策略,增加了修正校验算子,采用了基于属性依赖度的重要度修正策略,使得算法在局部空间能够得到一个较优解,保证遗传算法的全局搜索在有效的可行解空间进行。该算法通过数据实验分析,证明是求解知识约简问题的快速有效方法。
Rough set theory has been proved to be an excellent mathematical tool dealing with uncertain and vague description of objects after vague theory and evidence theory. Finding the minimal reduction is one of the most important works in the research of rough set theory, as an important part of soft computing, attribute reduction plays applications have played an important role, especially in the areas of knowledge acquisition, machine leaning, pattern recognition, decision analysis and modeling etc. However, it has been proved that finding the minimal reductions is a NP-hard problem. The thesis studies on the rough set attribute reduction algorithm based on genetic algorithm.
     Firstly, the thesis reviews the theories and methods of rough set and genetic algorithm systematically, and analyzes the algorithms of attribute reduction based on discernibility matrix, attribute significance, dependability and GA.
     Then, based on the characteristic of rough set theory, genetic algorithm as well as decision table attribute reduction, through analysis of the existing primary algorithm of attribute reduction based on the traditional genetic algorithm, an improved algorithm of rough set attribute reduction based on dependability and genetic algorithm is presented in this thesis. There are three characters of the algorithm. The first one is that define the fitness function in the algorithm by the dependability of attribute, in order to decrease the time and space complexity respectively. The second one is that uses attribute core of decision table as a restriction to improve the binary code initial population which is produced stochastically in the primary genetic algorithm. This improvement can shorten the calculation time of the algorithm and raise the accuracy of the results of the attribute reduction. The third one is that adds a correction operator. In this operator, the algorithm uses attribute significance as heuristic information which can guide the algorithm carrying out in the space of feasible results, and the attributes which can make more important influence are prevented to be lost. Last, an experiment is shown to prove that the improved algorithm is more excellence than some other attribute reduction algorithm based on genetic algorithm, and it is more accurate and efficient to solve the problem of attribute reduction in decision table.
引文
[1] Wong S. K. M, Ziarko W. On optional decision rules in decision tables[J]. Bulletin of Polish Academy of Sciences, 1985, 33(11/12): 693-696.
    [2] Pawlak Z. Rough set[J]. International Journal of Computer and Information Sciences, 1982, 11: 341-356.
    [3] Pawlak Z. Rough set-theoretical aspects of reasoning about data[M]. Dordrecht: Kluwer Academic Publishers, 1991.
    [4] Slowinski R. Intelligent decision support-handbook of applications and advances of the rough sets theory[C]. Dordrecht: Kluwer Academic Publishers, 1992.
    [5] Walczak W, Massart D. L. Tutorial, Rough sets theory[J]. Chemometrics and Intelligent Laboratory Systems, 1999, 47: 1-16.
    [6] Komorowski J, Pawlak. Z, Polkowski L, Skowron A. Rough sets: A tutorial[Z]. Singapore: Springer-Verlag. 1999.
    [7] 刘清.Rough集及Rough推理[M].北京:科学出版社,2001.
    [8] 王国胤.Rough集理论与知识获取[M].西安:西安交通大学出版社,2001.
    [9] 张文修,吴伟志,梁吉业等.粗糙集理论与方法[M].北京:科学出版社,2001.
    [10] Skowron A, Stepaiuk J. Decision rules based on discernibility matrices and decision matrices[Z]. Lin T Y(Ed.). Conference Proceeding of the Third International Workshop on Rough Sets and Soft Computing(RSSC'94). San Jose. California USA, 1994: 602-609.
    [11] Hu X. Knowledge discovery in databases: An attribute-oriented rough set approach[D]. University of Regina, Canada, 1995.
    [12] 周勇,杨兴江,徐扬.属性约简的依赖度算法研究[J].计算机工程与应用,2004,4:78-79.
    [13] 苗夺谦,胡桂荣.知识约简的一种启发式算法[J].计算机研究与发展,1999,36(6):681-684.
    [14] 刘少辉,盛秋戬,吴斌等.Rough集高效算法的研究[J].计算机学报,2003,2(5):524-529.
    [15] 闫德勤,王杨.基于关联矩阵的属性约简算法[J].计算机工程与应用,2005,20:181-182.
    [16] Wroblewski J. Finding minimal reducts using genetic algorithms[R]. Warsaw University of Technology: ICS Research Report, 16/95, 1995.
    [17] 何明,冯博琴,马兆丰等.一种改进的Rough集属性约简启发式遗传算法[J]. 西安石油大学学报(自然科学版),2004,19(3):80-86.
    [18] Dai J. H, Li Y. X. Heuristic genetic algorithm for minimal reduct in decision system based rough set theory[C]. Proceedings of the First International Conference on Machine Learning and Cybernetics[A]. Beijing, 2002: 833-836.
    [19] Z.米凯利维茨.演化程序——遗传算法和数据编码的组合[M].北京:科学出版社,2002.
    [20] 雷英杰,张善文,李续武等.MATLAB遗传算法工具箱及应用[M].西安:西安电子科技大学出版社,2005.
    [21] 张文修,梁怡.遗传算法的数学基础[M].西安:西安交通大学出版社,2000.
    [22] 王小平,曹立明.遗传算法——理论、应用与软件实现[M].西安:西安交通大学出版社,2003.
    [23] Vasconcelos J. A, Ramirez J. A, Takahashi R. H. C. Improvements in genetic algorithms[J]. IEEE Transactions on Magnetics, 2001, 37(5): 3414-3417.
    [24] Skowron A, Rauszer C. The discernibility matrices and functions in information systems[A]. In: Slowinski R. Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory[C]. Dordrecht: Kluwer Academic Publishers, 1992: 331-362.
    [25] 常犁云,王国胤,吴渝.一种基于Rough Set理论的属性约简及规则提取方法[J].软件学报,1999,10(10):415-417.
    [26] 胡可云.基于概念格和粗糙集的数据挖掘方法研究[D].北京:清华大学,2001.
    [27] 何国建,陶宏才.一种基于粗集理论的属性约简改进算法[J].计算机应用,2004,24(11):75-76,80.
    [28] 苗夺谦,王珏.粗糙集理论中概念与运算的信息表示[J].软件学报,1999,10(2):113-116.
    [29] 张小峰,李明,赵永升.粗糙集中的知识熵[J].计算机工程与设计,2005(26):2930-2934.
    [30] 王国胤,于洪,杨大春.基于条件信息熵的决策表约简[J].计算机学报,2002(7):759-766.
    [31] 贾要勤,徐光华.粗糙集属性约简的基因算法[J].中国机械工程,2000,11(7):796-800.
    [32] Bazan J G, Skowron A, Synak P. Dynamic reducts as a tool for extracting laws from decisions tables[A]. In: Ras Z. W. Methodologies for Intelligent Systems[C]. Berlin: Springer Verlag, 1994: 346-355.
    [33] 李雄飞,谢忠时等.基于粗集理论的约简算法[J].吉林大学学报,2003,33(1):82-87.
    [34] 赵卫东,戴伟辉.基于特征矩阵的决策表约简研究[J].系统工程理论与实践, 2003,3(3):65-79.
    [35] 陶志,许宝栋,汪定伟等.基于遗传算法的粗糙集知识约简方法[J].系统工程,2003,21(4):116-122.
    [36] 熊赟晖,肖人岳.用遗传算法求解粗糙集约简的改进方法[J].计算机科学,2002,29(9):123-125.
    [37] 任永功,王杨,闫德勤.基于遗传算法的粗糙集属性约简算法[J].小型微型计算机系统,2006,27(5):862-865.
    [38] Nguyen S. H. Some efficient algorithms for rough set methods[A]. In: Proc. of the Conf. of Information Processing and Management of Uncertainty in Knowledge Based Systems(IPMU'96)[C], Granada, Spain, 1996: 1451-1456.
    [39] Ziarko W. Variable precision rough set[J]. Computer and System Sciences, 1993, 46(1): 39-59.
    [40] http://www.ics.uci.edu/~mlearn/MLRepository.html.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700