基于粗糙集的属性约简算法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
粗糙集理论是20世纪70年代发展起来的一种新的处理含糊性和不确定性问题的数学工具,它是智能信息处理的一种重要方法,基于不可区分性思想和知识简化方法。目前,获得高效、快捷的属性约简算法是该理论研究的主要课题之一,高效的约简算法在信息系统分析与数据挖掘等领域具有重要的应用意义。
     本文在研究粗糙集理论的基础上发现由于信息系统中的决策规则存在相容的和不相容的,导致决策表的相容性和不可相容性。而大多数决策表属性约简算法在进行属性约简之前需要判断决策表的相容性。针对决策表的可相容性和不可相容性,分别研究了相应代表的属性约简算法。相容决策表约简算法包括基于依赖度和重要度的属性约简算法和基于广义信息表的属性约简算法;在不相容约简算法中,介绍了基于包含度的属性约简算法。通过分析表明以上算法只能对相容和不相容决策表分别约简,浪费了时间,不具一般性。为了解决这一问题,本文提出了基于决策熵的决策表属性约简算法,通过实验分析,其实验结果与前三个算法结果一致。表明基于决策熵的属性约简算法对决策表进行属性约简之前,不需要预先判断决策表是相容的或不相容的,并且解决了经典粗糙集理论中属性约简的两种定义对不相容决策表约简时结果不一致的问题,提高了效率,具有广泛性,达到了预期效果。
Rough Set theory is a new mathematical tool which can tackle ambiquity and uncertainty developed from the 1970s. It is an important method of intellective information transaction, which based of non-distinguish and technoledge reduction. At present, it's one of the important problems of calculating high effictive, shortcut attribute reduction algorithms, and the high effictive reduction algorithms have important application significance in information system analysis and data mining.
     This paper is based on rough set theory, due to there is consistence and inconsistence in decision rule of information system, which results in consistence and inconsistence of decision table. However most of attribute reduction algorithms of decision table need to judge whether decision table is consistent or inconsistent before progressing attribute reduction. Aiming at consistent or inconsistent of decision table, researches representative attribute reduction algorithms respectively. Consistent decision table includes an algorithm of attribute reduction based on dependence and importance and an algorithm of attribute reduction based on generalized information table; while an algorithm of attribute reduction based on inclusion degree in the inconsistent of decision table. It learned that above attribute reduction algorithms may only reduce consistent or inconsistent decision table respectively, wasting of most time, and they had not universality. In order to takle that, a proposed algorithm of attribute reduction for decision table based on entropy is pointed. The experiment result and above ones are accordant through analyzing. The new algorithm need not judge whether decision table is consistent or inconsistent before progressing attribute reduction for decision tables. When these definitions are used to reduce inconsistent decision table, reduction results could be inconsistent, it solves this problem and advances efficiency. It has universality and reaches anticipative effect.
引文
[1] Jiawei Han, Micheline Kamber. Data Mining: Concepts and Techniques. Burnaby, British Columbia, 2001(4): 1-10P
    [2] Pangning Tan, Michael Steinbach, Vipin Kumar. Introduction to Data Mining. Addison Wesley/Pearson, 2006(1): 5-509P
    [3] Binxiang Zheng, Yugeng Xi, Xiuhua Du. Research on similarity mining in time series data sets. Control and Decision. 2002(5): 259-270P
    [4] Yili Pan, Junxian Pan. A Deviation Analysis Method for IDS Information Source. Journal of Shenzhen University(Science & Engineering). 2002(2): 565-572P
    [5] Yufei Tao, Reynold Cheng, Xiaokui Xiao, Wang Kay Ngai, Ben Kao, Sunil Prabhakar: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions. VLDB, Trondheim, Norway. 2005(2): 922-933P
    [6] 数据挖掘讨论组.数据挖掘资料汇编.2000年12月
    [7] 庞倩超,王晏民.基于粗糙集的数据挖掘.北京建筑工程学院学报.2005(4):224-226页
    [8] Nanda S et al. Fuzzy rough sets. Fuzzy sets and systems. 1992, 45(2): 157-160P
    [9] Pawlak Z. Rough sets. International Journal of Computer and Information Science. 1982, 11(5): 341-356P
    [10] Shin H, Moon B, Lee S. Adaptive multi-stage distance join processing. Proceedings of ACM SIGMOD, Dallas, TX, 2005: 343-354P
    [11] Yamei Xu. Rough Sets Theory and Its Applications in Data Mining. Modern Computer. 2006(3): 167-169P
    [12] Hans-Peter Kriegel, Peter Kunath, Martin Pfeifle, Matthias Renz: Spatial Join for High-Resolution Objects. SSDBM. 2004: 151-160P
    [13] Manli Zhu, Dimitris Papadias, Jun Zhang, Dik Lun Lee: Top-k Spatial Joins. IEEE Trans. Knowl. Data Eng. 2005, 17(4): 567-579P
    [14] Y. Theodoridis, E. Stefanakis, T. Sellis, Cost Models for Join Queries in Spatial Databases, In: Proceedings of the 14th IEEE Conference on Data Engineering(ICDE). Orlando, Florida. 2005: 476-483P
    [15] Ruyan Xu, Gang Li, Huimin Zhang. Attribute-oriented generalization and reduction for KD in relational databases. 2000, 21(12): 60-63P
    [16] Jinfeng Ni, Chinya V, Ravishankar, Bir Bhanu et al. Probabilistic Spatial Database Operations. SSTD, Santorini Island, Greece. 2005, 23(5): 1140-1158P
    [17] 张文修,吴伟志,梁吉业等.粗糙集理论与方法.北京:科学出版社,2003:245-268页
    [18] 蒋良效,蔡之华,刘钊.一种基于粗糙集的决策规则挖掘算法.微型机与应用.2004,23(3):7-9页
    [19] 张文修,吴伟志.粗糙集理论介绍和研究综述.模糊系统与数学.2000,15(4):1-12页
    [20] 刘清.Rough集及Rough推理.北京:科学出版社,2001:324-326页
    [21] 张文修,仇国芳.粗糙集属性约简的一般理论.中国科学E辑.2005(12):312-315页
    [22] 徐凤生.属性约简中一种新的求核算法.计算机工程与科学.2006,28(11):532-535页
    [23] 冷永刚.粗糙集理论约简算法的研究.电子科技大学硕士学位论文.2004(3):14-27页
    [24] 陈欢.基于粗糙集理论的值约简及规则提取.福州大学学报.2004,4(8):457-459页
    [25] Jue Wang, Sanyang Liu. Extended Model of Rough Set Theory. Journal of TongJi University(Natural Science). 2006, 34(9): 386-389P
    [26] 张文宇,薛惠峰.基于RS理论的相容及不相容决策表简化对比研究.西安建筑科技大学学报(自然科学版).2002,34(6):123-124页
    [27] DAI JH, LI YX. Heuristic Genetic Algorithm For Minimal Reduct In Decision System Based Rough Set Theory. Proceedings of First International Conference on Machine Learning and Cybemetics. 2002, 4-5(11): 833-836P
    [28] 周勇,杨兴江,徐扬.属性约简的依赖度算法研究.计算机工程与应用.2004,40(4):78-79页
    [29] 刘发升,周学毛.一种基于粗糙集带支持信息的挖掘算法.计算机技术与自动化.2003,22(4):37-40页
    [30] 武森,高学东等.数据仓库与数据挖掘.北京:冶金工业出版社,2003:282-283页
    [31] 何国建,陶宏才.一种基于粗集理论的属性约简改进算法.计算机应用.2004(11):75-77页
    [32] 陈自洁.一种改进的粗集综合评价方法.海南师范学院学报(自然科学版).2005,18(12):325-326页
    [33] 刘文军,王加银,冯艳宾等.一种求粗糙集中最小属性约简的新算法.北京师范大学学报(自然科学版).2004,40(20):9-10页
    [34] 张文修,梁怡,吴伟志.信息系统与知识发现.北京:科学出版社,2003:22-95页
    [35] 孙士宝,秦克云.基于包含度的决策表属性约简算法的研究.计算机工程与应用.2006(3):20-21页
    [36] 苗夺谦,王珏.粗糙集理论中信息熵和粗糙知识之间的关系.模式识别和人工智能.1998,11(1):30-40页
    [37] 王国胤,于宏,杨大春.基于条件信息熵的决策表约简.计算机学报.2002,25(7):759-766页
    [38] 蒋思宇,卢炎生.两种新的决策表属性约简概念.小型微型计算机系统.2006,27(3):512-515页
    [39] 王国胤.决策表核属性的计算方法.计算机学报.2003,26(50):612-615页
    [40] Hu X H, Cercone N. Learing in relational databases: A rough set approach. Computational Intelligence. 1995, 11(2): 323-337P
    [41] Ye D Y, Chen Z J. A new discernibility matrix and the computation of a core. Acta Electronica Sinica. 2002, 30(7): 1086-1088P
    [42] Wong S K M, Ziarko W. On optional decision rules in decision tables. Bulletin of Polish Academy of Science, 1985, 33: 693-696P

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700