用户名: 密码: 验证码:
基于粗糙集的相对属性约简算法及决策方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
粗糙集理论和方法是一种能有效的分析和处理不一致、不精确、不完备等各种信息的数据分析工具。该理论和方法已经在模式识别、机器学习、决策支持、知识发现、预测建模等领域得到成功的应用。相对属性约简算法和决策方法是粗糙集理论和应用的关键技术之一,也是知识发现和决策的重要研究课题,并已成为一个备受关注的研究热点。
     围绕粗糙集相对属性约简和决策方法中的相对属性约简、决策规则获取、基于粗糙集的决策方法以及其原型系统等四个重要问题,从六个方面开展研究工作。它们分别是基于贪心策略的相对属性约简、基于核的自顶向下剪枝的相对属性约简、基于贪心策略的分类规则获取、基于粗糙集的支持向量决策方法、基于粗糙集的最优支持向量决策方法和基于粗糙集的支持向量集成决策模型。
     针对寻求单个相对属性约简的问题,基于贪心策略的相对属性约简算法以条件属性的分类能力作为启发信息,是解决相对属性约简的一种有效算法,该算法的实现比较直观。当决策信息系统中包含有大量对象时,该算法有效节约了存储空间,适合大规模数据集上的计算,而且在该相对属性约简方法中,仅仅考虑了属性分类能力大小,倾向于选择分类能力强的条件属性加入到相对属性约简中,根据分类目标,这种倾向是合理的。
     当一个决策信息系统包含相当多的属性和大量的纪录时,如何从决策信息系统中获取包含最少条件属性的相对属性约简和获取所有相对属性约简的集合是一个值得研究的课题。基于核的自顶向下剪枝的相对属性约简算法是解决该问题的一种可行算法,实验结果表明基于核的自顶向下剪枝的相对属性约简算法的可行性和有效性。
     在决策规则的获取方面,根据决策规则的不同度量,从不同的角度获取决策规则,可获得基于贪心策略的分类一致性规则获取算法和不一致性分类规则获取算法。这些算法根据属性的决策能力的大小作为启发式知识来指导这一属性值约简过程的进行,不但获取的规则通常较短,而且有较强的分类预测性能,既提高了运行速度,又节约了存储空间。
     随着决策信息系统的数据量的增加,粗糙理论分类的容错能力与泛化能力较弱等缺点也突现出来,因此,如何提高决策信息系统的容错能力与泛化能力是一个值得研究的课题。从不同的角度,可获得基于粗糙集和支持向量优点的三种决策方法。通过对比实验,结果表明相关算法有较高的容错能力与泛化能力。
     基于前面的研究结果以及有关技术,设计并实现了一个基于粗糙集的相对属性约简算法和决策方法的原型系统。与同类系统相比,该系统在设计实现上具有一定的独到之处,具有较高质量的知识发现和决策结果,并具有较好的容错能力与泛化能力,还具有较强的鲁棒性。
Rough set is an excellent data analysis tool to process inconsistent, inaccurate, incomplete information and so on. Its theories and methods have been applied to many areas successfully including pattern recognition, machine learning, decision support, knowledge discovery, and forecast modeling and so on. The relative-attribute reduction algorithm and decision-making method are key technologies of rough set theory and application and becoming a research focus of concern.
     Surrounding the four key problems of relative-attribute reduction and decision-making method, i.e. relative-attribute reduction, rules of acquisition, decision method based on rough set and its prototype system, the following six works have been done: greedy relative-attribute reduction algorithm, top-down pruning relative-attribute reduction algorithm based on core, greedy decision rules of acquisition, support vector decision-making methods based on rough set, optimal support vector decision-making methods based on rough set, and decision-making model of support vector ensembles based on rough set.
     For the problem of questing single relative-attribute reduct, the greedy relative-attribute reduction algorithm, which takes classification ability of condition attributes for heuristic information, can be an effective algorithm to achieve more intuitive to solve the above. When the decision-making information system contains a large number of objects, the algorithm can save a lot of storage space and is suitable for large-scale data sets on the calculation. It tends to choose the condition attributes of higher classification ability added to the relative attribute reduct. According reduction goal, this tendency is reasonable.
     When a decision-making information system consists of a considerable number of attributes and a large number of records, how to get the relative-attribute reduct including least condition attributes or the whole set of all relative-attribute reducts from the decision-making information systems, is a worthy subject of research. Top-down pruning relative-attribute reduction algorithm based on core is a viable solution to the problem. Experimental results show that top-down pruning relative-attribute reduction algorithm based on core is feasible and effective.
     According to different measure of decision-making rules from different aspects, the greedy methods obtaining decision-making rules of consistency or inconsistency are proposed. The methods, which take classification ability of condition attributes for heuristic knowledge to guide the attribute value reduction process, not only get the shorter rules of the strong performance in the classification of forecasts, but also improve the speed and save the storage space.
     By following the increasing of the amount of data, the weakness of fault tolerance and generalization capability is to the fore in rough set theory. Therefore, it is a worthy subject of study how to improve fault-tolerant ability and generalization ability of the decision-making. Three kinds of support vector decision-making methods based on rough sets are proposed from different aspects. By comparing the other methods, experimental results show that our proposed algorithms have higher fault-tolerant ability and generalization ability.
     Based on the above research findings, as well as the related technology, designed and implemented a prototype system based on relative-attribute reduction algorithms and decision-making methods. Compared with similar systems, this prototype system is designed to achieve that has certain uniqueness, with higher quality results of knowledge discovery and decision-making, and has good fault-tolerant ability and generalization ability, but also has strong robustness.
引文
[1]Shafer G A Mathematical Theory of Evidence. Princeton: Princeton University Press,1976,133~185
    [2]梅长林,周家良.实用统计分析方法.北京:科学出版社,2002
    [3]Radzikowska A N, Kerre E E. A comparative study of fuzzy rough sets. Fuzzy Sets and Systems,2002,126:137~155
    [4]Green J, Home N, Orlowska E, Siemens P. A Rough Set of Information Retrieval. Fundamental Informatcae,1996,28:273~296
    [5]刘清.Rough集及Rough推理.北京:科学出版社,2001
    [6]张文修等.粗糙集理论与方法.北京:科学出版社,2001
    [7]Wang J, Wang J. Reduction algorithm based on discernibility matrix the ordered attributes method. Journal of Computer Science and Technology,2001,16(6): 489~504
    [8]Zhang T f,Xiao J,Wang X h. Algorithms of Attribute Relative Reduction in Rough Set Theory. Acta Electronica Sinica,2005,11:2080~2083
    [9]Pawlak Z. Rough Set. International Journal of Computer and Information Sciences, 1982,11:341-356
    [10]Ziarko W. Introduction to the special issue on rough sets and knowledge discovery. International Journal of Computational Intelligence,1995,11(2):223~226
    [11]Komorowski J, Pawlak Z, Polkowski L et al. Rough sets:A tutorial .in:S.K.pal and A.Skwron(eds.), Rough fuzzy hybridization:A new trend in decision-making, Springer-Verlag, Singapore,1999:3~98
    [12]Pawlak Z. Rough Sets—Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht,1991
    [13]Yao H X.Introduction to and Survey for the Studies of Fuzzy Rough Sets Theory, Journal of Chongqing Institute of Technology,2006,08:132~135
    [14]Wang Y,Mo Z W,Tang X,Tang L.Rough Set Models of Multivalued Information System and Reduction Based on Set Pair Analysis.Journal of Sichuan Normal University,2007,05:318~320
    [15]王志海.粗糙集合扩展模型研究.合肥:合肥工业大学,1998,4
    [16]Skowron A and Stepaniuk J. Tolerance approximation spaces. Fundamental Informaticae,1996,27(2-3):245~253
    [17]Jarvinen J. Approximations and rough sets based on tolerance. In:Proceedings of 2nd international Conference, RSCTC2000, LNAI2005, Banff, Canada, 2001.182~189
    [18]Yao Y Y and Lin T Y. Generalization of rough sets using modal logic, Intelligent Automation and Soft Computing. An International Journal,1996,2(2):103~120
    [19]Yang Y, LI L. Matrix redefinition of rough sets on double universes. Computer Engineering and Applications.2007,43(25):1~2
    [20]Guan X,Yi X,Sun Y F,He Y. Variable precision rough set model with applications to emitter recognition. Journal of Tsinghua University.2007,47(1):28~31
    [21]Howie J M. Fundamentals of Semi groups Theory. Oxford University Press Inc.1995
    [22]Zhang L H, Zhang G H, Zhang J, Bai Y C. An Intrusion Detection Mechanism Using Rough Set Classification (RSC). Journal of Shanghai Jiaotong university. 2004,10:194~199
    [23]Zheng S F, Guany Y Y,Shi K Q. Discernable matrix and its application in inconsistent decision. Journal of Shandong University.2005,04:86~89
    [24]Liu H J, Feng B, Li W J, Lu H T. Discrimination Method of Rough Set Attribute Reduction and Its Applications. Journal of Xi'an Jiaotong university.2007,08: 939~943
    [25]Zhu H. Research of Representation Formula or the Dependence Degree Among Attributes. Computer Engineering.2005,01:174~176
    [26]Hu X, Cercone N. Learning in Relational Database: A Rough Set Approach.
    International Journal of Computational Intelligence.1995,11(2):323~338
    [27]Miao D Q, Wang J. An information-based Algorithm for Reduction of Knowledge. In:IEEE ICIP'1997.1997(1),1155~1158
    [28]Yan Y,Yang H Z. Knowledge reduction algorithm based on mutual information. Journal of Tsinghua University.2007(47):1903~1906
    [29]苗夺谦,王珏.粗糙集理论中知识粗糙性与信息熵关系的讨论,模式识别与人工智能.1998,11(3):34~40
    [30]Wroblew ski J. Finding minimal reducts using genetic algorithm. ICS Research Report 16/95, Institute of computer science, Warsaw University of Technology, 1995
    [31]Skowron A, Rauszer C. The discernibility matrices and function in information system. In: Slowinski R. ed. Intelligent Decision Support-Handbook of Applications and Advances of the Rough Sets Theory. Dordrecht:Kluwer Academic Publishers,1992.331~362
    [32]Nguyen S H, Skowron A. Boolean reasoning for feature extraction Problems. In: Tenth international symposium on Methodologies for intelligent Systems, Foundations of intelligent Systems (ISMIS'97).USA:Springer-verlag, Berlin, 1997.117~126
    [33]Chen W Y, Shu z, Wang Z J et al, Study of key techniques concerning Virtual assembly system for equipment maintenance training, In:Third International Conference on virtual Reality and Its Application in Industry.Hangzhou, China: 2002.254~25
    [34]Slezak D. Approximate reducts in decision tables. In:Proc. Of the sixth International Conference, Information Processing and Management of Uncertainty in Knowledge-based Systems (IPMU'96), Granada, Spain:1996.1159~1164
    [35]Bazan J,Skowron A,Synak P. Dynamic reducts as a tool for extracting laws from decision tables in:methodologies for intelligent system. In:Proc.8th International Symposium ISMIS'94, Charlotte, NG, October 1994, LNAI vol.869,Springer
    Verlag 1994,346~355
    [36]Jan G, Bazan J. Dynamic reducts and statistical inference. In: Proceedings of the Sixth International Conference, Information Processing and Management of Uncertainty in Knowledge-Based Systems. Granada, Spain,1996,(2):1147~1152
    [37]Shao J. Finding Reducts with User Specified Criteria. In:Wagner R(eds.), Proc. Of 8th international workshop on Database and Expert systems Applications (DEXA'97), Toulouse, France, IEEE Computer Society Press,1997,352~357
    [38]Nguyen S H., Skowron A. Boolean reasoning for feature extraction problems. In: Z.W.Ras, A. Skowron (Eds.), tenth international Symposium on Methodologies for Intelligent Systems, Foundations of Intelligent Systems (ISMIS'97), Charlotte, NC, USA, Lecture Notes in Artificial Intelligence 1325, Springer-verlag, Berlin, 1997,117~126
    [39]Skowron A and Stepanjuk J. Information Reduction Based on Constructive Neighborhood Systems. In:Proceedings of the Fifth international Workshop on Rough Sets Soft Computing (RSSC'97) at Third Annual Joint Conference on information Sciences(JCIS'97). Duke University, Durham, NC, USA,1997,158~ 160
    [40]Qu B B, Lu Y S. Matrix Computation for Rule Extraction in Incomplete Information Systems. Journal of Computer Science.2007(34):193~195
    [41]Wang X H, Zhang T F, Xiao J M. Algorithm for decision rules reduction based on binary discernibility matrix. Computer Engineering and Applications,2007,43 (27):178~180
    [42]Pan. Y,Jian L R,Da Q L. Variable Precision Rough Sets Methodology for Probabilistic Rules Discovery from a Multi-Criteria Decision Table. Chinese Journal of Management Science,2005,2:95~100
    [43]Zhang Y, Li C. Approach to rule extraction and generation using rough Multilayer Perception networks. Journal of Huazhong University of Science & Technology.2007,02:64~66
    [44]Guo S,Wang Z Y,Wu Z C,Yang H P.Incremental rules extraction based on rough set theory. Computer Applications.2005,11:2621~2623
    [45]Wang Y,Yan D Q,Zhang F M.Rough set and decision tree based incremental rule reduction algorithm. Computer Engineering and Applications,2007,43(1):170~172
    [46]Wang J Y,Liao C. Research on the Attribute Reduction and Rule Acquisition of Time-Series Data Based on Rough Entropy. Journal of Hunan University.2005,08:112~116
    [47]Bazan J. Dynamic reducts and statistical inference. In: Proceedings oft the Sixth International Conference, Information Processing and Management of Uncertainty in Knowledge~Based Systems (IPMIU'96), Granada Spain,1996,1147~1152
    [48]Skowron A,Polkowski L. Synthesis of decision systems from data tables. In: T. Y Lin, N. Cereone (eds.), Rough sets and data mining:Analysis of imprecise data, Kluwer Academic Publisher Dordrecht,1997,259~299
    [49]Han J, Fu.Y. Discovery of multiple~level association rules from large database. In: Proc. of the 21th international conference on very large database, Zurich, Switzerland,1995,402~431
    [50]Yao Y Y. A comparative study of fuzzy sets and rough sets, Information Sciences, 1998,109(1-4):227~242
    [51]Skowron A,Grzymala B J W. From rough set theory to evidence theory. In:Yager R R, Fedrizzi M and Kacprzyk J et al(eds.), Advances in the Dempster~Shafter Theory of Evidence, New York 1994,193~236
    [52]Zhang N, Xu J Z, Yu X D, et al. Fault diagnosis of the power transformer based on RS theory and neural network. High Voltage Engineering,2003,29(11):9~10
    [53]Zhang D B,Wang Y N,Huang H X. Rough Neural Network Based on Variable Precision Rough Set. Journal of Electronics & Information Technology, 2008,08:1013~1017
    [54]WANG Y Q, Li F C, LI H M. Synthetic fault diagnosis method of power transformer based on rough set theory and Bayesian network.in:Proceedings of the
    CSEE,2006,26(8):137~141
    [55]Yao Y Y, Lingras P J. Interpretations of belief functions in the theory of rough sets, information Sciences,1998,104(1-2):81~106
    [56]Lingras P. Unsupervised rough sets classification using GAs. Journal of Intelligent Information Systems,2001,16(3):215~228
    [57]Lin T Y, Liu Q. Rough approximate operators:Axiomatic rough set theory. In: Ziarko W P ed. Rough Sets, Fuzzy Sets and Knowledge Discovery. Springer-Verlag,1994,256~260
    [58]代建华,潘云鹤.相似关系粗糙集理论的一个极小公理组.复旦学报.2004(43):856~859
    [59]Dai J H, Chen W D, Pan Y H. A Minimal Axiom Group of Rough Set Based on Quasi-ordering. Journal of Zhejiang University Science.2004,05:810~815
    [60]Szczerba L W. Rough quantifiers have no Tarski property. Bulletin of the Polish Academic of Sciences:Mathematics,1987,35(9-10):663-665
    [61]Guan T, Feng B Q. Knowledge Reduction Methods in Fuzzy Objective Information Systems. Journal of Software.2004,15(10):1744~1747
    [62]Nguyen S H, Nguyen H S. An Application of discretization methods in control. In: Proceedings of the Workshop on Robotics, Intelligent Control and Decision Support Systems.Polish-Japanese institute of Information Technology, Warsaw, 1999.47~52
    [63]Xia K W, Shen J Y, Li C B. Study on attribute reduction method in sample information processing. Journal of Xi'an Jiaotong University.2005,39(6):558~ 561
    [64]Nguyen S H, Skowron A. Quantization of real value attributes: Rough set and Boolean reasoning approach. Bulletin of intention Rough Set Society,1997, 1(1):5~16
    [65]Ohrn A, Komorowski J, Skowron A, Synak P. The ROSETTA software system. In: L. Polkowski and A. Skowron(eds.), Rough Sets in Knowledge Discovery 2:
    Applications, Case Studies and Software Systems, Physica-Verlag, Heidelberg,1998.572~576
    [66]Slowinski R.Intelligent Decision Support, handbook of Applications and Advances of Rough Sets Theory. Kluwer Academic Publishers, Dordrecht.1992
    [67]Yao Y Y.Relational Interpretations of Neighborhood Operators and Rough Set Approximation Operators. Information Seiences,1998,111:239~259
    [68]Grzymala Busse J W. a New Version of the Rule Induction System LERS. Fundamenta Informaticae.1997,31:27~39
    [69]Slezak D. Various approaches reasoning with frequency-based decision reduets a survey. In:Polowski L, Lin T Y and Tsumoto S(eds.) Rough Sets in Soft Computing and Knowledge Discovery:New Developments, Physica-Verlag, Heidelberg.2000
    [70]Amaldi,Kann V. On the Approximation of Minimizing Nonzero variables or Unsatisfied Relations in Linear Systems. Theoretical Computer Science,1998,209: 237~260
    [71]Jelonek J,Krawiec K,Slowinski R. Rough set reduction of attributes and their domains for neural networks. International Journal of Computational Intelligence, 1995,11(2):339-347
    [72]Chang J,Zhu J, Zhang F. An Updated Algorithm for Attribute Reduction Based on Discernibility Matrix. Journal of Hunan University,2009,04:85~88
    [73]Yang M. An incremental updating algorithm of the computation of a core based on the improved discernibility Matrix. Chinese Journal of Computers,2006,29(3): 407~413
    [74]Ye D Y, Cheng Z T. A new difference matrix and its methods of core finding. Electronic Journal,2002,30(7):1086~1088
    [75]Liang J Y,Chin K S,Dang C Y,et al.A new method for measuring uncertainty and fuzziness in rough set theory. International Journal of General Systems, 2002,31(4):331~342
    [76]Miao D Q, Hu G R. A heuristic algorithm for reduction of knowledge. Journal of
    Computer Research and Development,1999,36(5):681~684
    [77]Wang G Y, Yu H, Yang D C. Decision table reduction based on conditional information entropy. Chinese Journal of Computers,2002,25(7):759~766
    [78]Miao D Q,Hou L S. A Heuristic Algorithm for Reduction of Knowledge Based on DiscernibilityMatrix.In:International Conference on Intelligence Information Technology (ICIIT-02),2002.276~279
    [79]Li H. An absolute information quantity based algorithm for reduction of knowledge in information systems. Computer Engineering and Applications,2004, 40(28):52~53,217
    [80]Shi F, Lou Z L,Zhang Y Q. A Modified Heuristic Algorithm of Attribute Reduction in Rough Set. Journal of Shanghai Jiaotong University,2002,36 (4):478~481
    [81]Zheng X X, Qian F. Diagnostic Knowledge Extraction Based on Variable Precision Rough-Fuzzy Sets Integration Model.Journal of East China University of Science and Technology.2006,12:1458~1462
    [82]Jakub W. Finding minimal reducts using genetic algorithm.Institute of Computer Science Reports, Warsaw University of Technology, Warsaw,1995
    [83]Ma G Z, Research on Optimizaation of Multi-cost Sensitive Back Propagation Neural Network, Ph.D.Thesis.Huazhong university of technology and technology, 2009,05:46~52
    [84]Murphy P M, Aha D W. UCI repository ofmachine learning database [EB/OL]. http: //www.ics.uci.edu/-mlearn/MLRepository.html
    [85]Liu L Y, Wang H Y, Zheng L Y. Research and Application on Decision Rule Reduction Algorithm Based on Rough Sets. Journal of Lanzhou Jiaotong University. 2004,23(6):78~80
    [86]Jia P, Dai J H, Pan Y h, et al. Nove algorithm for attribute reduction based on mutual-information gain ratio. Journal of Zhejiang University Engineering Science. 2006,40(6):1041 1044
    [87]Su Z L. An improved Multi-population Genetic Algorithm for Job-shop Scheduling Problem. Ludong University Journal.2007,23(2):136-141
    [88]Mitchell T M. Machine Learning and Data Mining, Communications of the ACM. 1999,11(42):30~36
    [89]Catlett J. Megainduction:Machine Learning on Very Large Databases. Ph.D.Thesis. University of Sydney.1991
    [90]Ion A, John K, et al. An Evaluation of Bayesian Anti-Spam Filtering. In:Proceedings of the workshop on Machine learning in the New Information Age,G Potamias,V Moustakis and M van Someren(eds),llth European conference on Machine Learning Barceloma,Spain,2000.9~17
    [91]Bennett K, Bredensteiner E. Duality and geometry in SVM classifiers .In: Proceedings of the 17th International Conference on Machine Learning. San Francisco, USA:MorganKaufmann Publisher,2000.57~64
    [92]Gama J. A linear-bayes classifier. Articial Intelligence and Computer Science Laboratory, University of Porto,2000
    [93]Mehta M, Agrawal R, Rissanen J. SLIQ:A Fast Scalable Classifier for Data Mining. In:Proceedings of the 5th International Conference on Extending Database Teehoology (EDBT96),Avignon,Franee,1996.18~32
    [94]Andrews R, Diederich J, Tickle A B. Survey and Critique of Techniques for Extracting Rules from Trained Artificial Neural Networks. Knowledge Based Systems,1995,8(6):373~384
    [95]Lisboa P J G Neural Networks in Medical Journals: Current Trends and Implications for BioPatten. In:Proceedings of the 1st European Workshop on Assessment of Diagnostic Performance (EWADP 2004).Milan,2004.99~112
    [96]Pawlak Z.Rough sets, decision algorithms and Bayes'theorem. European Journal of Operational Research,2002,136(1):181~189
    [97]Chou K C, Cai Y D. Using functional-domain composition and support vector machines for prediction of protein subcellular location. J. Biol. Chem. 2002(29):45765~45769
    [98]Garg A, Bhasin M, and Raghava G P S. Support vector machine based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search, J. Biol. Chem.2005(280):14427-14432
    [99]Cristianini N,Shawe T J. An Introduction to Support Vector Machines and other Kernel-Based Learning Methods. Cambridge University Press,2000
    [100]Zhang X G. Introduce to Statistical Learning Theory and Support Vector Machines. Acta Automatica Sinica.2000,01:32~41
    [101]Li B, Yuan S M, Wang L M. The Simplified Expression of Bayesian Network. Journal of apparatus and instrument.2005,10:1070~1073
    [102]Chang C C, Lin C J. LIBSVM—a library for support vector machines.< http://www.csie.ntu.edu. tw/-cjlin/libsvm/>
    [103]邓乃扬,田英杰.数据挖掘中的新方法——支持向量机.北京:科学出版社,2004
    [104]Jiang Y, Zhou Z H, Chen Z Q. Rule learning based on neural network ensemble.In: Proceedings of the International Joint Conference on Neural Networks,2002.1416~1420
    [105]郑建军,刘炜,刘琼昕.基于选择性的贝叶斯分类器集成方法.北京理工大学学报,2003,23(6):724~727.
    [106]Dou P Y, Wei Q R. Intelligent Dada Analysis Model Based on Rough Set Theory. Mathematics in Practice and Theory.2007,06:109-114
    [107]Cortes C, Vapnik V N. Support vector networks. Machine Learning,1995,20: 273~297

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700