基于粗糙集的知识约简算法研究及应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
研究如何从大量的数据中智能地抽取出有价值的知识和信息,已成为当前人工智能研究中非常活跃的研究领域。目前,知识发现面临着不能有效地处理不完备、不确定性数据以及知识的可解释性比较差的问题。而作为集合论的扩展,粗糙集理论是一种新的软计算方法,可以有效处理模糊的、不确定性知识。它不需要先验知识和外界参数,近年来已经被成功地应用于人工智能、数据挖掘、模式识别等诸多领域。因此,研究基于粗糙集的知识发现方法具有十分重要的意义。
     本文对粗糙集的基本理论和概念进行了分析和研究,在这些基本理论的框架下,主要做了以下几个方面的研究:
     (1)粗糙集属性离散化
     粗糙集的属性离散化要求:离散化结果要保持决策系统的不可分辨关系,以此来确保系统的分类能力不变;求得最小数目的断点集。针对这两点要求,本文首先介绍目前文献中已有的一些连续属性离散化的算法,并对其进行研究与分析,得出这些算法在上述两方面或其他方面的不足;然后,针对这些不足,提出了基于改进遗传算法的数据离散化算法;最后通过实例验证,该算法具有较好的离散化效果。
     (2)粗糙集属性约简
     针对传统的粗糙集属性约简算法效率不高,速度不快的问题,本文提出基于条件信息熵和相关系数的属性约简算法,把决策表的非核属性约简过程转化为相关系数的运算,能减少对决策表的扫描次数,降低算法时间复杂度,降低算法冗余,提高属性约简的效率。并利用k-fold轮换对比方法计算相关系数,较大幅度的减少了计算量,同时能得到次优属性约简结果。文中结合实验对算法的性能进行了验证。
     (3)基于粗糙集的柴油机燃油喷射智能故障诊断系统
     本文最后将对粗糙集的相关研究应用在故障诊断方面。在对柴油机及其燃油喷射系统进行了介绍之后,基于本文提出的算法构造了柴油机燃油喷射智能故障诊断系统,以帮助工作人员更好地完成故障诊断工作。
The study on how to automatically extract valuable information and knowledge from large scale of data has become very active research area in current artificial intelligence research. Nowadays, knowledge discovery is facing the problems that incomplete and uncertain data is not processed effectively and interpretability of knowledge is weak. As a new soft computing method, rough set theory is the extension of set theory, and it is efficient in processing incomplete and uncertain data without knowing prior knowledge and external parameters. And it has been successfully used in areas of artificial intelligence, data mining, pattern recognition, and so on. Therefore, the research of knowledge discovery technology based on rough set theory is of great practical significance.
     In this dissertation, basic theories and conceptions of rough set are analyzed and studied. And in the framework of them, these researches are done:
     (1) Discretization of continuous attributes in rough set
     Discretization of continuous attributes in rough set requires: the indiscernibility of decision system can not be changed by results of discretization so as to make sure that classification capacity of the decision system is not going to change; and the number of breakpoints in breakpoints set is as small as possible. Aiming at these two requests, firstly some discretization algorithms of continuous attributes are introduced, and they are studied and analyzed to expose their deficiencies on the above aspects or other ones; after that, aiming at these deficiencies, a mew data discretization algorithm based on advanced genetic algorithm is proposed; at last, experiments are carried out to prove its performance.
     (2) Attribute reduction in rough set
     Aiming at the problem of inefficiency and low velocity with the traditional attribute reduction algorithm, an attribute reduction algorithm based on conditional information entropy and correlation coefficient is proposed, which changes attribute reduction process of non core attributes in the decision table into calculation of correlation coefficient, and reduces the number of scanning decision table, algorithmic time complexity and redundancy of the algorithm, and improves the efficiency of attribute reduction. Then the k-fold rotation comparison method is used to calculate correlation coefficient, which largely reduces calculation amount, and attains sub optimal attribute reduction result. The algorithm details are given and an experiment is carried out, the result of which verifies the efficiency of the algorithm.
     (3) Intelligent fault diagnosis system of fuel injection system in diesel enginesbased on rough set theory
     Relative researches on rough set theory in this dissertation are used in fault diagnosis. After the introduction of diesel engine and its fuel injection system, theoretic basis of fault diagnosis based on rough set theory is analyzed, and an intelligent fault diagnosis system of fuel injection system in diesel engines based on rough set theory is established to help the staff finish fault diagnosis job better.
引文
[1]P M FRANK.Analytical and qualitative model based fault diagnosis-A survey and some new results[J].European Journal of Control.2001,5(2):6-8.
    [2]Patton R J,Chen J.A review of Parity space approaches to fault diagnosis[C].Proceedings of IFAC fault detection,supervision and safety for technical Processes.Baden-Baden,German,1991.65-81.
    [3]Patton R J.Chen J.Review of Parity space approaches to fault diagnosis for Aerospace system[J].Journal of Guidance,Control and Dynamics,1994,17(2):278-285.
    [4]Rotem Y,Wachs A.Lewin D R.Ethylene compressor monitoring using model-based PCA[J].AIChE Journal.2000,46(9):1825-1836.
    [5]叶昊,王桂增,方崇智.小波变换在故障检测中的应用[J].自动化学报,1997,23(6):736-741.
    [6]吴今培.智能故障诊断技术的发展和展望[J].振动、测试与诊断,1999,19(2):80-147.
    [7]赵纪元,何正嘉,孟庆丰等.小波包模糊聚类网络研究及应用[J].西安交通大学学报,1998,32(2):15-19.
    [8]周受饮,谢友柏.基于模糊神经网络的机械故障诊断研究[J].汽轮机技术.1999,41(4):216-219.
    [9]Z Pawlak.Rough Sets[J].International Journal of Computer and Information Sciences,1982,11(5):341-356.
    [10]Z Pawlak.Rough Sets:Theoretical Aspects and Reasoning about Data[C].Kluwer Academic Publishers,Dordrecht,1991.
    [11]Slowinski R.Intelligent Decision Support-handbook of Applications and Advances of the Rough Sets Theory[C].Kluwer Academic Publishers,Dordrecht,1992.
    [12]J w Grzymala-Busse.LERS-A system for learning from examples based on rough sets[C].R.Slowinski.Intelligent Decision support:Handbook of APPlication and Advances of the Rough Sets Theory,1992.
    [13]The Knowledge Systems Group.Dept.of Computer and Information Science,Norwegian University of Science and Technology.the Group of Logic.Inst.of Mathematies,University of Warsaw,Poland.Rosetta-A rough set toolkit for analysis of data[EB/OL].[2010-02-09].http://www.lcb.uu.se/tools/rosetta/.
    [14]Jan Bazan,Nguyen Hung Son,Andrzej Skowron,et al.RSES-Rough set exploration system [EB/OL].[2010-02-09],http://alfa.mimuw.edu.PI/~rses/.
    [15]J Li,N Cercrone.Empirical Analysis on the Geriatric Care Data Set Using Rough Sets Theory[J].Tech.Report,CS-2005-05,2005.
    [16]Q Shen,A Chouchoulas.A modular approach to generating fuzzy rules with reduced attributes for the monitoring of complex systems[J].Engineering Applications of Artificial Intelligence,2002,13(3):263-278.
    [17]J Zaluski,R Szoszkiewicz,J Krysinski,J.Stefanowski.Rough set theory and decision rules in data analysis of breast cancer patients[J].Transactions on Rough Sets 1,2004,LNCS 3100:375-391.
    [18]L P Khoo,S B Tor,L Y Zhai.A rough set-based approach for classification and rule induction[J].International Journal of Advanced Manufacturing Technology,1999,15:438-444.
    [19]A A Bakar,Md N Sulaiman,M.Othman,M.H Selamat.Finding minimal reduct with binary integer programming in data mining[C].Proceedings of the TENCON,2000:141-149.
    [20]X.Hu.Using rough sets theory and database operations to construct a good ensemble of classifiers for data mining applications[C].Proceedings of 1CDM,2001:233-240.
    [21]F Questier,I A Rollier,B Walczak,D L Massart.Application of rough set theory to feature selection for unsupervised clustering[J].Chemometrics and Intelligent Laboratory Systems,2002,63:155-167.
    [22]K Thangavel,A Pethalakshmi.Performance analysis of accelerated Quickreduct algorithm[C],Proceedings of International Conference on Computational Intelligence and Multimedia Applications,2007,IEEE Xplore 2.0 2:318-322.
    [23]王国胤.Rough集理论与知识获取[M].西安:西安交通大学出版社,2001.
    [24]秦川.基于粗糙集的知识获取技术研究及其在医疗诊断中的应用[D].镇江:江苏大学,2008.
    [25]Pawlak Z.Rough set theory and its applications to data analysis[J].Cybernetics and Systems,1998,29:661-688.
    [26]苗夺谦,王珏.粗糙集理论中概念与运算的信息表示[J].软件学报,1999,10(2):113-116.
    [27]Dai J H,Li Y X.Study on discretization based on rough set theory[C].Proc of lst Int Conf on Machine Learning and Cybernetics,Piscataway,2002,IEEE Press:1371-1373.
    [28]Roy A,Pal S K.Fuzzy discretization of feature space for a rough set classifier[J].Pattern Recognition Letter,2003,24(6):895-902.
    [29]Susmaga R.Analyzing discretizations of continuous attributes given a monotonic discrimination function[J].Intelligent Data Analysis,1997,1(3):157-179.
    [30]榉一伦,基于粗糙集的数据离散化方法研究.长春:吉林大学,2009.
    [31]Caiyun Chen.Guozhi Li,Yongsheng Qiao,Shuopin Wen.Study on discretization in rough set based on genetic algorithmiC].Proceedings of the Second International Conference on Machine Learning and Cyberuetics,Xi'an,2003,2-5.
    [32]H S Nguyen,A Skowron.Quantization of Real Values Attributes,Rough Set and Boolean Reasoning Approaches[C].Proc.of the 2~(nd) Joint Annual Conf.on Information Science,Wrightsville Beach,NC,USA.34-37.
    [33]Dougherty J,Kohavi R,Sahami M.Supervised and unsupervised discretization of continuous features[C].In proc.12~(th) international conference on machine learning,Los Altos,CA,1995.
    [34]Everitt B.Cluster analysis[M].Second edition,Heinemann,Landon,1980.
    [35]Fayyad U,Irani K.On the handling of continuous-valued attributes in decision Tree generation[J].Machine learning,1992,87-102.
    [36]Fayyad U,Irani K.Multi-interval discretization of continuous-valued attributes for classification learning[C].Proc.13~(th) international joint conference on artificial Intelligence,margan Kaufmann,Los Altos,CA.1993,1022-1027.
    [37]R Kerber.Chi merge:Discretization of numeric attributes[C].In AAAI-92,Proceedings Ninth National Conference on Artificial Intelligence,AAAI Press/The MIT Press,1992,104-110.
    [38]Nguyen Hong Son,Nguyen Sinh Hoa.Discretization methods with backtracking[C].Proceedings of the Fifth European Congress on Intelligent Techniques and Soft Computing,(EUFIT'97),Aachen,Germany,Verlag Mainz,1997:201-205.
    [39]李敏强,寇纪淞,李丹等.遗传算法的基本理论与应用[M].北京:科学出版社,2002.
    [40]刘美容.基于遗传算法、小波与神经网络的电路故障诊断方法[D].郑州:湖南大学,2009.
    [41]Bagley J D.The Behavior of Adaptive System Which Employs Genetic and Correlation Algorithm[M],Michigan:University of Michigan,1967.
    [42]Cavicchio D J.Adaptive Search Using Simulated Evolution[M].Michigan:University of Michigan,1970.
    [43]Hollstein R B.Artificial Genetic Adaptation in Computer Control System[M].Michigan:University of Michigan,1971.
    [44]J H Holland.Adaptation in Natural and Artificial Systems Adaptation in Natural and Artificial Systems[M].Cambridge,MA:MIT Press,1975.
    [45]王小平,曹立明.遗传算法—理论、应用与软件实现[M].西安:西安交通大学出版社,2002.
    [46]Liang G C,He Y G.A fault identification approach for analog circuits using fuzzy neural network mixed with genetic algorithms[C].IEEE Proc.Intelligent Systems and Signal Processing,Changsha China,2003,2:1267-1272.
    [47]Lian-Dong Fu,Kui-Sheng Chen,Jun-Sheng Yu etc.al.The Fault Diagnosis for Electro-Hydraulic Servo Valve Based on the Improved Genetic Neural Network Algorithm [C].Machine Learning and Cybernetics,International Conference,2006:2995-2999.
    [48]童小华,张学,刘妙龙.遥感影像的神经网络分类及遗传算法优化[J].同济大学学报(自然科学版),2008,36(7):985-989.
    [49]董泽,黄宇,韩璞.量子遗传算法优化RBF神经网络及其在热工辨识中的应用[J].中国电机工程学报,2008,28(17):99-104.
    [50]蒙祖强,蔡自兴.一种基于并行遗传算法的非线性系统辨识方法[J].控制与决策,2003,18(3):367-374.
    [51]Dou Wei,Liu Zhan-sheng,Wang Dong-hua.Combination Diagnosis Based on Genetic Algorithm for Rotating Machinery[C],Natural Computation,2007.ICNC 2007.Third International Conference,2007,4:307-313.
    [52]张文修,梁怡.遗传算法的数学基础[M].西安:西安交通大学出版社,2000.
    [53]Rudolph G.Convergence analysis of canonical genetic algorithms[J].IEEE Transactions on Neural Networks,1994,5(1):96-101.
    [54]潘凤萍.遗传算法的理论与应用研究[M].徐州:中国矿业大学信电学院,2003.
    [55]张维明,刘忠,肖卫东等.信息系统建模[M].北京:电子工业出版社,2002.
    [56]徐宗本,张讲社,郑亚林编著.计算智能中的仿生学:理论与算法[M].北京:科学出版社, 2003.
    [57]陈国良等.遗传算法极其应用[M].北京:人民邮电出版社,1996.
    [58]HAO Guo-sheng,GONG Dun-Wei,HUANG Yong-Qing.Interactive Genetic Algorithms Based on Estimation of Users' Most Satisfactory Individuals[C].Sixth International Conference on Intelligent Systems Design and Application,IEEE computer science,Jinan,China,2006,3:132-137.
    [59]HAO Guo-Sheng,GONG Dun-Wei,SUN Xiao-Yan,ZHANG Yong,LING Ping.Evolution control of complex adaptive system based on genetic algorithmiC].Proceedings of the International Conference on Complex Systems and Applications—Modeling,Control and Simulations,Hanlin Hotel,Jinan,China,2007:926-929.(ISTP)
    [60]Sun X.Y,Gong D W,Li S B.Classification and regression-based surrogate models-assisted interactive genetic algorithm with individual's fuzzy fitness[C].Proceedings of Genetic and Evolutionary Computation Conference,2009.
    [61]沐阿华,周绍磊.于晓.一种快速自适应遗传算法及其仿真研究[J].系统仿真学报.2004.16(1):122-125.
    [62]王国胤,于洪,杨大春.基于条件信息熵的决策表约简[J].计算机学报.2002,25(7):759-766.
    [63]叶玉玲,伞冶.基于遗传算法的粗糙集混合数据属性约简[J].哈尔滨工业大学学报,2008,40(5):683-687.
    [64]JinJie Huang,Shiyong Li.A GA-based Approach to Rough Data Model[C].Proceedings of the 5~(th) World Congress on Intelligent Control and Automation,Hangzhou China,2004.
    [65]李兵,谢剑英.遗传算法的自适应代沟的替代策略研究[J].控制理论与应用,2001,18(1):41-44.
    [66]R A Fisher.Iris data set[DB/OL].1936,[2010-02-09]..http://archive.ics.uci.edu/ml/datasets/Iris
    [67]Wong S.K.M.,Ziarko W.On optional decision rules in decision tables[J].Bulletin of Polish Academy of Sciences,1985,33(11/12):693-696.
    [68]李宁宁.基于粗糙集理论的数据挖掘应用研究[D].大连:大连理工大学,2007.
    [69]赖桃桃.增量式属性约简更新算法研究[D].厦门:厦门大学,2009
    [70]Skowron A,Rauszer C.The discernibility matrices and functions in information systems[C].Slowinski.Intelligent decision support handbook of application and advances of the Rough Sets Theory.Dordrect:Kluwer Academic Publisher,1991:331-362.
    [71]Hu Xiaohua,Cercone N.learning in relational database:a rough set approach[J].Computational Intelligence,1995,11(2):323-337.
    [72]Hu X.Knowledge discovery in database:a attribute-oriented rough set approach[D].Canada:Doctoral dissertation.University of Regina,1995.
    [73]王亚英.基于粗集理论的知识发现方法研究[D].上海:交通大学,2000.
    [74]苗夺谦,胡桂荣.知识约简的一种启发式算法[J].计算机研究与发展,1999,36(6):681-684.
    [75]Attila Kertesz-Farkas,Benchmarking protein classification algorithms via supervised cross-validation[J].J Biochem Biophys.Methods,2008 70:1215-1223.
    [76]李保国.12V190B柴油机故障诊断系统知识工程研究[D].沈阳:东北大学,2005.
    [77]李玉峰.基于神经网络的柴油机燃油系统故障诊断的研究与发现[D].济南:山东大学,2007.
    [78]曹龙汉,曹长修.基于粗糙集理论的柴油机神经网络故障诊断研究[J].内燃机学报,2002,20(4):357-361.
    [79]袁科新.基于BP神经网络的发动机故障诊断研究[D].济南:山东大学,2006.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700