粗糙集在不完备信息系统数据挖掘中的应用研究

英文题名：The Application Research of Rough Set in Data Mining of Incomplete Information System
作者：申爱华
论文级别：硕士
学科专业名称：管理科学与工程
中文关键词：粗糙集 ; 不完备信息 ; 数据挖掘 ; 电力系统网络 ; 故障诊断
英文关键词：Rough Set ; Incomplete Information ; Data Mining ; Electric Power Network ; Fault Diagnose
学位年度：2004
导师：陈燕
学科代码：1201
学位授予单位：大连海事大学
论文提交日期：2004-03-01

摘要

1982年波兰学者Z．Pawlak提出了粗糙(Rough)集。它是一种处理不精确和不完备信息的数学工具，而且不依赖于数据集之外的任何附加信息。经历了近20年的发展，已经在理论和应用上取得了丰硕的成果。
     数据挖掘是从大量的、不完全的、有噪声的、模糊的、随机的数据中，提取隐含在其中的、人们事先不知道的但又是潜在有用的信息和知识的过程。对一些含有不完备信息的数据，传统的数据挖掘技术无能为力，而粗糙集却可以对这一类信息进行处理。作为集合论的扩展，粗糙集理论的主要研究领域之一就是在信息不完备情况下的数据挖掘技术。
     本文主要针对粗糙集理论在不完备信息系统中的应用展开研究。提出了基于分辨矩阵和数据分析的两种数据约简模型，并对分辨矩阵算法做了改进，最后对这两种模型进行了比较。
     基于粗糙集理论，本文利用JAVA语言实现了一个简单实用的小型数据挖掘模型，该模型为B／S模式，面向互联网应用。以一个实际的电力系统网络故障诊断的实例进行检验，在电网故障中存在缺失和错误保护动作信号的情况下，该系统也能做出正确的诊断，说明粗糙集理论对于不完备和不精确的信息系统有很强的容错能力。
In 1982, Polish scholar Z. Pawlak put forward Rough Set theory. It can be utilized as a mathematical tool for the analysis of imprecise and incomplete information with the support of the interrelated data set only. So far plentiful achievements of Rough Set have been made both in theory research and application.
    Data Mining is a process that can abstract embedded and potentially useful
    information from large amounts of data that are incomplete, noisy, fuzzy, and random.
    Rough Set can deal with the data containing incomplete information, which is beyond
    the ability of the traditional technology of data mining. As the extension of set theory,
    one of Rough Set's central research fields is data mining with incomplete information.
    The dissertation mainly focuses on the research and application of Rough Set in the incomplete information system. It proposes two data reduction models based on discernibility matrix and data analysis separately, improves the discernibility matrix algorithm and made the comparison between the two models.
    In this dissertation, a data mining model based on Rough Set that is B/S structure is made. And a practical instance about electric power system fault diagnose is brought forward to check this model. It is found that part of the signals in the electric network fault is incomplete or even wrong, the right diagnose can still be obtained through the model. So it is concluded that this model is effective in handling the incomplete and imprecise information.

引文

[1] Z.Pawlak. Rough Sets-Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht, 1991
    [2] Z. Pawlak, Slowenski R. Rough Set approach to multi-attribute decision analysis. Invited Review[J]. European Journal of Operational Research, 1994, 72:443-459
    [3] Z.Pawlak. Rough Sets: Propobilitiy versus deterministic approach[J]. Int J Man-Machine Studies, 1988, 29:81-95
    [4] Piatetsky-Shapiro G; Fayyad U,Smith P. From Data Mining to Knowledge Discovery: an overview [M]. In:Fayyad U M,Piatetsky Shapiro G, Lisa Lewinson et al.Advances in Knowledge Discovery and Data Ming. America: AAA/MIT Press,1996:1-35
    [5] Hu Xiao Hua, Nike Cercone. Learning in Relational bases: a rough set approach[J], Computitional Intelligence, 1995, 11 (2), 323-338
    [6] Z.Pawlak. Rough Logic. Bull. Polish Acad. Sci. Tech. 35/5-6, 1987, 253-258
    [7] E.Orlowska. A Logic of Indiscernibility Relation. In: A. Skowron (ed) Computation Theory, Lecture Notes in Computer Science 208, 1985, 177-186
    [8] Z.Pawlak. Rough Sets-Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht, 1991
    [9] Z.Pawlak. Rough Functions. Bull. Polish Acad. Sci. Tech. 35/5-6, 1987, 249-251
    [10] Z.Pawlak and A. Skowron. Rough Menbership Functions: A Tool for Reasoning with Uncertainty. In: C. Rauszer(ed.), Algebraic Methods in Logic and Computer Science, Banach Center Publication 28, Polish Academy of Sciences, Warsaw, 1993, 135-150


    [11] K. Slowinski, "Rough classification of HSV patients", in R. Slowinski(ed.), Intelligent Decision Support-Handbook of Applications and Advances of the Rough Sets Theory, Kluwer, pp. 77-94, 1992.
    [12] A. Mrozek, "Rough sets in computer implememation of rule-based control of industrial process", in R. Slowinski (ed.), Intelligent Decision Support-Handbook of Applications and Advances of the Rough Sets Theory, Kluwer, pp. 19-32, 1992.
    [13] Jiawei Han，Micheline Kamber，数据挖掘概念与技术，机械工业出版社，2001年8月，29-35
    [14] 毕天妹，倪以信，杨奇逊．人工智能技术在输电网络故障诊断中的应用评述[J]，电力系统自动化，2000，24(2)：11-16
    [15] 于金龙，李晓红，孙立新．连续属性值的整体离散化，哈尔滨工业大学学报．2000(32)：48-53
    [16] 曾黄麟．粗糙集理论及其应用[M]，第1版，1996，重庆：重庆大学出版社，2
    [17] 张琦，韩祯祥，文福拴．一种基于粗糙集理论的电系统故障诊断和警报处理新方法[J]，中国电力，1998，31(4)：32-35
    [18] 赛英，陈文伟．从数据库中发现知识的方法与应用[J]，管理科学学报，1999，2(3)，92-96
    [19] 孙雅明，杜红卫，廖志伟．基于神经逻辑网络冗余纠错和FNN组合的配电网高容错性故障定位[J]．电工技术学报 2001，16(4)：71-75
    [20] 王国胤．Rough集理论与知识获取，第1版，2001，西安：西安交通大学出版社，148
    [21] 刘清．Rough集及Rough推理，第1版，2001，北京：科学出版社，53
    [22] 王清印，崔授民，任彪．不确定信息产生的根源与泛灰集合基础[J]，华中理工大学学报，2000(4)，66-68
    [23] 张本祥，孙博文．非线性方法论[M]，哈尔滨：哈尔滨出版社，1997
    [24] 浙江大学数学系．概率论与数理统计[M]，北京：科学出版社，1992


    [25] 胡杨宇，李然等．城市电网故障诊断系统．继电器，2000，30(12)，28-29
    [26] 段振国，高曙等．一种电网故障智能诊断求解模型的研究[J]．中国电机工程学报，1997，17(6)：399-402

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700