知识系统中全粒度粗糙集及概念漂移的研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Study on Entire-Granulation Rough Sets and Concept Drifting in a Knowledge System
  • 作者:邓大勇 ; 卢克文 ; 苗夺谦 ; 黄厚宽
  • 英文作者:DENG Da-Yong;LU Ke-Wen;MIAO Duo-Qian;HUANG Hou-Kuan;College of Mathematics,Physics and Information Engineering,Zhejiang Normal University;Xingzhi College,Zhejiang Normal University;School of Electronics and Information Engineering,Tongji University;School of Computer and Information Technology,Beijing Jiaotong University;
  • 关键词:全粒度粗糙集 ; 概念漂移 ; 偏序关系 ; 概念耦合 ; 上、下近似
  • 英文关键词:entire-granulation rough sets;;concept drifting;;partial ordering relation;;concept coupling;;upper and lower approximation
  • 中文刊名:JSJX
  • 英文刊名:Chinese Journal of Computers
  • 机构:浙江师范大学数理与信息工程学院;浙江师范大学行知学院;同济大学电子与信息工程学院;北京交通大学计算机与信息技术学院;
  • 出版日期:2016-11-29 12:46
  • 出版单位:计算机学报
  • 年:2019
  • 期:v.42;No.433
  • 基金:国家自然科学基金项目(61473030,61572442,61203247,61273304,61573259,61472166);; 浙江省自然科学基金项目(LY15F020012)资助
  • 语种:中文;
  • 页:JSJX201901007
  • 页数:13
  • CN:01
  • ISSN:11-1826/TP
  • 分类号:87-99
摘要
概念漂移探测是数据流挖掘的一个研究重点,不确定性分析是粗糙集理论的研究核心之一.大数据、数据流中存在不确定变化和概念漂移现象,但是,除F-粗糙集外,几乎所有的粗糙集模型都是静态模型或半动态模型,专注于各种不确定性研究,难以处理不确定性变化,也难以探测概念漂移.结合量子计算、数据流、概念漂移和粗糙集、F-粗糙集的基本观点,以上、下近似为工具,定义了知识系统中的全粒度粗糙集和上、下近似概念漂移,上、下近似概念耦合等概念,探讨了全粒度粗糙集的性质,分析了知识系统内概念的全局变化.全粒度粗糙集继承了Pawlak粗糙集和F-粗糙集的基本思想,以上、下近似簇为工具表示了概念在知识系统内的各种可能变化.用嵌套哈斯图表示了概念不同情况下的同一性和差异性:同一层内的表示没有发生概念漂移,不同层内的表示发生了概念漂移.以正区域为工具,定义了决策表中的全粒度正区域和概念漂移、概念耦合等概念,探究了全粒度正区域的性质,分析了决策表内整体概念的全局变化.全粒度正区域表示了决策表中各种可能情况下的正区域,用嵌套哈斯图表示了正区域簇的同一性和差异性:同一层内没有发生相对于正区域的概念漂移,不同层内发生了相对于正区域的概念漂移.在全粒度粗糙集意义下,定义了全粒度绝对约简、全粒度值约简、全粒度Pawlak约简等属性约简,并探讨其性质.与大部分的属性约简不同(仅仅与并行约简和多粒度约简类似),全粒度属性约简要求概念的所有可能表示不发生概念漂移.进一步探讨了属性约简的优缺点,属性约简使得概念的表示变得单一,冗余属性的存在增加了概念表示的丰富性、多样性.在认识论方面,以粗糙集和粒计算为工具分析了人类认识世界的局部性与全局性,对人类认识世界的方式进行了进一步探讨.全粒度粗糙集在一定意义下能够表示人类认识的复杂性、不确定性、多样性、层次性和动态性,在量子计算的帮助下能够从一个粒度转跳到另一个粒度并且毫无困难.全粒度粗糙集的研究及其中的概念漂移探测为各种条件下的概念漂移探测和人类智能的模拟提供了有益的启示.
        Concept drifting detection is one of the hot topics in data stream mining,and analysis of uncertainty is dominant in rough set theory.There exist the change of uncertainty and concept drifting in big data and data stream.However,except for F-rough sets,almost all of rough setmodels are static models or semi-dynamic models,which study on vagueness and uncertainty.It is hard for them to deal with the change of uncertainty,and to detect concept drifting.Combined with the ideas of quantum computing,data stream,concept drifting,rough sets and F-rough sets,a rough set model for entire granulations(called entire-granulation rough sets)is presented,and a lot of concepts,such as concept drifting of upper approximation,concept drifting of lower approximation,coupling of upper approximation and coupling of lower approximation,etc.are defined.The properties of entire-granulation rough sets are investigated,and the change of uncertainty for a concept in a knowledge system is analyzed with these definitions.Entiregranulation rough sets inherit the basic ideas of Pawlak rough sets and F-rough sets,which describe all of the changes of uncertainty for a concept with a family of upper approximations and lower approximations.Embedded Hasse diagram is employed to express the identity and diversity for a concept in different cases:There exists no concept drifting for the same level of concept expressions but exists concept drifting for the different levels of concept expressions.With the positive region,the positive region for entire granulations is defined,and concept drifting,concept coupling are defined in a decision system.The properties of entire-granulation positive region are discussed,and the analysis and measurement for the change of concept uncertainty are conducted.Entire-granulation positive region expresses all of the positive regions in various cases in a decision system.Embedded Hasse diagram is also employed to express the identity and diversity for the family of positive regions:There exists no concept drifting relative to positive region for the same level of concepts,but exists concept drifting relative to positive region for different levels of concepts.In entire granulation rough sets,entire-granulation absolute reducts,entire-granulation value reducts and entire-granulation Pawlak reducts are defined,and their properties are investigated.Not like most types of attribute reducts(just like parallel reducts and mutil-granulation conditional attribute reducts),entire-granulation conditional attribute reducts ask for no concept drifting for all of concept expressions.The advantages and faults of conditional attribute reduction are further investigated:The unicity of concept expressions is done when condition attribute reduct is conducted,while the redundant conditional attributes can make concept expression more diversified.From the viewpoints of epistemology,the wholeness and locality of human thinking are further analyzed with granular computing and rough sets.To some extent,entire-granulation rough sets can express complexity,uncertainty,diversity,hierarchy and dynamic in the process of human cognition.With the help of quantum computing,the model of entire-granulation rough sets can transform one type of granulation to another fluently.The study on entire-granulation rough sets and concept drifting detection among them can provide heuristic information for various concept drifting detection and simulation of human intelligence.
引文
[1]Babcock B,Babu S,Dater M,et al.Models and issues in data stream systems//Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles Database Systems.New York,USA,2002:1-30
    [2]Wang Tao,Li Zhou-Jun,Yan Yue-Jin,et al.A survey of classification of data streams.Journal of Computer Research and Development,2007,44(11):1809-1815(in Chinese)(王涛,李舟军,颜跃进等.数据流挖掘分类技术综述.计算机研究与发展,2007,44(11):1809-1815)
    [3]Xu Wen-Hua,Qin Zheng,Chang Yang.Semi-supervised learning based ensemble classifier for stream data.Pattern Recognition and Artificial Intelligence,2012,25(2):292-299(in Chinese)(徐文华,覃征,常扬.基于半监督学习的数据流集成分类算法.模式识别与人工智能,2012,25(2):292-299)
    [4]Du L,Song Q,Jia X.Detecting concept drift:An information entropy based method using an adaptive sliding widow.Intelligent Data Analysis,2014,18(3):337-364
    [5]Yeon K,Song M S,Kim Y,et al.Model averaging via penalized regression for tracking concept drift.Journal of Computational&Graphical Statistics,2012,19(19):457-473
    [6]Mirza B,Lin Z,Liu N.Ensemble of subset online sequential extreme learning machine for class imbalance and concept drift.Neurocomputing,2015,149(PA):316-329
    [7]Sun Xue,Li Kun-Lun,Han Lei,et al.Construction of the concept drift detection model based on the information entropy of feature distribution and dynamic weighting algorithm.Acta Eletronica Sinica,2015,43(7):1356-1361(in Chinese)(孙雪,李昆仑,韩蕾等.基于特征项分布的信息熵及特征动态加权概念漂移检测模型.电子学报,2015,43(7):1356-1361)
    [8]Hobbs J R.Granularity//Proceedings of the 9th International Joint Conference on Artificial Intelligence.Los Angeles,USA,1985:432-435
    [9]Lin T Y.Granular computing.Announcement of the BASICSpecial Interest Group on Granular Computing.California,USA:Berkeley,1997
    [10]Zadel L A.Fuzzy sets.Information and Control,1965,8(3):338-353
    [11]Pawlak Z.Rough sets.International Journal of Computer and Information Sciences,1982,11(5):341-356
    [12]Pawlak Z.Rough Sets-Theoretical Aspect of Reasoning about Data.Dordrecht,Holland:Kluwer Academic Publishers,1991
    [13]Wang Guo-Yin.Rough Set Theory and Knowledge Acquisition.Xi’an:Xi’an Jiaotong University Press,2001(in Chinese)(王国胤.Rough集理论与知识获取.西安:西安交通大学出版社,2001)
    [14]Zhang Bo,Zhang Ling.Theories and Applications for Problem Solving.Beijing:Tsinghua University Press,1990(in Chinese)(张钹,张铃.问题求解理论及应用.北京:清华大学出版社,1990)
    [15]Li De-Yi,Meng Hai-Jun,Shi Xue-Mei.Membership clouds and Membership cloud generators.Journal of Computer Research and Development,1995,32(6):16-18(in Chinese)(李德毅,孟海军,史雪梅.隶属云和隶属云发生器.计算机研究与发展,1995,32(6):16-18)
    [16]Deng Da-Yong,Chen Lin.Parallel reducts and F-rough sets//Miao Duo-Qian,Wang Guo-Yin,Yao Yi-Yu,et al,eds.Cloud Model and Granular Computing.Beijing:Science Press,2012:210-228(in Chinese)(邓大勇,陈林.并行约简与F-粗糙集//苗夺谦,王国胤,姚一豫等编.云模型与粒计算.北京:科学出版社,2012:210-228)
    [17]Chen Lin.Parallel Reducts and Decision in Various Levels of Granularity[M.S.dissertation].Zhejiang Normal University,Jinhua,Zhejiang,2013(in Chinese)(陈林.粗糙集中不同粒度层次下的并行约简及决策[硕士学位论文].浙江师范大学,浙江,金华,2013)
    [18]Cao Fuyuan,Huang Joshua Zhexue.A concept-drifting detection algorithm for categorical evolving data//Proceedings of the 17th Pacific-Asia Conf on Knowledge Discovery and Data Mining.Berlin,Germany,2013:485-496
    [19]Deng Da-Yong,Pei Ming-Hua,Huang Hou-Kuan.The F-rough sets approaches to the measures of concept drift.Journal of Zhejiang Normal University:Natural Sciences,2013,36(3):303-308(in Chinese)(邓大勇,裴明华,黄厚宽.F-粗糙集方法对概念漂移的度量.浙江师范大学学报:自然科学版,2013,36(3):303-308)
    [20]Deng Da-Yong,Xu Xiao-Yu,Huang Hou-Kuan.Concept drifting detection for categorical evolving data based on parallel reducts.Journal of Computer Research and Development,2015,52(5):1071-1079(in Chinese)(邓大勇,徐小玉,黄厚宽.基于并行约简的概念漂移探测.计算机研究与发展,2015,52(5):1071-1079)
    [21]Deng Da-Yong,Miao Duo-Qian,Huang Hou-Kuan.Analysis of concept drifting and uncertainty in an information system.Journal of Computer Research and Development,2016,53(11):2607-2612(in Chinese)(邓大勇,苗夺谦,黄厚宽.信息表中概念漂移与不确定性分析.计算研究与发展,2016,53(11):2607-2612)
    [22]Pawlak Z,Skowron A.Rough membership functions//Yager R R,Fedrizzy M,Kacprzyk J eds.Adances in the Dempster Shafer Theory of Evidence.New York,USA:John Wiley,1994:251-271
    [23]Miao Duo-Qian,Hu Gui-Rong.A heuristic algorithm for reduction of knowledge.Journal of Computer Research and Development,1999,36(6):681-684(in Chinese)(苗夺谦,胡桂荣.知识约简的一种启发式算法.计算机研究与发展,1999,36(6):681-684)
    [24]Wang Guo-Yin,Yu Hong,Yang Da-Chun.Decision table reduction on conditional information entropy.Chinese Journal of Computers,2002,25(7):759-766(in Chinese)(王国胤,于洪,杨大春.基于条件信息熵的决策表约简.计算机学报,2002,25(7):759-766)
    [25]Yang Ming.Approximate reduction based on conditional information entropy in decision tables.Acta Eletronica Sinica,2007,35(11):2156-2160(in Chinese)(杨明.决策表中基于条件信息熵的近似约简.电子学报,2007,35(11):2156-2160)
    [26]Liang J Y,Chin K S,Dang C Y.A new method for measuring uncertainty and fuzziness in rough set theory.International Journal of General Systems,2002,31(4):331-342
    [27]Liang Ji-Ye,Li De-Yu.Uncertainty and Knowledge Acquisition in Information Systems.Beijing:Science Press,2005(in Chinese)(梁吉业,李德玉.信息系统中的不确定性与知识获取.北京:科学出版社,2005)
    [28]Wang Guo-Yin,Zhang Qing-Hua.Uncertanty of rough sets in different knowledge granularities.Chinese Journal of Computers,2008,31(9):1588-1598(in Chinese)(王国胤,张清华.不同知识粒度下粗糙集的不确定性研究.计算机学报,2008,31(9):1588-1598)
    [29]Lin Jia-Yi,Peng Hong,Zheng Qi-Lun.A new algorithm for value reduction based on rough set.Computer Engineering,2003,29(4):70-71(in Chinese)(林嘉宜,彭宏,郑启伦.一种新的基于粗糙集的值约简算法.计算机工程,2003,29(4):70-71)
    [30]Qian Y H,Liang J Y,Pedrycz W,et al.Positive approximation:An accelerator for attribute reduction in rough set theory.Artificial Intelligence,2010,174:597-618

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700