基于邻域粗糙隶属函数的离群点检测
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Outlier detection based on neighborhood rough membership functions
  • 作者:杨晓玲 ; 张贤勇
  • 英文作者:YANG Xiao-ling;ZHANG Xian-yong;College of Mathematics and Software Science,Sichuan Normal University;Institute of Intelligent Information and Quantum Information,Sichuan Normal University;
  • 关键词:离群点检测 ; 邻域粗糙集 ; 粗糙隶属函数 ; 混合型属性数据 ; 数据挖掘
  • 英文关键词:outlier detection;;neighborhood rough set;;rough membership function;;hybrid attribute data;;data mining
  • 中文刊名:SJSJ
  • 英文刊名:Computer Engineering and Design
  • 机构:四川师范大学数学与软件科学学院;四川师范大学智能信息与量子信息研究所;
  • 出版日期:2019-02-16
  • 出版单位:计算机工程与设计
  • 年:2019
  • 期:v.40;No.386
  • 基金:国家自然科学基金项目(61673285、61203285);; 四川省青年科技基金项目(2017JQ0046);; 四川省教育厅科研基金项目(15ZB0028)
  • 语种:中文;
  • 页:SJSJ201902041
  • 页数:7
  • CN:02
  • ISSN:11-1775/TP
  • 分类号:240-246
摘要
针对传统粗糙集的离群点检测方法难以处理数值型属性数据的问题,提出基于邻域粗糙隶属函数的离群点检测方法,其适用于数据包括数值型、符号型与混合型。基于混合型距离与自适应半径,定义邻域粗糙隶属函数刻画对象离群程度,构建邻域粗糙离群因子实施离群点检测,设计相应的离群点检测算法NRMFOD。UCI数据对比实验结果表明,NRMFOD算法具有有效性,优于3种常用检测算法(RMF、RBD、DIS算法)。
        The outlier detection method based on classical rough sets is difficult to deal with numerical attribute data.Aiming at this problem,the outlier detection based on neighborhood rough membership functions was proposed to effectively apply to the numerical,symbolic and hybrid attribute data.Based on the mixed distance and adaptive radius,the neighborhood membership function was defined to describe the object's outlier degree,the neighborhood outlier factors were constructed to implement the outlier detection,and the corresponding outlier detection algorithm NRMFOD was designed.According to comparative experiments of UCI data,the NRMFOD algorithm is effective and is superior to three usual detection algorithms(i.e.,RMF,RBD,DIS).
引文
[1]Xue ZX,Shang YL,Feng AF.Semi-supervised outlier detection based on fuzzy rough C-means clustering[J].Mathematics&Computers in Simulation,2010,80(9):1911-1921.
    [2]Han JW,Kamber M,Pei J.Data mining:Concepts and techniques[M].San Francisco:Morgan Kaufmann,2011.
    [3]Wu DF.A regression sequences based method for high dimensional outlier detection[J].Journal of Discrete Mathematical Sciences&Cryptography,2017,20(4):931-943.
    [4]Cao L,Yan YZ,Kuhlman C,et al. Multi-tactic distancebased outlier detection[C]//IEEE 33rd International Conference on Data Engineering.IEEE,2017:959-970.
    [5]Tang XQ,Zhu P.Hierarchical clustering problems and analysis of fuzzy proximity relation on granular space[J].IEEE Transactions on Fuzzy Systems,2013,21(5):814-824.
    [6]Jiang F,Sui YF,Cao CG.A rough set approach to outlier detection[J].International Journal of General Systems,2008,37(5):519-536.
    [7]Chen YM,Miao DQ,Zhang HY.Neighborhood outlier detection[J].Expert Systems with Applications,2010,37(12):8745-8749.
    [8]Jiang F,Chen YM.Outlier detection based on granular computing and rough set theory[J].Applied Intelligence,2015,42(2):303-322.
    [9]Jiang F,Sui YF,Cao CG.Outlier detection using rough set theory[C]//Rough Sets,Fuzzy Sets,Data Mining,and Granular Computing,10th International Conference,2005:79-87.
    [10]Hu QH,Yu DR,Xie ZX.Neighborhood classifiers[J].Expert Systems with Applications an International Journal,2008,34(2):866-876.
    [11]Ge X,Wang P,Yun Z.The rough membership functions on four types of covering-based rough sets and their applications[J].Information Sciences,2017,390:1-14.
    [12]Zheng TT,Zhu LY.Uncertainty measures of neighborhood system-based rough sets[J]. Knowledge-Based Systems,2015,86:57-65.
    [13]SI Jianhui.Offline handwritten Chinese character segmentation based on neighborhood-covering[D]. Hebei:Hebei University,2009(in Chinese).[司建辉.基于邻域覆盖的脱机手写体汉字切分[D].河北:河北大学,2009.]
    [14]Ro K,Zou C,Wang Z,et al.Outlier detection for high dimensional data[J].ACM Sigmod Record,2001,30(2):37-46.
    [15]Bay SD.The UCI KDDN repository[DB/OL].[2011-10-15].http://kdd.Ics.Uci.edu.
    [16]Harkin S,He HX, Williams GJ,et al.Outlier detection using replicator neural networks[C]//Proc of the 4th Int Conf on Data Warehousing and Knowledge Discovery.AixenProvence:Springer-Verlag,2002:170-180.