摘要
针对传统粗糙集的离群点检测方法难以处理数值型属性数据的问题,提出基于邻域粗糙隶属函数的离群点检测方法,其适用于数据包括数值型、符号型与混合型。基于混合型距离与自适应半径,定义邻域粗糙隶属函数刻画对象离群程度,构建邻域粗糙离群因子实施离群点检测,设计相应的离群点检测算法NRMFOD。UCI数据对比实验结果表明,NRMFOD算法具有有效性,优于3种常用检测算法(RMF、RBD、DIS算法)。
The outlier detection method based on classical rough sets is difficult to deal with numerical attribute data.Aiming at this problem,the outlier detection based on neighborhood rough membership functions was proposed to effectively apply to the numerical,symbolic and hybrid attribute data.Based on the mixed distance and adaptive radius,the neighborhood membership function was defined to describe the object's outlier degree,the neighborhood outlier factors were constructed to implement the outlier detection,and the corresponding outlier detection algorithm NRMFOD was designed.According to comparative experiments of UCI data,the NRMFOD algorithm is effective and is superior to three usual detection algorithms(i.e.,RMF,RBD,DIS).
引文
[1]Xue ZX,Shang YL,Feng AF.Semi-supervised outlier detection based on fuzzy rough C-means clustering[J].Mathematics&Computers in Simulation,2010,80(9):1911-1921.
[2]Han JW,Kamber M,Pei J.Data mining:Concepts and techniques[M].San Francisco:Morgan Kaufmann,2011.
[3]Wu DF.A regression sequences based method for high dimensional outlier detection[J].Journal of Discrete Mathematical Sciences&Cryptography,2017,20(4):931-943.
[4]Cao L,Yan YZ,Kuhlman C,et al. Multi-tactic distancebased outlier detection[C]//IEEE 33rd International Conference on Data Engineering.IEEE,2017:959-970.
[5]Tang XQ,Zhu P.Hierarchical clustering problems and analysis of fuzzy proximity relation on granular space[J].IEEE Transactions on Fuzzy Systems,2013,21(5):814-824.
[6]Jiang F,Sui YF,Cao CG.A rough set approach to outlier detection[J].International Journal of General Systems,2008,37(5):519-536.
[7]Chen YM,Miao DQ,Zhang HY.Neighborhood outlier detection[J].Expert Systems with Applications,2010,37(12):8745-8749.
[8]Jiang F,Chen YM.Outlier detection based on granular computing and rough set theory[J].Applied Intelligence,2015,42(2):303-322.
[9]Jiang F,Sui YF,Cao CG.Outlier detection using rough set theory[C]//Rough Sets,Fuzzy Sets,Data Mining,and Granular Computing,10th International Conference,2005:79-87.
[10]Hu QH,Yu DR,Xie ZX.Neighborhood classifiers[J].Expert Systems with Applications an International Journal,2008,34(2):866-876.
[11]Ge X,Wang P,Yun Z.The rough membership functions on four types of covering-based rough sets and their applications[J].Information Sciences,2017,390:1-14.
[12]Zheng TT,Zhu LY.Uncertainty measures of neighborhood system-based rough sets[J]. Knowledge-Based Systems,2015,86:57-65.
[13]SI Jianhui.Offline handwritten Chinese character segmentation based on neighborhood-covering[D]. Hebei:Hebei University,2009(in Chinese).[司建辉.基于邻域覆盖的脱机手写体汉字切分[D].河北:河北大学,2009.]
[14]Ro K,Zou C,Wang Z,et al.Outlier detection for high dimensional data[J].ACM Sigmod Record,2001,30(2):37-46.
[15]Bay SD.The UCI KDDN repository[DB/OL].[2011-10-15].http://kdd.Ics.Uci.edu.
[16]Harkin S,He HX, Williams GJ,et al.Outlier detection using replicator neural networks[C]//Proc of the 4th Int Conf on Data Warehousing and Knowledge Discovery.AixenProvence:Springer-Verlag,2002:170-180.