基于方形邻域和裁剪因子的离群点检测方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Square Neighborhood and Pruning Factor Based Outlier Detection Algorithm
  • 作者:涂晓敏 ; 石鸿雁
  • 英文作者:TU Xiao-min;SHI Hong-yan;Shenyang University of Technology;
  • 关键词:数据挖掘 ; 离群点 ; 方形邻域 ; 裁剪因子 ; 局部稀疏指数
  • 英文关键词:data mining;;outliers;;square neighborhood;;pruning factor;;local sparse index
  • 中文刊名:XXWX
  • 英文刊名:Journal of Chinese Computer Systems
  • 机构:沈阳工业大学;
  • 出版日期:2019-01-15
  • 出版单位:小型微型计算机系统
  • 年:2019
  • 期:v.40
  • 基金:国家自然科学基金项目(61074005)资助
  • 语种:中文;
  • 页:XXWX201901036
  • 页数:4
  • CN:01
  • ISSN:21-1106/TP
  • 分类号:188-191
摘要
针对改进的局部稀疏系数(Enhanced Local Sparsity Coefficient,简称ELSC)算法在邻域查询过程中存在的不足,以及为了提高算法查准率,提出了一种基于方形邻域和裁剪因子的离群点检测算法.首先采用方形邻域,吸取网格算法的思想,以扩张的方形邻域代替网格分割,快速地排除聚类点,避免了网格算法的"维灾"问题.其次为了提高算法的精确度,引入裁剪因子的概念对候选离群点集进行精选.最后通过新定义的局部稀疏指数确定离群点.试验测试表明,该算法的执行效率与检测精度均优于ELSC算法.
        In viewof the shortcomings of enhanced local sparsity coefficient( Enhanced Local Sparsity Coefficient,called ELSC) algorithm in the process of neighborhood query,to improve the accuracy of the algorithm,this paper proposed a algorithm which square neighborhood and pruning factor based outlier detection algorithm. First of all,algorithm applied to the square neighborhood,which absorbs the idea of grid based algorithm,eliminates the normal points with dense square neighborhood rapidly; the algorithm partitioned dataset with square neighborhood,not with spatial girds,and overcomes the"dimension disaster"based on grid algorithm. Secondly,the identify accuracy could be improved within the novel pruning factor,which is used for the selection of candidate outlier points. In the end,the newly defined local sparse index could determine outlier data objects. Experimental result shows the algorithm is not only efficient in detection accuracy but also more effective than ELSC in the computation.
引文
[1]Han Jia-wei,Micheling Kamber. Concepts and technologies of data mining[M]. Beijing:Mechanical Industry Press,2014.
    [2]Huang Hong-yu,Lin Jia-xiang,Chen Chong-cheng,et al. Reviewof outlier detection[J]. Application Research of Computers,2006,23(8):8-13.
    [3]Xue An-rong,Ju Shi-guang,He Wei-hua,et al. Research on local outlier mining algorithm[J]. Journal of Computer Science,2007,30(8):1455-1463.
    [4]Guan Hao-wen. Anomaly detection of medical insurance based on outlier detection[D]. Jinan:Shandong University,2016.
    [5]Li Xiao-hui. Research on video anomaly detection algorithm based on constrained sparse representation[D]. Changchun:Northeast Normal University,2016.
    [6]Qin Hao. Density based local outlier mining and its application in intrusion detection[D]. Dalian:Dalian Maritime University,2016.
    [7]Breunig MM,Kriegel H P,Ng R T,et al. LOF:identifying densitybased local outliers[C]. Proc of SIGMOD'00,Dallas,2000:427-438.
    [8]Agyemang M,Ezeife C I. Lsc-mine:algorithm for mining local outliers[C]. NewOrleans,2004:5-8.
    [9]Agyemang Malik. Local sparsity coefficient based mining of outliers[D]. Windsor Ontario:University of Windsor,2003.
    [10] Jia Chen-ke,Qiu Bao-zhi. Outlier mining based on local isolation coefficient[J]. Microcomputer Information,2005,12(36):107-109.
    [11] Zhou Yun-feng. Research and application of density based local outlier detection algorithm[D]. Wuhan:Huazhong Normal University,2016.
    [12]Zhao Xin-xiang. Research and improvement of density based local outlier detection algorithm[D]. Wuhan:Huazhong Normal University,2014.
    [13]Li Zong-lin,Luo Ke. Adaptive determination of parameters in DBSCAN algorithm[J]. Computer Engineering and Application,2016,52(3):70-73.
    [14] Zhang Y,Wang X,Li B,et al. Dboost:a fast algorithm for DBSCAN-based clustering on high dimensional data[M]. Advances in Knowledge Discovery and Data Mining,Springer International Publishing,2016.
    [15] Huang Tian-qiang,Qin Xiao-lin,Ye Fei-yue. A newmethod for finding outliers based on square neighborhood[J]. Control and Decision,2006,21(5):541-545.
    [16]Zhang Tian-you. Research on outlier detection algorithm of high dimensional big data set based on grid partitioning[D]. Changsha:Central South University,2011.
    [17] Tao Yun-xin,Pi Chang-de. Outlier detection algorithm based on neighborhood and density[J]. Journal of Jilin University:Information Science Edition,2008,26(4):398-403.
    [18]Jie Cai-ming. Analysis and research on density based local outlier detection algorithm[D]. Chongqing:Chongqing University,2012.
    [1]Han Jia-wei,Micheling Kamber.数据挖掘概念与技术[M].北京:机械工业出版社,2014.
    [2]黄洪宇,林甲祥,陈崇成,等.离群数据挖掘综述[J].计算机应用研究,2006,23(8):8-13.
    [3]薛安荣,鞠时光,何伟华,等.局部离群点挖掘算法研究[J].计算机学报,2007,30(8):1455-1463.
    [4]关皓文.基于离群点检测方法的医保异常发现[D].济南:山东大学,2016.
    [5]李晓惠.基于约束稀疏表示的视频异常检测算法研究[D].长春:东北师范大学,2016.
    [6]秦浩.基于密度的局部离群点挖掘及在入侵检测中应用研究[D].大连:大连海事大学,2016.
    [10]贾晨科,邱保志.基于局部孤立系数的孤立点挖掘[J].微计算机信息,2005,12(36):107-109.
    [11]周云锋.基于密度的局部离群点检测算法的研究与应用[D].武汉:华中师范大学,2016.
    [12]赵新想.基于密度的局部离群点检测算法的研究与改进[D].武汉:华中师范大学,2014.
    [13]李宗林,罗可. DBSCAN算法中参数的自适应确定[J].计算机工程与应用,2016,52(3):70-73.
    [15]黄添强,秦小麟,叶飞跃.基于方形邻域的离群点查找新方法[J].控制与决策,2006,21(5):541-545.
    [16]张天佑.基于网格划分的高维大数据集离群点检测算法研究[D].长沙:中南大学,2011.
    [17]陶运信,皮德常.基于邻域和密度的异常点检测算法[J].吉林大学学报:信息科学版,2008,26(4):398-403.
    [18]揭财明.基于密度的局部离群点检测算法分析与研究[D].重庆:重庆大学,2012.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700