摘要
针对改进的局部稀疏系数(Enhanced Local Sparsity Coefficient,简称ELSC)算法在邻域查询过程中存在的不足,以及为了提高算法查准率,提出了一种基于方形邻域和裁剪因子的离群点检测算法.首先采用方形邻域,吸取网格算法的思想,以扩张的方形邻域代替网格分割,快速地排除聚类点,避免了网格算法的"维灾"问题.其次为了提高算法的精确度,引入裁剪因子的概念对候选离群点集进行精选.最后通过新定义的局部稀疏指数确定离群点.试验测试表明,该算法的执行效率与检测精度均优于ELSC算法.
In viewof the shortcomings of enhanced local sparsity coefficient( Enhanced Local Sparsity Coefficient,called ELSC) algorithm in the process of neighborhood query,to improve the accuracy of the algorithm,this paper proposed a algorithm which square neighborhood and pruning factor based outlier detection algorithm. First of all,algorithm applied to the square neighborhood,which absorbs the idea of grid based algorithm,eliminates the normal points with dense square neighborhood rapidly; the algorithm partitioned dataset with square neighborhood,not with spatial girds,and overcomes the"dimension disaster"based on grid algorithm. Secondly,the identify accuracy could be improved within the novel pruning factor,which is used for the selection of candidate outlier points. In the end,the newly defined local sparse index could determine outlier data objects. Experimental result shows the algorithm is not only efficient in detection accuracy but also more effective than ELSC in the computation.
引文
[1]Han Jia-wei,Micheling Kamber. Concepts and technologies of data mining[M]. Beijing:Mechanical Industry Press,2014.
[2]Huang Hong-yu,Lin Jia-xiang,Chen Chong-cheng,et al. Reviewof outlier detection[J]. Application Research of Computers,2006,23(8):8-13.
[3]Xue An-rong,Ju Shi-guang,He Wei-hua,et al. Research on local outlier mining algorithm[J]. Journal of Computer Science,2007,30(8):1455-1463.
[4]Guan Hao-wen. Anomaly detection of medical insurance based on outlier detection[D]. Jinan:Shandong University,2016.
[5]Li Xiao-hui. Research on video anomaly detection algorithm based on constrained sparse representation[D]. Changchun:Northeast Normal University,2016.
[6]Qin Hao. Density based local outlier mining and its application in intrusion detection[D]. Dalian:Dalian Maritime University,2016.
[7]Breunig MM,Kriegel H P,Ng R T,et al. LOF:identifying densitybased local outliers[C]. Proc of SIGMOD'00,Dallas,2000:427-438.
[8]Agyemang M,Ezeife C I. Lsc-mine:algorithm for mining local outliers[C]. NewOrleans,2004:5-8.
[9]Agyemang Malik. Local sparsity coefficient based mining of outliers[D]. Windsor Ontario:University of Windsor,2003.
[10] Jia Chen-ke,Qiu Bao-zhi. Outlier mining based on local isolation coefficient[J]. Microcomputer Information,2005,12(36):107-109.
[11] Zhou Yun-feng. Research and application of density based local outlier detection algorithm[D]. Wuhan:Huazhong Normal University,2016.
[12]Zhao Xin-xiang. Research and improvement of density based local outlier detection algorithm[D]. Wuhan:Huazhong Normal University,2014.
[13]Li Zong-lin,Luo Ke. Adaptive determination of parameters in DBSCAN algorithm[J]. Computer Engineering and Application,2016,52(3):70-73.
[14] Zhang Y,Wang X,Li B,et al. Dboost:a fast algorithm for DBSCAN-based clustering on high dimensional data[M]. Advances in Knowledge Discovery and Data Mining,Springer International Publishing,2016.
[15] Huang Tian-qiang,Qin Xiao-lin,Ye Fei-yue. A newmethod for finding outliers based on square neighborhood[J]. Control and Decision,2006,21(5):541-545.
[16]Zhang Tian-you. Research on outlier detection algorithm of high dimensional big data set based on grid partitioning[D]. Changsha:Central South University,2011.
[17] Tao Yun-xin,Pi Chang-de. Outlier detection algorithm based on neighborhood and density[J]. Journal of Jilin University:Information Science Edition,2008,26(4):398-403.
[18]Jie Cai-ming. Analysis and research on density based local outlier detection algorithm[D]. Chongqing:Chongqing University,2012.
[1]Han Jia-wei,Micheling Kamber.数据挖掘概念与技术[M].北京:机械工业出版社,2014.
[2]黄洪宇,林甲祥,陈崇成,等.离群数据挖掘综述[J].计算机应用研究,2006,23(8):8-13.
[3]薛安荣,鞠时光,何伟华,等.局部离群点挖掘算法研究[J].计算机学报,2007,30(8):1455-1463.
[4]关皓文.基于离群点检测方法的医保异常发现[D].济南:山东大学,2016.
[5]李晓惠.基于约束稀疏表示的视频异常检测算法研究[D].长春:东北师范大学,2016.
[6]秦浩.基于密度的局部离群点挖掘及在入侵检测中应用研究[D].大连:大连海事大学,2016.
[10]贾晨科,邱保志.基于局部孤立系数的孤立点挖掘[J].微计算机信息,2005,12(36):107-109.
[11]周云锋.基于密度的局部离群点检测算法的研究与应用[D].武汉:华中师范大学,2016.
[12]赵新想.基于密度的局部离群点检测算法的研究与改进[D].武汉:华中师范大学,2014.
[13]李宗林,罗可. DBSCAN算法中参数的自适应确定[J].计算机工程与应用,2016,52(3):70-73.
[15]黄添强,秦小麟,叶飞跃.基于方形邻域的离群点查找新方法[J].控制与决策,2006,21(5):541-545.
[16]张天佑.基于网格划分的高维大数据集离群点检测算法研究[D].长沙:中南大学,2011.
[17]陶运信,皮德常.基于邻域和密度的异常点检测算法[J].吉林大学学报:信息科学版,2008,26(4):398-403.
[18]揭财明.基于密度的局部离群点检测算法分析与研究[D].重庆:重庆大学,2012.