一种基于谱嵌入和局部密度的离群点检测算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Outlier Detection Algorithm Based on Spectral Embedding and Local Density
  • 作者:李长镜 ; 赵书良 ; 池云仙
  • 英文作者:LI Chang-jing;ZHAO Shu-liang;CHI Yun-xian;College of Mathematics and Information Science,Hebei Normal University;College of Resources and Environmental Science,Hebei Normal University;
  • 关键词:离群点检测 ; 谱嵌入 ; 局部密度 ; 迭代策略 ; 相似度图 ; 检测精度
  • 英文关键词:Outlier detection;;Spectral embedding;;Local density;;Iterative strategy;;Similarity graph;;Detection accuracy
  • 中文刊名:JSJA
  • 英文刊名:Computer Science
  • 机构:河北师范大学数学与信息科学学院;河北师范大学资源与环境科学学院;
  • 出版日期:2019-03-15
  • 出版单位:计算机科学
  • 年:2019
  • 期:v.46
  • 基金:国家自然科学基金(71271067);; 国家社科基金重大项目(13&ZD091);; 河北省高等学校科学技术研究项目(QN2014196)资助
  • 语种:中文;
  • 页:JSJA201903039
  • 页数:7
  • CN:03
  • ISSN:50-1075/TP
  • 分类号:266-272
摘要
离群点检测问题是数据挖掘领域的研究热点之一。现有的检测算法主要应用于离群点位于初始属性子空间或底层子空间各种线性组合等情况,当离群点嵌入局部非线性子空间时,进行离群点有效检测的难度很大。为此,文中分析了典型的谱嵌入算法在离群点检测上存在的不足,然后以局部密度为基础,提出了一种基于谱嵌入和局部密度的离群点检测算法。该算法采用迭代策略对不重要的特征向量进行高效筛查,以发现有助于检测出局部非线性子空间离群点的特征向量,并利用上一次迭代获得的基于局部密度的谱嵌入结果来改进下一次迭代的相似度图,经过多次迭代可以将离群点从正常点中分离。仿真实验结果表明,所提算法的检测精度优于当前其他典型算法,且该算法对参数的设置不敏感。
        Outlier detection is one of the hot topics in the field of data mining.The existing detection algorithms are mainly applied to the cases where outliers lie in initial attribute subspace or various linear combinations of underlying subspace,when the outliers are embedded in local nonlinear subspace,it is very difficult to detect the outliers effectively.To solve this problem,the shortcomings of typical spectral embedding algorithm for outlier detection were firstly analyzed,and then on the basis of local density,an outlier detection algorithm based on spectral embedding and local density was proposed.The algorithm which uses iterative strategy can efficiently screen unimportant eigenvectors and discover eigenvectors that are relevant for finding outliers hidden in local non-linear subspaces,and the local densitybased spectral embedding from a previous iteration is used for improving the similarity graph for the next iteration,such that outliers are gradually segregated from inliers during these iterations.The simulation results show that the detection accuracy of the proposed algorithm is better than other typical algorithms,and it is not sensitive to the parameter setting.
引文
[1]RAHMANI M,ATIA G K.Randomized robust subspace recovery and outlier detection for high dimensional data matrices[J].IEEE Transactions on Signal Processing,2017,65(6):1580-1594.
    [2]FAN F F,LI Z H,CHEN Q,et al.An Outlier-detection Based Approach for Automatic Entity Matching[J].Chinese Journal of Computers,2017,40(10):2197-2211.(in Chinese)樊峰峰,李战怀,陈群,等.一种基于离群点检测的自动实体匹配方法[J].计算机学报,2017,40(10):2197-2211.
    [3]TEMPL M,HRON K,FILZMOSER P.Exploratory tools for outlier detection in compositional data with structural zeros[J].Journal of Applied Statistics,2017,44(4):734-752.
    [4]YANG J H,DENG T Q.A One-Cluster Kernel PCM Based SVDD Method for Outlier Detection[J].Acta Electronica Sinica,2017,45(4):813-819.(in Chinese)杨金鸿,邓廷权.一种基于单簇核PCM的SVDD离群点检测方法[J].电子学报,2017,45(4):813-819.
    [5]RO K,ZOU C,WANG Z,et al.Outlier detection for high-dimensional data[J].Biometrika,2015,102(3):589-599.
    [6]BREUNIG M M,KRIEGEL H P,NG R T,et al.LOF:identifying density-based local outliers[J].ACM Sigmod Record,2010,29(2):93-104.
    [7]KRIEGEL H P,ZIMEK A.Angle-based outlier detection in high-dimensional data[C]∥Proceedings of the 14th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining.Las Vegas,Nevada,USA:ACM Press,2008:444-452.
    [8]DANG X H,MICENKOVB,ASSENT I,et al.Outlier detection with space transformation and spectral analysis[C]∥Proceedings of the 13th SIAM International Conference on Data Mining.Austin,Texas,USA:IEEE Press,2013:225-233.
    [9]NG A Y,JORDAN M I,WEISS Y.On spectral clustering:Analysis and an algorithm[C]∥26th Annual Conference on Neural Information Processing Systems 2012.Lake Tahoe,Nevada,United States:IEEE Press,2012:849-856.
    [10]SHI J,MALIK J.Normalized cuts and image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,22(8):888-905.
    [11]CAMPOS G O,ZIMEK A,SANDER J,et al.On the evaluation of unsupervised outlier detection:measures,datasets,and an empirical study[J].Data Mining and Knowledge Discovery,2016,30(4):891-927.
    [12]YANG Y,MA Z,YANG Y,et al.Multitask spectral clustering by exploring intertask correlation[J].IEEE Transactions on Cybernetics,2015,45(5):1083-1094.
    [13]BI W,CAI M,LIU M,et al.A big data clustering algorithm for mitigating the risk of customer churn[J].IEEE Transactions on Industrial Informatics,2016,12(3):1270-1281.
    [14]GU Y,LIU T,JIA X,et al.Nonlinear multiple kernel learning with multiple-structure-element extended morphological profiles for hyperspectral image classification[J].IEEE Transactions on Geoscience and Remote Sensing,2016,54(6):3235-3247.
    1)https://archive.ics.uci.edu/ml/datasets.html

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700