一种空间交叉异常显著性判别的非参数检验方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A Nonparametric Test Method for Identifying Significant Cross-outliers in Spatial Point Dataset
  • 作者:杨学习 ; 邓敏 ; 石岩 ; 唐建波 ; 刘启亮
  • 英文作者:YANG Xuexi;DENG Min;SHI Yan;TANG Jianbo;LIU Qiliang;School of Geosciences and Info-Physics,Central South University;
  • 关键词:空间数据挖掘 ; 空间异常探测 ; 交叉异常 ; 非参数检验 ; 显著性
  • 英文关键词:spatial data mining;;spatial outlier detection;;cross-outlier;;nonparametric test;;significance
  • 中文刊名:CHXB
  • 英文刊名:Acta Geodaetica et Cartographica Sinica
  • 机构:中南大学地球科学与信息物理学院;
  • 出版日期:2018-09-15
  • 出版单位:测绘学报
  • 年:2018
  • 期:v.47
  • 基金:国家自然科学基金(41471385;41730105);; 国家重点研发计划(2016YFB0502303);; 中南大学中央高校基本科研业务费专项资金(2016zzts085)~~
  • 语种:中文;
  • 页:CHXB201809011
  • 页数:11
  • CN:09
  • ISSN:11-2089/P
  • 分类号:94-104
摘要
空间异常探测旨在从海量空间数据中挖掘不符合普适性规律、表现出"与众不同"特性的空间实体集合,对于揭示地理现象的特殊发展规律具有重要价值。现有研究在空间异常度量方面取得了重要进展,但多缺乏对空间异常模式显著性的统计判别,且是针对单一类别数据,没有顾及多类别数据间的相互影响。为此,本文基于空间随机过程的思想,针对两种类别空间点数据,提出了一种空间交叉异常显著性判别的非参数检验方法。首先,针对基本数据集实体,采用约束Delaunay三角网,构建合理、稳定的空间邻近域;然后,统计落在基本数据集实体空间参考邻域半径范围内的参考数据集实体的数目,度量初始异常度;进而,采用α-Shape法构建支撑域,以空间随机过程为基础构建零模型,采用蒙特卡洛模拟检验空间异常的显著性;最后,采用生存距离对异常模式的稳定性进行评价分析。通过试验分析与比较发现,该方法能够有效识别具有统计显著性的空间交叉异常。
        In the field of geography,a spatial outlier is an object whose non-spatial attribute value is significantly different from the values of its spatial neighbors.Detection of spatial outliers will be helpful to uncover special geographical phenomenon,so it has become an important branch of spatial data mining.Although existing methods are able to measure spatial outlier factor,the significance of these outliers can not be evaluated in an objective way.Furthermore,the existing methods are mainly designed for single class dataset,without taking into account the interaction between different categoriesofdataset.Inthisstudy,a nonparametric test was developed to identify the significant cross-outliers in spatial point dataset.Firstly,a reasonable and stable spatial neighborhood is constructed for the primary dataset entitys using the constraint Delaunay triangulation.Then,using the number of reference dataset entitys falling in the spatial reference neighbor radius to measure the initial outlier factor.Constructed the support domain byα-Shape method,the null model is constructed based on spatial randomness process,and the significant spatial cross-outliers are identified by statistical test.Finally,the stability of the spatial cross-outlliers are evaluated by the living distance.Experimentson on both simulated and real-world datasets show that the proposed permutation test is effective for determining significant spatial cross-outliers in spatial point datasets.
引文
[1]李德仁,王树良,李德毅.空间数据挖掘理论及应用[M].2版.北京:科学出版社,2013.LI Deren,WANG Shuliang,LI Deyi.Spatial Data Mining Theories and Applications[M].2nd ed.Beijing:Science Press,2013.
    [2]刘大有,陈慧灵,齐红,等.时空数据挖掘研究进展[J].计算机研究与发展,2013,50(2):225-239.LIU Dayou,CHEN Huiling,QI Hong,et al.Advance in Spatiaotemporal Data Mining[J].Journal of Computer Research and Development,2013,50(2):225-239.
    [3]HAWKINS D.Identification of Outliers[M].London:Chapman and Hall,1980.
    [4]SHEKHAR S,LU C T,ZHANG Pusheng.A Unified Approach to Detecting Spatial Outliers[J].GeoInformatica,2003,7(2):139-166.
    [5]KNORR E M,Ng R T.Algorithms for Mining DistanceBased Outliers in Large Datasets[C]∥Proceedings of the24th International Conference on Very Large Data Bases,New York:VLDB Press,1998:392-403.
    [6]BREUNIG M M,KRIEGEL H P,Ng R T,et al.LOF:Identifying Density-Based Local Outliers[C]∥Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data.Dallas:ACM,2000:93-104.
    [7]刘启亮,邓敏,石岩,等.一种基于多约束的空间聚类方法[J].测绘学报,2011,40(4):509-516.LIU Qiliang,DENG Min,SHI Yan,et al.A Novel Spatial Clustering Method Based on Multi-constraints[J].Acta Geodaetica et Cartographica Sinica,2011,40(4):509-516.
    [8]SHI Yan,DENG Min,YANG Xuexi,et al.Adaptive Detection of Spatial Point Event Outliers Using Multilevel Constrained Delaunay Triangulation[J].Computers,Environment and Urban Systems,2016,59:164-183.
    [9]杨学习,石岩,邓敏,等.一种基于多层次专题属性约束的空间异常探测方法[J].武汉大学学报(信息科学版),2016,41(6):810-817.YANG Xuexi,SHI Yan,DENG Min,et al.A New Method of Spatial Outlier Detection by Considering Multi-level Thematic Attribute Constraints[J].Geomatics and Information Science of Wuhan University,2016,41(6):810-817.
    [10]LU C T,DOS SANTOS JR R F,LIU Xutong,et al.AGraph-based Approach to Detect Abnormal Spatial Points and Regions[J].International Journal on Artificial Intelligence Tools,2011,20(4):721-751.
    [11]CHEN Dechang,LU C T,KOU Yufeng,et al.On Detecting Spatial Outliers[J].GeoInformatica,2008,12(4):455-475.
    [12]CHAWLA S,SUN P.SLOM:A New Measure for Local Spatial Outliers[J].Knowledge and Information Systems,2006,9(4):412-429.
    [13]DENG Min,LIU Qiliang,LI Guangqiang.Spatial Outlier Detection Method Based on Spatial Clustering[J].Journal of Remote Sensing,2010,14(5):944-958.
    [14]唐建波,刘启亮,邓敏,等.空间层次聚类显著性判别的重排检验方法[J].测绘学报,2016,45(2):233-240.DOI:10.11947/j.AGCS.2016.20140605.TANG Jianbo,LIU Qiliang,DENG Min,et al.APermutation Test for Identifying Significant Clusters in Spatial Dataset[J].Acta Geodaetica et Cartographica Sinica,2016,45(2):233-240.DOI:10.11947/j.AGCS.2016.20140605.
    [15]HE Zengyou,DENG Shengchun,XU Xiaofei.Outlier Detection Integrating Semantic Knowledge[M]∥MENGXiaofeng,SU Jianwen,WANG Yujun.Advances in Webage Information Management.Berlin,Heidelberg:Springer,2002:126-131.
    [16]PAPADIMITRIOU S,FALOUTSOS C.Cross-Outlier Detection[M]∥MENG Xiaofeng,SU Jianwen,WANGYujun.Advances in Spatial and Temporal Databases.Berlin,Heidelberg:Springer,2003:199-213.
    [17]HE Zengyou,XU Xiaofei,HUANG J Z,et al.Mining Class Outliers:Concepts,Algorithms and Applications in CRM[J].Expert Systems with Applications,2004,27(4):681-697.
    [18]HEWAHI N M,SAAD M K.Class Outliers Mining:DistanceBased Approach[J].International Journal of Computer and Information Engineering,2007,1(9):2805-2818.
    [19]LIU Xutong,CHEN Feng,LU C T.On Detecting Spatial Categorical Outliers[J].GeoInformatica,2014,18(3):501-536.
    [20]LU Y C,CHEN Feng,WANG Yating,et al.Discovering Anomalies on Mixed-type Data Using a Generalized Student-t Based Approach[J].IEEE Transactions on Knowledge and Data Engineering,2016,28(10):2582-2595.
    [21]JANEJA V P,PALANISAMY R.Multi-domain Anomaly Detection in Spatial Datasets[J].Knowledge and Information Systems,2013,36(3):749-788.
    [22]ZHENG Yu,ZHANG Huichu,YU Yong.Detecting Collective Anomalies from Multiple Spatio-temporal Datasets Across Different Domains[C]∥Proceedings of the 23rd SIGSPA-TIAL International Conference on Advances in Geographic Information Systems.Seattle:ACM,2015:2.
    [23]蔡建南,刘启亮,徐枫,等.多层次空间同位模式自适应挖掘方法[J].测绘学报,2016,45(4):475-485.DOI:10.11947/j.AGCS.2016.20150337.CAI Jiannan,LIU Qiliang,XU Feng,et al.An Adaptive Method for Mining Hierarchical Spatial Co-location Patterns[J].Acta Geodaetica et Cartographica Sinica,2016,45(4):475-485.DOI:10.11947/j.AGCS.2016.20150337.
    [24]PEI Tao,WANG Weiyi,ZHANG Hengcai,et al.Densitybased Clustering for Data Containing Two Types of Points[J].International Journal of Geographical Information Science,2015,29(2):175-193.
    [25]KOLINGEROVI,ALIK B.Reconstructing Domain Boundaries within a Given Set of Points,Using Delaunay Triangulation[J].Computers&Geosciences,2006,32(9):1310-1319.
    [26]HUBERT M,VANDERVIEREN E.An Adjusted Boxplot for Skewed Distributions[J].Computational Statistics&Data Analysis,2008,52(12):5186-5201.
    [27]EDELSBRUNNER H,KIRKPATRICK D,SEIDEL R.On the Shape of a Set of Points in the Plane[J].IEEETransactions on Information Theory,1983,29(4):551-559.
    [28]王远飞,何洪林.空间数据分析方法[M].北京:科学出版社,2007:60.WANG Yuanfei,HE Honglin.Spatial Data Analysis Method[M].Beijing:Science Press,2007:60.
    [29]LEUNG Y,ZHANG Jiangshe,XU Zongben.Clustering by Scale-space Filtering[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22(12):1396-1410.
    [30]PEI Tao,ZHU Axing,ZHOU Chenghu,et al.A New Approach to the Nearest-neighbour Method to Discover Cluster Features in Overlaid Spatial Point Processes[J].International Journal of Geographical Information Science,2006,20(2):153-168.
    [31]DENG Min,HE Zhanjun,LIU Qiliang,et al.Multi-scale Approach to Mining Significant Spatial Co-location Patterns[J].Transactions in GIS,2017,21(5):1023-1039.
    [32]RIPLEY B D.The Second-Order Analysis of Stationary Point Processes[J].Journal of Applied Probability,1976,13(2):255-266.
    [33]毛媛媛,丁家骏.抢劫与抢夺犯罪行为时空分布特征研究---以上海市浦东新区为例[J].人文地理,2014,29(1):49-54.MAO Yuanyuan,DING Jiajun.Study on Spatial-temporal Patterns of Robbery and Snatch:A Case Study of Pudong New Area,Shanghai[J].Human Geography,2014,29(1):49-54.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700