摘要
传统扫描统计方法在进行时空异常聚类模式挖掘时,受扫描窗口形状的限制,不能准确地获取聚类区域形状。提出一种改进的不规则形状时空异常聚类模式挖掘方法stAntScan。新方法基于26方位时空邻近单元格构建时空邻接矩阵,再对蚁群最优化扫描统计方法进行改进,使其能适应三维大数据量的时空区域扫描。模拟数据和真实微博签到数据的实验证明,stAntScan能有效地识别时空范围内的不规则形状异常聚类,并且准确性较经典的SaTScan方法高。
Spatio-temporal abnormal cluster pattern is an important spatial point pattern.The pattern results can reflect the distribution and evolution of spatio-temporal events timely and accurately.Early researches has verified the scan statistic based clustering methods are very effective in detection spatial and spatio-temporal abnormal cluster pattern.However,due to the fixed shape of scan window,traditional scan statistic based clustering methods have limitation on obtaining exact shape and size of cluster.This paper proposed an improved irregularly shaped spatio-temporal abnormal cluster pattern mining algorithm stAntScan.The algorithm constructs the spatio-temporal neighborhood matrix by a newly defined 26 directions spatio-temporal neighbor cells.Then the algorithm improves the ant colony optimization based method to fit for spatio-temporal scanning on three-dimensional large data set.In the end,the Monte Carlo simulation method is used to test the significance of clusters.Experimental results on both simulated data and real Weibo check-in data have testified the efficiency and accuracy of stAntScan on irregularly shaped spatio-temporal abnormal cluster pattern mining.And compared with the classical SaTScan,it gets much better results in finding exact shape and size of clusters.
引文
[1]Wang Jinfeng.Spatial Analysis[M].Beijing:Science Press,2006(王劲峰.空间分析[M].北京:科学出版社,2006)
[2]Wang Yuanfei,He Honglin.Methods for Spatial Data Analysis[M].Beijing:Science Press,2007(王远飞,何洪林.空间数据分析方法[M].北京:科学出版社,2007)
[3]Tango T,Takahashi K,Kohriyama K.A Spacetime Scan Statistic for Detecting Emerging Outbreaks[J].Biometrics,2011,67(1):106-115
[4]Lima M S D,Duczmal L H.Adaptive Likelihood Ratio Approaches for the Detection of Space-time Disease Clusters[J].Computational Statistics and Data Analysis,2014,77:352-370
[5]Costa M A,Kulldorff M.Maximum Linkage Spacetime Permutation Scan Statistics for Disease Outbreak Detection[J].International Journal of Health Geographics,2014:13-20,doi:10.1186/1476-072x-13-20
[6]Kulldorff M,Tango T,Park P J.Power Comparisons for Disease Clustering Tests[J].Computational Statistics&Data Analysis,2003,42(4):665-684
[7]Wang Haijun,Deng Yu,Wang Li,et al.A Cmeans Algorithm Based on Data Field[J].Geomatics and Information Science of Wuhan University,2009,34(5):626-629(王海军,邓羽,王丽,等.基于数据场的C-均值聚类方法研究[J].武汉大学学报·信息科学版,2009,34(5):626-629)
[8]Deng Min,Peng Dongliang,Liu Qiliang,et al.A Hierarchical Spatial Clustering Algorithm Based on Field Theory[J].Geomatics and Information Science of Wuhan University,2010,36(7):847-852(邓敏,彭东亮,刘启亮,等.一种基于场论的层次空间聚类算法[J].武汉大学学报·信息科学版,2011,36(7):847-852)
[9]Deng Min,Liu Qiliang,Li Guangqiang,et al.A Spatial Clustering Algorithm Based on Minimum Spanning Tree-liike[J].Geomatics and Information Science of Wuhan University,2010,35(11):1 360-1 364(邓敏,刘启亮,李光强,等.一种基于似最小生成树的空间聚类算法[J].武汉大学学报·信息科学版,2010,35(11):1 360-1 364)
[10]Lawson A B.Statistical Methods in Spatial Epidemiology[M].Chichester:John Wiley&Sons,Ltd,2006
[11]Openshaw S,Charlton M,Wymer C,et al.A Mark Geographical Analysis Machine for the Automated Analysis of Point Data Sets[J].Int J Geogr Inf Sci,1987,1(4):335-358
[12]Kulldorff M,Nagarwalla N.Spatial Disease Clusters:Detection and Inference[J].Statistics in Medicine,1995,14:799-810
[13]Kulldorff M.A Spatial Scan Statistics[J].Communstatist-theory Meth,1997,26(6):1 481-1 496
[14]Kulldorff M,Athas W,Feurer E,et al.Evaluating Cluster Alarms:A Space-time Scan Statistic and Brain Cancer in Los Alamos,New Mexico[J].Am J Public Health,1998,88(9):1 377-1 380
[15]Kulldorff M.Prospective Time Periodic Geographical Disease Surveillance Using a Scan Statistic[J].J R Stat Soc Ser A-Stat Soc,2001,164:61-72
[16]Kulldorff M,Heffernan R,Hartman J,et al.A Space-time Permutation Scan Statistic for Disease Outbreak Detection[J].PLOS Medicine,2005,2(3):216-224
[17]Demattei C,Molinari N,Daures J P.Arbitrarily Shaped Multiple Spatial Cluster Detection for Case Event Data[J].Computational Statistics&Data Analysis:2007,51(8):3 931-3 945
[18]Cucala L.A Flexible Spatial Scan Test for Case Event Data[J].Computational Statistics&Data Analysis,2009,53(8):2 843-2 850
[19]Patil G P,Taillie C.Upper Level Set Scan Statistic for Detecting Arbitrarily Shaped Hotspots[J].Environ Ecol Stat,2004,11(2):183-197
[20]Tango T,Takahashi K.A Flexibly Shaped Spatial Scan Statistic for Detecting Clusters[J].International Journal of Health Geographics,2005,4(11):1-15
[21]Duczmal L,Kulldorff M,Huang L.Evaluation of Spatial Scan Statistics for Irregularly Shaped Clusters[J].J Comput Graph Stat,2006,15(2):428-442
[22]Duczmal L,Cancado A L F,Takahashi R H C,et al.A Genetic Algorithm for Irregularly Shaped Spatial Scan Statistics[J].Computational Statistics&Data Analysis,2007,52(1):43-52
[23]Janeja V,Atluri V.Random Walks to Identify Anomalous Free-form Spatial Scan Windows[J].IEEE Transactions on Knowledge and Data Engineering,2008,20(10):1 378-1 392
[24]Pei Tao,Wan You,Jiang Yong,et al.Detecting Arbitrarily Shaped Clusters Using Ant Colony Optimization[J].Int J Geogr Inf Sci,2011,25(10):1 575-1 595
[25]Wan You,Pei Tao,Zhou Chenghu,et al.ACOMCD:A Multiple Cluster Detection Algorithm Based on the Spatial Scan Statistic and Ant Colony Optimization[J].Computational Statistics&Data Analysis,2012,56(2):283-296
[26]Duan Haibin.Ant Colony Optimization:Principle and Applications[M].Beijing:Science Press,2005(段海滨.蚁群算法原理及其应用[M].北京:科学出版社,2005)