用户名: 密码: 验证码:
空间离群点挖掘技术的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
空间离群点是与其空间邻域中其它空间对象的非空间属性值存在明显差异的空间对象。空间离群点挖掘是空间数据挖掘的一个重要分支,在交通控制、遥感图像分析、气象预报和人口统计数据分析等应用中可揭示重要现象。
     随着传感器设备技术的发展,数据采集设备的数量越来越多,精度越来越高,采集的项目也越来越多,因此数据量越来越大,维数越来越高。然而现有的空间离群点挖掘算法主要是针对单维或中低维的中小规模数据量的挖掘,难以适应高维大数据量的挖掘,并且现有算法没有充分考虑空间数据的特点,挖掘的不是真正意义上的空间离群点,而是全局离群点。算法存在用户依赖性大,检测精度低,挖掘效率低等局限。此外,随着网络技术、传感器技术和无线通信技术的发展,数据的采集、收集、保存和处理都呈现分散状态,因此,基于分布环境的数据挖掘也引起人们的关注,但基于分布环境的空间离群点挖掘算法还未见报道。
     本文将根据空间数据自身的特点,研究属性划分方法和属性的权值设置方法,空间离群程度的度量方法,实现挖掘精度高、用户依赖性少的高效的空间离群点挖掘算法。针对现有算法主要局限在数值型属性数据处理上的不足,通过将非数值型数据转化为数值型数据,实现基于混合型属性的统一算法。针对高维大数据量,采用剪枝策略、基于子空间的离群点挖掘和集成学习的方法实现高维大数据量的挖掘:针对分布环境下的空间离群点挖掘,提出了基于隐私保护的空间离群点挖掘算法。论文的主要贡献如下:
     (1)提出基于属性划分的方法解决局部离群点的挖掘问题。一般的局部离群点的挖掘采用的是满维属性的挖掘方法,如LOF(Local Outlier Factor)方法,其结果是局部邻域的确定非常耗时,由于所有维属性不加区分地等同看待,所以离群度度量的准确性受到影响,影响了挖掘的精度和速度。提出将数据对象的属性划分为标识属性、环境属性和固有属性,标识属性起着标识对象的作用,如数据对象名称等;环境属性决定了对象所处环境,如地理位置、时间、序列等,可利用环境属性确定邻域;固有属性是数据对象特有属性,包括行为属性和状态属性,决定了对象的行为和状态特征,可利用该类属性确定对象的离群程度。
     (2)提出空间数据对象的离群程度的新的度量方法,即基于空间数据特性的空间局部离群系数SLOF(Spatial Local Outlier Factor)的度量方法;提出基于空间离群度的空间离群点挖掘算法ASLOF(Algodthm based on SLOF)。将数据对象的属性分为标识属性、空间属性和非空间属性,利用空间属性确定空间邻域、建立空间索引,利用非空间属性确定对象的离群程度,并在离群度的度量中引入属性的权值,提高度量精度,据此提出了基于空间离群度的空间离群点挖掘算法。理论证明和实验测试结果表明,ASLOF在挖掘的精度、用户依赖性和算法性能上均优于现有算法。
     (3)提出混合属性的统一的空间离群度的度量方法和挖掘算法。从离群点性质入手,通过统计分类属性的频度,将分类属性转化为数值型,并通过属性的权值设置和属性的标准化等处理后,实现基于混合属性的空间离群点的统一挖掘算法。实验结果表明,算法可有效实现混合属性的空间离群度的统一度量计算和有效挖掘。
     (4)提出基于集成学习的子空间离群点集成的高维大数据量的空间离群点快速挖掘算法S2OEAHL(Subspace Spatial Outlier Ensemble Algorithm baSed High-dimensional Large data sets)。由于很多空间数据对象的标识属性中含有空间对象所在的地域标识,根据地域标识构建对象的层次编码树,基于层次编码树,实现数据的分区和对象的快速检索,通过计算分区的上下界和使用包围盒检测方法,剪除明显不含有离群点的分区,保留可能含有离群点的分区作为候选分区,实现了分区的快速剪枝,从而降低数据处理数量。对候选分区采用子空间挖掘方法,为避免与属性维度成指数关系的大量搜索,采用指定子空间挖掘和基于子空间权值的集成融合方法来解决高维数据的离群点挖掘问题。算法的实现中采用了基于单维子空间的离群系数挖掘方法,并利用优化计算的方法求得被检测对象所对应的各属性的权值,在此基础上通过集成融合函数求得被检测对象的离群度,根据离群度的排序可获得所求离群点。理论证明和实验结果均表明算法的有效性和计算的高效性。
     (5)提出基于分布环境的隐私保护的空间离群点挖掘算法DPPASLOF(DistribuIcd Privacy Preserving Algorithm based on SLOF)。算法中利用空间数据的局部性,发挥各数据方的主动参与的能力,借助于空间索引技术和隐私保护协议以提高搜索能力和隐私保护能力。理论证明算法的安全性,计算的高效性和低通信代价。
A spatial outlier is a spatially referenced object whose non-spatial attribute values are significantly different from the values of its neighborhood.Spatial outlier mining is an important branch of spatial data mining,it can reveal important phenomenon in the applications of traffic control,sensed image analysis,weather forecasting and analysis of demographic data and others.
     With the development of sensor technology,the number of equipment for data acquisition is more and more,the desired precision is higher,more and more projects collected,therefore increasing the amount of data,the higher dimension.However,the existing spatial outlier mining algorithm is mainly for the small and medium-sized datasets which is one-dimensional or low-dimensional,difficult to adapt to the large high-dimensional data mining,and did not fully consider the characteristics of spatial data,the data it mined is not the true spatial outliers,but the global outliers.Their disadvantages are the high user-dependency,low detection accuracy,low efficiency of mining.In addition,with the development of network technology,sensor technology and wireless communication technology,the acquisition,collection,preservation and processing of data appear a state of decentralization,so the data mining based on the distributed environment is also cause for concern.However,spatial outlier mining algorithm based on the distributed environment hasn't been reported.
     According to the characteristics of spatial data,this article will research on the methods of attribute partition and weight value setup,the measurement of spatial outlier score,achieving the high-performance spatial outlier mining algorithms with high mining precision,less user-dependency.
     The disadvantages of existing algorithms mainly limited to numerical data,by transforming the non-numerical data into numerical data,make the unified algorithm based on the mixed attribute come true.For high-dimensional large amount of data, use pruning strategy,the outlier mining based on subspace and ensemble learning methods to achieve the data mining of high-dimensional large amount of data sets; For the spatial outlier mining of distributed environment,the privacy preserving spatial outlier mining algorithms were proposed.The main contribution of the paper is as follows:
     (1) Propose the method based on the attribute division to resolve the problem of local outlier mining.The general local outlier mining uses the method of full-dimensional attributes,such as LOF(Local Outlier Factor) method.As a result,it is very time-consuming in determining the local neighborhood,since all-dimensional attributes are indiscriminately equated,the accuracy of the measurement of outlier score affected,the mining accuracy and speed of data mining also affected.The attributes of data object can be categorized as the ID attributes,context attributes and inherent attributes.The ID attributes play the role of marking the data object,such as the name of data object and so on.The context attributes decide the environment of the object,such as location,time,sequence,it can be used to identify neighborhood. The inherent attributes is the unique attributes of data object,including behavior attributes and status attributes,decide the behavior and characteristics of the status of the object,we can use it to determine the spatial outlier score of data objects.
     (2) Propose a new method for the measurement of the spatial outlier score of data objects.That is,the measurement method of SLOF(Spatial Local Outlier Factor) which is based on the characteristics of spatial data.Propose the spatial outlier mining algorithm ASLOF(Algorithm based on SLOF).The attributes of data object can be categorized as the ID attributes,spatial attributes and non-spatial attributes,use the spatial attributes to determine the spatial neighborhood,establish the spatial index,use the non-spatial attributes to determine the spatial outlier score,and introduce the weight value of attributes in the measurement of outlier score,improving the measurement accuracy.Based on these,propose the spatial outlier mining algorithm based on the spatial outlier score.The theory and experimental results show that the proposed ASLOF algorithm outperforms the other existing algorithms in mining accuracy,user-dependency,and efficiency.
     (3) Propose a unified measurement of the spatial outlier score and mining algorithm of mixed attributes.Start with the nature of outliers,through counting the frequency of classified attributes,transform the classified attributes into numeric attributes,and through weight value setup and standardization of the attributes,after the above mentioned deal,make the unified mining algorithm of spatial outlier which based on the mixed attribute come true.The experimental results show that it can effectively achieve the unified measurement of spatial outlier score with mixed attributes and mining.
     (4) Propose the subspace spatial outlier ensemble algorithm based highdimensional large data sets(S2OEAHL).Due to a lot of geographical identity contained in the ID attributes of the spatial data objects,according to the geographical identity to construct of the hierarchy coding tree of object,based on the tree,achieve the division of data and rapidly search of the object,by calculating the upper and lower bound of the division and minimum bounding rectangle(MBR) method,cutting the division which obviously not contain outliers,reserving the division which may contain outliers as a candidate division,it realizes the rapid pruning of the division, consequently reduce the number of data processing.Adopting the subspace mining method for the candidate division,in order to avoid a large number of search which has an exponential relationship with the dimension of the attributes,using a subspace-based mining and ensemble learning based on subspace-weight to address the issue of outlier mining of high-dimensional data.Algorithm use the outlier factor mining method of one-dimensional subspace,and use the optimizational method of calculation to achieve the corresponding weight of attributes of the detected object. On this basis,the outlying-ness of each data object is measured by fusing outlier factors in different subspaces using a combination function.According to the sort of outlier factors we can acquire the outliers.The theory and experimental results show the effectiveness of the algorithm and the high efficiency of calculation.
     (5) Propose the spatial outlier mining algorithm DPPASLOF(Distributed Privacy Preserving Algorithm based on SLOF) of the protection of privacy based on distributed environment.The algorithm using the locality of spatial data,exert the ability of active participation of every data holder party,with the spatial index technology and privacy preserving protocols in order to improve the ability to search and privacy preserving.Theory shows the safety of the algorithm,the high-performance of computing and the low cost of communications.
引文
[1]Fayyad U M,Piatesky-shapiro G,Smyth P P.From Data Mining to Knowledge Discovery:an overview[C].Advances in Knowledge Discovery and Data Mining.California:AAA/MIT Press,1996:1-36.
    [2]Han Jiawei,Kamber Micheline.Data mining:concepts and techniques[M].San Francisco:Morgan Kaufmann Publishers,2001.
    [3]Tan Pang-Ning,Steinbach Michael,Kumar Vipin.Introduction to data mining[M].New York:Addison-Wesley,2006.
    [4]Knorr E.,Ng R..Algorithms for mining distance-based outliers in large datasets[J].In:Proc.of the 24th VLDB Conference,New York,1998:392-403.
    [5]Knorr E.,Ng R..A Unified Approach for Mining Outliers:Properties and Computation[C].In:Proc.of Knowledge Discovery and Data Mining(KDD'97),Newport Beach,1997:219-222.
    [6]Knorr E.,Ng R..Vladimir Tucakov.Distance-based outliers:algorithms and applications[J].The VLDB Journal,2002,8(3-4):237-253.
    [7]Breunig M.,Kriegel H.P.,Ng R.,et al.LOF:Identifying density-based local outliers[C].In:Proc.of ACM SIGMOD Conference,Dallas,2000:93-104.
    [8]魏藜,宫学庆,钱卫宁等.高维空间中的离群点发现[J].软件学报,2002,13(2):280-290.
    [9]薛安荣,鞠时光,何伟华等.局部离群点挖掘算法研究[J].计算机学报,2007,30(8):1455-1463.
    [10]Shekhar S.,Lu C.-T.,and Zhang P..A Unified Approach to Spatial Outliers Detection[J].GeoInformatica,2003,7(2):139-166.
    [11]Shekhar S.,Lu C.-T.,and Zhang P..Detecting Graph-based Spatial Outliers[J].International Journal of Intelligent Data Analysis(IDA),2002,6(5):451-468.
    [12]Lu Chang-Tien,Chen Dechang,Kou Yufeng.Algorithms for Spatial Outlier Detection[C].In:Proc.of 3rd International Conference on Data Mining,Melbourne,2003:597-600.
    [13]Lu Chang-Tien,Chen Dechang,Kou Yufeng.Detecting Spatial Outliers with Multiple Attributes[C].In:Proc of the 15th International Conference on Tools with Artificial Intelligence,Sacramento,2003:122-128.
    [14]文俊浩,吴中福,吴红艳.空间孤立点检测[J].计算机科学,2006,33(5):185-187.
    [15]Chawla Sanjay,Sun Pei.SLOM:a new measure for local spatial outliers[J].Knowledge and Information Systems,2006,9(4):412-429.
    [16]Kou Y.,Lu C.-T.,and Chen D.Spatial Weighted Outlier Detection.In:Proc of the SIAM Conference on Data Mining,Bethesda,2006:613-617.
    [17]Chen Dechang,Lu Chang-Tien,Kou Yufeng and et al.On Detecting Spatial Outliers[J].Geoinformatica,2008,12(4):455-475.
    [18]Xue Among and Ju Shiguang.Algorithm for Spatial Outlier Detection Based on Outlying Degree[C].In:Proc.of the WCICA 2006,Dalian,12(7):6005-6009.
    [19]薛安荣,鞠时光.基于空间约束的离群点挖掘[J].计算机科学,2007,34(6):207-210.
    [20]Otey,M.E.,Ghoting,A.,Parthasarathy,A..Fast Distributed Outlier Detection in Mixed-Attribute Data Sets[J].Data Mining and Knowledge Discovery,2006,12(2-3):203-228.
    [21]Xu,Y.J.,Qian,W.,Lu,H.,Zhou,A..Finding centric local outliers in categorical/numerical spaces[J].Knowledge Information Systems,2006,9(3):309-338.
    [22]Hawkins D.Identification ofoufliers[M].London:Chapman and Hall,1980.
    [23]Barnet V.,Lewis T.Outlier in statistical data[M].New York:John Wiley&Sons,1994.
    [24]Ruts I.,Rousseeuw P.Computing Depth Contours of Bivariate Point Clouds[J].Journal of Computational Statistics and Data Analysis,1996,40(23):153-168
    [25]Johnson T.,Kwok I.,Ng R..Fast Computation of 2-dimensional Depth Contours[C].In:Proc.of the 4th KDD,New York,1998:224-228.
    [26]Ester,M.,Kriegel,H.P.,Sander,J.,et al.A density-based algorithm for discovering clusters in large spatial databases with noise[C].In:Proc.of the 2nd International Conference on Knowledge Discovery and Data Mining,Portland,1996:226-231.
    [27]Ng,R.T.,Han,J.Efficient and effective clustering methods for spatial data mining[C].In:Proc.of the 20th VLDB Conference,Santiago,1994:144-155.
    [28]Karypis George,Han Eui-Hong(Sam),and Kumar Vipin.CHAMELEON:A Hierarchical Clustering Algorithm Using Dynamic Modeling[J].IEEE Computer:Special Issue on Data Analysis and Mining,1999,32(8):68-75
    [29]Zhang,T.,Ramakrishnan,R.,Linvy,M.BIRCH:an efficient eata clustering method for very large databases[C].In:Proc.of the ACM SIGMOD International Conference on Management of Data,Montreal,1996:103-114.
    [30]Wang,W.,Yang,J.,Muntz,R.STING:a statistical information grid approach to spatial data mining[C].In:Proc.of the 23rd VLDB Conference,Athens,1997:186-195.
    [31]Sheikholeslami,G.,Chatterjee,S.,Zhang,A.WaveCluster:a multi-resolution clustering approach for very large spatial databases[C].In:Proc.of the 24th VLDB Conference,New York,1998:428-439.
    [32]Agrawal,R.,Gehrke,J.,Gunopulos,D.,et al.Automatic subspace clustering of high dimensional data for data mining applications[C].In:Proc.of the ACM SIGMOD International Conference on Management of Data.Seattle,1998:94-105.
    [33]Breunig,M.M.,Kriegel,H.P.,Ng,R.T.,et al.OPTICS-OF:identifying local outliers[C].In:Proc.of the 3rd European Conference on Principles and Practice of Knowledge Discovery in Databases.Lecture Notes in Computer Science 1704,Prague,1999:262-270.
    [34]Guttmann R.A dynamic index structure for spatial searching[C].In:Proceedings of the international conference on management of data(SIGMOD),1984:47-57.
    [35]Bentley J.Multi-dimensional binary search trees used for associative searching[J].Communications of the ACM,1975,18(9):509-517.
    [36]Berchtold S,Keim D,Kreigel H.The X-tree:an index structure for high dimensional data[C].In:Proceedings of the international conference on very large data bases(VLDB),1996:28-39.
    [37]Finkel,R.A.and Bentley,J.L.Quad trees:A data structure for retrieval on composite keys [J].Acta Informatica,1974,4(1):1-9.
    [38]Bay,S.D.Schwabacher,M..Mining distance-based outliers in near linear time with randomization and a simple pruning rule[C].In:Proc.of ACM SIGKDD Int'l Conf.on Knowledge Discovery and Data Mining,2003:29-38.
    [39]Ramaswamy,S.,Rastogi,R.,Kyuseok,S.Efficient algorithms for mining outliers from large data sets[C].In:Proc.of the ACM SIGMOD International Conference on Management of Data,Dallas,2000:427-438.
    [40]Angiulli F,Pizzuti C.Outlier Mining in Large High Dimensional Data Sets[J].IEEE Trans.Knowledge and Data Eng.,2005,2(17):203-215.
    [41]Jin Wen,Tung Anthony K.H.,Han Jiawei and et al.Ranking Outliers Using Symmetric Neighborhood Relationship.In:Proc.of the PAKDD,2006:577-593.
    [42]Tang J.,Chen Z.,Fu A.and et al.Enhancing effectiveness of outlier detections for low-density patterns[C].In:Proc.of the 6th PAKDD,Taipei,2002:535-548.
    [43]Jin Wen,Tung A K Han Jiawei.Mining Top-n Local Outliers in Large Databases[C].In:Proc.of the KDD'01,San Jose,2001:293-298.
    [44]Chiu Anny Lai-mei,Fu Ada Wai-chee.Enhancements on Local Outlier Detection[C].In:Proc.of the 7th International Database Engineering and Applications Symposium,Hong Kong,2003:298-307.
    [45]Malik Agyemang.Local Sparsity Coefficient-Based Mining of Outliers[D].Windsor Ontario:University of Windsor,2003.
    [46]Papadimitirou S,Kitagawa H,Gibbons PB,Faloutsos C.LOCI:Fast outlier detection using the local correlation integral[C].In:Proc.of the 19th International Conference on Data Engineering.Bangalore,2003:315-326.
    [47]赵科平,周水庚,关佶红,周傲英.一种新的离群数据对象发现方法.中国人工智能学会第10届全国学术年会论文集[C].北京:北京邮电大学出版社,2003.
    [48]李存华,孙志挥,陈耿.基于网格上近似的大规模数据集离群点检测算法GROUT[J].计算机应用研究,2003,20(9):34-136.
    [49]Aggarwal,C.C.,Yu,P.Outlier detection for high dimensional data[C].In:Proc.of the ACM SIGMOD International Conference on Management of Data,Santa Barbara,2001:37-47.
    [50]Angiulli F,Basta S,Pizzuti C.Distance-based detection and prediction of Outlier[J].IEEE Trans.Knowledge and Data Eng.,2006,2(18):145-160.
    [51]Aggarwal,C.C..Re-designing Distance Functions and Distance-Based Applications for High Dimensional Data[J].S IGMOD Record Date,2001,30(1):13-18.
    [52]Yu Dantong,Sheikholeslami Gholamhosein and Zhang Aidong.FindOut:Finding outliers in very Large Datasets[J].Knowledge and Information Systems,2002,4(4):387-412.
    [53]Dutta,H.,Giaunella,C.,Borne,K.,and Kargupta,H..Distributed top-k outlier detection in astronomy catalogs using the demac system[C].In:Proc.of 7th SIAM International Conference on Data Mining,Minneapolis,2007:208-215.
    [54]许龙飞,熊君丽.基于粗糙集的高维空间离群点发现算法研究[J].计算机工程与应用,2004,40(7):58-60.
    [55]Edwin M.Knorr,Raymond T.Ng.Finding Intentional Knowledge of Distance-Based Outliers[C].In:Proe.of the 25th VLDB,Edinburgh,1999:211-222.
    [56]Zhixiang Chen,Jian Tang,Ada Wai-Chee Fu.Modeling and Efficient Mining of Intentional Knowledge of Outliers[C].In:Proe.of the 7th International Database Engineering and Applications Symposium Conference.Hong Kong,2003:44-53.
    [57]Shekhar S.,Chawla S..A tour of spatial databases[M].Upper Saddle River,N.J.:Prentice Hall,2003.
    [58]郑斌祥,杜秀华,席裕庚.一种时序数据的离群数据挖掘新算法[J].控制与决策,2002,17(3):324-327.
    [59]赵泽茂,何坤金,陈鹏.Web日志文件的异常数据挖掘算法及其应用[J].计算机工程,2003,29(17):195-197.
    [60]杨宜东,孙志挥,朱玉全,等.基于动态网格的数据流离群点快速检测算法[J].软件学报,2006,17(8):1796-1803.
    [61]Jagadish H V,Koudas N and Muthukrishnan S.Mining deviants in a time series database[C].In:Proc.of the 25th VLDB,Edinburgh,1999:341-350.
    [62]Choy K.outlier detection for stationary time series[J].Journal of Statistical Planning and Inference,2001,99(2):111-127.
    [63]Ma J.and Perkins S..Time-series novelty detection using one-class support vector machines[C].In:Proc.of the International Joint Conference on Neural Networks,2003:168-175.
    [64]Dasgupta D,Forrest S.Novelty detection in time series data using ideas from immunology[C].In:Proc.of the International Conference on Intelligent Systems,1999:82-87.
    [65]Shahabi C.,Tian X.,and Zhao W..TSA-tree:a wavelet-based approach to improve the efficiency of multi-level surprise and trend queries[C].In:Proc.of the 12th International Conference on Scientific and Statistical Database Management,2000:55-68.
    [66]Keogh E,Lonardi S,Chiu B.Finding surprising patterns in a time series database in linear time and space[C].In:Proc.of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,New York,2002:550-556.
    [67]Bejerano,G.and Yona,G.Modeling protein families using probabilistie suffix trees[C].In:Proc.of the third annual international conference on Computational molecular biology,1999:15-24.
    [68]Pei Sun,Sanjay Chawla,Bavani Arunasalam.Mining for Outliers in Sequential Databases[C].In:Proc.of the Sixth SIAM International Conference on Data Mining,Bethesda,2006:94-105.
    [69]薛安荣,何伟华.基于时序离群检测的新的分段方法[J].计算机工程与设计,2007,28(20):4875-4877.
    [70]姚卫新.智能数据分析中异常数据的集成化管理方法研究[D].上海:复旦大学,2004.
    [71]陆声链.孤立点挖掘及其内涵知识发现的研究与应用[D].南宁:广西大学,2005.
    [72]Gwadera,R.,Atallah,M.J.,and Szpankowski,W..Reliable detection of episodes in event sequences[J].Knowledge and Information Systems,2005,7(4).415-437.
    [73]Sun Pei.Outlier Detection in High Dimensional Spatial and Sequential Data Sets[D].Sydney:University of Sydney,2006.
    [74]Xiuyao Song,Mingxi Wu,Christopher Jermaine and et al.Conditional Anomaly Detection[J].IEEE Trans.on Knowledge and Data Eng,2007,19(5):631-645.
    [75]Beclanann N.,Kriegel H.P.,Schneider R.,Seeger B..The R*-tree:An efficient and robust access method for points and rectangles[C].in:Proceedings of the SIGMOD Conference on Management of Data,Atlantic City,N J,1990.New York:ACM Press,1990:322-331.
    [76]Tobler,W.R.A computer movie simulating urban growth in the Detroit region[J].Economic Geography,1970,46(2):234-240.
    [77]Cook D.,Symanzik J.,Majure J.J.The variogram cloud link[OL].http://www.public.iastate.edu/-dicook/compgeo/VariogramCloudExample.html,1996.
    [78]Luc Anselin.Local indicators of spatial association - LISA[J].Geographical Analysis,1995,27(2):93-115.
    [79]Luc Anselin.Exploratory spatial data analysis and geographic information systems.In M.Painho.(Ed.),New Tools for Spatial Analysis,1994:45-54.
    [80]Hu Tianming,Sung Sam Yuan.A trimmed mean approach to finding spatial outliers[J].Intelligent Data Analysis,2000,8(1):79-95.
    [81]U.S.Census Burean,United Stated Department of Commence.http://www.census.gov/[OL].
    [82]Li Shuxin,Lee Robert,and Lang Sheau-Dong.Mining Distance-based Outliers from Categorical Data[C].In:Seventh IEEE International Conference on Data Mining -Workshops,IEEE Computer Society,2007:225-230.
    [82]He,Z.,Deng,S.,Xu,X.An Optimization Model for Outlier Detection in Categorical Data[C].In:Proc.of 2005 International Conference on Intelligent Computing(ICIC'05),2005:400-409.
    [83]He,Z.,Xu,X.,Huang,J.,Deng,J.FP-Outlier:Frequent Pattern Based Outlier Detection[J].Computer Science and Information System,2005,2(1):103-118.
    [84]He,Z.,Deng,S.,Xu,X..A Fast Greedy algorithm for outlier mining[C].In:Proc.of PAKDD,2006:567-576.
    [85]于绍越,商琳.基于信息熵的相对离群点的检测方法:ENBRO[J].南京大学学报(自然科学),2008,44(2):212-218
    [86]Wei,L.,Qian,W.,Zhou,A.,Jin,W.HOT:Hypergraphbased Outlier Test for Categorical Data[C].In:Proc.of 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining,PAKDD,2003:399-410.
    [87]Xu,Y.J.,Qian,W.,Lu,H.,Zhou,A.Finding centric local outliers in categorical/numerical spaces[J].Knowledge Information Systems,2006,9(3):309-338.
    [88]Koufakou A.,Ortiz E.G.,Georgiopoulos M.and et al.A Scalable and Efficient Outlier Detection Strategy for Categorical Data[C].In:19th IEEE International Conference on Tools with Artificial Intelligence,IEEE Computer Society,2007:210-217.
    [89]Shannon,C.E.A mathematical theory of communication[J].Bell System Technical Journal,1948,27(7):379-423,and 27(10):623-656.
    [90]Agrawal,R.,Srikant,R.Fast algorithms for mining association rules[C].In:Proc.of the Int'l Conference on Very Large Data Bases VLDB,1994:487-499.
    [91]Beyer K.,Goldstein J.,Ramakrishnan R.and et al.When is Nearest Neighbors Meaningful?[C].In:Proc.of ICDT'99,Lecture Notes In Computer Science,Vol.1540,1999:217-235.
    [92]He Z.,Xu X.,Huang J.Z.and et al.A Frequent Pattern Discovery Based Method for Outlier Detection[C].In:Proc.of WAIM'04,2004:726-732.
    [93]Zhu,C.,Kitagawa,H.,& Faloutsos,C.Example-Based Robust Outlier Detection in High Dimensional Datasets[C].In Proc.of 2005 IEEE International Conference on Data Mining (ICDM'05),Houston,Texas,2005:829-832.
    [94]Zhu,C.,Kitagawa,H.,Papadimitriou,S.,& Faloutsos,C.OBE:Outlier by Example[C].In Proc.of 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'04),Sydney,Australia,2004:222-234.
    [95]Zhang,J.,& Wang,H.Detecting Outlying Subspaces for High-dimensional Data:the New Task,Algorithms and Performance[J].Knowledge and Information Systems:An International Journal(KAIS),2006,10(3):333-355.
    [96]Zhang,J.,Lou,M.,Ling,T.W.,& Wang,H.HOS-Miner:A System for Detecting Outlying Subspaces of High-dimensional Data[C].In Proc.of 30th International Conference on Very Large Data Bases(VLDB'04),Toronto,Canada,2004:1265-1268.
    [97]Zhang Ji,Gao Qigang,Wang Hai.A Novel Method for Detecting Outlying Subspaces in High.dimensional Databases Using Genetic Algorithm[C].In:Proceedings of the Sixth International Conference on Data Mining(ICDM'06),IEEE Computer Society,2006:731-740.
    [98]He Zengyou,Shengchun Deng,Xiaofei Xu.A Unified Subspace Outlier Ensemble Framework for Outlier Detection.WAIM 2005:632-637.
    [99]He Zengyou,Xiaofei Xu,Sh.engchun Deng:A Unified Subspace Outlier Ensemble Framework for Outlier Detection in High Dimensional Spaces[OL].CoRR abs/cs/0505060,2005.
    [100]Ghoting Amol,Parthasarathy Srinivasan,Otey Matthew Eric.Fast mining of distance-based outliers in high-dimensional datasets[J].Dam Mining and Knowledge Discovery,2008,16(3):349-364.
    [101]彭京,唐常杰,程温泉.一种基于层次距离计算的聚类算法[J].计算机学报,2007,30(5):786-795
    [102]Vaidya J,Clifton C.Privacy-Preserving Outlier Detection[C].In:The Fourth IEEE International Conference on Data Mining.Brighton,UK,2004:1-4.
    [103]Clifton C,Kantarcioglu M,Vaidya J,et al.Tools for Privacy Preserving Distributed Data Mining[J].SIGKDD Explorations,2003,4(2):28-34.
    [104]Goldreich O..Foundations of Cryptography:Basic Applications[M].London:Cambridge University Press,2004.
    [105]Xue Among,Jiang Dongjie,Ju Shiguang,et al.Privacy-preserving Hierarchical-k-means Clustering on Horizontally Partitioned Data[C].International Symposium on Advances in Computer and Sensor Networks and Systems,zhengzhou,2008:453-460.
    [106]黄毅群,卢正鼎,胡和平等.分布式异常检测中隐私保持问题研究[J].电子学报,2006,34(5):796-799.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700