离群检测及离群释义空间查找算法研究

英文题名：Research on Outlier Detection and Searching Algorithm for Outlying Paraphrase Space
作者：林海
论文级别：博士
学科专业名称：计算机科学与技术
中文关键词：离群挖掘 ; 谱聚类 ; 离群行为子空间 ; 离群释义空间
英文关键词：Outlier mining ; Spectral Clustering ; Outlying behavior subspace ; Outlying
英文关键词：Paraphrase Space
学位年度：2012
导师：朱庆生
学科代码：0812
学位授予单位：重庆大学
论文提交日期：2012-10-01

摘要

数据集中的离群点是指那些偏离常规数据对象的数据点，它们表现为与常规数据的产生机制完全不同。离群点可能蕴含着重要的信息，如在信用卡欺诈行为、通信盗用行为、网络入侵行为等领域中离群点是数据分析的主要对象；在疾病诊断、天文观察等研究领域，离群对象可能给予我们新的视角，导致新理论或新应用的出现。离群挖掘就是利用统计学、机器学习、智能计算、可视化等技术来发现数据集中的离群点，供用户进行分析和研究。
     离群挖掘具有重要的学术意义和广泛的应用前景。面对日益复杂的大型高维数据集，如何迅速有效地发现并处理异常行为是一个具有挑战性的问题。
     聚类结构是数据在形成过程中所体现出来的一种常见形式，数据不同类别之间具有较明显的特征差异。与传统的聚类算法相比，谱聚类具有能在任意形状的样本空间上聚类，且收敛于全局最优解的特点，因此近年来得到了广泛应用。
     云模型是在概率论和模糊数学理论两者的基础之上形成的定性概念和其定量表示之间相互转换的模型。其中的正态云模型将实际应用中不符合正态分布严格定义的分布纳入泛正态分布的范畴。
     数据集中的离群点之所以会有离群行为的产生，是因为体现该离群点的各个属性字段或者属性字段的组合的取值和常规数据不同。在检测到这些离群点之后，分析其离群的行为以及对其离群行为作出解释，不仅有助于加深我们对数据集的理解，还可以帮助我们提高在新产生的数据集上检测离群数据的效果和效率。
     本论文将谱聚类方法和云模型理论用于发现数据集中的离群点，同时对离群点的离群行为进行分析和解释。主要研究工作和成果如下：
     （1）针对复杂数据集的聚类问题，提出一种改进的谱聚类算法。该算法引入与密度有关的自适应邻居规模参数更精确地计算对象间的相似性，实现更精确的聚类。基于该算法获得的稳定聚类能够有效地检测离群点。
     （2）提出一种基于谱聚类的数据集聚类分析与离群检测算法。该算法通过计算不同聚类数目下的动态有效性指标来自动确定数据集的最优聚类数，然后计算“小聚类”的成员的局部离群因子，并根据该局部离群因子确定该成员是否是离群点。
     （3）将云模型中云滴对于云模型的隶属度概念与数据点在数据集中的离群度相结合，提出一种基于云模型的离群数据检测算法。该算法首先计算数据对象在各个属性字段上的离群度，再计算数据对象在全属性集上的离群度，最后根据离群度的大小查找出数据集中的离群数据。
     （4）针对当前离群挖掘主要集中于如何检测离群点，而忽视了对离群点产生原因和离群行为的分析，论文提出一种查找离群点的离群行为子空间和关键离群行为子空间的算法。引入“强离群释义空间”和“弱离群释义空间”的概念，并提出一种检测离群点离群释义空间的算法。
Outliers in a dataset are those data points that deviate from the regular data points,which have completely different generation mechanism from the conventional data.Some outliers are caused by the error during the data formation processes; such outliershave no benefit on the description of the data set integrity, therefore they should beremoved in the process of data preprocessing. However, some outliers may contain veryimportant information; outliers are the main target of the data analysis in many fieldssuch as credit card fraud, communications, misappropriation, network intrusion;abnormal objects could give us new perspectives in the scientific research field ofdisease diagnosis, astronomical observation, etc., therefore resulting in the emergence ofnew theories or new applications. Outlier mining is to discover outliers in datasets foranalysis and processing by applying statistics, machine learning, intelligent computing,visualization and other techniques.
     Because outliers may contain important knowledge, outlier mining has a widerange of applications; related research will have important academic and practicalsignificance. However, faced with increasingly complex large-scale andhigh-dimensional datasets, it is a challenging problem that how to identify and deal withabnormal behavior quickly and effectively.
     Clustering structure is a common form in dataset with certain generatingmechanism, and distinct characteristic differences exist between different clusters. Wefocus on finding abnormity in datasets with clustering methods, analyzing andexplaining the outlying behavior of the outliers. The main contributions are listed asfollow.
     (1) The basic theory and classic algorithms of spectral clustering are analyzed andstudied roundly. Clustering on complex datasets can be implemented by applyingspectral method. A modified algorithm is proposed, in which an adaptive neighbor sizeparameter related to the density is introduced to calculate the similarity between objectsmore accurately. And the algorithm also automatically selects the optimal clusteringnumber according to calculating different dynamic validy indexes under differentcluster number. The stable cluster obtained by applying such algorithm is theprecondition of achieving effective outlier detection.
     (2) Spectral clustering is applied for outlier mining and a spectral clustering based unknown dataset structure analysis and outlier detection Progress is proposed, whichfirst clusters the dataset by applying spectral clustering algorithm, then calculates theoutlying factor of objects in “small” clusters and then confirms if the objects are outliersaccording to such values.
     (3) The basic theory and related algorithms of cloud model are analyzed andstudied and cloud model theory is applied for outlier mining. By combining the conceptof membership of cloud droplet in a cloud in cloud model theory and the concept ofoutlierness of outlier in a dataset in outlier mining theory, a cloud model based outlierand outlying behavior subspace detection algorithm is proposed.
     (4) According to the present situation of the researches in outlier mining area aremainly concentrated in searching outliers while the research of causes of outlieremergence and that of analysis and explanation of outlying beheavior are limited, anoutlying behavior subspace and key outlying behavior subspace search algorithm isproposed. The understanding of the concept of outlying paraphrase space related to acertain outlier is discussed and on the basis of that, the concepts of “strong paraphrasespace” and “weak paraphrase space” related to a class of outliers are proposed. And atlast a simple algorithm for generating “strong outlying paraphrase space” and “weakoutlying paraphrase space” related to outliers is proposed.

引文

[1] M. H. Dunham. Data Mining Introductory and Advanced Topics (影印版).北京:清华大学出版社.2003.
    [2] J. Han, M. Kamber. Data Mining: Concepts and Techniques [M].北京:中国机械工业出版社,2006.
    [3]李雄飞,董元方等.数据挖掘与知识发现[M].北京:高等教育出版社,2003.
    [4] M. Kantardzic. Data Mining Concepts, Models, Methods, and Algorithms[M]. New York:IEEE press.2002.
    [5] T. Hastie, R. Tibshirani等著.范明,柴玉梅等译.统计学习基础—数据挖掘、推理与预测[M].北京:电子工业出版社.2004.
    [6]吴今培,孙德山.现代数据分析[M].北京:机械工业出版社.2006.
    [7] M. M. Breunig, H. Kriegel, et al. OPTICS-OF: Identifying Local Outliers [C]. Proceedingsof the3rd European Conference on Principles and Practice of Knowledge Discovery inDatabases (PKDD ‘99),1999,262-270.
    [8] D. J. Miller, J. Browning. A Mixture Model and EM-based Algorithm for Class Discovery,Robust Classification, and Outlier Rejection in Mixed Labeled/Unlabeled Data Sets [J]. IEEETransactions on Pattern Analysis and Machine Intelligence,2003,25(11):1468-1483.
    [9] A. Ghoting, M. E. Otey, et al. LOADED: Link-based Outlier and Anomaly Detection inEvolving Data Sets [C]. Proceedings of the Fourth IEEE International Conference on DataMining,2004,387-390.
    [10] A.H. Pilevar, M. Sukumar. GCHL: A Grid-clustering Algorithm for High-dimensional VeryLarge Spatial Data Bases [J]. Pattern Recognition Letters,2005,26(7):999-1010.
    [11] M. Breitenbach, G. Z. Grudic. Clustering Through Ranking On Manifolds [C]. Proceedingsof the22nd International Conference on Machine Learning,2005,73-80.
    [12] H. Xiong, G. Pandey, et al. Enhancing Data Analysis with Noise Removal [J]. IEEETransactions on Knowledge and Data Engineering,2006,18(3):304-319.
    [13] Z. A. Bakar, R. Mohemad, et al. A Comparative Study for Outlier Detection Techniques inData Mining [C]. IEEE Conference on Cybernetics and Intelligent Systems,2006,1-6.
    [14] Z. Li, H. Ma, et al. A Unifying Method for Outlier and Change Detection from Data Streams[C]. International Conference on Computational Intelligence and Security,2006,580-585.
    [15] Y. Pei, O. R. Zaiane, et al. An Efficient Reference-based Approach to Outlier Detection inLarge Datasets [C]. Proceedings of the Sixth International Conference on Data Mining(ICDM ‘06),2006,478-487.
    [16] S. Goldenstein, C. Vogler. When Occlusions are Outliers [C]. Proceedings of the2006Conference on Computer Vision and Pattern Recognition Workshop (CVPRW ‘06),2006,89-98.
    [17] D. Wang, D. S. Yeung, et al. Structured One-Class Classification [J]. IEEE Transactions onSystems, Man, and Cybernetics, Part B: Cybernetics,2006,36(6):1283-1295.
    [18] D. Birant, A. Kut. Spatio-Temporal Outlier Detection in Large Databases [J]. Journal ofComputing and Information Technology,2006,4,291-297.
    [19] Z. Wang, J. Li, et al. Research of Spatial Outlier Detection Based on Quantitative Value ofAttributive Correlation [C]. Proceedings of the6th World Congress on Intelligent Controland Automation,2006,5906-5910.
    [20] A. Schaum. Advanced Methods of Multivariate Anomaly Detection [C]. IEEE AerospaceConference,2007,1-7.
    [21] A. Pawling, N. V. Chawla, et al. Anomaly Detection in a Mobile Communication Network [J].Comput Math Organ Theory,2007,13(4):407-422.
    [22] J. Ting, A. D’Souza, et al. Automatic Outlier Detection: A Bayesian Approach [C]. IEEEInternational Conference on Robotics and Automation,2007,2489-2494.
    [23] X. Song, M. Wu, et al. Conditional Anomaly Detection [J]. IEEE Transactions on Knowledgeand Data Engineering,2007,19(5):631-645.
    [24] V. J. Hodge, J. Austin. A Survey of Outlier Detection Methodologies [J]. ArtificialIntelligence Review.2004,22:85-126.
    [25] G. Kollios, D. Gunopulos, et al. Efficient Biased Sampling for Approximate Clustering andOutlier Detection in Large Data Sets [J]. IEEE Transactions on Knowledge and DataEngineering,2003,15(5):1170-1187.
    [26] S. Jiang, Q. Li, et al. GLOF: A New Approach for Mining Local Outlier [C]. Proceedings ofthe Second International Conference on Machine Learning and Cybernetics,2003,157-162
    [27] N. R. Adam, V. P. Janeja, et al. Neighborhood Based Detection of Anomalies in HighDimensional Spatio-Temporal Sensor Datasets [C]. ACM Symposium on Applied Computing,2004,586-583.
    [28] Z. Wang, S. Wang, et al. A Spatial Outlier Detection Algorithm Based Multi-attributiveCorrelation [C]. Proceedings of the Third International Conference on Machine Learning andCybernetics,2004,1727-1732.
    [29] C. C. Aggarwal, P. S. Yu. An Effective and Efficient Algorithm for High-dimensional OutlierDetection [J]. The VLDB Journal,2005,14(2):211-221.
    [30] P. D’Urso. Fuzzy Clustering for Data Time Arrays with Inlier and Outlier Time Trajectories[J]. IEEE Transactions on Fuzzy Systems,2005,13(5):583-604.
    [31] M. Wu, C. Jermaine. Outlier Detection by Sampling with Accuracy Guarantees [C].Proceedings of the12th ACM SIGKDD International Conference on Knowledge Discoveryand Data Mining,2006,767-772.
    [32] C. Strecha, R. Fransens, et al. Combined Depth and Outlier Estimation in Multi-View Stereo[C]. Proceedings of the IEEE Computer Society Conference on Computer Vision and PatternRecognition,2006,2394-2401.
    [33] T. Zhu. An Outlier Detection Model Based on Cross Datasets Comparison for FinancialSurveillance [C]. Proceedings of the2006IEEE Asia-Pacific Conference on ServicesComputing (APSCC ‘06),2006,601-604.
    [34] R. Fransens, C. Strecha, et al. Robust Estimation in the Presence of Spatially CoherentOutliers [C]. Proceedings of the Conference on Computer Vision and Pattern RecognitionWorkshop,2006,102-110.
    [35] M. Victor, W. Zibetti, et al. Outlier Robust and Edge-Preserving SimultaneousSuper-Resolution [C]. IEEE International Conference on Image Processing,2006,1741-1744.
    [36] M. Novotny, H. Hauser. Outlier-Preserving Focus+Context Visualization in ParallelCoordinates [J]. IEEE Transactions on Visualization and Computer graphics,2006,12(5):893-900.
    [37] S. Kim, S. Cho. Prototype Based Outlier Detection [C]. International Joint Conference onNeural Networks,2006,820-826.
    [38] N. Iyer, P. P. Bonissone. Automated Risk Classification and Outlier Detection [C].Proceedings of the IEEE Symposium on Computational Intelligence in Multicriteria DecisionMaking,2007,272-279.
    [39] T. Ahmed, M. Coates, et al. Multivariate Online Anomaly Detection Using Kernel RecursiveLeast Squares [C].26th IEEE International Conference on Computer Communications,2007,625-633.
    [40] K. Lekadir, R. Merrifield, et al. Outlier Detection and Handling for Robust3-D Active ShapeModels Search [J]. IEEE Transactions on Medical Imaging,2007,26(2):212-222.
    [41] R. S. Menjoge, R. E. Welsch. A Diagnostic Method for Simultaneous Feature Selection andOutlier Identification in Linear Regression [J]. Computational Statistics&Data Analysis,2010,54(12):3181-3193.
    [42] T.D. Nguyen, R. Welsch. Outlier Detection and Least Trimmed Squares ApproximationUsing Semi-Definite Programming [J]. Computational Statistics&Data Analysis,2010,54(12):3212-3226.
    [43] W. Jin, A. K. H. Tung, et al. Mining Top-n Local Outliers in Large Databases [C].Proceedings of the7th ACM SIGKDD International Conference on Knowledge Discoveryand Data Mining, San Francisco, California,2001.293–298.
    [44] C. C. Aggarwal, P. S. Yu. Outlier Detection for High Dimensional Data [C]. Proceedings of2001ACM SIGMOD International Conference on Management of Data,2001,37~47.
    [45] T. Hu, S. Y. Sung. Detecting Pattern-based Outliers [J]. Pattern Recognition Letters.2003,24:3059-3068.
    [46]李翠平,李盛恩等.一种基于约束的多维数据异常点挖掘方法[J].软件学报.2003,14(9):1571-1577.
    [47]李存华,孙志挥. GridOF:面向大规模数据集的高效离群点检测算法[J].计算机研究与发展.2003,40(11):1586-1592.
    [48] T. Lv, Z. Wang, et al. A New Unsupervised Clustering Method Based on Outlier Information[C]. Proceedings of the3th International Conference on Machine Learning and Cybernetics,Shanghai, China. New York: IEEE press,2004.1540-1544.
    [49] J. Hardin, D. M. Rocke. Outlier Detection in the Multiple Cluster Setting Using theMinimum Covariance Determinant Estimator [J]. Computational Statistics&Data Analysis.2004,44:625-638.
    [50] N. C. Schwertmana, M. Owens, et al. A Simple More General Boxplot Method forIdentifying Outliers [J]. Computational Statistics&Data Analysis.2004,47:165-174.
    [51] L. Wang, L. Zou. Research on Algorithms for Mining Distance-Based Outliers. ChineseJournal of Electronics [J].2005,14(3):485-490.
    [52] R. Ranta, V. Louis-Dorr, et al. Iterative Wavelet-Based Denoising Methods and RobustOutlier Detection [J]. IEEE Signal Processing Letters.2005,12(8):557-560.
    [53] H. Lee, S. Cho. Application of LVQ to Novelty Detection Using Outlier Training Data [J].Pattern Recognition Letters.2006,27:1572-1579.
    [54] J. Goupy. Factorial Experimental Design: Detecting an Outlier with the Dynamic Variableand the Daniel’s Diagram [J]. Chemometrics and Intelligent Laboratory Systems.2006,80(5):156-166.
    [55] F. Angiulli, C. Pizzuti. Outlier Mining in Large High-Dimensional Data Sets [J]. IEEE Trans.on Knowledge and Data Engineering.2005,17(2):203-215.
    [56] S. Ramaswamy, R. Rastogi, et al. Efficient Algorithms for Mining Outliers from Large DataSets [C]. Proceedings of2000ACM SIGMOD International Conference on Management ofData,2000.427-438.
    [57]杨风召,朱扬勇等. IncLOF:动态环境下局部异常的增量挖掘算法[J].计算机研究与发展,2004,41(3):477-484.
    [58] F. Angiulli, S. Basta, et al. Distance-based Detection and Prediction of Outliers [J]. IEEETrans. on knowledge and Data Engineering.2006,18(2):145-160.
    [59] S. Hawkins, H. He, et al. Outlier Detection Using Replicator Neural Networks [C].Proceedings of the4th International Conference on DaWaK Data Warehousing andKnowledge Discovery. Heidelberg: Springer-Verlag.2002.170-180.
    [60] W. Zhao, D. Chen, et al. Detection of Outlier and a Robust BP Algorithm against Outlier [J].Computers and Chemical Engineering.2004,28(10):1403-1408.
    [61] C Englunda, A Verikas. A Hybrid Approach to Outlier Detection in the Offset LithographicPrinting Process [J]. Engineering Applications of Artificial Intelligence.2005,18:759-768.
    [62] X. Li, G. Li, et al. Outlier Detection in Near Infra-Red Spectra with Self Organizing Map [J].Transactions of Tianjin University.2005,11(2):129-132.
    [63] S. Han, S. Cho. Evolutionary Neural Networks for Anomaly Detection Based on theBehavior of a Program [J]. IEEE Trans. on Systems, Man, and Cybernetics, Part B:Cybernetics.2006,36(3):559-570.
    [64] M. I. Petrovskiy. Outlier Detection Algorithms in Data Mining Systems [J]. Programmingand Computer Software.2003,29(4):228-237.
    [65] C. Lin, A. Chen. Fuzzy Discriminant Analysis with Outlier Detection by Genetic Algorithm[J]. Computers&Operations Research.2004,31(3):877-888.
    [66] Wen-Liang Hung, Miin-Shen Yang. An Omission Approach for Detecting Outliers in FuzzyRegression Models[J]. Fuzzy Sets and Systems.2006,157:3109-3122.
    [67]耿焕同,于琨等.一种基于离群数据挖掘的数据查询方法[J].中国科学技术大学学报.2004,34(2):213-218.
    [68] X. Liu, G. Cheng, et al. Analyzing Outlier Cautiously [J]. IEEE Trans. on Knowledge andData Engineering.2002,14(2):432-437.
    [69] N. Xu, Y. Zhang. An Efficient Reduction Algorithm of High-Dimensional Decision TablesBased on Rough Sets Theory [C]. Proceedings of the5th World Congress on IntelligentControl and Automation,2004,4304-4308.
    [70] K. Yamanoshi, J. Takeuchi. A Unifying Framework for Detecting Outliers and Change Pointsfrom Time Series [J]. IEEE Trans. on knowledge and Data Engineering.2006,18(4):482-492.
    [71] K. Yamanishi, J. Takeuchi, et al. On-Line Unsupervised outlier Detection Using FiniteMixtures with Discounting Learning Algorithms [J]. Data Mining and Knowledge Discovery.2004,8:275-300.
    [72] R. D. Banker, H. Chang. The Super-efficiency Procedure for Outlier Identification, not forRanking Efficient Units [J]. European Journal of Operational Research.2006,175:1311-1320.
    [73] E. M. Knorr, R. T. Ng. Finding Intensional Knowledge of Distance-based Outliers [C].Proceedings of the25th International Conference on Very Large Data Bases,1999.211~222.
    [74] Z. Chen, J. Tang, et al. Modeling and Efficient Mining of Intentional Knowledge of Outliers[C]. Proceedings of the7th International Database Engineering and Applications Symposium.IEEE Computer Society,2003.1-10.
    [75] S. Papadimitriou, H. Kitagawa, et al. LOCI: Fast Outlier Detection Using the LocalCorrelation Integral [C]. Proceedings of the19th International Conference on DataEngineering. IEEE Computer Society,2003.315-326.
    [76]周晓云,孙志挥等.高维类别属性数据流离群点快速检测算法[J].软件学报,2007,18(4):933-942.
    [77]孙焕良,鲍玉斌等.一种基于划分的孤立点检测算法[J].软件学报,2006,17(5):1009-1016.
    [78]薛安荣,鞠时光等.局部离群点挖掘算法研究[J].计算机学报,2007,30(8):1455-1463.
    [79] Y. Jin, Q. Zhu. An Exceptional Reduction Algorithm for Outliers Analyzing inHigh-Dimension Space [C]. Proceedings of6th World Congress on Intelligent Control andAutomation,2006,5911-5914.
    [80] J. Chen, J. A. Romagnoli. A Strategy for Simultaneous Dynamic Data Reconciliation andOutlier Detection [J]. Computers&Chemical Engineering,1998,22(4):559-562.
    [81] E. Hund, D. L. Massart, et al. Robust Regression and Outlier Detection in the Evaluation ofRobustness Tests with Different Experimental Designs [J]. Analytica Chimica Acta,2002,463(1):53-73.
    [82] J. Pan. Discordant Outlier Detection in the Growth Curve Model with Rao's SimpleCovariance Structure [J]. Statistics&Probability Letters,2004,69(2):135-142.
    [83] Q. Zhou, S. L, et al. Detection of Outliers and Establishment of Targets in External QualityAssessment Programs [J]. Clinica Chimica Acta,2006,372(1):94-97.
    [84] S. Bandyopadhyay, S. Santra. A Genetic Approach for Efficient Outlier Detection inProjected Space [J]. Pattern Recognition,2008,41(4):1338-1349.
    [85] S. M. Guo, L. C. Chen, et al. A Boundary Method for Outlier Detection Based on SupportVector Domain Description [J]. Pattern Recognition,2009,42(1):77-83.
    [86] P. Wiegand, R. Pell, et al. Simultaneous Variable Selection and Outlier Detection Using aRobust Genetic Algorithm [J]. Chemometrics and Intelligent Laboratory Systems,2009,98(2):108-114.
    [87] K. A. Yoon, D. Bae. A Pattern-based Outlier Detection Method Identifying AbnormalAttributes in Software Project Data [J]. Information and Software Technology,2010,52(2):137-151.
    [88] F. N. Gumedze, S. J. Welham, et al. A Variance Shift Model for Detection of Outliers in theLinear Mixed Model [J]. Computational Statistics&Data Analysis,2010,54(9):2128-2144.
    [89] D. L. Massart, L. Kaufman, et al. Least Median of Squares: A Robust Method for Outlier andModel Error Detection in Regression and Calibration [J]. Analytica Chimica Acta,1986,(187):171-179.
    [90] A. Ukkelberg, O. S. Borgen. Outlier Detection by Robust Alternating Regression [J].Analytica Chimica Acta,1993,277(2):489-494.
    [91] G. C. Lalor, C. Zhang. Multivariate Outlier Detection and Remediation in GeochemicalDatabases [J]. The Science of the Total Environment,2001,281(1):99-109.
    [92] J. W. Wisnowski, D. C. Montgomery, et al. A Comparative Analysis of Multiple OutlierDetection Procedures in the Linear Regression Model [J]. Computational Statistics&DataAnalysis,2001,36(3):351-382.
    [93]崔贯勋.基于密度的离群数据挖掘算法研究[D].重庆:重庆大学,2007.
    [94] S. Chen, W. Wang, et al. A Comparison of Outlier Detection Algorithms for ITS Data [J].Expert Systems with Applications,2010,37(2):1169-1178.
    [95] E. Knorr, R. Ng. Algorithms for Mining Distance-Based Outliers in Large Datasets [C].Proceedings of International Conference on Very Large Databases,1998,392-403.
    [96] B. Walczak. Outlier Detection in Multivariate Calibration [J]. Chemometrics and IntelligentLaboratory Systems,1995,28(2):259-272.
    [97] M. R. Brito, E. L. Chavez, et al. Connectivity of the Mutual k-nearest-neighbor Graph inClustering and Outlier Detection[J]. Statistics&Probability Letters,1997,35(1):33-42.
    [98] A. Luceno. Multiple Outliers Detection through Reweighted Least Deviances [J].Computational Statistics&Data Analysis,1998,26(3):313-326.
    [99]蔡博文.高维数据集中离群数据挖掘方法的研究[D].合肥:合肥工业大学,2006.
    [100] R. Ostermark. A Fuzzy Vector Valued KNN-algorithm for Automatic Outlier Detection [J].Applied Soft Computing,2009,9(4):1263-1272.
    [101] M. Chaouch, C. Goga. Design-based Estimation for Geometric Quantiles with Application toOutlier Detection [J]. Computational Statistics&Data Analysis,2010,54(10):2214-2229.
    [102] M. J. Gomez, Z. De Benzo, et al. Comparison of Methods for Outlier Detection and TheirEffects on the Classification Results for a Particular Data Base [J]. Analytica Chimica Acta,1990,239,229-243.
    [103] S. Chib, R. C. Tiwari. Outlier Detection in the State Space Model [J]. Statistics&ProbabilityLetters,1994,20(2):143-148.
    [104] C. Hartmann, P. Vankeerberghen, et al. Robust Orthogonal Regression for the OutlierDetection When Comparing Two Series of Measurement Results [J]. Analytica Chimica Acta,1997,344(1):17-28.
    [105] J. A. Pierna, F. Wahl, et al. Methods for Outlier Detection in Prediction [J]. Chemometricsand Intelligent Laboratory Systems,2002,63(1):27-39.
    [106]薛安荣.空间离群点挖掘技术的研究[D].镇江:江苏大学,2008.
    [107] W. Cui, X. Yan. Adaptive Weighted Least Square Support Vector Machine RegressionIntegrated with Outlier Detection and Its Application in QSAR [J]. Chemometrics andIntelligent Laboratory Systems,2009,98(2):130-135.
    [108] M. E. Tarter. Density Estimation Applications for Outlier Detection [J]. Computer Programsin Biomedicine,1979,10(1):55-60.
    [109] G. Puchwein, A. Eibelhuber. Outlier Detection in Routine Analysis of Agricultural GrainProducts by Near-infrared Spectrometry [J]. Analytica Chimica Acta,1989,223,95-103.
    [110] P. John, L. James. Method and Apparatus for Biological Fluid Analyte ConcentrationMeasurement Using Generalized Distance Outlier Detection [J]. Laboratory Automation&Information Management,1997,33(2):145.
    [111]汤俊.基于可疑金融交易识别的离群模式挖掘研究[D].武汉:武汉理工大学,2007.
    [112] Y. Chen, D. Miao, et al. Neighborhood Outlier Detection [J]. Expert Systems withApplications,2010,37(12):8745-8749.
    [113] M. M. Breunig, H. P. Kriegel, et al. LOF: Identifying Density Based Local Outliers [C].Proceedings of ACM Conference,2000,93-104.
    [114] H. J. Escalante. A Comparison of Outlier Detection Algorithms for Machine Learning [J].Programming and Computer Software,2005,228–237.
    [115] B. V. Cutsem, I. Gath. Detection of Outliers and Robust Estimation Using Fuzzy Clustering[J]. Computational Statistics&Data Analysis,1993,15(1):47-61.
    [116]金义富.高维稀疏离群数据集延伸知识发现研究[D].重庆:重庆大学,2007.
    [117] F. Liu, B. Wu. Multi-group Cancer Outlier Differential Gene Expression Detection [J].Computational Biology and Chemistry,2007,31(2):65-71.
    [118] J. A. S. Almeida, L. M. S. Barbosa, et al. Improving Hierarchical Cluster Analysis: A NewMethod with Outlier Detection and Automatic Clustering [J]. Chemometrics and IntelligentLaboratory Systems,2007,87(2):208-217.
    [119]鞠可一,周德群等.高维离群检测算法及其应用[J].系统工程,2008,26(11):116-122.
    [120] M.F. Jiang, S. S. Tseng, et al. Two-phase Clustering Process for Outlier Detection [J]. Patternrecognition letters,2001,22(6):691-700.
    [121] D. Yu, G. Sheikholeslami, et al. Findout: Finding Outliers in Very Large Datasets [J].Knowledge and Information Systems,2002,4(4):387-412.
    [122]高新波：模糊聚类分析[M]。西安电子科大出版社,2004.
    [123] J. Shi, J. Malik. Normalized Cuts and Image Segmentation [J]. IEEE Transactions on PatternAnalysis and Machine Intelligence,22(8),2000,888-905.
    [124] A.Y. Ng, M.I. Jordan, et al. On spectral clustering: Analysis and an algorithm [J]. Advancesin Neural Information Processing Systems (NIPS),14,2001.
    [125] M. Meila. The Multicut Lemma. UW Statistics Technical Report417,2001.
    [126] M. Meila, J. Shi. Learning Segmentation with Random Walk. In NIPS,2001.
    [127] S. X. Yu, J. Shi. Multiclass Spectral Clustering [C]. Proceedings of the Ninth IEEEInternational Conference on Computer Vision (ICCV’03),2003,313-319.
    [128] M. Meila, S. Shortreed, et al. Regularized spectral learning. UW-Stat Dept TR No.465.2004
    [129] F. R. Bach, M. I. Jordan. Learning Spectral Clustering [J]. Advances in Neural InformationProcessing Systems (NIPS),16,2004.
    [130] A. Azram, Z. Ghahramani. Spectral Methods for Automatic Multiscale Data Clustering,Proceedings of the2006IEEE Society Conference on Computer Vision and PatternRecognition Vol.1,2006,190-197.
    [131] C. Hennig. Cluster-wise Assessment of Cluster Stability. Research report no.271,Department of Statistical Science, University College London, December2006.
    [132] B. Mohar. The Laplacian Spectrum of Graphs [J]. Graph Theory, Combinatorics, andApplications,1988:871-898.
    [133] B. Mohar. Some Applications of Laplace Eigenvalues of Graphs [J]. Graph Symmetry:Algebraic Methods and Applications,1997:225-275.
    [134] F. Chung. Spectral Graph Theory [M]. Providence: AMS and CBMS,1997.
    [135] D. Verma, M. Meila. A Comparison of Spectral Clustering Algorithms. UW CSE.Technicalreport03-05-01,2003.
    [136] M. Meila, J. Shi. A Random Walks View of Spectral Segmentation [C]. Proceedings of the8th International Workshop on Artificial Intelligence and Statistics (AISTATS),2001,3-7.
    [137]李德毅,孟海军等．隶属云和隶属云发生器[J].计算机研究和发展，1995，32(6):16-21
    [138]李德毅,刘常昱等.不确定性人工智能[J].软件学报，2004，15(11):1583-1594.
    [139]王梓坤.概率论基础及其应用[M].北京:北京师范大学出版社，1995.
    [140]李德毅,史雪梅.语言原子模型和似然推理[J].计算机智能接口与智能应用论文集,1993,272-277.
    [141]李德毅.知识表示中的不确定性[J].中国工程科学,2000,2(10):73-79.
    [142]李德毅,刘常昱.论正态云模型的普适性[J].中国工程科学,2004,6(8):28-34.
    [143] D. Zhou, O. Bousquet, et al. Learning with Local and Global Consistency [C]. Advances inNeural Information Processing Systems16,2004,32l-328.
    [144] U. V. Luxburg. A Tutorial on Spectral Clustering [J]. Stat Comput,2007,17(4):395-416.
    [145] C. H. Wang. Recognition of Semiconductor Defect Patterns Using Spatial Filtering andSpectral Clustering [J]. Expert Systems with Applications,2008,34(3):1914-1923.
    [146] S. Ray, R.H. Turi. Determination of Number of Clusters in K-means Clustering andApplication in Colour Image Segmentation. The4th International Conference on Advances inPattern Recognition and Digital Techniques (ICAPRDT99),1999,27-29.
    [147] J. Shen, S. I. Chang, et al. Determination of Cluster Number in Clustering Microarray Data,Applied Mathematics and Computation169(2005)1172-1185.
    [148]王梓坤.概率论基础及其应用[M].北京:北京师范大学出版社,1996.
    [149] J. Zhang, H. Wang. Detecting Outlying Subspaces for High-dimensional Data: The New Task,Algorithms, and Performance [J]. Knowledge Information System.2006,10(3):333-355.
    [150] J. Zhang, Q. Gao, et al. A Novel Method for Detecting Outlying Subspaces inHigh-dimensional Databases Using Genetic Algorithm [C]. International Conference on DataMining,2006:731-740.
    [151] F. Angiulli, L. Palopoli, Detecting Outlying Properties of Exceptional Objects [J]. ACMTrans on Database Systems,2009:34:62-74.
    [152] S. Lin, H. Chalupsky. Discovering and Explaining Abnormal Nodes in Semantic Graphs [J].IEEE Trans on Knowledge and Data Engineering.2008,20(8):14-22.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700