异常检测方法及其关键技术研究

英文题名：Research on Outlier Detection Method and Its Key Techniques
作者：陈斌
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：异常检测 ; 支持向量数据描述 ; 鲁棒性 ; 加权平均 ; 可能性C－均值 ; 多视图学习 ; AUC性能 ; 流形嵌入 ; AUC正则化
英文关键词：outlier detection ; support vector data description ; robustness ; weighted averaging ; possibilisitic C-means ; multiview learning ; AUC metric ; manifold embedding
学位年度：2013
导师：陈松灿
学科代码：081203
学位授予单位：南京航空航天大学
论文提交日期：2013-03-01

摘要

所谓异常检测就是检测和发现观测数据中不符合正常（期望）行为的异常数据模式，根据应用领域的不同，这些异常模式也被称为野值点、不一致点、新颖点、离群点或者污点。近年来异常检测已广泛用于故障诊断、疾病检测、入侵检测、信用卡（或保险）欺诈检测及身份辨识等领域。在这些领域中，异常模式常常蕴含了显著的（通常具有很大危害甚至致命的）行为信息，如互联网中网络流量(行为)的异常可能意味着受攻击主机上敏感信息的泄密，信用卡的欺诈行为会导致巨大的经济损失。因此异常检测的研究极具理论意义和实用价值，并已得到了广泛的关注，成为了模式识别领域中一个非常活跃和热门的研究方向。
     异常检测任务的特殊性往往在于只有符合期望（正常类）行为的数据模式，而罕有或未知违反符合期望（异常类）行为的数据模式，此两类观察样本的极端不平衡性（异常类样本数远小于正常类样本数）使得异常检测非常困难。因而目前对异常检测方法的研究主要集中于无监督学习框架和一些利用极少数有标号异常样本的监督学习方法。本文针对各种异常检测方法的原理、鲁棒性和先验信息嵌入等方面进行了深入研究，主要工作如下：
     1.提出了基于单簇聚类的数据描述OCCDD (One-cluster Clustering based Data Description)，其利用单簇类聚类算法可能性C－均值PCM (Possibilistic C－Means）即P1M(PCM,C=1)进行权值计算并采用加权平均方法求解包含超球，克服了SVDD (Support Vector Data Description）采用极小极大化估计包含大多数正常类样本超球时超球中心对野值点的不鲁棒性，避免了SVDD求解二次规划的高训练复杂性。并从理论上证明了P1M拥有PCM（C>1）一般不具备的全局最优特性。进一步针对文本分类等应用中自然形成的观测数据的多视图特性，对OCCDD进行拓展，提出了一种多视图的异常检测方法，不同于单个视图上的单独训练，其实现了多视图的同时学习和相互促进。
     2.提出了AUC (Area under the ROC curve)正则化的SVDD，其针对异常类样本分布在正常类样本四周的情形，利用AUC度量对样本分布和错分代价的不敏感性，将AUC度量作为正则化项嵌入到SVDD优化目标中，从而同时优化最小包含球体积和AUC性能，解决了一般异常检测器不能胜任存在极少异常类样本的极端不平衡样本分布问题。此后，针对AUC正则化方法产生的高训练复杂性，提出了两种解决方案进行加速。
     3.提出了一种流形学习算法的设计框架：mXXX≈ISOMAP+XXX（XXX可为任一基于欧氏距离的学习算法），其仅需将原空间的测地距离近似为ISOMAP降维空间上的欧氏距离，而无需显式ISOMAP降维，即在隐含ISOMAP降维后空间上执行原XXX算法而实现流形结构信息的嵌入。针对观测数据位于或接近于低维非线性流形时欧氏距离难以真实地刻画其几何结构的不足，采用上述框架以SVDD为例设计了流形嵌入的SVDD (mSVDD)，算法优点如下：（1）通过对ISOMAP降维空间中欧氏距离的近似计算，解决了前述基于测地距离的SVDD无法直接优化的问题；（2）无需真正执行ISOMAP的MDS (Multidimensional Scaling)和嵌入流形维数的选择（；3）不同于原空间(基于欧氏距离的)SVDD，mSVDD基于测地距离并隐含执行了ISOMAP，故能实现流形嵌入。
     4.揭示了基于支撑域的异常检测器和密度估计的关系。在综述目前的异常检测方法基础上，重点就两种基于支撑域的单分类器：单类支持向量机（One-class SVM，One-class Support VectorMachine)和支持向量数据描述SVDD，揭示了高斯核核化后它们与密度估计之间的本质性关系：首先，将基于支撑域的单分类器统一到密度估计的框架下；其次，还证明了基于支撑域的单分类器诱导的密度估计和真实密度一致，优化这些单分类器的同时也能减小积分平方误差。
Outlier detection is to detect and discover those abnormal data patterns not conforming to normal(expected) behavior in observed data. These abnormal patterns are noted as outlier, inconsistent point,novelty or stain for different applications. Recent years, outlier detection is widely applied in faultdiagnosis, disease detection, intrusion detection, credit card (or insurance) fraud detection and personidenfication. In these areas, the abnormal pattern often implies significant (usually greatly harmedeven deadly) behavior. For instance, the abnormal traffic (behavior) in Internet may imply the leakageof sensitive information in attacked host, and credict card fraud behavior would lead to greateconomic loss. For the great pratical meaning and value, outlier detection is now becoming a veryactive and hot research area. As a result, many researchers pay close attention to the research in thearea.
     Different from other learning task, outlier detection task is with only data patterns conforming toexpected behavior (target class), and rare (even no) data patterns not conforming to expected behavior(outlier class). So there exists extreme imbalance (outlier samples are much less than target samples)leading to great difficulty in outlier detection. Therefore, recent research maily focused inunsupervised learning framework and supervised learning method with a very few labeled outliersamples. Based on the deep research on the principles of various outlier detection methods, robustnessto outliers and the embedding of prior knowledge, the contributions of this paper are as followed:
     1. First, One-cluster Clustering based Data Description (OCCDD) is proposed which employsthe PCM (Possibilisitic C-Mean) algorithm with one cluster, that is, P1M(PCM,C=1) to compute theweights, and hereafter, obtains an enclosing ball with weight averaging. As a result, OCCDD advoidsthe sensitivity to outliers and high training complexity in Support Vector Data Description (SVDD)due to minimax optimization. Second, global optimal charactistic of P1M which original PCM (C>1)has no is proved in theory. In the end, a multiview OCCDD is proposd to adapt the instinctivemultiview property in text classification. Different from general classifers learn in single view,multiview OCCDD simultaneously learns from all views, and increases the performance owing toeach view boosting mutally.
     2. A SVDD regularized with Area under the ROC curve (AUC) is proposed towards the situationthat outliers lie around the target samples. The regularized SVDD incorporates AUC measure into theoptimizing object of SVDD, and simultaneously optimizes the volume of minimum enclosing ball andAUC performance so as to deal with the extreme balance in class distribution. Then, two speed tricksare proposed to solve the high training complexity after AUC regularization.
     3. A designing framework for manifold-based classifier: mXXX≈ISOMAP+XXX (here, XXXdenotes an existed learning algorithm based on Euclid Distance) is proposed, which replaces theEuclid distance in the feature space after ISOMAP dimension reduction by the Geodesic Distance ininput space, and implicitly conducts a ISOMAP without the truly ISOMAP process. When underlyingmanifold of the observed data existed, SVDD performance degrades since Euclid Distance cannotdepict the true geometrical structure, so we extend this method to SVDD and derivate a SVDD withManifold Embedding (mSVDD). After manifold embedding, mSVDD has advantages as follows:(1)With the approximation of Euclid Distances in the feature space induced by ISOMAP process, itsolves the problem that Geodesic Distance based SVDD cannot be directly optimized;(2)It avoidstruly Multidimensional Scaling (MDS) process in ISOMAP and selection of the dimension of theEuclid space after ISOMAP;(3) Different from formal Euclid Distance based SVDD, mSVDD isbased on Geodesic Distance, and implicitly executes a ISOMAP process, thus it can find a manifoldembedding.
     4. The relationship beween density estimation and domain-based outlier dectectors is revealed,especially, the essential relation between kernel density estimation and two domain-based outlierdetectors (One-Class Support Vector Machine (OCSVM) and SVDD) induced by Gaussian kernel.That is, domain-based outlier detectors are falling into the framework of density estimation. Moreover,the density estimator induced by OCSVM and SVDD is consistent to the true density; meanwhile,optimizing OCSVM and SVDD can also reduce the Integrated Squared Error (ISE).

引文

[1]. R. Duda, P. Hart, D. Stork. Pattern Classification[M]. Second. John Wiley&Sons,2001.
    [2]. Andrea Cerioli, AnthonyC. Atkinson, Marco Riani. Some Perspectives on Multivariate OutlierDetection[M]. Springer Berlin Heidelberg,2011.
    [3]. D. Hawkins. Identification of outliers (Monographs on applied probability and statistics)[M].Springer,1980.
    [4]. F. Y. Edgeworth. On discordant observations[J]. Philosophical Magazine,1887,23(5):364-375.
    [5]. H. Viljoen, J. H. Venter. Identifying multivariate discordant observations: a computer-intensiveapproach[J]. Computational Statistics&Data Analysis,2002,40(1):159-172.
    [6]. C. Bishop. Novelty Detection and Neural Network Validation[C]. IEE Proceedings---Vision,Image and Signal Processing.1994:217-222.
    [7]. Dit-yan Yeung, Calvin Chow. Parzen-window network intrusion detectors[C]. In Proceedings ofthe Sixteenth International Conference on Pattern Recognition.2002:385-388.
    [8]. C. Bishop. LOF: identifying density-based local outliers[J]. SIGMOD Rec.,2000,29(2):93-104.
    [9]. dit-Yan Yeung, Calvin Chow. Parzen-Window Network Intrusion Detectors[C]. ICPR '02:Proceedings of the16th International Conference on Pattern Recognition (ICPR'02). Washington, DC,USA,2002:385-388.
    [10]. S. Roberts, L. Tarassenko. A probabilistic resource allocating network for novelty detection[J].Neural computation,1994,91(435):1047-1061.
    [11]. J. Ilonen, P. Paalanen, J. K. Kamarainen, et al. Gaussian mixture pdf in one-class classification:computing and utilizing confidence values[C]. The18th International Conference on PatternRecognition (ICPR'06). Hong Kong,2006:577-580.
    [12]. Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, et al. LOF: identifying density-basedlocal outliers[J]. SIGMOD Rec.,2000,29(2):93-104.
    [13]. Alexander G. Tartakovsky, Aleksey S. Polunchenko, Grigory Sokolov. Efficient ComputerNetwork Anomaly Detection by Changepoint Detection Methods[J]. IEEE Journal of Selected Topicsin Signal Processing,2012, PP(99):1-7.
    [14]. Shohei Hido, Yuta Tsuboi, Hisashi Kashima, et al. Statistical outlier detection using directdensity ratio estimation[J]. Knowledge and Information Systems,2011,26(2):309-336.
    [15]. Yixin Chen, Xin Dang, Hanxiang Peng, et al. Outlier detection with kernelized spatial depthfunction[J]. IEEE Transaction on Pattern Analysis and Machine Intelligence,2008,31(2):288-309.
    [16]. D. Tax. One-Class Classification---Concept-learning in the absence of counter-examples[D].Universiteit Delft,2001.
    [17]. P. S. Bradley, O. L. Mangasarian. k-Plane Clustering[J]. Journal of Global Optimization,2000,16(1):23-32.
    [18]. T. Hastie, W. Stuetzle. Principal curves[J]. Journal of the American Statistical Association,1988,84(406):502-516.
    [19]. Alexander J. Smola, Sebastian Mika, Bernhard Scholkopf, et al. Regularized PrincipalManifolds[J]. Journal of Machine Learning Research,2001,1(6):179-209.
    [20]. P. Juszczak. Learning to recognise---A study on one-class classification and active learning[D].Universiteit Delft,2006.
    [21]. H. Hoffmann. Kernel PCA for novelty detection[J]. Pattern Recogn.,2007,40(3):863-874.
    [22]. B. Scholkopf, J. Platt, J. Shawe-Taylor, et al. Estimating the support of high-dimensionaldistribution[J]. Neural Computation,2001,13(7):1443-1471.
    [23]. D. Tax, R. Duin. Support Vector Data Description[J]. Machine Learning,2004,54(1):45-66.
    [24]. D. Tax, R. Duin. Support vector domain description[J]. Pattern Recognition Letters,1999,20(11-13):1191-1199.
    [25]. A. N. Dolia, C. J. Harris, J. S. Shawe-Taylor, et al. Kernel ellipsoidal trimming[J].Computational Statistics&Data Analysis,2007,52(1):309-324.
    [26]. Qing Tao, Gao-wei Wu, Jue Wang. A new maximum margin algorithm for one-class problemsand its boosting implementation[J]. Pattern Recognition,2005,38(10):1071-1077.
    [27]. Volker Roth. Kernel Fisher Discriminants for Outlier Detection[J]. Neural Computation,2006,18(4):942-960.
    [28]. Andras Banhalmi, Andras Kocsor, Robert Busa-Fekete. Counter-Example Generation-BasedOne-Class Classification[C]. The18th European Conference on Machine Learning (ECML)ECML.2007:543-550.
    [29].边肇祺,张学工.模式识别[M].第二版.清华大学出版社,2000.
    [30]. P. Meinicke, T. Twellmann, H. Ritter. Maximum Contrast Classifiers[C]. Proceedings of theInternational Conference on Artificial Neural Networks.2002:745-750.
    [31]. Garcia J. Nunez, Z. Kutalik, K. H. Cho, et al. Discriminative Densities from Maximum ContrastEstimation[C]. Advances in Neorual Information Processing,2002.
    [32]. Charles X. Ling, Jin Huang, Harry Zhang. AUC: a Statistically Consistent and moreDiscriminating Measure than Accuracy[C]. Proceedings of18th International Joint Conference onArtificial Intelligence. Acapulco, Mexico,2003:519-526.
    [33]. Vladimir N. Vapnik. The Nature of Statistical Learning Theory[M]. Second. Springer-Verlag,1999.
    [34]. Ingo Steinwart, Don Hush, Clint Scovel. A classification framework for anomaly detection[J]. J.Machine Learning Research,2005,6:211-232.
    [35]. VARUN CHANDOLA, ARINDAM BANERJEE, VIPIN KUMAR. Anomaly Detection: ASurvey[J]. ACM Computing Surveys,2009,41(3):1-58.
    [36]. V. Hodge, J. Austin. A Survey of Outlier Detection Methodologies[J]. Artificial IntelligenceReview,2004,22:85-126.
    [37].冯爱民,陈松灿.基于核的单类分类器研究[J].南京师范大学学报（工程技术版）,2008,8(4):1-5.
    [38].潘志松,陈斌,缪志敏, et al. One-Class分类器研究[J].电子学报,2009,37(11):2496-2503.
    [39]. L. Tarassenko, P. Hayton, M. Brady. Novelty detection for the identification of masses inmammograms[C]. Proc. of the Fourth International IEE Conference on Artificial Neural Networks.1995:442-447.
    [40]. Friedrich Pukelsheim. The three sigma rule[J]. The American Statistician,1994,48(2):88-91.
    [41]. C. Bishop. Neural Networks for Pattern Recognition[M]. Oxford University Press,1995.
    [42]. S. R. Sain, H. L. Gray, W. A. Woodward, et al. Outlier detection from a mixture distributionwhen training data are unlabeled[J]. Bulletin of the Seismological Society of America,1999,89:294-304.
    [43]. Jian Zhang, Rajgopal Kannan. A Sparse Bayesian Framework for Anomaly Detection inHeterogeneous Networks[C]. ICST QShine2010. Houston, Texas, USA,2010:75-87.
    [44]. Takafumi Kanamori, Shohei Hido, Masashi Sugiyama. Efficient Direct Density Ratio Estimationfor Non-stationarity Adaptation and Outlier Detection[J].2009:809-816.
    [45]. M. F. Jiang, S. S. Tseng, C. M. Su. Two-phase clustering process for outliers detection[J].Pattern Recognition Letters,2001,22(6-7):691-700.
    [46]. D. Hochbaum, D. Shmoys. A best possible heuristic for the K-Center problem[J]. Mathematicsof Operations Research,1985,10(2):180-184.
    [47]. Hyoung-joo Lee, Sungzoon Cho. Application of LVQ to novelty detection using outlier trainingdata[J]. Pattern Recogn. Lett.,2006,27(13):1572--1579.
    [48]. Iwan Syarif, Adam Prugel-Bennett, Gary Wills. Unsupervised Clustering Approach for NetworkAnomaly Detection[C]. The fourth International conference on Networked Digital Technologies.Canadian University of Dubai, UAE,2012:135-145.
    [49]. Yasser Yasami, Saadat Pour Mozaffari. A novel unsupervised classification approach fornetwork anomaly detection by k-Means clustering and ID3decision tree learning methods[J]. JSupercomput,2010,53:231-245.
    [50]. Iwan Syarif, Adam Prugel-Bennett, Gary Wills. Unsupervised Clustering Approach for NetworkAnomaly Detection[C]. Networked Digital Technologies. Dubai, UAE,2012:135-145.
    [51]. P. S. Bradley, O. L. Mangasarian. k-Plane Clustering[J]. Journal of Global Optimization,2000,16(1):22-32.
    [52]. Yoshiki Kanda, Romain Fontugne, Kensuke Fukuda, et al. ADMIRE: Anomaly detectionmethod using entropy-based PCA with4three-step sketches[J]. Computer Communications,2013,35(5):575-588.
    [53]. Hari Om, Tanmoy Hazra. Design of Anomaly Detection System for Outlier Detection inHardware Profile Using PCA.[J]. International Journal on Computer Science&Engineering,2012,4(9):1623-1632.
    [54]. Atef Abdelkefi, Yuming Jiang, Wei Wang, et al. Robust Traffic Anomaly Detection withPrincipal Component Pursuit[C]. Proceedings of the ACM CoNEXT Student Workshop. Philadelphia,USA,2010,10:1-10:2.
    [55]. EMMANUEL J. CANDE S, XIAODONG LI, YI MA, et al. Robust Principal ComponentAnalysis?[J]. Journal of the ACM,2011,58(3):11:1-37.
    [56].张军平,王珏.主曲线研究综述[J].计算机学报,2003,26(2):129-146.
    [57]. T. Cover. Geometrical and statistical propertyes of systems of linear inequalityies in patternrecognition[J]. IEEE Trans. Electron. Comput.,1965,14:326-334.
    [58]. James E. Fowler, Qian Du. Anomaly Detection and Reconstruction From RandomProjections.[J]. IEEE Transactions on Image Processing,2012,21(1):184-195.
    [59]. Smola A. J, Mica S, Scholkopf B. Regularized principal manifolds[J]. Journal Of MachineLearning Research,2001,1(6):179-209.
    [60]. Piotr Juszczak, David M. J. Tax, Elzbieta Pekalska, et al. Minimum spanning tree basedone-class classifier[J]. Neurocomputing,2009,72(7-9):1859-1869.
    [61]. Joshua B. Tenenbaum, Vin Silva, John Langford. A Global Geometric Framework for NonlinearDimensionality Reduction[J]. Science,2000,290:2319-2323.
    [62]. Sam T. Roweis, Lawrence K. Saul. Nonlinear dimensionality reduction by locally linearembedding[J]. Science,2001,290:2323-2326.
    [63]. Kamalika Das, Kanishka Bhaduri, Petr Votava. Distributed anomaly detection using1-classSVM for vertically partitioned data[J]. Statistical Analysis and Data Mining,2011,4(4):393-406.
    [64]. Jiang Tian, Hong Gu, Chiyang Gao, et al. Local density one-class support vector machines foranomaly detection[J]. Nonlinear Dynam,2011,64:127-130.
    [65]. Guichong Li, Nathalie Japkowicz, Lian Yang. Anomaly Detection via Coupled GaussianKernels[C]. Canadian Conference on Artificial Intelligence. Toronto, ON, Canada,2012:343-349.
    [66]. Krikamol Muandet, Bernhard Scholkopf. One-Class Support Measure Machines for GroupAnomaly Detection[EB/OL]. http://arxiv.org/abs/1303.0309,2013.
    [67]. Masud Moshtaghi, Timothy C. Havens, James C. Bezdek, et al. Clustering ellipses for anomalydetection[J]. Pattern Recognition,2011,44(1):55-69.
    [68]. B. Scholkopf, C. Burges, V. Vapnik. Extracting support data for a given task[C]. FirstInternational Conference on Knowledge Discovery&Data Mining.1995:
    [69]. Ivor W. Tsang, James T. Kwok, Pak-Ming Cheung. Core Vector Machines: Fast SVM Trainingon Very Large Data Sets[J]. Journal of Machine Learning Research,2005,6(5):363-392.
    [70]. Xu-sheng Gan, Jing-shun Duanmu, Jia-fu Wang, et al. Anomaly intrusion detection based onPLS feature extraction and core vector machine[J]. Knowledge-Based Systems,2013,40(3):1-6.
    [71]. Ivor W. Tsang, Andras Kocsor, James T. Kwok. Simpler core vector machines with enclosingballs[C]. Proceedings of the Twenty-Fourth International Conference on Machine Learning (ICML).orvallis, Oregon, USA,2007:911-918.
    [72]. Di Wang, Bo Zhang, Peng Zhang, et al. An online core vector machine with adaptive MEBadjustment[J]. Pattern Recognition,2010,43(10):3468-3482.
    [73]. B. Scholkopf, J. Giesen, S. Spalinger. Kernel Methods for Implicit Surface Modeling[C].Advances in Neural Information Processing Systems.2005:1193-1200.
    [74].邓乃扬,田英杰.数据挖掘中的新方法－－支持向量机[M].科学出版社,2004.
    [75]. P. Shivaswamy, T. Jebara. Ellipsoidal Kernle Machines[C]. Proceedings of the EleventhInternational Conference on Artificial Intelligence and Statistics. Puerto Rico,2007:484-491.
    [76].陈斌,冯爱民,陈松灿, et al.基于单簇聚类的数据描述[J].计算机学报,2007,30(8):1325-1332.
    [77]. Shawe-Taylor J., Cristianini N. Kernel Methods for Pattern Analysis[M]. Cambridge UniversityPress,2004.
    [78]. Volker Roth. Outlier Detection with One-class Kernel Fisher Discriminants[C]. Advances inNeural Information Processing Systems17,2005,1169-1176.
    [79]. Luc Devroye, Laszlo Gy rfi, Gabor Lugosi. A probabilistic theory of pattern recognition[M].New York: Springer,1996.
    [80].文传军,詹永照,陈长军.最大间隔最小体积球形支持向量机[J].控制与决策,2010,25(1):79-83.
    [81].陶剑文,王士同.大间隔最小压缩包含球学习机[J].软件学报,2012,23(6):1458-1471.
    [82]. A. Munoz, J. Moguerza. One-Class Support Vector Machines and Density Estimation: ThePrecise Relation[C]. Progress in Pattern Recognition, Image Analysis and Applications.2004:216-223.
    [83]. J. Shawe-Taylor, A. N. Dolia. A Framework for Probability Density Estimation[EB/OL].http://eprints.pascal-network.org/archive/00002982/01/main.pdf,2006.
    [84]. Ingo Steinwart, Don Hush, Clint Scovel. Density Level Detection is Classification[C]. Advancesin Neural Information Processing Systems17,2005:1337-1344.
    [85]. James L. Powell. Notes On Nonparametric Density Estimation[EB/OL].http://emlab.berkeley.edu/users/powell/e241a/sp07/ndnotes.pdf,2007.
    [86]. M. Markos, S. Sameer. Probability Density Estimation from Optimally Condensed DataSamples[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINEINTELLIGENCE,2003,25(10):1253-1264.
    [87]. Peter Hall. Central limit theorem for integrated square error of multivariate nonparametricdensity estimators[J]. Journal of Multivariate Analysis,1984,14:1-16.
    [88]. Robert P. W. Duin, Dick de Ridder, Piotr Juszczak, et al. PRToolbox[Z].2004,
    [89]. J. J. Sylvester. A Question in the Geometry of Situation[J]. Quarterly Journal of Pure andApplied Mathematics,1857,1:7-9.
    [90]. P. M. Dearing. One facility minimax location with Euclidean distance[EB/OL].http://www.ces.clemson.edu/~pmdrn/Dearing/location/minimax.pdf,1997.
    [91]. Ricardo A. Maronna, R. Douglas Martin, Victor J. Yohai. Robust Statistics: Theory andMethods[M]. John Wiley&Sons, Ltd,2006.
    [92]. Leon Bottou, Chih-Jen Lin. Support Vector Machine Solvers[EB/OL].http://www.csie.ntu.edu.tw/~cjlin/papers/bottou_lin.pdf,2007.
    [93]. K. K. Chin. Support Vector Machines applied to Speech Pattern Classification[D]. University ofCambridge,1998.
    [94]. J. Platt. Fast Training of Support Vector Machines using Sequential Minimal Optimization[C].Advances in kernel methods--Support Vector Learning,1999,185-208.
    [95]. Ivor W. Tsang, James T. Kwok, Pak-Ming Cheung. Core vector machines: Fast SVM training onvery large data sets[J]. Journal of Machine Learning Research,2005,6:363-392.
    [96]. J. T. Kwok J. I. W. Tsang. Generalized core vector machines[J]. IEEE Transactions on NeuralNetworks,2006,17(5):1126-1140.
    [97]. Ivor W. Tsang, James T. Kwok, Jacek M. Zurada. Generalized core vector machines[J]. IEEETransactions on Neural Networks,2006,17(5):1126-1140.
    [98]. H. W. Kuhn. Nonlinear programming [M]. John Wiley and Sons, NewYork,1967.
    [99]. Nimrod Megiddo. The weighted Euclidean1-center problem[J]. Mathematics of OperationsResearch,1983,8(4):498-504.
    [100]. J. Bezdek. Pattern Recognition with Fuzzy Objective Function Algorithms[M]. Plenum Press,1981.
    [101]. R. Krishnapuram, J. Keller. A Possibilistic Approach to Clustering[J]. IEEE TRANSACTIONSON FUZZY SYSTEMS,1993,1(2):98-110.
    [102]. M. Barni, V. Cappellini, A. Mecocci. Comments on A Possibilistic Approach to Clustering?[J].IEEE TRANSACTIONS ON FUZZY SYSTEMS,1996,4(3):393-396.
    [103]. S. C. Chen, D. Q. Zhang. Robust image segmentation using FCM with spatial constraints basedon new kernel-induced distance measure[J]. IEEE TRANSACTIONS ON SYSTEMS MAN ANDCYBERNETICS PART B-CYBERNETICS,2004,34(4):1907-1916.
    [104]. K. Pal J. Keller N. Pal. A possibilistic fuzzy c-means clustering algorithm[J]. IEEE Trans. onFuzzy Systems,2005,13(4):517-530.
    [105]. B. Scholkopf, R. Willianson, A. Smola, et al. Support Vector Method for Novelty Detection[C].Advances in Neural Information Processing Systems.2000:582-588.
    [106]. J. Kolen, T. Hutcheson. Reducing the Time Complexity of the Fuzzy C-Means Algorithm[J].IEEE TRANSACTIONS ON FUZZY SYSTEMS,2002,10(2):263-267.
    [107]. David Tax, Robert Duin. Outlier detection using classifier instability[C]. Advances in PatternRecognition, the Joint IAPR International Workshops.1998:593-601.
    [108]. C. Blake, C. Merz. UCI Repository of machine learning databases[J].1998,
    [109]. Larry M. Manevitz, Malik Yousef. One-Class SVMs for Document Classification[J]. J. Mach.Learn. Res.,2001,2:139-154.
    [110]. Ivor W. Tsang, James Kwok, Shutao Li. Learning the kernel in Mahalanobis One-ClassSupport Vector Machines[C]. International Conference on Neural Networks. Vancouver, BC, Canada,2006:1169-1175.
    [111]. H. Akaike. A new look at the statistical model identification[J]. IEEE Trans. Automat. Contr.,1974,19:716-723.
    [112]. Gideon E. Schwarz. Estimating the dimension of a model[J]. Annals of Statistics,1978,6(2):461-464.
    [113].薛晖.分类器设计中的正则化技术研究[D].南京航空航天大学,2008.
    [114]. Leo Breiman. Bagging predictors[J]. Machine Learning,1996,24:123-140.
    [115]. Robert E. Schapire. The strength of weak learnability[J]. Machine Learning,1990,5(2):197-227.
    [116]. Bj rn-Helge Mevik, Vegard H. Segtnan, Tormod N. s. Ensemble Methods and Partial LeastSquares Regression[J]. Journal of Chemonmetrics,2004,18(11):498-507.
    [117]. Y. Freund. Boosting a weak learning algorithm by majority[J]. Information and computation,1995,141(2):256-285.
    [118]. Larry Shoemaker, Lawrence O. Hall. Anomaly Detection Using Ensembles[C]. Multiple Classifier Systems. Naples, Italy,2011:6-15.
    [119]. Michael J. Kearns, Leslie G. Valiant. Cryptographic Limitations on Learning BooleanFormulae and Finite Automata[C].1989:433-444.
    [120]. Yuval Raviv, Nathan Intrator. Bootstrapping with noise: A effective regularization technique[J].Connection Science,1996,8(3-4):355-372.
    [121]. John F. ELDER. The Generalization Paradox of Ensembles[J]. Journal of Computational andGraphical Statistics,2003,12(4):853-864.
    [122]. Steffen Bickel, Tobias Scheffer. Multi-View Clustering[C]. Proceedings of IEE InternationalConference on Data Mining.2004:19-26.
    [123]. Steffen Bickel, Tobias Scheffer. Estimation of Mixture Models Using Co-EM[C]. Proceedingsof the European Conference on Machine Learning. Oporto, Portugal,2005:35-46.
    [124]. David Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods[C].1995:189-196.
    [125]. Avrim Blum, Tom Mitchell. Combining Labeled and unlabeled data with Co-Training[C].Annual Conference on Computational Learning Theory (COLT-98).1998:92-100.
    [126]. Jason D. Farquhar, David R. Hardoon, Hongying Meng, et al. Two view learning: SVM-2K,Theory and Practice[C]. Advances in Neural Information Processing Systems,2005.
    [127]. Hongying Meng, John Shawe-Taylor, Sandor Szedmak, et al. Support Vector Machine toSynthesise Kernels[M]. Heidelberg, Berlin: Springer-Verlag,2005.
    [128]. Zhe Wang, SongCan Chen, Tingkai Sun. MultiK-MHKS: A Novel Multiple Kernel LearningAlgorithm[J]. IEEE Trans. Pattern Analysis and Machine Intelligence,2008,30(2):348-353.
    [129]. S. Littman M. McAllester Dasgupta. PAC generalization bounds for Co-training[C]. Advancesin Neural Information Processing. Vancouver, Canada,2002:375-382.
    [130]. L. M. Manevitz, M. Yousef. One-Class SVMs for Document Classification[J]. Journal ofMachine Learning Research,2001,2:139-154.
    [131]. Charu C. Aggarwal, Philip S. Yu. Outlier Detection for High Dimensional Data[C].2001:
    [132]. Fabrizio Sebastiani. Machine Learning in Automated Text Categorization[J]. ACM computingSurveys,2002,34(1):1-47.
    [133]. Ana Cardoso-Cashopo, Arlindo L. Oliveira. An empirical comparison of text categorizationmethods[C]. Proceedings of the10th International Symposium on String Processing and InformationRetrieval. Manaus,2003,183–196.
    [134]. Yong Quan, Jie Yang. Modified Kernel Functions by Geodesic Distance[J]. EURASIP Journalon Applied Signal Processing,2004,16:2515-2521.
    [135]. Sam T. Roweis, Lawrence K. Saul. An Introduction to Locally Linear Embedding[EB/OL].http://www.cs.toronto.edu/~roweis/lle/papers/lleintroa4.pdf,2000.
    [136]. Joshua B. Tenenbaum, Vin Silva, John Langford. Nonlinear Dimensionality Reduction byLocally Linear Embedding[J]. Science,2000,290:2323-2326.
    [137]. Heeyoul Choi, Seungjin Choi. Robust kernel Isomap[J]. Pattern Recogn.,2007,40(3):853-862.
    [138]. Ville Hautam, Ismo K, Pasi Fr. Outlier Detection Using k-Nearest Neighbour Graph[C].Proceedings of the17th International Conference on Pattern Recognition.2004:430-433.
    [139]. Bin Chen, Aimin Feng, Songcan Chen, et al. One-cluster clustering based Data Description[J].Chinese Journal of Computers,2007,30(8):1325-1332.
    [140]. A. P. Bradley. The Use of the Area Under the ROC Curve in the Evaluation of MachineLearning Algorithms[J]. Pattern Recognition,1997,30(6):1145-1159.
    [141]. L. Yan, R. Dodier, M. C. Mozer, et al. Optimizing classifier performance via theWilcoxon-Mann-Whitney statistic[C]. Proceedings of the17th International Conference on PatternRecognition,2003:848-855.
    [142]. Yi Liu, Yuan F. Zheng. Minimum Enclosing and Maximum Excluding Machine for PatternDescription and Discrimination[C]. The18th International Conference on Pattern Recognition(ICPR2006).2006:129-132.
    [143]. Maria Carolina Monard, Gustavo E. Batista. Measure-based classifier performanceevaluation[J]. Pattern Recognition Letters,1999,20:1165-1173.
    [144]. Alain Rakotomamonjy. Optimizing Area Under Roc Curve with SVMs[C]. ECAI04ROC andArtificial Intelligence Workshop. Valencia, Spain,2004:71-80.
    [145]. D. Q. Zhang, S. C. Chen, K. R. Tan. Improving the Robustness of "Online AgglomerativeClustering Method" Based on Kernel-Induce Distance Measures[J]. Neural Processing Letters,2005,21(1):45-51.
    [146]. Nello Cristianini, John Shawe-Taylor. An Introduction to Support Vector Machines and OtherKernel-based Learning Methods[M]. Cambridge University Press,2000.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700