结构化数据核函数的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
核函数是支持向量机中非常重要的一个研究方向,尽管在统计学习理论出现之前,核函数的概念与技术早已存在,但它在机器学习中真正的成功应用,是从支持向量机开始的。正是支持向量机与核函数技术的结合,才使得以支持向量机为代表的核机器学习得到了快速的发展和广泛的应用。本论文的所有工作正是基于支持向量机与核函数的结合而展开的,主要包括三个方面的内容:核函数的构造、核函数的实现以及核函数的应用。
     支持向量机的输入数据一般定义在向量空间,常用的核函数如多项式核、径向基核等都能取得很好的效果。但是,还有很多机器学习问题在解决的时候涉及到一些含有结构信息的数据(我们称为结构化数据),如字符串和图像等,采用这些常用的核函数往往无法取得满意的效果,因为这些数据在转换成向量时将会丢失一些结构信息。因此针对这些结构化数据的核函数构造问题,已经提出了许多新型的核函数以及实现算法。
     本论文以结构化数据的核函数作为研究对象,提出了一些新的核函数以及它们的实现算法;并且对已有核函数的实现进行改进,降低计算复杂度;然后将一些新型字符串核函数应用于入侵检测领域。
     (1)核函数的构造。在归纳和总结了现有的字符串核函数的基础上,本论文将字符串核函数划分为基于序列的以及基于概率的两大类字符串核函数。基于序列的字符串核函数比较常见,包括间隙加权核以及谱核等常用的核函数。谱核没有考虑不连续的子序列对核函数的影响,而间隙加权核函数则惩罚长度较大的子序列,实际上,在有些应用中我们应该奖励长度较大的子序列,而非惩罚。在详细分析之后,本论文提出了一种基于序列的字符串核函数,叫做长度加权核函数,在这个核函数中长度越大的子序列所占的权重越大。另外,提出了一种变种——长度加权一次核函数,在这个核函数中重复出现的子序列我们只考虑一次。基于序列的字符串核只计算在两个字符串中出现的匹配子序列对核值的贡献,而没有考虑依次出现的字符之间的依赖关系。为了在核函数中体现字符之间的依赖关系,我们依据马尔可夫模型提出了基于概率的混合阶马尔可夫核函数,它也是一种字符串核函数。
     (2)核函数的实现。已经有许多算法用来实现字符串核函数,包括基于动态规划的、基于后缀树的以及基于后缀核的算法。在分析了后缀核的概念之后,本论文提出了一系列基于后缀核的实现算法,能够用来解决目前的间隙核函数以及本论文提出的长度加权核函数。另外,我们将位并行算法应用于核函数的实现算法中,分析表明这种处理在一定条件下能够加快定长度加权核函数的计算。为了快速实现混合阶马尔可夫核函数,本论文采用了后缀树存储结构,并利用它的匹配统计量计算混合阶马尔可夫核函数,能够在线性时间内求出核函数的值。
     (3)入侵检测是信息安全中很重要的一个环节。支持向量机作为一种分类算法已经被应用于基于网络的入侵检测中,但是在基于主机的入侵检测中,由于输入数据大部分为命令序列或者系统调用序列,采用常见的径向基或者多项式核函数的支持向量机并不合适。针对基于主机的入侵检测系统,我们利用训练数据构造了基于字符串核函数的1-类支持向量机,包括现有的以及本论文提出的字符串核函数,并用这个1-类支持向量机对测试数据进行测试,实验结果表明本论文提出的一些字符串核函数比现有的一些字符串核函数更加适用于基于主机的入侵检测系统。
Kernel function is an important idea in Machine Learning, especially in Support Vector Machine (SVM). Kernel function had been introduced into the context of machine learning before SVM was presented. However, the first successful application of kernel function is SVM. The development of kernel machine learning depends on the research of the combination of kernel function and SVM. This dissertation is based on the combination of kernel function and SVM, including constructions, implementations, and applications of the kernel function.
     In general, SVM takes a feature vector representation as its input, where some usual kernel functions such as polynomial kernel and RBF kernel are used. However, in the real world, many machine learning problems often work on special data types, such as strings and images. These special data contain some information of their structure, which might be useful in problem solving. We denote these special data by structured data. The usual kernel function may be not applicable for the problem involving structured data, because some information of these structured data may be lost in the mapping to feature vector. Therefore, many efforts were made to construct kernel functions for structured data and to compute these functions efficiently.
     This dissertation takes the kernel functions for string (denoted by string kernels) as research object, and presents some novel string kernels and algorithms for these kernels. Moreover, a bit-parallel technique is introduced to speed up the computation of fixed length string kernels. Finally, these string kernels are used in the context of intrusion detection.
     (1) The construction of new string kernels. String kernels can be divided into two kinds, one is subsequence-based and another is probability-based. The existing gap-weighted kernels and spectrum kernels belong to the former. The spectrum kernels did not consider the contribution of non-contiguous subsequences, and the gap-weighted kernels penalized the longer subsequences rather than encouraged them. By observing these disadvantages of existing string kernels in certain context, we presented a subsequence-based string kernel, which is called length-weighted kernel, in which the longer subsequence can contribute more to the kernel value. Moreover, a variant of length-weighted kernel is proposed, which is called length-weighted once kernel, in which all subsequences will be considered only once regardless of whether they occur only once or many times in a string. Other than subsequence-based string kernels, the probability-based string kernels take the dependence of characters occur in string into account. In order to consider the dependence of characters, we proposed a probability-based string kernel, which is based on Markov model and called Markov kernel.
     (2) The implementation of string kernels. Dynamic programming, suffix kernel, and suffix tree are three general approaches to compute the value of string kernels. After analyzing the idea of suffix kernels, we presented a series of algorithms based on suffix kernels to compute existing gapped kernels and the length-weighted kernel as well as its variant. Moreover, a bit-parallel technique is introduced to accelerate the computation of kernel functions. In order to compute the Markov kernel, a suffix tree is used to store the string and its matching statistics are used in the process of computation.
     (3) Intrusion detection is an important facet of information security. As a classification algorithm, SVM was used widely in the network-based intrusion detection. Other than network-based intrusion detection, most of the input data of host-based intrusion detection are command sequences or system call sequences. Therefore, the usual polynomial and RBF kernels are not applicable for host-based intrusion detection. In the host-based intrusion detection system, we construct a 1 -SVM classifier based on the training data by embedding string kernels, and detect intrusions in the testing data using this 1-SVM classifier. The experimental results reveal that the new string kernels outperform the existing string kernels in the host-based intrusion detection system.
引文
[1] Allwein, E.L., Robert E. & Schapire,Y.S. Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers. Journal of Machine Learning Research, 1, 2000, pp.113-141.
    
    [2] Asa, B., David H., Hava T. S.& Vapnik V. Support Vector Clustering. Journal of Machine Learning Research, Volume 2, Volume 2, December 2001, pp.125-137.
    [3] Bach,F.R.& Jordan,M.I.. Kernel Independent Component Analysis. Journal of Machine Learning Research 3,2002, pp.1-48.
    [4] Bach, F. R., Lanckriet, G. R. G., Jordan, M. I. Multiple kernel learning, conic duality, and the SMO algorithm. In 21th International conference on Machine Learning (ICML04), 2004.
    [5] Barzilay O., Brailovsky V.L. On domain knowledge and feature selection using a support vector machine, Pattern Recognition Letters 1999,20:475 - 484.
    [6] Ben-Hur, A., Noble, W.S. Kernel methods for predicting protein-protein interactions. Bioinformatics, 2005,21(1), i38-i46.
    [7] Blumer,A., Ehrenfeucht, A., Haussler, D. & Warmuth, M. K. Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36(4), 1989, pp. 929-965.
    [8] Boser,B.E., Guyan I.M. and Vapnik, V. A training algorithm for optimal margin classifiers. In Housler, D. editors, Proceeding of 5th Annual ACM Workshop on Computational learning Theory, Pittsburg, PA, 1992, pp. 144-152.
    [9] Bredensteiner,E.J.. Bennett,K. P. Multicategory Classification by support vector machines. Computational Optimization and Applications. 12(1/3), 1999, pp.53-79.
    [10] Brown, M., Grundy, W., Lin, D., Cristianini, N., Sugnet, C., Furey, T., Ares, M., & Haussler, D. Knowledge-based analysis of microarray gene expression data by using support vector machines, Proc. Natl. Acad. Sci. 97 (2000) 262-267.
    
    [11] Burges, C.J.C. Simplified Support Vector Decision Rules. 13th International Conference on Machine Learning, 1996, pp. 71-78
    
    [12] Burges,C.J.C. & Scholkopf B. Improving the accuracy and speed of support vector learning machines. In Mozer, M., Jordan & Petsche, T., editors, Advances in Neural Information Processing Systems, 9, Cambridge, MA: MIT Press, 1997, pp.375-381.
    [13] Burges,C.J.C. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 1998, pp.121-167.
    [14] Burges,C.J.C. & David,C.J. Uniqueness theorems for kernel methods. Neurocomputing. 55(1-2), 2003, pp. 187-220.
    [15] Burges,C.J.C. Geometry and invariance in kernel based methods. In Scholkopf, B., Burges, C. J. C, & Smola, A. J. editors, Advances in Kernel Methods - Support Vector Learning, MIT Press, 1999, pp.89-116.
    [16] Campbell,C., Kernel methods: a survey of current techniques. Neurocomputing. 48(1-4), 2002, pp.63-84.
    [17] Cancedda, N., Gaussier, E., Goutte, C., Renders, J-M. Word-Sequence Kernels. Journal of Machine Learning Research, 3 (2003) 1059-1082.
    [18] Cao,L.J. & Tay, F.E.H. Support vector machine with adaptive parameters in financial time series forecasting. Neural Networks, IEEE Transactions on, 14(6), 2003, pp.1506 - 1518.
    [19] Carbonell J. Introduction: Paradigms for machine learning. Artificial intelligence. 40(1), 1989, pp.1-9.
    [20] Carl,G.&Peter,S. Model selection for support vector machine classification. Neurocomputing. 55(1-2), 2003, pp.221-249.
    [21] Cauwenberghs,G. & Poggio,T. Incremental and Decremental Support Vector Machine Learning. In Advances in Neural Information Processing Systems (NIPS'2000), MIT Press, Vol. 13,2001, pp.409-415.
    [22] CeciIio,A., Xavier,P. & Andreu, C. K-SVCR. A support vector machine for multi-class classification. Neurocomputing. 55(1-2), 2003, pp.57-77.
    [23] Chang, C.C., Lin, C.J. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
    [24] Chang, C.C., Lin, C.J. Training v-Support Vector Classifiers: Theory and Algorithms. Neural Computation, 2001,13,2119-2147.
    [25] Chang, W.I., Lawler, E.L. Sublinear approximate string matching and biological applications. Algorithmica, 12(4/5):327-344,1994.
    [26] Chapelle, O., Haffner, P. & Vapnik, V. SVMs for histogram-based image classification. IEEE Transaction on Neural Networks, 1999.
    [27] Chapelle,O., Vapnik,V., Bousquet,O. & Mukherjee,S. Choosing Multiple Parameters for support vector machines. Machine Learning. 46(1/3), 2002, pp.131-159.
    [28] Collins M., Duffy N. Convolution kernels for natural language, In Neural Information processing systems(NIPS01) 2001.
    
    [29] Cortes C, Vapnik V. Support-vector Networks. Machine Learning. 20(3), 1995, pp.273-297.
    [30] Cortes, C, Haffner, P., Mohri, M. Rational Kernels: Theory and Algorithms. Journal of Machine Learning Research, 2004,5(12), 1035 - 1062.
    [31] Cristianini, N., Shawe-Taylor, J. & Campbell, C. Dynamically adapting kernels in support vector machines. In Kearns, M. S., Solla, S. A., and Cohn, D. A., editors, Advances in Neural Information Processing Systems, 11. MIT Press, 1998.
    [32] Cristianini, N. & Shawe-Taylor, J. An Introduction to Support vector machines and other kernel methods. Cambridge University Press, UK, 2000.
    [33] Cuturi, M., Vert, J.P. The context-tree kernel for strings. Neural Networks, 18(2005), 1111-1123.
    [34] Debar, H., Dacier, M., Wepspi, A. A Revised Taxonomy for Intrusion Detection Systems.Technical Report,Computer Science Mathematics, IBM Research, Zurich Research Laboratory, Switzerland, 1999.
    [35] Denning, D. E. An Intrusion Detection Model. IEEE Transactions on Software Engineering, 1987, 13(2): 222-232.
    [36] Domeniconi,C.& Gunopulos,D. Incremental Support Vector Machine Construction. In Proceedings of the First IEEE International Conference on Data Mining (ICDM), 2001.
    [37] Duda,R.O. & Hart.P.E. Pattern Classification . Wiley & Sons. 2001.
    [38] Eisen, M. Spellman,P. Brown,P. Botstein,D. Clustering analysis and display of genome-wide expression patterns, Proc. Natl. Acad. Sci. USA 95 (1998) 14863-14868.
    [39] Feng, L., Guan, X., Guo, S., Gao, Y., Liu, P. Predicting the intrusion intentions by observing system call sequences. Computers & Security, 23:241-252,2004.
    [40] Forrest, S., Hofmeyr, S.A., Somayaji, A. Intrusion detection using sequences of system calls. Journal of Computer Security, 6:151-180, 1998.
    [41] Forrest, S., Hofmeyr, S.A., Somayaji, A., Longstaff. T.A. A Sense of Self for UNIX Processes. In Proceedings of the IEEE Symposium on Security and Privacy, 1996,120-128.
    [42] Forrest, S., Perelson, A.S., Allen, L., Cherukuri, R. Self-Nonself Discrimination in a Computer. In IEEE Symposium on Security and Privacy, 1994.
    [43] Francis,T.E.H. & Cao, L.J. Modified support vector machines in financial time series forecasting. Neurocomputing. 48(1-4), 2002, pp.847-861.
    [44] Frieβ, T. Cristianini,N. & Campbell,C. The kernel-Adatron: a fast and simple training procedure for support vector machines. In J. Shavlik, editor, Machine Learning: Proceedings of the Fifteenth International Conference. Morgan Kaufmann, 1998.
    [45] Fung,G. & Mangasarian,O.L. Proximal support vector machine classifiers. In F. Provost and R. Srika nt, editors, Proceedings KDD-2001: Knowledge Discovery and Data Mining, August 26-29,2001, San Francisco,CA, pages 77-86, New York, 2001.
    [46] Fung,G. & Mangasarian,O.L. Incremental Support Vector Machine Classification.In R. Grossman, H. Mannila, and R. Motwani, editors, Proceedings of the Second SIAM International Conference on Data Mining, SIAM, April 2002, pp.247-260.
    [47] Gartner, T. A survey of kernels for structured data. SIGKDD Explorations, 2003.
    [48] Gartner, T., Lloyd, J.W., Flach, PA. Kernels and Distances for Structured Data. Machine Learning, 57,205-232,2004.
    [49] Giegerich, R., Kurtz, S. From Ukkonen to McCreight and Weiner: A unifying view of linear-time suffix tree construction. Algorithmica, 19(3):331-353, 1997.
    [50] Gordon, L., Chervonenkis, A.Y., Gammerman, A.J, Shahmuradov, I.A., Solovyev, V.V. Sequence alignment kernel for recognition of promoter regions. Bioinformatics, 2003, 19(15), 1964-1971.
    [51] Grundy, W.N. Family-based homology detection via pairwise sequence comparison, in: International Conference on Computational Molecular Biology (RECOMB-98), ACM Press, New York, 1998.
    [52] Guo G., Li S. Z., Chan K. L. Support vector machines for face recognition. Image and Vision Computing. 19(9-10), 2001, pp.631 -638.
    
    [53] Guyon,I., Weston,J., BarnhiIl,S. & Vapnik,V. Gene Selection for Cancer Classification using support vector machines. Machine Learning. 46(1/3), 2002, pp.389-422.
    [54] Han,J. & Kamber,M., Data Mining Concept and Techniques. Morgan Kaufman Publishers, 2000.
    [55] Haussler, D. Convolution kernels on discrete structures. Technical Report UCSC-CRL-99-10, Department of Computer Science, University of California at Santa Cruz, Santa Cruz, CA, 1999.
    [56] He, J., Tan, A.H., Tan, C.L. On machine Learning Methods for Chinese Document Categorization. Applied Intelligence. 18(3), 2003, pp. 311-322.
    [57] Hong, D. H., Hwang, C. Support vector fuzzy regression machines. Fuzzy Sets and Systems. 138(2), 2003, pp.271-281.
    [58] Hsu, C.W., Lin, C.J. A comparison of methods for multiclass support vector machines. Neural Networks, IEEE Transactions on, 13(2), 2002, pp.415 - 425.
    [59] Hsu, C.W., Lin, C.J. A Simple Decomposition Method for support vector machines. Machine Learning. 46(1/3), 2002, pp.291-314.
    [60] Hyyro, H., Navarro, G. Bit-Parallel Witnesses and Their Applications to Approximate String Matching. Algorithmic 41 (2004) 203 - 231.
    [61] Jaakkola T., Haussler D. Exploiting generative models in discriminative classifiers, In Advances in Neural Information Processing Systems 11, 487 - 493, Kearns M.S., Solla S.A., Cohn D.A.Eds. MIT Press, 1999.
    [62] Jaakkola T., Diekhans, D., Haussler D. Using the Fisher kernel method to detect remote protein homologies. In Proceedings of the 7th international conference on intelligent systems for molecular biology, 1999.
    [63] Jain, A.K., Duin, R.P.W. & Mao, J. Statistical pattern recognition: a review. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(1), 2000, pp.4 - 37.
    [64] James Tin-Yau Kwok. The evidence framework applied to support vector machines. Neural Networks, IEEE Transactions on, 11(5), 2000, pp.1162 -1173.
    [65] Je, H., Kim,D. & Yang, B.S. Human Face Detection in Digital Video Using SVM Ensemble. Neural Processing Letters. 17(3), 2003, pp.239-252.
    [66] Jebara, T., Kondor, R., Howard, A. Probability product kernels. Journal of Machine Learning Research, 5 (2004) 819-844.
    [67] Joachims, T. Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms. Kluwer Academic Publishers, 2002.
    [68] Joachims, T. Text categorization with support vector machines. In Proceedings of European Conference on Machine Learning (ECML), 1998.
    [69] Joachims. T. Making large-scale SVM learning practical. In Scholkopf, B., Burges, C. J. C, & Smola, A. J. editors, Advances in Kernel Methods - Support Vector Learning, pp. 169-184. MIT Press, 1999.
    [70] Jonsson, K., Kittler, J., Li, Y.P., Matas, J. Support vector machines for face authentication. Image and Vision Computing. 20(5-6), 2002, pp.369-375.
    [71] Ju, W. and Vardi, Y. A hybrid high-order Markov chain model for computer intrusion detection. Journal of Computational and Graphical Statistics, 10:277-295, 2001.
    [72] Kashima, H., Tsuda, K., Inokuchi, A. Marginalized kernels between labeled graphs. In 20nd International Conference on Machine Learning(ICML03), 2004.
    [73] Kaufmann, L. Solving the quadratic programming problem arising in support vector classification. In Scholkopf, B., Burges, C. J. C, & Smola, A. J. editors, Advances in Kernel Methods -- Support Vector Learning, 1999, MIT Press, pp. 147-168.
    [74] Keerthi,S.S. & Gilbert,E.G. Convergence of a Generalized SMO Algorithm for SVM Classifier Design. Machine Learning. 46(1/3), 2002, pp.351-360.
    [75] Keerthi,S.S., Shevade, S. K., Bhattacharyya, C. & Murthy, K. R. K. A fast iterative nearest point algorithm for support vector machine classifier design. Technical report, Department of CSA, II Sc, Bangalore, India, 1999. Technical Report No. TR-ISL-99-03.
    [76] Keerthi, S.S., Shevade, S. K., Bhattacharyya, C. & Murthy, K. R. K. Improvements to Platt's SMO algorithm for SVM classifier design. Technical report, Control Division, Department of Mechanical and Production Engineering, National University of Singapore, 1999. Technical Report No. CD-99-14.
    [77] Knuth, D. E. The Art of Computer Programming: Fundamental Algorithms, volume 1. Addison-Wesley, Massachusetts, second edition, 1998
    [78] Kim,K. Financial time series forecasting using support vector machines. Neurocomputing. 55(1-2), 2003, pp.307-319.
    [79] Kondor R.I., Lafferty J. Diffusion kernels on graphs and other discrete structures, In Sammut C. and Hoffmann A., editors, Proceedings of the 19th International Conference on Machine Learning, Morgan Kaufmann, 315- 322,2002.
    [80] Kre(?)el, U. Pairwise Classification and Support Vector Machines. In Scholkopf, B., Burges, C.J.C., and Smola, A.J. editors, Advances in Kernel Methods - Support Vector Learning. MIT Press, 1999.
    [81] Krishnapuram B., Carin L. Support vector machines of improved multiaspect target recognition using the Fisher kernel scores of hidden Markov models, In: IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002,3:2989 - 2992.
    [82] Lee,Y. Mangasarian,O.L. SSVM: A Smooth support vector machine for Classification. Computational Optimization and Applications. 20(1), 2001, pp.5-22.
    [83] Lee Y.J. & Mangasarian. O. L. RSVM: Reduced support vector machines. Technical Report 00-07, Data Mining Institute, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, July 2000. Proceedings of the First SIAM International Conference on Data Mining, Chicago, April 5-7,2001
    [84] Leslie, C., Eskin, E., Cohen, A., Weston, J., Noble, W.S. Mismatch string kernels for discriminative protein classification. Bioinformatics, 2004,20(4), 467-476.
    [85] Leslie, C., Eskin, E., Noble, W.S. The spectrum kernel: a string kernel for SVM protein classification. In: Proceedings of the pacific biocomputing Symposium, 2002.
    [86] Leslie, C., Eskin, E., Weston, J., Noble, W.S. Mismatch string kernels for SVM protein classification. In: Proceedings of Neural Information Processing Systems, 2002.
    [87] Leslie, C., Kuang, R. Fast String Kernels using Inexact Matching for Protein Sequences. Journal of Machine Learning Research 5 (2004) 1435-1455.
    [88] Lin, C.J. Asymptotic convergence of an SMO algorithm without any assumptions. Neural Networks, IEEE Transactions on, 13(1), 2002, pp.248 - 250.
    [89] Lin, C.J. On the convergence of the decomposition method for support vector machines. Neural Networks, IEEE Transactions on, 12(6), 2001, pp.1288 -1298.
    
    [90] Lodhi,H., Saunders,C., Shawe-Taylor,J., Cristianini,N.& Watkins,C.J.C.H. Text Classification using String Kernels. Journal of Machine Learning Research, Vol. 2,2001, pp. 419-444.
    [91] Mahe, P., Ueda, N., Akutsu, T., Perret, J.L., Vert, J.P. Extensions of marginalized graph kernels. In 21nd International Conference on Machine Learning(ICML04), 2004.
    [92] Manber, U., & Myers, G. Suffix arrays: A new method for on-line string searches. SIAM journal on computing, 1993,22(5), 935-948.
    [93] Mangasarian,O.L., & Musicant,D.R. Lagrangian Support Vector Machines. Journal of Machine Learning Research, Vol. 1,2001, pp. 161-177.
    [94] Maxion, R.A., Townsend, T.N. Masquerade detection augmented with error analysis. IEEE Transactions on Reliability 53 (2004).
    [95] McCreight E M. A Space-economical Suffix Tree Construction Algorithm. J. ACM. 1976,23: 262-272.
    [96] Menchetti, S., Costa, F., Frasconi, P. Weighted decomposition kernels. In 22nd International Conference on Machine Learning(ICML05), 2005.
    [97] Mika S. Ratsch G., Scholkopf B. et al. Invariant feature extraction and classification in kernel spaces. In Advances in Neural Information Processing Systems 12, Cambridge, MA, MIT press, 2000, pp.526-532
    [98] Moreno P.J., Rifkin R. Using the Fisher kernel method for Web audio classification, In: IEEE International Conference on Acoustics, Speech, and Signal Processing, 2000,4: 2417 - 2420.
    [99] Mitchell,T. Machine Learning. McGraw-Hill, 1997.
    [100] Mukherjee B., Heberlein, L. Network Intrusion Detection, IEEE Network, 1994, 8 (3): 26-41.
    [101] Mullen K..-R., Mika, S., Ratsch, G., Tsuda, K., Scholkopf, B. An introduction to kernel-based learning algorithms. Neural Networks, IEEE Transactions on, Vol. 12(2), 2001, pp. 181-201.
    [102] Myers, G. A fast bit-vector algorithm for approximate string matching based on dynamic programming. Journal of the ACM 3 (1999) 395-415.
    [103] Neuhaus, M., Bunke, H. Edit distance-based kernel functions for structural pattern classification. Pattern Recognition. 2006, 39(10), 1852-1863.
    [104] Oren,M., Papageorgiou, C., Sinha, P., Osuna, E. & Poggio, T. Pedestrian detection using wavelet templates. In Proceedings Computer Vision and Pattern Recognition, pp. 193-199, 1997.
    [105] Osuna, E. & Girosi G. Reducing run-time complexity in SVMs. In Proceedings of the 14th International Conf. On Pattern Recognition, Brisbane, Australia, 1998.
    [106] Osuna, E., Freund,R. & Girosi,F. Training support vector machines: An application to face detection. In Proceedings of Computer Vision and Pattern Recognition, pp. 130-136, 1997.
    [107] Osuna,E., Freund,R. & Girosi,G. Improving training algorithm for support vector machines. Proc. IEEE NNSP'97. Amelia Island, 1997, pp.24-26.
    [108] Parrado-Hernandez,E., Mora-Jimenez, I. & Navia-Vazquez, A. Growing support vector classifiers via architecture boosting, in: Proceedings of the Learning'00, Madrid, Spain, 2000.
    [109] Parrado-Hernandez,E., Mora-Jimenez, I., Arenas-Garcia, J., Figueiras-Vidal, A.R., et. al. Growing support vector classifiers with controlled complexity. Pattern Recognition. 36(7), 2003, pp. 1479-1488.
    [110] Platt,J.C., Cristianini, N. & Shawe-Taylor, J. Large margin DAGs for multiclass classification. In Neural Information Processing Systems (NIPS 99), 1999.
    [111] Platt.J.C. Sequential minimal optimization: A fast algorithm for training support vector machines. Technical Report MSR-TR-98-14, Microsoft Research, 1998.
    [112] Platt.J.C. Probabilistic methods for SV machines. In A.J. Smola, P.L. Bartlett, B. Scholkopf, et al, editors, Advances in Large Margin Classifiers, Cambridge, MA, MIT Press, 2000, pp.61-74.
    [113] Platt.J.C. Fast training of support vector machines using sequential minimal optimization. In Scholkopf, B., Burges, C. J. C., & Smola, A. J. editors, Advances in Kernel Methods - Support Vector Learning, pp. 185-208 . MIT Press, 1999.
    [114] Pontil,M. & Verri, A. Object recognition with support vector machines. IEEE Trans. on PAMI, 20:637-646,1998.
    [115] Reilly M, Stillman M. Open infrastructure for scalable intrusion detection[A]. Information Technology Conference, IEEE[C]. 1998. 129-133.
    [116] Rousu, J., Shawe-Taylor, J. Efficient computation of gapped substring kernels on large alphabets. Journal of Machine Learning Research, 6 (2005) 1323-1344.
    [117] Rumelhart,D.E., Hinton GE. & Williams R.J. Learning internal representations by error propagation, Parallel Distributed processing: Explorations in macrostructure of cognition, Vol. 1. Badford Books, Cambridge, MA., 1986,318-362.
    [118] Saunders, C, Shawe-Taylor, J., Vinokourov, A. String kernels, Fisher kernels and finite state automata. In Advances in Neural Information Processing Systems(NIPS03), 2003.
    [119] Schonlau, M., DuMouchel, W., Ju, W.-H., Karr, A.F., Theus, M., Vardi, Y. Computer intrusion: Detecting masquerades. Statistical Science 16 (2001) 58-74.
    [120] Scholkopf,B, Smola, A. & Muller,K. Support vector methods in learning and feature extraction. In the Proceedings of 9th Australia Conference on Neural Networks, Brisbane, Australia, University of Queensland, 1998.
    
    [121] Scholkopf,B. Support Vector Learning. R. Oldenbourg Verlag, 1997.
    [122] Scholkopf,B., Mika,S. Burges,C.J.C. Input space vs. feature spacein kernel-based methods. Neural Networks, IEEE Transactions on, 1999,10(5), pp. 1000-1017.
    [123] Scholkopf,B., Smola, A. & Miiller, K.-R. Kernel principal component analysis. In Scholkopf, B., Burges, C. J. C, & Smola, A. J. Advances in Kernel Methods - Support Vector Learning. pp. 327-352. MIT Press, 1999.
    [124] Scholkopf,B., Smola,A., Williamson,R. & Bartlett,P. New support vector algorithms. Technical Report NC-TR-98-031, NeuroCOLT Working Group, 1998.
    [125] Scholkopf,B., Bartlett,P., Smola,A. & Williamson,R. Generalization bounds via the eigenvalues of the gram matrix. Technical Report NC-TR-1999-035, NeuroCOLT Working Group, 1999.
    [126] Scholkopf,B.,Bartlett,P., Smola,A. & Williamson,R. Shrinking the tube: a new support vector regression algorithm. In Kearns, M. S., Solla, S.A. and Cohn, D. A. editors, Advances in Neural Information Processing Systems, 11. MIT Press, 1998.
    [127] Scholkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J. & Williamson, R.C. Estimating the support of a high-dimensional distribution. Technical report, Microsoft Research, MSR-TR-99-87, 1999.
    [128] Scholkopf, B. The kernel trick for distances. Advances in Neural Information Processing Systems, 2000, pp.301-307.
    [129] Sekar, R., Bendre, M., Dhurjati, D., Bollineni, P. A fast automation-based method for detecting anomalous program behaviors. In: Proceedings of the IEEE Symposium on Security and Privacy, 144-155,2001.
    [130] Shawe-Taylor, C, Cristianini, N. Kernel methods for pattern analysis. Cambridge University Press, 2004.
    [131] Shevade,S.K., Keerthi,S.S., Bhattacharyya, C, Murthy, K.R.K. Improvements to the SMO algorithm for SVM regression. Neural Networks, IEEE Transactions on, 11(5), 2000, pp.1188-1193.
    [132] Shpigelman, L., Singer, Y., Paz, R., Vaadia, E. Spikernels: Embedding spiking neurons in inner product spaces. In Neural Information Processing Systems (NIPS 03), 2003.
    [133] Smola, A., Kondor, R. Kernels and regularization on graphs. In Conference on Learning Theory (COLT03), 2003.
    [134] Smola, A. & Scholkopf, B. A tutorial on support vector regression. Statistics and Computing, 1998.
    [135] Smola, A.& Scholkopf, B. On a kernel-based method for pattern recognition, regression, approximation and operator inversion. Algorithmica, 22:211 -231, 1998.
    [136] Smola, A.J. Learning with Kernels. PhD thesis, Technische University at Berlin, 1998.
    [137] Sollich,P. Probability interpretation and bayesian methods for support vector machines. In Proceedings of ICANN'99,1999, pp.91-96
    [138] Sollich,P. Probability methods for support vector machines. In Advance in Neural Information Processing Systems 12, MIT Press, 1999, pp.349-355.
    [139] Sonnenburg, S., Ratsch, G, Schafer, C. A general and efficient multiple kernel learning algorithm. In Advances in Neural Information Processing Systems(NIPS06), 2006.
    [140] Sonnenburg, S., Ratsch, G, Schafer, C, Scholkopf, B. Large Scale Multiple Kernel Learning. Journal of Machine Learning Research 7 (2006) 1531-1565.
    [141] Suykens,J.A.K.. Vandewalle,J. Least Squares support vector machine Classifiers. Neural Processing Letters. 9(3), 1999, pp.293-300.
    [142] Suykens,J.A.K., De Brabanter, J., Lukas, L., Vandewalle, J. Weighted least squares support vector machines: robustness and sparse approximation. Neurocomputing. 48(1-4), 2002, pp. 85-105.
    [143] Syed,N.A., Liu,H. & Sung,K.K. Incremental Learning with Support Vector Machines. International Joint Conference on Artificial Intelligence (IJCAI),1999.
    [144] Takimoto, E., Warmuth, M.K. Path Kernels and Multiplicative Updates. Journal of Machine Learning Research, 4 (2003) 773-818.
    [145] Tian, S.F., Mu, S.M., Yin, C.H. Sequence-Similarity Kernels for SVMs to Detect Anomalies in System Calls, Neurocomputing, Vol.70, Nos.4-6, 859-866, Jan. 2007.
    [146] Tian, S.F., Yin, C.H., Mu, S.M. High-order Markov kernels for network intrusion detection, Lecture Notes in Computer Science, Volume 4234, 184-191, Springer, 2006.
    [147] Tian, S.F., Mu, S.M., Yin, C.H. Cooperative Clustering for Training SVMs, Lecture Notes in Computer Science, Volume 3971,962 - 967, Springer, 2006.
    [148] Tian, S.F., Yu, J., Yin, C.H. Anomaly detection using support vector machines, Lecture Notes in Computer Science, Volume 3173, Part 1, 592-597, Springer, 2004.
    [149] Tong,S. & Koller, D. Support Vector Machine Active Learning with Applications to Text Classification. Journal of Machine Learning Research. 2, pp. 45-66.
    [150] Tony,V.G., Suykens,J.A.K., Bart,B. & Stijn,V. Vanthienen,Jan. Dedene,Guido. de Moor,Bart. Vandewalle,Joos. Benchmarking Least Squares support vector machine Classifiers. Machine Learning. 54(1), 2004, pp5-32.
    [151] Tresp,V. Scaling Kernel-Based Systems to Large Data Sets. Data Mining and Knowledge Discovery. 5(3), 2001, pp.197-211.
    [152] Tsuda, K., Kin, T., Asai, K. Marginalized kernels for biological sequences. Bioinformatics, 18,S268-S275,2002.
    
    [153] UkKonen E. On-line Construction of Suffix Trees. Algorithmica. 1995,14:249-260.
    [154] Vapnik,V. & Chapelle, O. Bounds on error expectation for SVM. In Smola, A. J., Bartlett, P., B. Scholkopf & Schuurmans, C. editors, Advances in Large Margin Classifiers. MIT Press, 1999.
    [155] Vapnik,V. & Chervonenkis A.J. On the uniform convergence of relative frequencies of events to their probabilities. Doklady Akademii Nauk USSR, 1968, 181(4). (English translation: Sov. Math. Dokl.)
    [156] Vapnik,V. & Chervonenkis A.J. On the uniform convergence of relative frequencies of events to their probabilities. Theory Probability Application. 1971,16, pp264-280.
    [157] Vapnik,V. & Chervonenkis, A. The necessary and sufficient conditions for consistency in the empirical risk minimization method. Pattern Recognition and Image Analysis, 1(3):283-305, 1991.
    [158] Vapnik,V. & Mukherjee, S. Support vector method for multivariant density estimation. In Neural Information Processing Systems (NIPS 99), 1999.
    [159] Vapnik,V. Statistical Learning Theory. Wiley, 1998.
    [160] Vapnik,V. The Nature of Statistical Learning Theory. Springer Verlag, 1995.
    [161] Vapnik,V. The Nature of Statistical Learning Theory. Springer Verlag, 2000.
    [162] Vapnik,V.N. An overview of statistical learning theory. Neural Networks, IEEE Transactions on, 10(5), 1999, pp.988-999.
    [163] Vishwanathan, S.V.N., Borgwardt, K.M., Schraudolph, N.N. Fast computation of graph kernels. In Advances in 19th Neural Information processing systems(NIPS07) 2007.
    [164] Vishwanathan, S., Smola, A. Fast Kernels for String and Tree Matching. Advances in Neural Information Processing Systems, 15:569-576,2003.
    [165] Wahba,G. Support vector machines, reproducing kernel Hilbert spaces and the randomized GACV. In Scholkopf, B., Burges, C. J. C, & Smola, A. J. Advances in Kernel Methods - Support Vector Learning. MIT Press, 1999.pp. 69-88.
    [166] Wang, C., Scott, S.D. New Kernels for Protein Structural Motif Discovery and Function Classification. In 22nd International Conference on Machine Learning(ICML05), 2005.
    [167] Wang, K., Stolfo, S.J. One-Class Training for Masquerade Detection. In 3rd IEEE Conf Data Mining Workshop on Data Mining for Computer Security, 2003.
    [168] Warrender, S., Forrest, S., Pearlmutter, B. Detecting intrusions using system calls: Alternative data models. In: Proceedings of the IEEE Symposium on Security and Privacy, 133-145, 1999.
    [169] Watkins,C. Dynamic alignment kernels. In Smola, A. J., Bartlett, P., B. Scholkopf & Schuurmans, C. Advances in Large Margin Classifiers. MIT Press, 1999.
    [170] Watkins,C. Kernels from matching operations. Technical Report CSD-TR-98-07, Royal Holloway, University of London, Computer Science Department, July 1999.
    [171] Weiner, P. Linear pattern matching algorithms. In Proceedings of IEEE Symposium on Switching and Automata Theory, 1973, 1-11.
    [172] Weston,J. & Watkins,C. Support vector machines for multi-class pattern recognition. In Proceedings of the 6th European Symposium on Artificial Neural Networks (ESANN), 1999.
    [173]Weston,J.& Watkins,C.Multi-class Support Vector Machines.Technical Report,CSD-TR-98-04,May 1998,Department of Computer Science,Royal Holloway University of London,England.
    [174]Ye,N.,Li,X.,Chert,Q.,Emran,S.M.,Xu,M.Probabilistic techniques for intrusion detection based on computer audit data.IEEE Trans.On Systems,Man,and Cybernetics-Part A:Systems and Humans,31:266-274,2001.
    [175]Ye,N.,Zhang,Y.,Borror,C.M.Robustness of the Markov-chain model for cyber-attack detection.Reliability,IEEE Transactions on,2004,53(1),116-123.
    [176]Yeung,D.,Ding,Y.Host-based intrusion detection using dynamic and static behavioral models.Pattern Recognition,36:229-243,2003.
    [177]Yin,C.H.,Tian,S.E,Mu,S.M.Using Gap-insensitive String Kernel to Detect Masquerading.In Proceedings of Advanced Data Mining and Applications,Lecture Notes in Artificial Intelligence,3584:323-330,2005.
    [178]Yin,C.H.,Tian,S.F.,Mu,S.M.Detecting Anomalous Process Using Gapped String Kernels.Journal of Computational Information Systems,Vol.2,pp.1227-1233,2006.
    [179]Yin,C.H.,Tian,S.F.,Mu,S.M.A Fast Bit-parallel Algorithm for Gapped String Kernels.In Proceedings of International Conference on Neural Information Processing,Lecture Notes in Computer Science,4232:634-641,2006.
    [180]Zhang,A.,Wu,Z.L.,Li,C.H.,and Fang,K.T.On Hadamard-Type Output Coding in Multiclass Learning,J.Liu et al.(Eds.):IDEAL 2003,LNCS 2690,pp.397-404,2003.
    [181]边肇祺,张学工.模式识别(第二版),清华大学出版社,2000.
    [182]陈光英,张千里,李星.基于SVM分类机的入侵检测系统,通信学报,23(5)2002年pp51-56.
    [183]陈毅松,汪国平,董士海.基于支持向量机的渐进直推式分类学习算法.软件学报,14(3),2003,pp.451-460.
    [184]Cristianini,N.,Shawe-Taylot,J.李国正,王猛,曾华军译.支持向量机导论.北京,电子工出版社,2004.
    [185]邓乃扬,田英杰.数据挖掘中的新方法—支持向量机.北京:科学出版社,2004.
    [186]何清,史忠植,任力安.基于超曲面的多类分类方法.系统工程理论与实践,2003,3,pp.92-99.
    [187]李航.Beyond Binary Classification,技术报告,第四届机器学习及其应用研讨会.2006.
    [188]李辉,管晓宏,昝鑫,韩崇昭.基于支持向量机的网络入侵检测.计算机研究与发展,40(6),2003,pp.799-808.
    [189]柳金甫.应用随机过程.中国铁道出版社,2000.
    [190]李昆仑.支持向量机学习的扩展及其应用研究.博士论文,北京交通大学,2004.
    [191]饶鲜,董春曦,杨绍全.基于支持向量机的入侵检测系统.软件学报,14(4),2003,pp.798-803.
    [192]Shawe-Taylor,J.,Cristianini,N.赵玲玲,翁苏明,曾华军译.模式分析的核方法.北京,机械工业出版社,2006.
    [193]史忠植.知识发现,清华大学出版社,2002.
    [194]谭小彬,奚宏生,王卫平,殷保群.基于支持向量机的异常检测.中国科学技术大学学报.33(5),2003,pp.599-605.
    [195]陶卿,王珏,薛美盛.Support Vector的几何解释及其在联想记忆中的应用.计算机学报.10(10),PP.1111-1115.
    [196]田盛丰,黄厚宽,李洪波.基于支持向量机的手写体相似字识别.中文信息学报.14(3),2000,pp.37-42.
    [197]唐正军,李建华.入侵检测技术.北京,清华大学出版社,2004.
    [198]王建芬,曹元大.支持向量机在大类别数分类中的应用.北京理工大学学报.21(2),2001,pp.225-228.
    [199]瓦普尼克.许建华,张学工译.统计学习理论.北京,电子工业出版社,2004.
    [200]瓦普尼克.张学工译.统计学习理论的本质.北京,清华大学出版社,2000.
    [20l]王珏,石纯一.机器学习研究.广西师范大学学报(自然科学版).21(2),2003,pp.1-15.
    [202]王珏,周志华,周傲英.机器学习及其应用.北京:清华大学出版社.2006.
    [203]肖健华.智能模式识别方法.广州:华南理工大学出版社,2005.
    [204]尹传环,田盛丰,牟少敏.一种面向间隙核函数的快速算法.电子学报.5,2007.
    [205]尹清波,张汝波,李雪耀,王慧强.基于线性预测与马尔可夫模型的入侵检测技术研究.计算机学报,2005,28(5):900-907.
    [206]张莉,支撑矢量机与核方法研究.博士论文,西安电子科技大学,2002.
    [207]张学工.关于统计学习理论与支持向量机.自动化学报.26(1),2000,pp.31-42.
    [208]周伟达.核机器学习方法研究.博士论文,西安电子科技大学,2003.