鲁棒最小二乘支持向量机研究与应用

英文题名：Research and Application of Robust Least Squares Support Vector Machines
作者：刘京礼
论文级别：博士
学科专业名称：管理科学与工程
中文关键词：最小二乘支持向量机 ; 鲁棒性 ; 特征选择 ; 稀疏性
英文关键词：Least Squares Support Vector Machines ; robust ; feature selection ; sparse
学位年度：2010
导师：徐伟宣 ; 石勇
学科代码：1201
学位授予单位：中国科学技术大学
论文提交日期：2010-05-01

摘要

二分类问题是统计学习理论、机器学习以及人工智能中研究的一个重要问题。支持向量机模型采用结构风险极小化原则和核函数方法来构造分类模型,模型比较简单,解具有唯一性。最小二乘支持向量机模型使用误差均方和作为目标函数,把二次规划模型的求解转化成求解线性方程组,克服了支持向量机模型求解二次规划计算量大的问题。但是最小二乘支持向量机模型中的等式约束以及目标函数中的均方误差和使得模型的解丢失了稀疏性,降低了解的鲁棒性。
     由于随机的或者非随机过程的存在,现实生活中的数据经常带有噪声和不确定性。数据的噪声以及不确定性会影响统计学习分类算法模型的性能,降低分类的准确率及其分类模型的推广能力。支持向量机和最小二乘支持向量机模型都是采用了固定范数的目标函数,这种建立模型的方法不能够很好的适应各种各样的数据结构,从而使得模型的适应能力较弱。为了加强最小二乘支持向量机模型的鲁棒性和稀疏性,增强其推广能力,使模型能够根据数据结构自动进行调整,本文主要开展了以下几个方面的工作：
     1.系统整理了文献中对支持向量机模型(SVM)和最小二乘支持向量机模型(LS-SVM)中改进鲁棒性的方法,并指出这些改进模型存在的问题和缺陷。从而得到了本文将要研究的主要问题,即以加强最小二乘支持向量机模型的稀疏性、鲁棒性和可解释性为目的,对原有模型进行了较大的改进,给出了基于最小二乘支持向量机模型的有效二分类算法模型。
     2.针对最小二乘支持向量机模型丢失稀疏性和鲁棒性的原因,提出了使用核主成分法对样本数据中存在的噪声特征进行剔除,并借鉴先前的增强最小二乘支持向量机模型稀疏性的方法,对特征进行压缩,给出了一个双层L1范数最小二乘支持向量机模型—KPCA一L1-LS-SVM.通过使用KPCA方法,可以有效的进行特征抽取和提取。同时以L1范数作为目标函数,可以有效的消除噪声点对模型推广能力的影响,并使模型的解更稀疏,从而可以降低计算的复杂度。在仿真数据集和基准数据库上对该模型的测试表明该方法是有效的。
     3.在实际的二分类问题中,由于噪声点或者噪声特征的存在使得样本的标签会出现不确定的情况。分类模型应该能够自动判别哪些是相对重要的点,哪些是受噪声点影响较大的样本,从而在分类函数的构造中剔除这种样本。模糊隶属度的概念则可以用来描述样本标签的不确定性。本文采用L1范数作为目标函数以及模糊隶属度的概念可以构造出一个具有稀疏性和鲁棒性的基于最小二乘支持向量机的分类模型—模糊L1-LS-SVM.在测试数据集上的测试表明这个模型同样可以消除噪声点的影响,并具有较好的可解释性。
     4.在分类问题中,不同的样本在分类函数的构造中所起的作用是不同的。在分类函数的构造中,样本所包含的判别信息越是重要,相应的样本对分类模型的构造所起的作用就越大。因此,为了区别不同样本对于决策函数构造的不同作用,可以对包含重要信息的样本赋予较大的权重,而包含次要信息的样本所对应的权重就会较小。通过这种赋权的方法也可以消除噪声点对分类模型的影响,使得模型具有鲁棒的特征。无论是支持向量机还是最小二乘支持向量机模型,在目标函数中都使用固定的Lp范数,这是一种基于先验知识的建模方法,不能适应各种各样复杂的数据结构。从模型更好的适应数据的角度出发,本文提出了一个赋权鲁棒最小二乘支持向量机模型—RW-Lp-LS-SVM.在仿真数据集以及UCI基准数据库上的测试表明该模型具有鲁棒性特征,稀疏性好,具有较好的解释能力。
     5.信用评估数据库所包含的数据类型比较特殊,其类别比例极不均衡。为了检验本文所提出的三个模型的分类性能,我们使用这三个模型在三个信用数据库上进行测试,所得到的结果说明模型能够较好的适应信用数据库类别不均衡的特点,因而可以作为信用风险评价的备选模型。
Binary classification is a wildly studied topic in statistical learning theory, machine learning and artificial intelligence. Support Vector Machines (SVM) adopts structural risk minimization principle and kernel method. It is a simple quadratic programming and has a unique solution. The objective of the Least Squares Support Vector Machines (LS-SVM) is a sum of squares error (SSE) term, thus its solution is obtained by solving a linear formulation equations, which makes it easier to be solved. The drawback of LS-SVM is that sparseness is lost in the solution because the use of equality constraints and the SSE. The solution in LS-SVM is also less robust.
     The real data sets are often accompanied with noise and uncertainty because of the randomness and no randomness. The noise and uncertainty may have a great impact on the classification model, which reduce the classification accuracy and the generalization ability of the model. Both SVM and LS-SVM adopt fixed objective function, which is a statistical learning method based on prior knowledge. The model construction in SVM and LS-SVM may not be adaptive to various kinds of data sets, thus makes the generalization worse. This thesis is mainly focus on how to improve the sparseness and robustness of LS-SVM and increasing its generalization ability.
     1. This thesis made a systematic review on how to improve the robustness of SVM and LS-SVM. We also pointed out the drawbacks of the existing models, from which we derived our main research topics, i.e. how to obtain an efficient binary classification model based on previous LS-SVM and how to improve the sparseness, robustness and interpretability of the model.
     2. Concentrating on the less robustness and sparseness of LS-SVM, We proposed to use the kernel principle component analysis (KPCA) to reduce the noisy features of the data sets. Based on the original work on how to increase the sparseness of the LS-SVM, we gave a bi-level L1 LS-SVM model-KPCA-L1-LS-SVM. KPCA can efficiently extract features from the original features and the usage of L1 in the objective function of the programming makes KPCA-L1-LS-SVM efficiently reduce the effect of the noisy data on the model, which reduces the computational complexity. Several tests on the simulation and benchmarking data sets prove the efficiency of KPCA-L1-LS-SVM.
     3. The existence of noisy data and features makes the labels of the sample data uncertain in binary classification. An efficient classification model can automatically determine which the relatively important data are and which are less important. The less important data play a lesser role in the construction of the separating hyper-plane. The idea of fuzzy membership can be used to describe the uncertainty of the labels. By adopting the fuzzy membership and the L1 norm in the objective function, we proposed a new model, which is called fuzzy-L1-LS-SVM. The numerical tests on this model proved that it can get rid of the impact of noisy data on the solution and had good interpretability.
     4. Different data plays a different role in the construction of the decision function. The more important the information contained in the data, the more important in the construction of the separating plane. To differentiate the different role of the data in the formulation of the decision function, the thesis proposed to assign a heavy weight on the more important data, while the less important data may be assigned a small weight. The weight can also get rid of the negative impact on the classification model to some extent, thus makes the model a robust one. The use of fixed Lp norm in the objective function in SVM and LS-SVM is not a data-driven model, which makes it less suitable for various complex data structure. In order to be more adaptive to the data structure, a weighted robust LS-SVM model is proposed. The simulation and the UCI benchmarking data tests proved that the model is robust, sparse and have good interpretability.
     5. The credit evaluation data sets have a very special data structure, which has unbalanced category. We tested the three models on the two UCI credit data sets and a credit data set of an anonymous American bank to prove the efficiency of these three models. The results showed the models are efficient in handling the kind of unbalanced data sets and can be an alternative tool in credit risk evaluation.

引文

A.N.Tikhonov 1963. on Solving ill-posed problem and method of regularization[J]. Doklady Akademii Nauk,153,501-504.
    Abdou, H., J. Pointon, et al.2008. Neural nets versus conventional techniques in credit scoring in Egyptian banking[J]. Expert Syst. Appl.,35:3,1275-1292.
    Amaldi, E., V. Kann 1998. On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems[J]. Theoretical Computer Science,209:1-2,237-260.
    Asuncion, A., D. J. Newman 2007. UCI Machine Learning Repository, University of California, School of Information and Computer Science.
    Baesens, B., T. Van Gestel, et al.2003. Benchmarking state-of-the-art classification algorithms for credit scoring[J]. Journal of the Operational Research Society,54:6,627-635.
    Bell, A., T. Sejnowski 1995. An Information-Maximization Approach to Blind Separation and Blind Deconvolution[J]. Neural Computation,7:6,1129-1159.
    Ben-David, A., E. Frank 2009. Accuracy of machine learning models versus "hand crafted" expert systems-A credit scoring case study[J]. Expert Syst. Appl.,36:3,5264-5271.
    Ben-Tal, A., A. Nemirovski 1998. Robust Convex optimization[J]. Mathematics of Operational Research,23:4,769-805.
    Bishop, C. M.1995. Neural Networks for Pattern Recognition[M], Oxford University Press.
    Bishop, C. M.1995. Neural networks for pattern recognition[M]. Oxford, UK, Oxford University Press.
    Blum, A., R. Rivest 1993. Training a 3-node neural network is NP-complete:9-28.
    Borg,1., P. Groenen 1997. Modern Multidimensional Scaling[M]. Berlin, Springer-Verlag.
    Boser B. E., G.I. M., V. Vapnik 1992. a Training Algorithm for Optimal Margin Classifiers[C].5th Annual ACM workshop on Computational Learning Theory, Pittsburgh,PA, ACM press.
    Bradley, P. S., O. L. Mangasarian 1998. Feature Selection via Concave Minimization and Support Vector Machines[C]. Proceedings of the Fifteenth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc.
    Bredensteiner, E. J., K. P. Bennett 1998. Feature minimization within decision trees[J]. Comput. Optim. Appl.,10:2,111-126.
    Breiman, L., J. Friedman, et al.1984. Classification and Regression Trees[M]. Belmont, California, Wads worth.
    Brito, A. E.2001. Iterative adaptive extrapolation applied to SAR image formation and sinusoidal recovery [D]:[PhD].Dept. of Electrical and Computer Engineering.
    Burges, C. J. C.1998. A tutorial on Support Vector Machines for pattern recognition[J]. Data Mining and Knowledge Discovery,2:2,121-167.
    Burges, C. J. C.2004. Geometric methods for feature extraction and dimensional reduction:a guided tour., Microsoft Research, University of Toronto.
    C.Saunders, M. O. S., J.Weston,L.Bottou,B.Scholkopf,A.J.Smola 1998. support vector machine reference manual. Tech.Rep. London, Royal Holloway University.
    Cao, L. J., H. P. Lee, et al.2003. Modified support vector novelty detector using training data with outliers[J]. Pattern Recognition Letters,24:14,2479-2487.
    Chapelle, O.2007. Training a support vector machine in the primal[J]. Neural Computation,19:5, 1155-1178.
    Chen, L., T. Chiou 1999. A fuzzy credit-rating approach for commercial loans:a Taiwan case[J]. Omega,27:4,407-419.
    Chen, S., D. Donoho 1994. Basis Pursuit[C].28th Asilomar Conference Signals,Systmes Computers, Asilomar.
    Chen, S., D. Donoho, et al.2001. Atomic decomposition by basis pursuit[J]. SIAM review,43:1, 129-159.
    Chen, Y. X., J. Z. Wang 2003. Support vector learning for fuzzy rule-based classification systems[J]. Ieee Transactions on Fuzzy Systems,11:6,716-728.
    Chiang, J. H., P. Y. Hao 2003. A new kernel-based fuzzy clustering approach:Support vector clustering with cell growing[J]. leee Transactions on Fuzzy Systems,11:4,518-527.
    Chiang, J. H., P. Y. Hao 2004. Support vector learning mechanism for fuzzy rule-based modeling:A new approach[J]. Ieee Transactions on Fuzzy Systems,12:1,1-12.
    Cooper, W. W., Z. Huang, et al.2004. Chance Constrained Dea. Handbook on Data Envelopment Analysis, Springer US.71:229-264.
    Cortes, C, V. Vapnik 1995. SUPPORT-VECTOR NETWORKS[J]. Machine Learning,20:3,273-297.
    Dantzig, G., G. Infanger 1993. Multi-stage stochastic linear programs for portfolio optimization[J]. Annals of Operations Research,45:1,59-76.
    Dantzig, G. B.1955. Linear Programming Under Uncertainty[J]. Management Science,3,4,197-206. Desai, V., J. Crook, et al.1996. A comparison of neural networks and linear scoring models in the credit union environment[J]. European Journal of Operational Research,95:1,24-37.
    Du, P., J. Peng, et al.2009. Self-adaptive support vector machines:modelling and experiments[J]. Computational Management Science,6:1,41-51.
    Duba, R. O., P. E. Hart 1973. Pattern Classification and Scene Analysis[M]. New York, Wiley Interscience.
    E. Osuna, R. F., F. Girosi 1996. Support Vector Machines:Training and Applications. A. I. Memo AIM-1602,MITA.I.Lab.
    E. Osuna, R. F., F. Girosi 1997. An Improved Training Algorithm for Support Vector Machines[C]. Neural Networks for Signal Processing VII, New York, IEEE.
    E.J.Bredensteiner 1997. Optimization methods in data mining and machine learning[D]:[PhD].Department of Mathmatical sciences.
    Fan, J. Q., J. C. Lv 2010. A SELECTIVE OVERVIEW OF VARIABLE SELECTION IN HIGH DIMENSIONAL FEATURE SPACE[J]. Statistica Sinica,20:1,101-148.
    Fogel, D. B.1995. Evolutionary computation:toward a new philosophy of machine intelligence[M], IEEE Press.
    Friedman, J.1987. Exploratory projection pursuit[J]. Journal of the American statistical association, 82:397,249-266.
    Fung, G., O. L. Mangasarian 2000. Data selection for support vector machine classifiers[C]. Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, Boston, Massachusetts, United States, ACM.
    Fung, G., O. L. Mangasarian 2001. Proximal support vector machine classifiers[C]. Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, San Francisco, California, ACM.
    Fung, G. M., O. L. Mangasarian, et al.2003. Minimal kernel classifiers[J]. Journal of Machine Learning Research,3:2,303-321.
    Girosi, F.1998. An equivalence between sparse approximation and support vector machines[J]. Neural Computation,10:6,1455-1480.
    Golub, G. H., V. L. C. F.1989. Matrix Computations[M]. Baltimore MD, Johns Hopkins University Press.
    Guyon, I., A. Elisseeff 2003. An introduction to variable and feature selection[J]. The Journal of Machine Learning Research,3,1157-1182.
    Guyon, I., A. Elisseeff 2003. An introduction to variable and feature selection[J]. Journal of Machine Learning Research,3,1157-1182.
    Guyon, I., S. Gunn, et al.2006. Feature Extraction:Foundations and Applications[M], Springer Verlag.
    Guyon,I., J. Weston, et al.2002. Gene selection for cancer classification using support vector machines[J]. Machine Learning,46:1-3,389-422.
    Henley, W., D. Hand 1997. Construction of a k-nearest-neighbour credit-scoring system[J]. IMA Journal of Management Mathematics,8:4,305.
    Herbrich, R., J. Weston 1999. Adaptive margin support vector machines for classification[J]. Ninth International Conference on Artificial Neural Networks (Icann99), Vols 1 and 2:470,880-885.
    Hoegaerts, L., J. A. K. Suykens, et al.2004. A comparison of pruning algorithms for sparse least squares support vector machines[J]. Neural Information Processing,3316,1247-1253.
    Holland, J.1975. Adaptation in natural and artificial systems[M]. Ann Arbor, The University of Michigan Press.
    Hotelling, H.1933. Analysis of a complex of statistical variables into principal components[J]. Journal of educational psychology,24:6,417-441.
    Huang, C. L., M. C. Chen, et al.2007. Credit scoring with a data mining approach based on support vector machines[J]. Expert Systems with Applications,33:4,847-856.
    Huang, J., G. Tzeng, et al.2006. Two-stage genetic programming (2SGP) for the credit scoring model[J]. Applied Mathematics and Computation,174:2,1039-1053.
    Huang, K., D. Zheng, et al.2009. Arbitrary norm support vector machines[J]. Neural Comput.,21:2, 560-582.
    Huang, K. Z., I. King, et al.2008. Direct Zero-norm Optimization for Feature Selection. Icdm 2008: Eighth Ieee International Conference on Data Mining, Proceedings. D. Gunopulos, F. Turini, C. Zaniolo, N. Ramakrishnan and X. D. Wu. Los Alamitos, Ieee Computer Soc:845-850.
    Huang, W., K. K. Lai, et al.2008. A Least Squares Bilateral-Weighted Fuzzy SVM Method to Evaluate Credit Risk[C]. Proceedings of the 2008 Fourth International Conference on Natural Computation-Volume 07, IEEE Computer Society.
    Huang, Z., H. C. Chen, et al.2004. Credit rating analysis with support vector machines and neural networks:a market comparative study[J]. Decision Support Systems,37:4,543-558. Huber, P. J.1981. Robust Statistics[M]. New York, John Wiley&Sons.
    Inoue, T., S. Abe 2001. Fuzzy support vector machines for pattern classification[J]. Ijcnn'01: International Joint Conference on Neural Networks, Vols 1-4, Proceedings,1449-1454.
    Jain, A. K., R. P. W. Duin, et al.2000. Statistical pattern recognition:A review[J]. Ieee Transactions on Pattern Analysis and Machine Intelligence,22:1,4-37.
    Joachims, T.1999. Making Large-scale SVM Learning Practical[C]. Advances in Kernel Methods-Support Vector Learning, Cambridge, MA, MIT Press.
    Jr., W. E. H., J. L. A. Jr.1985. A linear programming alternative to discriminant analysis in credit scoring[J]. Agribusiness,1:4,285-292.
    Jung, J., D. O'leary, et al.2008. Adaptive constraint reduction for training support vector machines[J]. Electronic Transactions on Numerical Analysis,31,156-177.
    Kall, P., S. W. Wallace 1994. Stochastic Programming[M]. Chichester, John Wiley&Sons.
    Kanal, L.1974. PATTERNS IN PATTERN-RECOGNITION-1968-1974[J]. Ieee Transactions on Information Theory,20:6,697-722.
    Knight, K., W. Fu 2000. Asymptotics for lasso-type estimators[J]. Ann. Statist.,28:5,1356-1378.
    Kohavi, R., G. H. John 1997. Wrappers for feature subset selection[J]. Artificial Intelligence,97:1-2, 273-324.
    Koller, D., M. Sahami 1996. Toward optimal feature selection[C]. Proc.13th International Conference on Machine Learning, Bari, Italy.
    Kou, G., Y. Peng, et al.2005. Discovering Credit Cardholders'Behavior by Multiple Criteria Linear Programming[J]. Annals of Operations Research,135:1,261-274.
    Kurzynski, M., E. Puchala 2006. The optimal feature extraction procedure for statistical pattern recognition[J]. Computational Science and Its Applications-Iccsa 2006, Pt 3,3982,1210-1215.
    Lawson, C, R. Hansen 1974. Solving least squares Problems[M], Englewood Cliffs:Prentice Hall.
    Lee, C, D. A. Landgrebe 1993. Feature Extraction Based on Decision Boundaries[J]. leee Transactions on Pattern Analysis and Machine Intelligence,15:4,388-400.
    Lee, T., C. Chiu, et al.2006. Mining the customer credit using classification and regression tree and multivariate adaptive regression splines[J]. Computational Statistics & Data Analysis,50:4,1113-1130.
    Lee, T. H., M. Zhang 1996. Detection of uneven bias and score alignment in developing segmented credit scoring model through logistic regression approach[C].
    Lee, Y. J., O. L. Mangasarian 2001. SSVM:A smooth support vector machine for classification[J]. Computational Optimization and Applications,20:1,5-22.
    Lin, C. F., S. D. Wang 2002. Fuzzy support vector machines[J]. IEEE Transactions on neural networks, 13:2,464-471.
    Lin, C. F., S. D. Wang 2004. Training algorithms for fuzzy support vector machines with noisy data[J]. Pattern Recognition Letters,25:14,1647-1656.
    Lin, C. F., S. D. Wang 2005. Fuzzy support vector machines with automatic membership setting[J]. Support Vector Machines:Theory and Applications,177,233-254.
    Liu, H., L. Yu 2005. Toward integrating feature selection algorithms for classification and clustering[J]. leee Transactions on Knowledge and Data Engineering,17:4,491-502.
    Liu, J. L., J. P. Li, et al.2004. Data mining approach in scientific research organizations evaluation via clustering. Data Mining and Knowledge Management. Y. Shi, W. Xu and Z. Chen. Berlin, Springer-Verlag Berlin.3327:128-134.
    Liu, J. L., J. P. Li, et al.2009. A Two-Layer Least Squares Support Vector Machine Approach to Credit Risk Assessment. Cutting-Edge Research Topics on Multiple Criteria Decision Making, Proceedings. Y. Shi, S. Wang, Y. Peng, J. Li and Y. Zeng. Berlin, Springer-Verlag Berlin.35:566-572.
    Liu, Y, H. H. Zhang, et al.2007. Support vector machines with adaptive Lq penalty[J]. Comput. Stat. Data Anal.,51:12,6380-6394.
    Liu, Y. F., H. H. Zhang, et al.2007. Support vector machines with adaptive L-q penalty[J]. Computational Statistics & Data Analysis,5].12,6380-6394.
    Liu, Z. Q., S. L. Lin, et al.2010. Sparse Support Vector Machines with L-p Penalty for Biomarker Identification[J]. Ieee-Acm Transactions on Computational Biology and Bioinformatics,7:1,100-107.
    Mangasaian, O. L., D. R. Musicant 2001. Lagrangian support vector machines[J]. Journal of Machine Learning Research,1:3,161-177.
    Mangasarian, O. L.1999. Arbitrary-norm separating plane[J]. Operations Research Letters,24:1-2, 15-23.
    Mangasarian, O. L.2006. Exact 1-norm support vector machines via unconstrained convex differentiable minimization[J]. Journal of Machine Learning Research,7,1517-1530.
    Mangasarian, O. L., D. R. Musicant 2001. Active support vector machine classification[J]. Advances in Neural Information Processing Systems 13,13,577-583.
    Mercer, J.1908. Functions of positive and negative type, and their connection with the theory of integral equations[C]. Proc. Roy. Soc., London.
    Mosteller, F., T. J. W.1968. Data Analysis, including statistics[M]. Reading, Mass, Addison-Wesley.
    Nagy, G.1968. State of the Art in Pattern Recognition[J]. Proceedings of the IEEE,56:5,836-862.
    Narendra, P, K. Fukunaga 1977. BRANCH AND BOUND ALGORITHM FOR FEATURE SUBSET SELECTION[J]. Ieee Transactions on Computers,26:9,917-922.
    Nguyen, M., F. de la Torre 2009. Optimal feature selection for support vector machines[J]. Pattern Recognition,43:3,584-591.
    Nilsson, N. J.1996. Introduction to Machine Learning.
    Novikoff, A. B. J.1962. On Convergence Proofs on Perceptrons[M].
    O.L.Mangasarian 2002. a finite newton method for classification[J]. Optimization methods and software,17,913-929.
    Ong, C, J. Huang, et al.2005. Building credit scoring models using genetic programming[J]. Expert Systems with Applications,29:1,41-47.
    Park, K. N., S. C. Jeong, et al.2003. a study on the development of the integrative credit rating systems using rough set analysis[J]. International Journal of Industrial Engineering-Theory Applications and Practice,10:4,345-352.
    Pedroso, J. P., N. Murata 2001. Support vector machines with different norms:motivation, formulations and results[J]. Pattern Recogn. Lett,22:12,1263-1272.
    Platt, J.1999. Fast Training of Support Vector Machines using Sequential Minimal Optimization[C]. Advances in Kernel Methods-Support Vector Learning, Cambridge, MA, MIT Press.
    Qing, T., W. GaoWei, et al.2008. A general soft method for learning SVM classifiers with L1-norm penalry[J]. Pattern Recogn.,41:3,939-948.
    R.J.Solomonoff 1960. a preliminary report on general theory of inductive inference. Technical report ZTB-138. Cambridge, MA, Zator Company.
    Reed, R.1993. Pruning algorithms-a survey[J]. IEEE Transactions on neural networks,4:5,740-747. Rissanen, J.1978. Modeling by shortest data description[J]. Automatica,14,465-471.
    S. S. Keerthi, S. K. S., C. Bhattacharyya, K. R. K. Murthy 1999. Improvements to Platt's SMO algorithm for SVM classifier design, National University of Singapore.
    S.S.Keerthi, D. M. D.2005. a modified finite newton method for fast solution of large scale linear SVMS[J]. Journal of Machine Learning Research,6,341-361.
    Saunders, C, Gammerman, et al.1998. Ridge Regression Learning Algorithm in Dual Variables[C]. 15th International Conference on Machine Learning (ICML'98).
    Scholkopf, B., C. J. C. Burges, et al., Eds.1998. Advances in kernel methods-support vector learning[M], The MIT Press.
    Scholkopf, B., A. Smola, et al.1998. Nonlinear component analysis as a kernel eigenvalue problem[J]. Neural Computation,10:5,1299-1319.
    Schurmann, J.1996. Pattern Classification:a Unified View of Statistical and Neural Approaches[M], John Wiley&Sons.
    Schlimmer, J. C.1993. Efficiently inducing determinations:A complete and systematic search algorithm that uses optimal pruning[C]. Proceedings of Tenth International Conference on Machine Learning.
    Shawe-Taylor, J., N. Cristianini 2004. Kernel Methods for pattern analysis[M], Cambridge University Press.
    Song, M. H., J. Lee, et al.2005. Support vector machine based arrhythmia classification using reduced features[J]. International Journal of Control Automation and Systems,3:4,571-579.
    Song, Q., W. Hu, et al.2002. Robust support vector machine with bullet hole image classification[J]. IEEE transactions on systems, man and cybernetics. Part C, Applications and reviews 32:4,440-448.
    Stone, M.]974. Cross-validatory Choice and Assessment of Statistical Predictions[J]. Journal of the Royal Statistical Society, Series B (Methodological) 36:2,111-147.
    Sun, D., J. Li, et al.2008. Credit risk evaluation:support vector machines with adaptive Lq penalty[J]. Journal of Southeast Univeristy (English Edition),24,33-36.
    Suykens, J. A. K., J. De Brabanter, et al.2002. Weighted least squares support vector machines: robustness and sparse approximation[J]. Neurocomputing,48,85-105.
    Suykens, J. A. K., T. V. Gestel, et al.2002. Least Squares Support Vector Machines[M], World Scientific Pub.Co.
    Suykens, J. A. K., L. Lukas, et al.2000. Sparse approximation using least squares support vector machines[J]. Iscas 2000:Ieee International Symposium on Circuits and Systems-Proceedings, Vol li, 757-760.
    Suykens, J. A. K., J. Vandewalle 1999. Least squares support vector machine classifiers[J]. Neural Processing Letters,9:3,293-300.
    T.Kohonen 1995. Self-Organizing Maps[M]. Berlin, Springer.
    Theodoridis, S., K. Koutroumbas 2003. Pattern Recognition[M], Elsevier Academic Press.
    Thomas, B.1996. Evolutionary algorithms in theory and practice:evolution strategies, evolutionary programming, genetic algorithms[M], Oxford University Press.
    Thomas, B., U. Hammel, et al.1997. Evolutionary computation:Comments on the history and current state[J]. IEEE Transactions on Evolutionary Computation,1:1,3-17.
    Thomas, L. C.2000. A survey of credit and behavioural scoring:forecasting financial risk of lending to consumers[J]. International Journal of Forecasting,16:2,149-172.
    Tibshirani, R.1996. Regression shrinkage and selection via the lasso[J]. Journal of the Royal Statistical Society. Series B (Methodological),58:1,267-288.
    Tipping, M., C. Bishop 1999. Probabilistic principal component analysis[J]. Journal of the Royal Statistical Society. Series B, Statistical Methodology,611-622.
    Tipping, M. E.2000. The relevance vector machine[C]. Advances in Neural information processing systems, MIT Press.
    Torkkola, K.2003. Feature extraction by non parametric mutual information maximization[J]. J. Mach. Learn. Res.,3,1415-1438.
    Tsai, C, J. Wu 2008. Using neural network ensembles for bankruptcy prediction and credit scoring[J]. Expert Systems with Applications,34:4,2639-2649.
    V.N.Vapnik, A.Ja.CHervonenkis 1974. Theory of Pattern Recognition[M]. Nauka,Moscow.
    V.Vapnik 1982. Estimation of Dependencies based on Empirical Data[M]. New York, Springer.
    V.Vapnik 1995. the Nature of Statistical Learning Theory[M]. New York, Springer-Verlag.
    V.Vapnik 1998. Statistical Learning Theory[M]. New York, John Wiley and Sons Inc.
    V.Vapnik, A. R. S.1978. Nonparametric methods for estimating probability densities[J]. Autom. and Remote Contr.,8.
    Valyon, J., G. Horvath 2003. A Weighted Generalized LS-SVM[J]. Periodica Polytechnica Ser. El. Eng.,47:3-4,229-251.
    van Gestel, T., J. A. K. Suykens, et al.2004. Benchmarking least squares support vector machine classifiers[J]. Machine Learning,54:1,5-32.
    Vapnik, V.1998. Statistical Learning Theory[M]. New York, John Wiley and Sons Inc.
    Wang, Y. Q., S. Y. Wang, et al.2005. A new fuzzy support vector machine to evaluate credit risk[J]. Ieee Transactions on Fuzzy Systems,13:6,820-831.
    Wei, L. W., Z. Y. Chen, et al.2007. Sparse and robust least squares support vector machine:a linear programming formulation[M]. New York, Ieee.
    Weise, T., S. Achler, et al.2007. Evolving Classifiers-Evolutionary Algorithms in Data Mining. Report Nr.:Kasseler Informatikschriften 2007,4, Citeseer.
    Weston, J., A. Elisseeff, et al.2003. Use of the Zero-Norm with Linear Models and Kernel Methods [J]. Journal of Machine Learning Research,3,1439-1461.
    Wolper, D. H.1992. Stacked Generalization[J]. Neural Networks,5,241-259.
    Yu, L., K. K. Lai, et al.2006. Credit Risk Assessment with Least Squares Fuzzy Support Vector Machines[C]. Proceedings of the Sixth IEEE International Conference on Data Mining-Workshops, IEEE Computer Society.
    Zhou, J., T. Bai 2008. Credit Risk Assessment Using Rough Set Theory and GA-Based SVM[C]. Proceedings of the 2008 The 3rd International Conference on Grid and Pervasive Computing-Workshops, IEEE Computer Society.
    Zhu, J., S. Rosset, et al.2004.1-norm support vector machines[C]. Advances in neural information processing systems, MIT press.
    邓乃扬,田英杰2005.数据挖掘中的新方法-支持向量机[M].北京,科学出版社.
    魏利伟2008.多目标规划数据挖掘分类算法研究及应用[D]:[PhD].中国科学院科技政策与管理科学研究所.
    张饶庭,方开泰2003.多元统计分析引论[M].北京,科学出版社.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700