支持向量机在个人信用评估中的应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
个人信用评估是商业银行风险管理的重要组成部分,国外银行界对于信用评估的研究已经有50多年的历史,发展出了包括统计方法和非统计方法两大类许多种方法。支持向量机(SVM)是近年来在统计学习理论的基础上发展起来的一种新的机器学习方法,它具有很强的泛化能力。本文的研究中心就是支持向量机在个人信用评估中的应用,引入遗传算法作为筛选属性变量和调节参数的优化算法,建立基于遗传算法和支持向量机的个人信用评估模型。最后将支持向量机作为AdaBoost算法的基础学习器,建立AdaBoost-SVM模型,应用到个人信用评估,实证分析表明,此种模型较之单一的支持向量机更有效。
     本文所做的主要工作为:
     1、考虑到模型的输入变量和模型的参数之间存在着相互依赖性,本文引入遗传算法将属性变量选择和参数调节两方面的工作同时进行,同步优化,使支持向量分类器性能达到最优。
     2、提出了动态AdaBoost支持向量机模型。传统的AdaBoost算法在整个Boosting过程中使用同一个学习器,这样做的话会造成有的支持向量机过强,有的过弱,而最终Boosting效果欠佳,因此,我们在每一次Boosting过程中都通过调节参数使支持向量机精度仅略高于随机猜测,得到一个动态的AdaBoost支持向量机模型。实证分析表明,该模型优于普通的支持向量机。
Personal credit scoring is an important part of commercial banks' risk management. In the last 50 years, many credit scoring methods have been developed by foreign banks. Support Vector Machine(SVM) is a new machine learning method developed in recent years on the foundation of statistical learning theory. The focus of this thesis is to apply SVM on Crediting Scoring. In this paper, Genetic Algorithm was used to choose the optimal input feature subset and set the best kernel parameters simultaneously, establishing a credit scoring model named GA-SVM. In addition, SVM was applied as the basic learning machine of AdaBoost algorithm, establishing another credit scoring model named AdaBoost-SVM. Experimental results have shown that AdaBoost-SVM is better than GA-SVM, which is better than the usual SVM.
     The main job of this paper are following:
     1、The traditional methods of credit scoring prefer to do feature selection and parameters optimization independently. The correlation between them is not considered, prohibiting the global optimal results. This paper tries to combine feature selection with parameter optimation based on genetic algorithm during SVM modeling.
     2、Dynamic Boosting has been coupled with SVM to established a AdaBoost-SVM. Traditional AdaBoost prefers to use the identical learning machine during the boosting process. In this paper, we design a parameter adjusting strategy to get different and moderately accurate SVM component classifier for boosting. And good results have been obtained on benchmark data sets.
引文
[1]Crook,J.N.et al.Recent developments in consumer credit risk assessment.European Journal of Operational Research(2007),doi:10.1016/j.ejor.2006.09.100.
    [2]Chih-Hung Wu,Gwo-Hshiung Tzeng,Yeong-Jia Goo,.A real-valued genetic algorithm to optimize the parameters of support vector machine for predicting bankruptcy.Expert Systems with Applications.
    [3]Young-Chan Lee.Application of support vector machines to corporate credit rating prediction.Expert Systems with Applications,doi:10.1016/j.eswa.2006.04.018.
    [4]肖文兵,费奇.基于支持向量机的个人信用评估模型及最优参数选择研究[J].系统工程理论与实践,2006.10.
    [5]乔立岩,彭喜元.基于遗传算法和支持向量机的特征子集选择方法[J].电子测量与仪器学报,Vol.23 No.1 2006.
    [6]陈果.基于遗传算法的支持向量机分类器模型参数优化[J].机械科学与技术,Vol.26 No.3 2007.
    [7]边肇祺,张学工著.模式识别[M].北京:清华大学出版社,2002
    [8]邓乃扬,田英杰著.数据挖掘中的新方法——支持向量机[M].北京:科学出版社,2004.6
    [9]V.N.Vapnik.The nature of statistical learning theory.New York:Spring-Verlag,1998.
    [10]陈国良,王煦法,庄镇泉,.遗传算法及其应用[M].北京:人民邮电出版社,1999.5.
    [11]张学工,关于统计学习与支持向量机[J].自动化学报,Vol.26,No.1,Jan,2000.
    [12]Qing-Song Xu,Yi-Zeng Liang.Monte Carlo cross validation.Chemometrics and Intelligent Laboratory Systems 56(2001)1-11.
    [13]T.Hastie,R.Tibshirani,J.Friedman著.范明,柴玉梅等译.统计学习基础[M]——数据挖掘原理、推理与预测[M],北京:电子工业出版社,2004
    [14]卢纹岱,吴喜之.SPSS for windows统计分析[M].北京:电子工业出版社,2006.6
    [15]李玉霜,张维.分类树应用于商业银行贷款5分类的探讨[J].系统工程学报,2001.8,282-288
    [16]Valiant L.G..A Theory of the Learnable Communications of the ACM,1984, 27(11),1134-1142.
    [17]Schapire R.E..The Strength of Weak Learnability.Machine Learning,1990,5(2):197-227.
    [18]Freund Y.,Schapire R.E.,Decision—Theoretic Generalization of On—Line learning and an Application to Boosting.Journal of Computer and System Sciences,1997.55(1):119-139.
    [19]王彦峰,高风.基于支持向量机的股市预测.计算机仿真,2006年11月
    [20]Valentini,G.,Dietterich,T.G.,Bias-variance analysis of support vector machines for the development of SVM-based ensemble methods,Journal of Machine Learning Research.2004,5,725-775
    [21]Chih-Chung Chang and Chih-Jen Lin,LIBSVM:a library for support vector machines,2001.Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
    [22]Newman,D.J.& Hettich,S.& Blake,(1998).UCI Repository machine learning databases[http://www.ics.uci.edu/~mlearn/MLRepository.html].Irvine,CA:University of California,Department of Information and Computer Science.
    [23]Chih-Hung Wu,Gwo-Hshiung Tzeng,Wen-Chang Fang,A real-valued genetic algorithm to optimize the parameters of support vector machine for predicting bankruptcy,Expert Systems with Applications 31(2006)231-240.
    [24]雷英杰,张善文,李绪武,周创明.Matlab遗传算法工具箱及其应用[M].西安:西安电子科技大学出版社,2006.4.
    [25]Liang-Hsuan Chen,Huey-Der Hsiao,Feature Selection to diagnose a business crisis by using a real GA-based support vector machine:An empirical study,Expert Systems with Applications(2007).
    [26]Young-Chan Lee,Application of support vector machines to corporate credit rating prediction,Expert Systems with Applications 33(2007)67-74.
    [27]Cheng-Lung Huang,Mu-Chen Chen,Chieh-Jen Wang,Credit scoring with a data mining approach based on support vector machines,Expert Systems with Applications33(2007)847-856.
    [28]石庆焱,靳云汇,多种个人信用评分模型在中国应用的比较研究[J],统计研究,2006.4
    [29]董乐红,耿国华,高原.Boosting算法综述[J].计算机应用与软件,2006.8
    [30]郝红伟著.MATLAB 6实例教程[M].北京:中国电力出版社,2002.4
    [31]牛艳庆,胡宝清.给予模糊AdaBoost算法的支持向量回归机[J].模糊系统与数学,2006.4
    [32]Chien-Ming Huang,Yuh-Jye Lee,Model selection for Support Vector Machine via uniform design,Computational Statistics& Data Analysis(2007).doi:10.1016/j.csda.2007.02,013
    [33]D.Martens et al.,Comprehensible credit scoring models using rule extraction..,Eur.J.OPer.Res.(2007),doi:10.1016/j.ejor.2006.04.015
    [34]庞素琳.Logistic回归模型在信用风险分析中的应用[J].数学的实践与认识,2006.9
    [35]姚奕,叶中行.基于支持向量机的银行客户信用评估系统研究[J].系统仿真学报,2004.4
    [36]王强,沈永平,陈英武.支持向量机规则提取[J].国防科技大学学报,2006.2
    [37]Imran Kurt,Mevlut Ture,A.Turhan.Kurum.Comparing performances of logistic regression,classification and regression tree,and neural networks for predicting coronary artery disease.Experts System with Applications(2008)366-374
    [38]S.R.Amendolia,.A comparative study of K-Nearest Neighbour,Support Vector Machine and Multi-Layer Perceptron for Thalassemia screening.Chemometrics and intelligent laboratory systems(2003)13-20
    [39]Sheng-Tun Li,.The evaluation of consumer loans using support vector machines.Expert System with Applications(2006)772-782
    [40]王华忠,俞金寿.核函数方法及其参数选择[J].江南大学学报,2008.8
    [41]M.H.Zhang,Q.S.Xu,.Application of boosting to classification problems in chemometrics.Analytica Chimica Acta(2005)167-176
    [42]Yan-Ping Zhou,.Boosting support vector regression in QSAR studies of bioactivities of chemical compounds.Pharmaceutical Sciences(2006)344-353
    [43]琚旭,王浩.基于Boosting的支持向量组合分类器[J].合肥工业大学学报,2006.10
    [44]陈爱斌.基于支持向量机的车型识别:[硕士学位论文].长沙:中南大学,2004
    [45]刘燕.SVM在个人房贷信用风险评估中的应用研究:[硕士学位论文].大连:大连理工大学,2005
    [46]张强,杨子龙,盛刚.浅谈SVM及其与KNN的联系[J].电脑知识与技术,2005.5
    [47]邓小文,支持向量机参数选择方法分析[J].福建电脑,2005.11
    [48]徐少峰,王延臣.个人信用评估中的LOGSTIC模型[J].天津轻工业学院学报,2003.12
    [49]孙华丽,谢剑英,.基于支持向量机的移动电话顾客满意评价系统[J].计算机仿真,2005.10
    [50]油永华.企业信用状况的定性评价[J].统计与信息论坛,2006.11
    [51]So Young Sohn,Hong Sik Kim.Random effects logistic regression model for default prediction of technology credit guarantee fund.European Journal of Operational Research(2007)472-478
    [52]David Martens,.Comprehensible credit scoring models using rule extraction from support vector machine.European Journal of Operational Research(2007)
    [53]姜明辉,王欢,王雅林.分类树在个人信用评估中的应用[J].商业研究,2003.21.86-88
    [54]巴曙松.商业银行信用评分体系的风险管理职能及其影响[J].重庆金融.2001,6:3-5.
    [55]Thomas L C.A survey of credit and behavioral scoring:forecasting financial risk of lending to customers[J].International Journal of Forecasting.2000,16:149-172.
    [56]钟波,肖智,刘朝林,陈玲.基于LS-SVM的信用评价方法[J].统计研究,2005,11:29
    [57]L.C.Thomas,D.B.Edelman,,J.N.Crook,Credit Scoring and Its Applications[M],Society for Industrial and Applied Mathematics,2002.
    [58]邵峰晶,于忠清编著.数据挖掘原理与算法[M],-北京:中国水利水电出版社,2003.8
    [59]石庆焱.个人信用评分的主要模型和方法综述[J].统计研究,2003,(8):36-39.
    [60]Thomas,L.C.,Edelman D.B.and Jonathan N.Crook(2002),Credit Scoring and Its Application,SIAM monographs on mathematical modeling and Computation,Philadelphia.