用户名: 密码: 验证码:
一种改进过采样算法在类别不平衡信用评分中的应用
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Application of improved oversampling algorithm in class-imbalance credit scoring
  • 作者:邵良杉 ; 周玉
  • 英文作者:Shao Liangshan;Zhou Yu;System Engineering Institute,Liaoning Technical University;
  • 关键词:信用评分 ; 类别不平衡 ; SDSMOTE算法 ; Fisher准则 ; 支持向量机 ; 集成学习
  • 英文关键词:credit scoring;;class-imbalance;;SDSMOTE algorithm;;Fisher criterion;;support vector machine;;ensemble learning
  • 中文刊名:JSYJ
  • 英文刊名:Application Research of Computers
  • 机构:辽宁工程技术大学系统工程研究所;
  • 出版日期:2018-04-08 10:51
  • 出版单位:计算机应用研究
  • 年:2019
  • 期:v.36;No.332
  • 基金:国家自然科学基金资助项目(71371091);; 辽宁省社会规划项目(L14BTJ004)
  • 语种:中文;
  • 页:JSYJ201906019
  • 页数:5
  • CN:06
  • ISSN:51-1196/TP
  • 分类号:89-93
摘要
针对信贷行业信用评分业务中存在的样本类别不平衡问题,首先在信用评分各影响因素Fisher比率值分析的基础上确定主要评判指标;而后以基于支持度的过采样算法(SDSMOTE)为样例合成算法,支持向量机(SVM)为基预测器,Boosting算法为框架,构建基于Fisher-SDSMOTE-ESBoostSVM的类别不平衡信用评分预测模型;在基分类器训练结束后引入淘汰策略,删除未被正确分类的合成样例,重新生成正类样例并修正样例权重;最后以UCI数据库中德国信用数据集为实验样本,F-measure值和G-mean值为评价指标,对比分析FisherSDSMOTE-ESBoostSVM与其他集成学习算法的预测结果。实验结果表明,Fisher-SDSMOTE-ESBoostSVM算法应用到信贷行业客户信用评分预测中具有可行性和适应性,且预测准确率较高,具有一定的实际应用价值。
        In view of class-imbalance in real credit scoring business of credit industry,this paper firstly determined the main evaluation indicators of credit scoring based on a comprehensive analysis of the influence factors' Fisher ratio value. Then,it chose the SMOTE based on support degree( SDSMOTE) oversampling algorithm to synthesize new samples,SVM played as the base predictor and Boosting algorithm as the framework,this paper proposed a credit scoring prediction model which associated class-imbalance with Fisher-SDSMOTE-ESBoostSVM theory. Besides,it introduced the elimination strategy to delete the synthetic sample which was not classified accurately,after that synthesized the new positive class sample again and modified the sample weight. Finally,it selected the German credit dataset in the UCI database as the experimental dataset,and Fmeasure value and G-mean value as evaluation standard,comparing and analyzing the prediction result of Fisher-SDSMOTEESBoostSVM model and others ensemble learning algorithm. Experimental results show that the application of FisherSDSMOTE-ESBoostSVM algorithm to customer credit score prediction is feasible and applicable,and show a high level of accuracy,which proved that the algorithm has a certain practical application value.
引文
[1]张婷婷. Logistic回归及其相关方法在个人信用评分中的应用[D].太原:太原理工大学,2017.(Zhang Tingting. The application of logistic regression and related methods in personal credit scoring[D]. Taiyuan:Taiyuan University of Technology,2017.)
    [2]陆爱国,王珏,刘红卫.基于改进的SVM学习算法及其在信用评分中的应用[J].系统工程理论与实践,2012,32(3):515-521.(Lu Aiguo,Wang Yu,Liu Hongwei. An improved SVM learning algorithm and its applications to credit scorings[J]. Systems Engineering-Theory&Practice,2012,32(3):515-521.)
    [3]陈启伟,王伟,马迪,等.基于Ext-GBDT集成的类别不平衡信用评分模型[J].计算机应用研究,2018,35(2):421-427.(Chen Qiwei,Wang Wei,Ma Di,et al. Class-imbalance credit scoring using Ext-GBDT ensemble[J]. Application Research of Computers,2018,35(2):421-427.)
    [4] Herrera F. On the use of map reduce for imbalanced big data using random forest[J]. Information Sciences,2014,285(3):112-137.
    [5] Blake C L,Merz C J. UCI Repository of machine learning databases[D]. Irvine,CA:University of California,1998.
    [6] Chawla N V,Bowyer K W,Hall L O,et al. SMOTE:synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research,2002,16(1):321-357.
    [7] Han Hui,Wang Wenyuan,Mao Binghuan. Borderline-SMOTE:a new oversampling method in imbalanced data sets learning[C]//Proc of International Conference on Intelligent Computing. Berlin:Springer,2005:878-887.
    [8] Nakamura M,Kajiwara Y,Otsuka A,et al. LVQ-SMOTE-learning vector quantization based synthetic minority over-sampling technique for biomedical data[J]. Bio Data Mining,2013,6(1):16.
    [9]郭明玮,赵宇宙,项俊平,等.基于支持向量机的目标检测算法综述[J].控制与决策,2014,29(2):193-200.(Guo Mingwei,Zhao Yuzhou,Xiang Junping,et al. Review of object detection methods based on SVM[J]. Control and Decision,2014,29(2):193-200.)
    [10]徐乾,王文剑,张文浩.处理非平衡数据的粒度SVM学习方法[J].计算机工程与应用,2011,47(24):97-99,114.(Xu Qian,Wang Wenjian,Zhang Wenhao. Granular support vector machine approach used for imbalanced data[J]. Computer Engineering and Applications,2011,47(24):97-99,114.)
    [11]李诒靖,郭海湘,李亚楠,等.一种基于Boosting的集成学习算法在不均衡数据中的分类[J].系统工程理论与实践,2016,36(1):189-199.(Li Yijing,Guo Haixiang,Li Yanan,et al. A Boosting based ensemble learning algorithm in imbalanced data classification[J]. Systems Engineering-Theory&Practice,2016,36(1):189-199.)
    [12]李雄飞,李军,董元方,等.一种新的不平衡数据学习算法PCBoost[J].计算机学报,2012,35(2):2202-2209.(Li Xiongfei,Li Jun,Dong Yuanfang,et al. A new learning algorithm for imbalanced dataPCBoost[J]. Chinese Journal of Computers,2012,35(2):2202-2209.)
    [13]黄海松,魏建安,康佩栋.基于不平衡数据样本特性的新型过采样SVM分类算法[J].控制与决策,2018,33(9):1549-1558.(HuangHaisong,Wei Jian’an,Kang Peidong. New over-sampling SVM classification algorithm based on unbalanced data sample characteristics[J]. Control and Decision,2018,33(9):1549-1558.)
    [14]赵清华,张艺豪,马建芬,等.改进SMOTE的非平衡数据集分类算法研究[J].计算机工程与应用,2018,54(18):168-173.(Zhao Qinghua,Zhang Yihao,Ma Jianfen,et al. Research on classification algorithm of imbalanced datasets based on improved SMOTE[J].Computer Engineering and Applications,2018,54(18):168-173.)
    [15]周绍磊,廖剑,史贤俊.基于Fisher准则和最大熵原理的SVM核参数选择方法[J].控制与决策,2014,29(11):1991-1996.(Zhou Shaolei,Liao Jian,Shi Xianjun. SVM parameters selection method based on Fisher criterion and maximum entropy principle[J]. Control and Decision,2014,29(11):1991-1996.)
    [16]古平,欧阳源遊.基于混合采样的非平衡数据集分类研究[J].计算机应用研究,2015,32(2):379-381,418.(Gu Ping,Ouyang Yuanyou. Classification research for unbalanced data based on mixedsampling[J]. Application Research of Computers,2015,32(2):379-381,418.)
    [17]陶新民,郝思媛,张冬雪,等.基于样本特性欠取样的不均衡支持向量机[J].控制与决策,2013,28(7):978-984.(Tao Xinmin,Hao Siyuan,Zhang Xuedong,et al. Support vector machine for unbalanced data based on sample properties under-sampling approaches[J].Control and Decision,2013,28(7):978-984.)
    [18]韩璐,韩立岩.正交支持向量机及其在信用评分中的应用[J].管理工程学报,2017,31(2):128-136.(Han Lu,Han Liyan. Orthogonal support vector machine and its application in credit scoring[J].Engineering Management,2017,31(2):128-136.)

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700