基于间隔理论的过采样集成算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Over sampling ensemble algorithm based on margin theory
  • 作者:张宗堂 ; 陈喆 ; 戴卫国
  • 英文作者:ZHANG Zongtang;CHEN Zhe;DAI Weiguo;Navigation and Observation Department, Navy Submarine Academy;
  • 关键词:不平衡数据 ; 间隔理论 ; 过采样方法 ; 集成分类器 ; 机器学习
  • 英文关键词:imbalanced data;;margin theory;;over sampling method;;ensemble classifier;;machine learning
  • 中文刊名:JSJY
  • 英文刊名:Journal of Computer Applications
  • 机构:海军潜艇学院航海观通系;
  • 出版日期:2019-01-09 13:48
  • 出版单位:计算机应用
  • 年:2019
  • 期:v.39;No.345
  • 语种:中文;
  • 页:JSJY201905020
  • 页数:4
  • CN:05
  • ISSN:51-1307/TP
  • 分类号:124-127
摘要
针对传统集成算法不适用于不平衡数据分类的问题,提出基于间隔理论的AdaBoost算法(MOSBoost)。首先通过预训练得到原始样本的间隔;然后依据间隔排序对少类样本进行启发式复制,从而形成新的平衡样本集;最后将平衡样本集输入AdaBoost算法进行训练以得到最终集成分类器。在UCI数据集上进行测试实验,利用F-measure和G-mean两个准则对MOSBoost、AdaBoost、随机过采样AdaBoost(ROSBoost)和随机降采样AdaBoost(RDSBoost)四种算法进行评价。实验结果表明,MOSBoost算法分类性能优于其他三种算法,其中,相对于AdaBoost算法,MOSBoost算法在F-measure和G-mean准则下分别提升了8.4%和6.2%。
        In order to solve the problem that traditional ensemble algorithms are not suitable for imbalanced data classification, Over Sampling AdaBoost based on Margin theory(MOSBoost) was proposed. Firstly, the margins of original samples were obtained by pre-training. Then, the minority class samples were heuristic duplicated by margin sorting thus forming a new balanced sample set. Finally, the finall ensemble classifier was obtained by the trained AdaBoost with the balanced sample set as the input. In the experiment on UCI dataset, F-measure and G-mean were used to evaluate MOSBoost, AdaBoost, Random OverSampling AdaBoost(ROSBoost) and Random UnderSampling AdaBoost(RDSBoost). The experimental results show that MOSBoost is superior to other three algorithm. Compared with AdaBoost, MOSBoost improves 8.4% and 6.2% respctively under F-measure and G-mean criteria.
引文
[1] DAI H L.Class imbalance learning via a fuuzy total margin based support vector machine[J].Applied Soft Computing,2015,31(C):172-184.
    [2] 谭洁帆,朱焱,陈同孝,等.基于卷积神经网络和代价敏感的不平衡图像分类方法[J].计算机应用,2018,38(7):1862-1865,1871.(TAN J F,ZHU Y,CHEN T X,et al.Imbalanced image classification approach based on convolution network and cost-sensitivity[J].Journal of Computer Applications,2018,38(7):1862-1865,1871.)
    [3] WANG S,YAO X.Using class imbalance learning for software defect prediction[J].IEEE Transactions on Reliability,2013,62(2):434-443.
    [4] OZCIFT A,GULTEN A.Classifer ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms[J].Computer Methods and Programs in Biomedicine,2011,104(3):443-451.
    [5] YU H,NI J,ZHAO J.ACOSampling:an ant colony optimization-based undersampling method for classifying imbalanced DNA microarray data[J].Neurocomputing,2013,101:309-318.
    [6] TOMEK I.Two modifications of CNN[J].IEEE Transactions on Systems,Man and Cybernetics,1976,SMC-6(11):769-772.
    [7] KUBAT M,MATWIN S.Addressing the curse of imbalanced training sets:one-sided selection[C]// Proceedings of the 14th International Conference on Machine Learning.San Francisco:Morgan Kaufmann,1997:179-186.
    [8] LAURIKKALA J.Improving identification of difficult small classes by balancing class distribution[C]// Proceedings of the 8th Conference on Artificial Intelligence in Medicine in Europe.Berlin:Springer,2001:63-66.
    [9] CHAWLA N,BOWYER K,HALL L,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16(1):321-357.
    [10] RIVERA W A.Noise reduction a priori synthetic over-sampling for class imbalanced data sets[J].Information Sciences,2017,408(C):146-161.
    [11] MA L,FAN S.CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests [J].BMC Bioinformatics,2017,18(1):169.
    [12] BOROWSKA,K,STEPANIUK J.Imbalanced data classification:a novel re-sampling approach combining versatile improved SMOTE and rough sets[C]// CISIM 2016:IFIP International Conference on Computer Information Systems and Industrial Management.Berlin:Springer,2016:31-42.
    [13] BAIG M M,AWAIS M M,EL-ALFY E S M.AdaBoost-based artificial neural network learning[J].Neurocomputing,2017,248(C):120-126.
    [14] MINZ A,MAHOBIYA C.MR image classification using Adaboost for brain tumor type[C]// Proceedings of the 2017 IEEE 7th International Advance Computing Conference.Washington,DC:IEEE Computer Society,2017:701-705.
    [15] 王军,费凯,程勇.基于改进的Adaboost-BP模型在降水中的预测[J].计算机应用,2017,37(9):2689-2693.(WANG J,FEI K,CHENG Y.Prediction of rainfall based on improved Adaboost-BP model[J].Journal of Computer Applications,2017,37(9):2689-2693.)
    [16] SCHAPIRE R E,FREUND Y,BARTLETT P,et al.Boosting the margin:a new explanation for the effectiveness of voting methods[J].Annals of Statistics,1998,26(5):1651-1686.
    [17] GAO W,ZHOU Z H.On the doubt about margin explanation of boosting[J].Artificial Intelligence,2013,203:1-18.
    [18] BACHE K,LICHMAN M.UCI repository of machine learning databases[DB/OL].[2018- 06- 20].http://www.ics.uci.edu/~mlearn/MLRepository.html.
    [19] van HULSE J,KHOSHGOFTAAR T M,NAPOLITANO A.Expertimental perspectives on learning from imbalanced data[C]// Proceedings of the 24th International Conference on Machine Learing.New York:ACM,2007:935-942.
    [20] LIU N,WEI L W,AUNG Z.Handling class imbalance in customer behavior prediction[C]// Proceedings of the 2014 International Conference on Collaboration Technologies and Systems.Piscataway,NJ:IEEE,2014:100-103.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700