面向非均衡数据类的朴素贝叶斯改进算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:An improved naive Bayesian algorithm for unbalanced data classes
  • 作者:谭志 ; 侯涛文
  • 英文作者:TAN Zhi;HOU Taowen;Beijing University of Civil Engineering and Architecture;
  • 关键词:朴素贝叶斯 ; 监督学习 ; 感受性曲线 ; 非均衡样本 ; 深度特征加权 ; 数据挖掘
  • 英文关键词:naive Bayesian;;supervised learning;;receiver operating characteristic curve;;unbalanced sample;;deep feature weighting;;data mining
  • 中文刊名:XDDJ
  • 英文刊名:Modern Electronics Technique
  • 机构:北京建筑大学;
  • 出版日期:2019-04-29 14:05
  • 出版单位:现代电子技术
  • 年:2019
  • 期:v.42;No.536
  • 基金:省部级重点实验室开放基金项目:基于波束形成的小型风力机气动噪声识别研究(201605);; 院级自然科学基金资助项目:汽车的振动与噪声测试(NJDZJ1622)~~
  • 语种:中文;
  • 页:XDDJ201909029
  • 页数:5
  • CN:09
  • ISSN:61-1224/TN
  • 分类号:126-130
摘要
针对朴素贝叶斯分类器存在对非均衡样本分类时,易将少数类样本分到多数类的问题,利用感受性曲线的性质和深度特征加权的思想,提出一种面向非均衡数据类的朴素贝叶斯加权算法(DA-WNB)。为了验证该算法对不平衡数据分类的有效性,实验结果以AUC、真正类率、整体精度为指标,仿真结果表明,该算法能提高少数类分类准确率(最高达60%),且能保持较高的整体精度。
        Naive Bayesian classifier is easy to divide minority-class samples into majority class samples while classifying unbalanced samples. In view of this phenomenon,an deep AUC(area under curve) weighted naive Bayesian(DA-WNB)algorithm for unbalanced data classes is proposed,which is based on property of receiver operating characteristic curve and thought of deep feature weighting. In order to verify the effectiveness of the algorithm for unbalanced data classification,the AUC,true positive rate(TPR)and overall accuracy are taken as the indicators for experiments. The simulation results show that the algorithm can improve the minority-class classification accuracy highest to 60%,and can maintain the high overall accuracy.
引文
[1]LEE C H,GUTIERREZ F,DOU D.Calculating feature weights in naive Bayes with Kullback-Leibler measure[C]//2011 IEEE International Conference on Data Mining.Vancouver:IEEE,2011:1146-1151.
    [2]WU J,CAI Z.Attribute weighting via differential evolution algorithm for attribute weighted naive Bayes(WNB)[J].Journal of computational information systems,2011,7(5):1672-1679.
    [3]WANG X,SUN X.An improved weighted naive Bayesian classification algorithm based on multivariable linear regression model[C]//2017 International Symposium on Computational Intelligence and Design.Hangzhou:IEEE,2017:219-222.
    [4]KRAWCZYK B.Learning from imbalanced data:open challenges and future directions[J].Progress in artificial intelligence,2016,5(4):1-12.
    [5]LEE J S,ZHU D.When costs are unequal and unknown:a subtree grafting approach for unbalanced data classification[J].Decision sciences,2011,42(4):803-829.
    [6]PRATI R C,BATISTA G E A P A,SILVA D F.Class imbalance revisited:a new experimental setup to assess the performance of treatment methods[J].Knowledge&information systems,2015,45(1):1-24.
    [7]GALAR M,FERNANDEZ A,BARRENECHEA E,et al.A review on ensembles for the class imbalance problem:bagging,boosting,and hybrid-based approaches[J].IEEE transactions on systems man&cybernetics Part C,2012,42(4):463-484.
    [8]KRAWCZYK B,WONIAK M,SCHAEFER G.Cost-sensitive decision tree ensembles for effective imbalanced classification[J].Applied soft computing,2014,14(1):554-562.
    [9]YAN Q,XIA S,MENG F.Optimizing cost-sensitive SVM for imbalanced data:connecting cluster to classification[EB/OL].[2017-11-05].https://arxiv.org/pdf/1702.01504.pdf.
    [10]KIM T,CHUNG B D,LEE J S.Incorporating receiver operating characteristics into naive Bayes for unbalanced data classification[J].Computing,2016(3):1-16.
    [11]KRUPINSKI E A.Receiver operating characteristic(ROC)analysis[J].Frontline learning research,2017,5(3):31-42.
    [12]JIANG L,LI C,WANG S,et al.Deep feature weighting for naive Bayes and its application to text classification[J].Engineering applications of artificial intelligence,2016,52(C):26-39.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700