基于改进信息增益的ACO-WNB分类算法研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Research on ACO-WNB Classification Algorithm Based on Improved Information Gain
  • 作者:邱宁佳 ; 高鹏 ; 王鹏 ; 陶跃
  • 英文作者:QIU Ning-jia;GAO Peng;WANG Peng;TAO Yue;College of Computer Science and Technology,Changchun University of Science and Technology;
  • 关键词:朴素贝叶斯 ; 信息增益 ; 特征子集 ; 蚁群算法
  • 英文关键词:Naive Bayesian(NB);;Information gain(IG);;Feature subset;;Ant colony optimization(AVO)
  • 中文刊名:JSJZ
  • 英文刊名:Computer Simulation
  • 机构:长春理工大学计算机科学技术学院;
  • 出版日期:2019-01-15
  • 出版单位:计算机仿真
  • 年:2019
  • 期:v.36
  • 基金:吉林省科技发展计划重点科技攻关项目(20150204036GX);; 吉林省省级产业创新专项资金项目(2017C051)
  • 语种:中文;
  • 页:JSJZ201901061
  • 页数:5
  • CN:01
  • ISSN:11-3724/TP
  • 分类号:302-306
摘要
针对朴素贝叶斯分类算法对文本分类性能不高的问题,提出一种基于改进信息增益的ACO-WNB分类算法。首先,根据特征词在数据集中的词频分布情况加入调节因子,对特征词的贡献/干扰作用进行增强/抑制的调节,选择具有强区分度的特征形成特征子集,提高IG处理非均衡数据集的准确率。然后,将蚁群优化算法(ACO)和加权朴素贝叶斯模型相结合,利用ACO对权重进行迭代和全局寻优,生成ACO-WNB分类器,提高对文本数据的分类效率。使用典型新闻数据集将改进前后的算法对比分析,实验表明IG (可以有效去除冗余的高频特征,对非均衡数据集有更好的特征选择能力,ACO-WNB分类器具有更高的准确率,使得对实际的文本数据有更好的分类效率。
        Aiming at the problem that the textbook classification performance is not high for naive Bayesian classification algorithm,this paper presents an ACO-WNB classification algorithm based on improved information gain.First,the adjustment factor was added according to the word frequency distribution of the feature word in the data set,the contribution/disturbance effect of the feature word was enhanced/suppressed,and a feature-forming feature subset was selected for a strongly discriminant feature,to increase the accuracy of IG's processing of unbalanced data sets. Then,the ant colony optimization algorithm and the weighted naive Bayesian model were combined,and the weights were subjected to iterations and global optimization using ACO,tu generate ACO-WNB classifier and improve the classification efficiency of text data. The use of typical news data sets can improve the comparison of algorithms before and after. The experiments show that IG(can effectively remove redundant high frequency characteristics,and has better feature selection ability for unbalanced data sets; while ACO-WNB classifier has a higher accuracy,so that the actual text data has better classification efficiency.
引文
[1] B Tang,et al. A Bayesian Classification Approach Using ClassSpecific Features for Text Categorization[J]. IEEE Transactions on Knowledge&Data Engineering,2016,28(6):1602-1606.
    [2] Y Wang,et al. Novel feature selection method based on harmony search for email classification[J]. Knowledge-Based Systems,2015,73(1):311-323.
    [3]杨雷,等.改进的朴素贝叶斯算法在垃圾邮件过滤中的研究[J].通信学报,2017,38(4):140-148.
    [4]雷军程,黄同成,柳小文.一种基于权重的文本特征选择方法[J].计算机科学,2012,39(7):250-252.
    [5]李猛.关于垃圾邮件过滤中特征选择算法的研究[D].吉林大学,2016.
    [6] S B Kim,et al. Some Effective Techniques for Naive Bayes Text Classification[J]. IEEE Transactions on Knowledge&Data Engineering,2006,18(11):1457-1466.
    [7]毛临川,等.基于信息增益的最优组合因子Fisher判别法[J].计算机工程与应用,2016,52(19):94-96.
    [8]李学明,等.基于信息增益与信息熵的TFIDF算法[J].计算机工程,2012,38(8):37-40.
    [9]贾娴,刘培玉,公伟.基于改进属性加权的朴素贝叶斯入侵取证研究[J].计算机工程与应用,2013,49(7):81-84.
    [10]谢小军,陈光喜.基于多属性联合的朴素贝叶斯分类算法[J].计算机技术与发展,2016,26(12):77-81.
    [11]魏赟,陈元元.基于改进蚁群算法的云计算任务调度模型[J].计算机工程,2015,41(2):12-1.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700