超球支持向量机文本分类方法改进
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Study on Improvement of Text Classification Using HS-SVM
  • 作者:胡吉明 ; 陈果
  • 英文作者:Hu Jiming;Chen Guo;Center for Studies of Information Resources, Wuhan University;
  • 关键词:LDA主题模型 ; 超球支持向量机 ; 增量学习 ; 密度决策函数
  • 英文关键词:LDA topic model;;Hyper-Sphere Support Vector Machine(HS-SVM);;Incremental learning;;Intensive degree decision function
  • 中文刊名:XDTQ
  • 英文刊名:New Technology of Library and Information Service
  • 机构:武汉大学信息资源研究中心;
  • 出版日期:2014-09-25
  • 出版单位:现代图书情报技术
  • 年:2014
  • 期:No.250
  • 基金:教育部人文社会科学青年基金项目“社会网络环境下信息内容主题挖掘与语义分类研究”(项目编号:13YJC870008);; 国家自然科学基金青年基金项目“社会网络环境下基于用户–资源关联的信息推荐研究(项目编号:71303178)的研究成果之一
  • 语种:中文;
  • 页:XDTQ201409012
  • 页数:7
  • CN:09
  • ISSN:11-2856/G2
  • 分类号:79-85
摘要
【目的】针对文本分类中类别特征向量改变和重叠等问题,对超球支持向量机(HS-SVM)分类算法进行改进。【方法】基于增量学习和密度决策函数对原始HS-SVM进行改进,实现超球类支持向量的动态改变,准确计算构造超球支持向量机的决策函数,从而达到提高文本分类效果的目的。【结果】与原始超球支持向量机的文本分类实验对比表明,本文所提方法在准确率和召回率方面优于其他方案,建模时间减少且对预测精确度的影响不大。【局限】应进行多种类型数据集上的实验验证,推广方法改进的适用性;其次对分类算法的底层改进欠缺,需继续探索。【结论】本研究有利于提高大规模文本分类的准确性和减少训练时间,从而提升文本分类效果。
        [Objective] In terms of the class features vector changing and overlapping, this paper improves the classification algorithm conducted by super ball supported vector machine. [Methods] Starting from combing the operational mechanism of LDA and HS-SVM, as well as the related studies, this paper constructs a text classification model based on LDA and HS-SVM. The traditional HS-SVM is improved considering incremental learning and intensive degree, and then the dynamic change of hyper-sphere class' support vector would be achieved and the decision function for constructing hyper-sphere support vector machine would be accurately calculated. [Results] The effect of text classification can be improved from the perspectives of precision rate and recall rate. Comparative experiments are conducted and the results demonstrate that methods in this article are feasible and effective which can effectively improve texts classification. In addition, this method reduces the time of modeling and has little influence on accuracy of predication. [Limitations] Noted that the proposal in this paper is comparatively more complex than the original algorithm that need continuous improvement; and the results needs experiments on more data sets. Meanwhile, the improvement on essence of algorithm is not optimal which is necessary to be further studied. [Conclusions] This study is helpful to improve the accuracy and reduce the training time in large-scale text categorization, and also improve the efficiency and performance of text classification.
引文
[1]Blei D M,Ng A Y,Jordan M I.Latent Dirichlet Allocation[J].The Journal of Machine Learning Research,2003,3:993-1022.
    [2]张玉峰,何超.基于潜在语义分析和改进的HS-SVM的文本分类模型研究[J].图书情报工作,2010,54(10):109-113.(Zhang Yufeng,He Chao.Research of Text Classification Model Based on Latent Semantic Analysis and Improved of HS-SVM[J].Library and Information Service,2010,54(10):109-113.)
    [3]Lakshminarayanan B,Raich R.Inference in Supervised Latent Dirichlet Allocation[C].In:Proceedings of the 21st IEEE International Workshop on Machine Learning for Signal Processing(MLSP).2011.
    [4]Momtazi S,Naumann F.Topic Modeling for Expert Finding Using Latent Dirichlet Allocation[J].Wiley Interdisciplinary Reviews:Data Mining and Knowledge Discovery,2013,3(5):346-353.
    [5]Du L,Buntine W,Jin H,et al.Sequential Latent Dirichlet Allocation[J].Knowledge and Information Systems,2012,31(3):475-503.
    [6]Guo Q,Li N,Yang Y,et al.Supervised LDA for Image Annotation[C].In:Proceedings of IEEE International Conference on Systems,Man and Cybernetics(SMC).2011:471-476.
    [7]Tsang I W,Kocsor A,Kwok J T.Simpler Core Vector Machines with Enclosing Balls[C].In:Proceedings of the24th International Conference on Machine Learning.New York,NY,USA:ACM,2007:911-918.
    [8]Strack R,Kecman V,Strack B,et al.Sphere Support Vector Machines for Large Classification Tasks[J].Neurocomputing,2013,101:59-67.
    [9]Chau A L,Li X O,Yu W.Large Data Sets Classification Using Convex-Concave Hull and Support Vector Machine[J].Soft Computing,2013,17(5):793-804.
    [10]Yun S W,Shu Y X,Ge B.An Algorithm of Sphere-Structure Support Vector Machine Multi-classification Recognition on the Basis of Weighted Relative Distances[C].In:Proceedings of the International Conference on Life System Modeling and Simulation/International Conference on Intelligent Computing for Sustainable Energy and Environment.Berlin:Springer,2010:506-514.
    [11]艾青,秦玉平,李迎春.基于超球支持向量机的多主题文本分类算法[J].计算机工程与设计,2010,31(10):2273-2275,2279.(Ai Qing,Qin Yuping,Li Yingchun.Multi-subjects Text Classification Algorithm Based on Hyper-Sphere Support Vector Machines[J].Computer Engineering and Design,2010,31(10):2273-2275,2279.)
    [12]王德成,林辉.一种SVM不平衡分类方法及在故障诊断的应用[J].电机与控制学报,2012,16(9):48-52.(Wang Decheng,Lin Hui.Imbalanced Pattern Classification Method Based on Support Vector Machine and Its Application on Fault Diagnosis[J].Electric Machines and Control,2012,16(9):48-52.)
    [13]蒋华,戚玉顺.基于球结构支持向量机的多标签分类的主动学习[J].计算机应用,2012,32(5):1359-1361.(Jiang Hua,Qi Yushun.Active Learning for Multi-label Classification Based on Sphere Structured Support Vector Machine[J].Journal of Computer Applications,2012,32(5):1359-1361.)
    [14]蒋华,戚玉顺.基于球结构SVM的多标签分类[J].计算机工程,2013,39(1):294-297.(Jiang Hua,Qi Yushun.Multi-label Classification Based on Sphere Structured SVM[J].Computer Engineering,2013,39(1):294-297.)
    [15]He Y H,Zhang K L.Support Vector Machines Based on Hyper-ball Clustering[C].In:Proceedings of the International Conference on Machine Learning and Cybernetics,2008:840-844.
    [16]Liu S,Shi G Y.Weighted Hyper-sphere SVM for Hypertext Classification[C].In:Proceedings of the 5th International Symposium on Neural Networks:Advances in Neural Networks.Springer:Lecture Notes in Computer Science,2008,5263:733-740.
    [17]Han F,Li H,Wen C,et al.A New Incremental Support Vector Machine Algorithm[J].Journal of Electrical Engineering,2012,10(6):1171-1178.
    [18]Cauwenberghs G,Poggio T.Incremental and Decremental Support Vector Machine Learning[C].In:Proceedings of the14th Annual Neural Information Processing Systems Conference(NIPS).MIT Press:Advances in Neural Information Processing Systems,2001,13:409-415.
    [19]Khreich W,Grangera E,Mirib A,et al.A Survey of Techniques for Incremental Learning of HMM Parameters[J].Information Sciences,2012,197:105-130.
    [20]刘爽,陈鹏.改进的超球支持向量机算法[J].计算机工程与应用,2009,45(16):149-151.(Liu Shuang,Chen Peng.Improved Hyper-Sphere Support Vector Machine[J].Computer Engineering and Applications,2009,45(16):149-151.)
    [21]谭松波,王月粉.中文文本分类语料库-TanCorpV1.0[EB/OL].[2013-09-10].http://www.searchforum.org.cn/tansongbo/corpus.htm.(Tan Songbo,Wang Yuefen.The Corpus of Chinese Text Classification-TanCorpV1.0[EB/OL].[2013-09-10].http://www.searchforum.org.cn/tansongbo/corpus.htm.)
    [22]中国科学院计算技术研究所.ICTCLAS2011[EB/OL].[2013-09-21].http://ictclas.org/ictclas_download.aspx.(Institute of Computing Technology Chinese Academy of Sciences.ICTCLAS2011[EB/OL].[2013-09-21].http://ictclas.org/ictclas_download.aspx.)
    [23]Tsang I W,Kocsor A,Kwok J T.LibCVM Toolkit Version:2.2(beta)[EB/OL].[2011-08-29].http://c2inet.sce.ntu.edu.sg/ivor/cvm.html.
    [24]Yild1r1m E A.Two Algorithms for the Minimum Enclosing Ball Problem[J].SIAM Journal on Optimization,2008,19(3):1368-1391.
    [25]Sebastiani F.Machine Learning in Automated Text Categorization[J].ACM Computing Surveys,2002,34(1):1-47.
    [26]Mobasher B,Dai H,Luo T,et al.Discovery and Evaluation of Aggregate Usage Profiles for Web Personalization[J].Data Mining and Knowledge Discovery,2002,6(1):61-82.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700