摘要
支持向量机在解决非线性及高维模式识别中表现出许多特有的优势,被广泛应用在文本分类领域。但是其核函数及其参数的选择对分类效果具有很大的影响,单一核函数难以很好地解决文本分类问题。因此,本文选取了三个常用的核函数进行两两组合,利用加权组合核的形式来弥补单核自身特点可能带来的缺点,然后利用人工免疫算法(Immune algorithm,IA)对组合核进行参数寻优,以提高文本分类效果。实验分析证明,此方法有效。
Support vector machine is a machine learning method based on the theory of statistical thinking,and it has been widely used in the field of text classification.The choice of kernel function and its parameters has great influence on the result of classification.Currently there are many kernel functions can be applied.In view of the disadvantages caused by using a single core,this paper selects three kernel functions which are commonly used for pairwise combination.By making use of artificial immune algorithm(IA),the combined kernel for the parameter is optimized and better performance in text categorization is achieved.The effectiveness of the method is demonstrated by experiments.
引文
[1]钟将,刘荣辉.一种改进的KNN文本分类[J].计算机工程与应用,2012(2):142-144.
[2]梁声灼,谢文修,李芒.一种改进的1-v-1SVM多类分类算法[J].南昌大学学报(理科版),2013(3):287-289+300.
[3]张华鑫,庞建刚.基于SVM和KNN的文本分类研究[J].现代情报,2015(5):73-77.
[4]奉国和.SVM分类核函数及参数选择比较[J].计算机工程与应用,2011(3):123-124+128.
[5]拓守恒.基于改进PSO的SVM文本分类研究[J].电脑开发与应用,2010(10):3-5+8.
[6]陈桂林,王生光,徐静妹,等.基于GA和组合核的SVM入侵检测算法[J].计算机技术与发展,2015(2):148-151.
[7] KE X,CUI W,QIONG Y,HE X Z,JUN T.A MapReduce based Parallel SVM for Email Classification[J].Journal of Networks,2014,96:.
[8]巩知乐,张德贤,胡明明.一种改进的支持向量机的文本分类算法[J].计算机仿真,2009(7):164-167.
[9]陈海红.多核SVM文本分类研究[J].软件,2015(5):7-10.
[10]NGUYEN,VU T,HUY,HUYNH N K,TAI,PHAN T,Hung,HOANG A.Improving Multi-class Text Classification Method Combined the SVM Classifier with OAO and DDAG Strategies[J].Journal of Convergence Information Technology,2015:102.
[11]黄爱华,蒲洪彬,李伟光,等.基于人工免疫机理和LSSVM的顾客需求重要度预测[J].华南理工大学学报(自然科学版),2013(1):89-94.
[12]崔建明,刘建明,廖周宇.基于SVM算法的文本分类技术研究[J].计算机仿真,2013(2):299-302+368.
[13]RONG H,BRIAN M N,SARAH J D.Active Learning for Text Classification with Reusability[J].Expert Systems With Applications,2016:45.