基于流形正则化极限学习机的文本分类算法研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Research on Text Classification Algorithm Based on Manifold Regularization Extreme Learning Machine
  • 作者:庞皓明 ; 冀俊忠 ; 刘金铎 ; 姚垚
  • 英文作者:PANG Haoming;JI Junzhong;LIU Jinduo;YAO Yao;Beijing Key Laboratory of Multimedia and Intelligent Software Technology,Beijing University of Technology;
  • 关键词:文本分类 ; 监督学习 ; 正则化极限学习机 ; 流形正则化 ; 特征映射
  • 英文关键词:text classification;;supervise learning;;Regularization Extreme Learning Machine(RELM);;manifold regularization;;feature mapping
  • 中文刊名:JSJC
  • 英文刊名:Computer Engineering
  • 机构:北京工业大学多媒体与智能软件技术北京市重点实验室;
  • 出版日期:2018-06-22 16:21
  • 出版单位:计算机工程
  • 年:2019
  • 期:v.45;No.501
  • 基金:国家自然科学基金(61375059,61672065)
  • 语种:中文;
  • 页:JSJC201906039
  • 页数:7
  • CN:06
  • ISSN:31-1289/TP
  • 分类号:248-254
摘要
基于极限学习机的文本分类方法在对输入的文本特征进行随机映射时,会呈现一种非线性的几何结构,利用最小二乘法无法对其进行求解,影响文本的分类性能。为此,引入一种新的流形正则化思想,提出基于极限学习机的改进算法。利用拉普拉斯特征映射保持输入文本特征的几何结构。基于样本的类别信息对样本点之间的距离进行修正,优先选择类别相同的样本点,以改善分类性能。在Reuters和20newsgroup数据集上的实验结果表明,与正则化极限学习机算法、AdaBELM算法等相比,该算法分类性能较好,F1-measure值可达91.42%。
        In the text classification process,the Extreme Learning Machine(ELM) randomly maps the input text features and presents a nonlinear geometric structure.As a result,the least square method cannot solve such nonlinear structures and thus affects the text classification performance.To solve this problem,this paper introduces a new manifold regularization and presents an improved algorithm based on extreme machine learning.The Laplace feature mapping is used to preserve the geometry of input text features.The distance between sample points is modified based on the category information of the sample,and the sample points with the same category are selected first to improve the classification performance.Experimental results on the datasets of Reuters and 20 newsgroup show that,compared with the Regularization Extreme Learning Machine(RELM),AdaBELM and other algorithms,the proposed algorithm has better classification performance,and the F1-measure can reach 91.42%.
引文
[1] JIANG Liangxiao,WANG Dianhong,CAI Zhihua.Discriminatively weighted naive bayes and its application in text classification[J].International Journal on Artificial Intelligence Tools,2012,21(1):1-19.
    [2] KENEKAYORO P,BUCKLEY K,THELWALL M.Automatic classification of academic Web page types[J].Scientometrics,2014,101(2):1015-1026.
    [3] LILLEBERG J,ZHU Yun,ZHANG Yanqing.Support vector machines and word2vec for text classification with semantic features[C]//Proceedings of the 14th International Conference on Cognitive Informatics and Cognitive Computing.Washington D.C.,USA:IEEE Press,2015:136-140.
    [4] 杨帅华,张清华.粗糙集近似集的KNN文本分类算法研究[J].小型微型计算机系统,2017,38(10):2192-2196.
    [5] SRIVASTAVA N,SALAKHUTDINOV R R,HINTON G E.Modeling documents with deep boltzmann machines[EB/OL].[2018-02-25].https://arxiv.org/ftp/arxiv/papers/13 09/1309.6865.pdf.
    [6] LAI Siwei,XU Liheng,LIU Kang,et al.Recurrent convolutional neural networks for text classification[C]//Proceedings of the 29th AAAI Conference on Artificial Intelligence.Palo Alto,USA:Association for the Advance of Artificial Intelligence,2015:2267-2273.
    [7] ZHOU Peng,QI Zhenyu,ZHENG Suncong,et al.Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling[EB/OL].[2018-02-25].https://arxiv.org/pdf/1611.06639.pdf.
    [8] CONNEAU A,SCHWENK H,BARRAULT L,et al.Very deep convolutional networks for text classification[EB/OL].[2018-02-25].https://arxiv.org/pdf/160 6.01781.pdf.
    [9] HUANG Guangbin,ZHU Qinyu,SIEW C K.Extreme learning machine:theory and applications[J].Neurocomputing,2006,70(1/2/3):489-501.
    [10] GURPINAR F,KAYA H,DIBEKLIOGLU H,et al.Kernel ELM and CNN based facial age estimation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops.Washington D.C.,USA:IEEE Press,2016:80-86.
    [11] BAI Peng,LIU Huaping,SUN Fuchun,et al.Robotic grasp stability analysis using extreme learning machine[C]//Proceedings of ELM’16.Berlin,Germany:Springer,2016:37-51.
    [12] YANG Chenguang,HUANG Kunxia,CHENG Hong,et al.Haptic identification by ELM-controlled uncertain manipulator[J].IEEE Transactions on Systems,Man,and Cybernetics:Systems,2017,47(8):2398-2409.
    [13] ZHENG Wenbin,QIAN Yuntao,LU Huijuan.Text categorization based on regularization extreme learning machine[J].Neural Computing and Applications,2013,22(3/4):447-456.
    [14] ROUL R K,NANDA A,PATEL V,et al.Extreme learning machines in the field of text classification[C]//Proceedings of the 16th IEEE/ACIS International Conference on Software Engineering,Artificial Intelligence,Networking and Parallel/Distributed Computing.Washington D.C.,USA:IEEE Press,2015:1-7.
    [15] FENG Xiaoyue,LIANG Yanchun,SHI Xiaohu,et al.Overfitting reduction of text classification based on AdaBELM[J].Entropy,2017,19(7):1-13.
    [16] SEUNG H S,LEE D D.The manifold ways of perception[J].Science,2000,290(5500):2268-2269.
    [17] TENENBAUM J B,DE SILVA V,LANGFORD J C.A global geometric framework for nonlinear dimensionality reduction[J].Science,2000,290(5500):2319-2323.
    [18] ROWEIS S T,SAUL L K.Nonlinear dimensionality reduction by locally linear embedding[J].Science,2000,290(5500):2323-2326.
    [19] 徐嘉明,张卫强,杨登舟,等.基于流形正则化极限学习机的语种识别系统[J].自动化学报,2015,41(9):1680-1685.
    [20] 李冬辉,闫振林,姚乐乐,等.基于改进流形正则化极限学习机的短期电力负荷预测[J].高电压技术,2016,42(7):2092-2099.
    [21] TOMAR V S,ROSE R C.Manifold regularized deep neural networks[C]//Proceedings of the 15th Annual Conference of the International Speech Communication Association.Grenoble,France:International Speech Communication Association,2014:348-352.
    [22] GUAN Naiyang,TAO Dacheng,LUO Zhigang,et al.Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent[J].IEEE Transactions on Image Processing,2011,20(7):2030-2048.
    [23] JIANG Mingyang,LIANG Yanchun,FENG Xiaoyue,et al.Text classification based on deep belief network and softmax regression[J].Neural Computing and Applications,2018,29(1):61-70.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700