基于混合神经网络的中文短文本分类模型
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Chinese short text classification model based on hybrid neural network
  • 作者:陈巧红 ; 王磊 ; 孙麒 ; 贾宇波
  • 英文作者:CHEN Qiaohong;WANG Lei;SUN Qi;JIA Yubo;School of Informatics Science and Technology, Zhejiang Sci-Tech University;
  • 关键词:卷积神经网络 ; 循环神经网络 ; 短文本分类 ; 特征表示 ; 注意力机制
  • 英文关键词:CNN;;RNN;;short text classification;;feature representation;;attention mechanism
  • 中文刊名:ZJSG
  • 英文刊名:Journal of Zhejiang Sci-Tech University(Natural Sciences Edition)
  • 机构:浙江理工大学信息学院;
  • 出版日期:2019-03-31 08:48
  • 出版单位:浙江理工大学学报(自然科学版)
  • 年:2019
  • 期:v.41
  • 基金:国家自然科学基金项目(51775513)
  • 语种:中文;
  • 页:ZJSG201904011
  • 页数:8
  • CN:04
  • ISSN:33-1338/TS
  • 分类号:101-108
摘要
针对已有算法中特征表示存在的稀疏问题以及文本高层特征提取效果不佳问题,提出了一种基于混合神经网络的中文短文本分类模型。该模型首先通过自定义筛选机制将文档以短语层和字符层进行特征词筛选;然后将卷积神经网络(CNN)和循环神经网络(RNN)相结合,提取文本高阶特征,并引入注意力机制优化高阶向量特征;最后将得到的高阶向量特征输入到全连接层得到分类结果。实验结果表明:该方法能有效提取出文档的短语层和字符层特征;与传统CNN、传统LSTM和CLSTM模型对比,二分类数据集上准确率分别提高10.36%、5.01%和2.39%,多分类数据集上准确率分别提高12.33%、4.16%和2.33%。
        In the existing algorithms,feature representation has the sparse problem and high-level extractin effect of texts is poor. Aiming at these problems, a Chinese short text classification model based on hybrid neural network was proposed. Firstly, the model screened the feature words at phrase level and character level through a self-defined the filtering mechanism. Then, convolutional neural network(CNN) and recurrent neural network(RNN) were combined to extract high-order features of texts, and the attention mechanism was introduced to optimize high-order vector features. Finally, the obtained high-order vector features were inputted into the full connection layer to obtain classification results. The experimental results showed that the proposed method could extract the features of phrase and character layers. Compared with CNN, LSTM and CLSTM models, the classification accuracy of the proposed model improveby 10.36%, 5.01% and 2.39% on binary dataset respectively, and the classification accuracyimproveby 12.33%, 4.16% and 2.33%on multiclass dataset respectively.
引文
[1] Sebastiani F.Machine learning in automated text categorization[J].ACM Computing Surveys,2001,34(1):1-47.
    [2] 王义真,郑啸,后盾,等.基于SVM的高维混合特征短文本情感分类[J].计算机技术与发展,2018,28(2),88-93
    [3] Kim Y.Convolutional neural networks for sentence classification[EB/OL].(2014-09-03) [2018-12-17].https://arxiv.org/abs/1408.5882.
    [4] Kalchbrenner N,Grefenstette E,Blunsom P.A convolutional neural network for modelling sentences[EB/OL].(2014-04-08) [2018-12-17].https://arxiv.org/abs/1404.2188.
    [5] 黄文明,莫阳.基于文本加权KNN算法的中文垃圾短信过滤[J].计算机工程,2017,43(3):193-199.
    [6] 黄磊,杜昌顺.基于递归神经网络的文本分类研究[J].北京化工大学学报(自然科学版),2017(1):100-106.
    [7] Wang C,Xu B.Convolutional neural network with word embeddings for chinese word segmentation[EB/OL].(2017-12-13) [2018-12-17].https://arxiv.org/abs/1711.04411.
    [8] Zhou Y,Xu B,Xu J,et al.Compositional recurrent neural networks for Chinese short text classification[C]//Web Intelligence (WI),2016 IEEE/WIC/ACM International Conference on.IEEE,2016:137-144.
    [9] Xu C Z,Liu D.Chinese text summarization algorithm based on Word2vec[C]//Journal of Physics:Conference Series.IOP Publishing,2018,976(1):012006.
    [10] Qu S,Xi Y,Ding S.Visual attention based on long-short term memory model for image caption generation[C]//Control And Decision Conference (CCDC),2017 29th Chinese.IEEE,2017:4789-4794.
    [11] Hochreiter S,Schmidhuber J.Long short-term memory[J].Neural computation,1997,9(8):1735-1780.
    [12] Xu K,Ba J,Kiros R,et al.Show,attend and tell:Neural image caption generation with visual attention[C]//International Conference on Machine Learning.Lille,France,2015:2048-2057.
    [13] Zhou Y,Xu B,Xu J,et al.Compositional recurrent neural networks for Chinese short text classification[C]//Web Intelligence (WI),2016 IEEE/WIC/ACM International Conference on.IEEE,2016:137-144.
    [14] Zhou C,Sun C,Liu Z,et al.A C-LSTM neural network for text classification[EB/OL].(2015-12-30) [2018-12-17].https://arxiv.org/abs/1511.08630.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700