融合主题的CLSTM短文本情感分类
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Topic-based Contextual LSTM for Short-text Sentiment Classification
  • 作者:秦锋 ; 黄超 ; 郑啸 ; 邵光梅
  • 英文作者:QIN Feng;HUANG Chao;ZHENG Xiao;SHAO Guangmei;College of Computer Science and Technology, Anhui University of Technology;
  • 关键词:主题 ; 滑动窗口 ; 上下文 ; 长短期记忆模型 ; 情感分类
  • 英文关键词:topic;;sliding window;;context;;LSTM;;sentiment classification
  • 中文刊名:HDYX
  • 英文刊名:Journal of Anhui University of Technology(Natural Science)
  • 机构:安徽工业大学计算机科学与技术学院;
  • 出版日期:2017-07-15
  • 出版单位:安徽工业大学学报(自然科学版)
  • 年:2017
  • 期:v.34;No.135
  • 基金:国家自然科学基金项目(61402008,61402009);; 安徽省科技重大专项(16030901060);; 安徽省高校自然科学研究重大项目(KJ2014ZD05);; 安徽省高校优秀青年人才支持计划
  • 语种:中文;
  • 页:HDYX201703014
  • 页数:7
  • CN:03
  • ISSN:34-1254/N
  • 分类号:84-90
摘要
针对短文本简短的特性,为提高对其进行情感分类准确率,提出了T-CLSTM(Topic-based Context CLSTM)模型。该模型通过LDA模型生成词主题向量,并构建滑动窗口词主题上下文和层次词主题上下文,实现短文本信息扩展。探讨词主题、词主题上下文的构成,以及滑动窗口尺寸对词主题上下文的影响;将词向量和词主题上下文向量作为输入特征量训练分类模型,进行情感分类。在COAE2014语料上进行实验,结果表明,本文提出的模型分类准确率可达92.3%,相比baseline算法SVM和LSTM分别提高2%和4%。
        In order to improve the accuracy of sentiment classification of short text, a T-CLSTM model was proposed to according to its characteristic. The model generates word topic vectors with LDA model, and constructs sliding window word topic context and hierarchical word topic context to extend the short text information. The composition of word topic, word topic context and the effect of the sliding window size on the topic context were discussed. The word vector and word topic context vectors are used as input features to train models for sentiment classification. Experimental results on the COAE2014 corpus show that the proposed model can obtain 92.3%accuracy, which is 2% and 4% higher than that of baseline algorithms SVM and LSTM.
引文
[1]LU T J.Semi-supervised microblog sentiment analysis using social relation and text similarity[C]//Big Data and Smart Computing(Big Comp),2015 International Conference on IEEE.2015:194-201.
    [2]苏艳,居胜峰,王中卿,等.基于随机特征子空间的半监督情感分类方法研究[J].中文信息学报,2012,26(4):85-91.
    [3]高伟,王中卿,李寿山.基于集成学习的半监督情感分类方法研究[J].中文信息学报,2013,03:120-126.
    [4]LI C,WU H,JIN Q.Emotion classification of Chinese microblog text via fusion of Bo W and evector feature representations[C]//Natural Language Processing and Chinese Computing.Communications in Computer and Information Science.Berlin,Heidelberg:Springer,2014(496):217-228.
    [6]WANG Y,LI Z,LIU J,et al.Word vector modeling for sentiment analysis of product reviews[C]//Natural Language Processing and Chinese Computing.Communications in Computer and Information Science.Berlin,Heidelberg:Springer,2014(496):168-180.
    [7]梁军,柴玉梅,原慧斌,等.基于深度学习的微博情感分析[J].中文信息学报,2014,28(5):155-161.
    [8]XUE B,FU C,ZHAN S.A study on sentiment computing and classification of Sina weibo with word2vec[C]//Big Data(Big Data Congress),2014 IEEE International Congress on.IEEE,2014:358-363.
    [5]李天辰,殷建平.基于主题聚类的情感极性判别方法[J].计算机科学与探索,2016,10(7):989-994.
    [9]BLEI D M.Probabilistic topic models[J].Communications of the ACM,2012,55(4):77-84.
    [10]BARBIERI N,MANCO G,RITACCO E,et al.Probabilistic topic models for sequence data[J].Machine learning,2013,93(1):5-29.
    [11]王凯,洪宇,邱盈盈,等.融合上下文依赖和句子语义的事件线索检测研究[J/OL].计算机科学与探索,http://kns.cnki.net/kcms/detail/11.5602.TP.20170307.1359.004.html.
    [12]MIKOLOV T,ZWEIG G.Context dependent recurrent neural network language model[R].Microsoft Research Technical Report.2012,12:234-239.
    [13]GHOSH S,VINYALS O,STROPE B,et al.Contextual LSTM(CLSTM)models for large scale NLP tasks[J].ar Xiv preprint ar Xiv:1602.06291,2016.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700