面向情感聚类的文本相似度计算方法研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Text Similarity Calculation for Text Sentiment Clustering
  • 作者:李欣 ; 李旸 ; 王素格
  • 英文作者:LI Xin;LI Yang;WANG Suge;Information Center,Shanxi Medical College for Continuing Education;School of Computer and Information Technology,Shanxi University;Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education Shanxi University;
  • 关键词:文本情感聚类 ; 文本相似度计算 ; 文本语义子空间
  • 英文关键词:sentiment-based text clustering;;text similarity calculation;;text semantic subspace
  • 中文刊名:MESS
  • 英文刊名:Journal of Chinese Information Processing
  • 机构:山西职工医学院信息中心;山西大学计算机与信息技术学院;山西大学计算智能与中文信息处理教育部重点实验室;
  • 出版日期:2018-05-15
  • 出版单位:中文信息学报
  • 年:2018
  • 期:v.32
  • 基金:国家自然科学基金(61573231,61632011,61672331,61432011);; 山西省科技基础条件平台计划项目(2015091001-0102)
  • 语种:中文;
  • 页:MESS201805012
  • 页数:8
  • CN:05
  • ISSN:11-2325/N
  • 分类号:102-109
摘要
在文本情感分析时,使用无监督的聚类方法,可以有效节省人力和数据资源,但同时也面临聚类精度不高的问题。相似性是文本聚类的主要依据,该文从文本相似度计算的角度,针对情感聚类中文本—特征向量的高维和稀疏问题,以及对评论文本潜在情感因素的表示问题,提出一种基于子空间的文本语义相似度计算方法(RESS)。实验结果表明,基于RESS的文本相似度计算方法,有效解决了文本向量的高维问题,更好地表达了文本间情感相似性,并获得较好的聚类结果。
        In text sentiment analysis,unsupervised clustering method is challenged by low precision.To improve the text similarity measure lying as key to clustering,this paper proposes a semantic subspace(RESS)method to deal with the high dimension and sparseness of sentiment text representation issue.It also helps to caputure the implicit expression of sentiment.The experimental results show that RESS can effectively reduce the feature of data set and generat better results.
引文
[1]孟小峰,慈祥.大数据管理:概念、技术与挑战[J].计算机研究与发展,2013,50(01):146-169.
    [2]Berry M W,Castellanos M.Survey of text mining[M].New York:Springer,2004:219-232.
    [3]Turney P D.Thumbs up or thumbs down?:semantic orientation applied to unsupervised classification of reviews[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics,2002:417-424.
    [4]李欣,王素格,李德玉.面向文本情感聚类的维度判别方法[J].计算机工程与应用,2015,51(7):124-130.
    [5]Pang B,Lee L,Vaithyanathan S.Thumbs up?:sentiment classification using machine learning techniques[C]//Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing-Volume 10.Association for Computational Linguistics,2002:79-86.
    [6]Das S R,Chen M Y.Yahoo!For amazon:sentiment parsing from small talk on the Web[J].ManagementScience,2007,53(9):1375-1388.
    [7]陈笑蓉,刘作国.文本聚类的重构策略研究[J].中文信息学报,2016,30(02):189-195.
    [8]Bilenko M,Basu S,Mooney R J.Integrating constraints and metric learning in semi-supervised clustering[C]//Proceedings of the 21st International Conference on Machine Learning.ICML,2004:81-88.
    [9]Bekkerman R,Raghavan H,Allan J,et al.Interactive clustering of text collections according to a user-specified Criterion[C]//Proceedings of the International Joint Conference on Artificial Intelligence.IJCAI,2007:684-689.
    [10]Dasgupta S,Ng V.Mining clustering dimensions[C]//Proceedings of the 27th International Conference on Machine Learning(ICML-10),2010:26270.
    [11]Riloff E,Patwardhan S,Wiebe J.Feature subsumption for opinion analysis[C]//Proceedings of the 2006Conference on Empirical Methods in Natural Language Processing,2006:440-448.
    [12]Feng S,Wang D,Yu G,et al.Extracting common emotions from blogs based on fine-grained sentiment clustering[J].Knowledge and Information Systems,2011,27(2):281-302.
    [13]黄永光,刘挺,车万翔,胡晓光.面向变异短文本的快速聚类算法[J].中文信息学报,2007,21(02):63-68.
    [14]Makrehchi M,Kamel M S.Text classification using small number of features[M].Machine Learning and Data Mining in Pattern Recognition.Springer Berlin Heidelberg,2005:580-589.
    [15]Zheng H T,Kang B Y,Kim H G.Exploiting noun phrases and semantic relationships for text document clustering[J].Information Sciences,2009,179(13):2249-2262.
    [16]Jing L,Ng M K,Huang J Z.Knowledge-based vector space model for text clustering[J].Knowledge and Information Systems,2010,25(1):35-55.
    [17]王素格,李德玉,魏英杰.基于赋权粗糙隶属度的文本情感分类方法[J].计算机研究与发展,2011,48(05):855-861.
    [18]夏云庆,杨莹,张鹏洲,刘宇飞.基于情感向量空间模型的歌词情感分析[J].中文信息学报,2010,24(01):99-103.
    [19]刘全超,黄河燕,冯冲.基于多特征微博话题情感倾向性判定算法研究[J].中文信息学报,2014,28(04):123-131.
    [20]郗亚辉.产品评论中领域情感词典的构建[J].中文信息学报,2016,30(05):136-144.
    [21]Mitral M,Hadi A,Man L,et.al.Sense Sentiment Similarity:An Analysis[C]//Proceedings of the 26th Association for the Advancement of Artificial Intelligence,2012:1706-1712.
    [22]Neviarouskaya A,Ishizuka M.SentiFul:Generating a reliable lexicon for sentiment analysis[C]//Proceedings of the 3th International Conference on Affective Computing and Intelligent Interaction and Workshops(ACII),2009:1-6.
    [23]Ortony A,Turner T J.What's basic about basic emotions?[J].Psychological Review,1990,97(3):315-331.
    [24]徐琳宏,林鸿飞,潘宇.情感词汇本体的构造[J].情报学报,2008,27(2):180-185.
    [25]Dunning T.Accurate methods for the statistics of surprise and Coincidence[J].Computational Linguistics,1993,19(1):61-74.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700