摘要
该文针对网络评论倾向分级问题,提出了一种基于观点袋模型和语言学规则的多级情感分类方法。通过分析句子中的词性搭配关系,设计了12种抽取特征-观点搭配模式,并对存在问题给出了解决策略。依据汉语用词特点和词汇在汽车领域的特殊用法,提出搭配四元组的情感倾向极性值计算方法。在此基础上,利用获取的搭配四元组及其情感倾向极性,建立文本的向量化表示,并构造了权重计算公式。最后,利用文本余弦相似度计算方法实现对评论文本的五级情感极性分类。通过在COAE2012任务3的汽车数据集上进行的测试,取得了较好的分类结果。
Focused on the online review sentiment polarity classification problem,a multi-level sentiment classification method is proposed based on bag-of-opinion model and a set of linguistic rules.According to the part-of-speech of each word in the sentences,12 patterns are designed for the feature-opinion pairs'extraction,which enable to represent the whole text in a series of four-tuple of"feature,degree word,opinion word,negation word".After designing the estimation of the sentiment priority of the four-tuple,the cosine similarity is further adopted for a 5-level sentiment polarity classification.Experiments on the dataset from COAE2012 Task 3car dataset indicate a good result compared to the performances of the other runs in COAE.
引文
[1]Wiebe J,Bruce R,Bell M,et al.A corpus study of evaluative and speculative language[C]//Proceedings of the 2nd ACL SIGdial Workshop on Discourse and Dialogue.USA:ACL,2001:1-10.
[2]Xia Y Q,Xu R F,Wong K F,et al.The unified collocation framework for opinion mining[C]//Proceedings of Machine Learning and Cybernetics,2007International Conference on.IEEE,2007,2:844-850.
[3]王素格.基于Web的评论文本情感分类问题研究[D].上海:上海大学,2008.
[4]Smadja F.Retrieving collocations from text:Xtract[J].Computational linguistics,1993,19(1):143-177.
[5]王素格,杨军玲,张武.自动获取汉语词语搭配[J].中文信息学报,2006,20(6):31-37.
[6]Qu L,Ifrim G,Weikum G.The bag-of-opinions method for review rating prediction from sparse text patterns[C]//Proceedings of the 23rd International Conference on Computational Linguistics.Association for Computational Linguistics,2010:913-921.
[7]Thet T T,Na J C,Khoo C S G.Aspect-based sentiment analysis of movie reviews on discussion boards[J].Journal of Information Science,2010,36(6):823-848.
[8]王素格,杨安娜.基于混合语言信息的词语搭配倾向判别方法[J].中文信息学报,2010,24(3):69-74.
[9]刘康,王素格,廖祥文,等.第四届中文倾向性分析评测总体报告[C]//Proceedings of the COAE2012,Nanchang,China,2012:1-33.
[10]唐都钰,石秋慧,王沛,等.HITIRSYS:COAE2012情感分析系统[C]//Proceedings of the COAE2012,Nanchang,China,2012:44-52.
[11]林莉媛,苏艳,戴敏,等.Suda_SAM_OMS情感倾向性分析技术报告[C]//Proceedings of the COAE2012,Nanchang,China,2012:69-76.
[12]程南昌,侯敏,腾永林,等.基于文本特征的语篇倾向性分析研究[C]//Proceedings of the COAE2012,Nanchang,China,2012:89-94.
[13]刘楠,贺飞艳,彭敏,等.基于情感要素的否定句极性判别方法[C]//Proceedings of the COAE2012,Nanchang,China,2012:123-131.
[14]魏现辉,任巨伟,何文泽,等.DUTIR COAE2012评测报告[C]//Proceedings of the COAE2012,Nanchang,China,2012:34-43.
[15]崔安颀,张永锋,刘奕群,等.基于情感词典的中文倾向性分析[C]//Proceedings of the COAE2012,Nanchang,China,2012:118-122.
[16]计算所汉语词法分析系统ICTCLAS.http://www.ictclas.cn/.
[17]王素格,尹学倩,李茹,等.基于非完备信息系统的评价对象情感聚类[J].中文信息学报,2012,26(4):98-102.
[18]宁鸿彬,徐同.新颁《标点符号用法》通释[M].教育科学出版社,1992.
[19]徐琳宏,林鸿飞,潘宇,等.情感词汇本体的构造[J].情报学报,2008,27(2):180-185.
[20]顾正甲,姚天昉.评价对象及其倾向性的抽取和判别[J].中文信息学报,2012,26(4):91-97.