产品描述词及情感词抽取模式的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着万维网的飞速发展,网络已成为完美的交流意见和发表观点的平台。越来越多的用户在论坛等平台发表对产品的评论。这些评论信息对于消费者和产品制造商都有较大的参考价值。然而,迅速增长的评论信息使得人工获取分析很难。因此,基于自然语言处理方法的产品评论分析有着重要的研究价值。在产品评论分析中,提取产品描述词和情感词是重要的处理过程。
     本论文针对中文评论信息,借助于计算语言学、统计学领域的理论和方法,从词性、句法分析两个不同语言粒度入手,挖掘出产品属性词和对应情感词间的模板,探索从中文评论语句中提取产品属性词及对应情感词的新算法。
     本文提出了对于不同领域的语料,给定领域相关种子词,基于模板从语料中互推迭代提取出产品属性词和对应情感词的提取算法。实验结果表明,该提取算法所需人工干预较少,性能优于现有方法,而且在此基础上实现了模板和提取算法的领域无关。
     本文的创新点是:1)利用产品属性词和对应情感词间的关系在两类词间进行互推迭代;2)本文设计的产品属性词和对应情感词提取算法领域无关且性能优于现有方法。提取算法不需任何领域相关的训练语料,只要给定极少的领域相关种子词就可以将词性模板、句法树路径和提取算法直接应用于其它领域。
With the rapid development of the World Wide Web, network has become a perfect platform to express and exchange of views. Nowadays, more and more users express their reviews on the products in the forum and other platforms. These reviews are very useful for consumers and product manufacturers. However, a mass of reviews makes it a hard task to get a just view. Therefore, product reviews analysis based on natural language processing technology is of great value. Extraction of product features and sentiment words from reviews is important processes in product reviews analysis.
     With the aid of theories and methods in computational linguistics, statistics, from the view of POS and syntax tree, we extract templates between product features and the corresponding sentiment words, develop new technologies and methods for extracting product features and the corresponding sentiment words from Chinese product reviews.
     This paper proposed a new algorithm to extract product features and the corresponding sentiment words from different domain Chinese product reviews based on the templates and domain related seed words. The experimental results show that the extraction algorithm required less manual intervention and gain good and stable performance in different domains. The extraction algorithm and the extracted templates are domain-independent.
     This paper is a departure from previous work in that:1) it utilizes the relationship between product features and the corresponding sentiment words to extract the two kinds of words mutually and iteratively; 2) The extraction algorithm of product features and the corresponding sentiment words is domain-independent. The performance of extraction algorithm is better than any previous work in this research domain. Without the use of any domain related training corpus and given several domain related seed words, the extraction algorithms can be applied to many different domains.
引文
[1]王素格.基于Web的评论文本情感分类问题研究[学位论文],上海,上海大学,2008
    [2]Hu, M. and Liu, B. 2004. Mining and summarizing customer reviews. Proc. Of KDD-2004, Seattle, Washington, USA.
    [3]Mingqing Hu, Bing Liu. Mining Opinion Features in Customer Reviews. American Association for Artificial Intelligence,21(1),2006:14-20
    [4]娄德成,姚天昉.汉语句子语义极性分析和观点抽取方法的研究.计算机应用,26(11),2006:2622-2625
    [5]姚天昉,娄德成.汉语语句主题语义倾向分析方法的研究.中文信息学报,21(5),2007:73-79
    [6]姚天防,娄德成,李建成等.一个用于汉语汽车评论的意见挖掘系统.在中国中文信息学会二十五周年学术会议,北京,2006:260-281
    [7]费仲超.基于短语模式的文本情感分类算法及其在邮件过滤中的应用[学位论文],上海,上海大学,2005
    [8]Soo-Min Kim, Eduard Hovy. Identifying Opinion Holders for Question Answering in Opinion Texts. American Association for Artificial Intelligence.
    [9]Soo-Min Kim, Eduard Hovy. Crystal:Analyzing Predictive Opinions on the Web.
    [10]胡英飞.基于行为识别的垃圾邮件过滤研究[学位论文],背景,北京研究大学,2008
    [11]httn://haike.haidu.com/view/737646.htm?fr=ala0_1
    [12]Turney, P., "Thumbs up or thumbs down?:semantic orientation applied to unsupervised classification of reviews"[A],2002, In 40th ACL[C], pp:417-424
    [13]Riloff, E. and Wiebe, J., "Learning extraction patterns for subjective expressions"[A],2003, In EMNLP-2003[C], pp:105-112
    [14]Nasukawa, T. and Yi, J., "Sentiment analysis: capturing favorability using natural language processing"[A],2003, In 2nd K-CAP[C], pp:70-77
    [15]Soo-Min Kim, Eduard Hovy. Identifying and Analyzing Judgment Opinions.
    [16]朱嫣岚,闵锦,周雅倩等.基于HowNet的词汇语义倾向计算.中文信息学报,21(1),2006:14-20
    [17]Yi, J., Nasukawa, T., Bunescu, R., Niblack, W.Sentiment analyzer:Extracting sentiments about a given topic using natural language processing techniques. In:The Third IEEE International Conference on Data Mining, November 2003,.IEEE Computer Society Press, Los Alamitos,2003,427-434.
    [18]Soo-Min Kim, Eduard Hovy. Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text.
    [19]Bing Liu, Mingqing Hu, Junsheng Cheng. Opinion Observer: Analyzing and Comparing Opinions on the Web. In IW3C2, Japan, May,2005:342-351
    [20]Cen Songxiang, Mao Yu, Li Rongjun etc. Credit Distribution: A Graph-Based Approach to Extract Product Description Words. In Proceedings of the 2008 International Symposium on Knowledge Acquisition and Modeling, Washington, Dec. 2008:398-402
    [21]Kim, S-M. and Hovy, E.2004. Determing the Sentiment of Opinions. Proc. of COLING 2004.
    [22]王新丽.中文文本分类系统的研究与实现[学位论文],天津,天津大学,2007
    [23]何国东.基于MapTree的汉语分词系统的设计与实现[学位论文],广东,华南理工大学,2008
    [24]梁南元.书面汉语自动分词系统--CDWS.中文信息学报,(2),1987
    [25]张华平,刘群.中文词语一体化分析,中科院计算技术研究所研究报告
    [26]孙宾.现代汉语文本的词语切分技术,北京大学计算语言学研究所研究报告
    [27]张国兵.汉语分词中未登录词识别及词性标注的研究与实现[学位论文],合肥,中国科学技术大学,2008
    [28]王东海.基于统计的汉语词性自动标注的若干分析和实验研究[学位论文],长春,长春工业大学,2007
    [29]赵法兴.基于机器学习的汉语词性自动标注系统[学位论文],长春,长春工业大学,2007
    [30]刘峰.关联规则改进算法及应用.内蒙古科技与经济,13,2007:32-34在COAE2008,北京,2008:32-37
    [31]卢俊之.基于语法功能匹配的句法分析算法[学位论文],南京,南京师范大学,2008
    [32]邵刚.基于中文的句法分析系统的研究与实现[学位论文],西安,西安科技大学,2008[2]Bin Shi, Kuiyu Chang. Mining Chinese Reviews. Sixth IEEE International Conference on Data Mining,2006
    [33]张慧.汉语句法分析及其在汉英统计翻译中的应用[学位论文],厦门,厦门大学,2007
    [34]张姝,贾文杰,夏迎炬等.基于CRF的评价对象抽取技术研究.在COAE2008,北京,2008:70-76
    [35]张艳,宗成庆,徐波.汉语术语定义的结构分析和提取.中文信息学报,17(6),2003:9-16
    [36]Tetsuya Nasukawa, Jeonghee Yi.Sentiment Analysis:Capturing Favorability Using Natural Language Processing. In K-CAP'03, Florida,October 23-25,2003:70-77
    [37]Michael Thelen, Ellen Riloff. A Bootstrapping Method for Learning Semantic Extraction Pattern Contexts. In Proceeding of the Conference on Empirical Methods in Natural Language Processing, Philadelphia, July 2002:214-221
    [38]Marius Pasca. Acquisition of Categorized Named Entities for Web Search. In CIKM'04, Washington, November 8-13,2004:137-145
    [39]Dmitry Davidov, Ari Rappoport. Classification of Semantic Relationships between Nominale Using Pattern Clusters. In Proceedings of ACL-08, Columbus, June,2008:227-235
    [40]蒙新泛,王厚峰.基于CRF的对象抽取及对象抽取的领域特定性研究.在COAE2008,北京,2008:32-37
    [41]张猛,彭一凡,樊扬等.中文倾向性分析的研究.在COAE2008,北京,2008:38-45
    [42]王克,张春良,朱慕华等.基于情感词词典的中文文本主客观分析.在COAE2008,北京,2008:56-62
    [43]王秉卿,张姝,张奇.中文情感词识别;在COAE2008,北京,2008;63-69
    [44]赵妍,刘鸿宇,秦兵等.HIT_IR_OMS:情感分析系统.在COAE2008,北京,2008:81-88
    [45]Popescu, A-M. and Etzioni, O.2005, Extracting Product Features and Opinions from Reviews, Proc. of HLT-EMNLP 2005.
    [46]Qi Su, Xinying Xu, Honglei Guo etc. Hidden Sentiment Association in Chinese Web Opinion Mining. Alternate Track, Beijing, April 2008:959-968
    [47]赵俊芹.顾客评论信息抽取算法的研究[学位论文],重庆,重庆大学,2007

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700