中文产品评论观点抽取方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着电子商务和web 2.0应用的发展,越来越多的消费者在购买和使用产品之后,喜欢在电子商务网站、论坛、博客发表对产品的观点态度,这些评论包含了用户对产品的特征,功能,性能等的看法,消费者在购买产品之前总会咨询别人对产品的意见从而做出明智的购买决定,厂家也可以根据用户的评论来改进产品,人工的去浏览这些海量产品评论是费时和低效的,并且还有滞后性和片面性。近来,如何对大量的非结构化的网络产品评论自动的进行观点抽取成为了一个研究热点。
     本文针对情感观点抽取资源的构建、产品属性特征的抽取、属性特征词与情感词的搭配识别及极性判定进行了深入研究,本文的主要研究工作如下:
     (1)利用开源工具Larbin和Xpath,针对购物网站的手机频道进行定向爬虫,并根据网页格式利用Xpath进行元数据抽取,最终构建手机评论语料库。
     (2)在构建情感观点抽取资源方面,提出了基于百度百科的基础情感词典构建方法、基于连词词典和依存句法关系相结合的领域情感词典方法、网络情感词典、情感修饰词典的构建方法。
     (3)在产品属性特征抽取方面,提出了基于规则和统计的识别算法和基于CRF的属性特征改进识别算法来抽取产品属性特征,前者的准确率达到0.56,覆盖率达到0.73,而后者的准确率更高,为0.78,但覆盖度仅0.46,为了与其他研究者进行比较,将Hu和Liu的方法应用到本实验环境,实验表明本文的两种方法好于Hu和Liu的方法。
     (4)在属性特征与情感词搭配识别及极性判定方面,提出了基于SVM搭配识别算法,并与最近邻匹配算法、基于依存句法搭配识别算法做了对比实验, SVM搭配识别算法的准确率达到0.83,覆盖率达到0.62,F值为0.71,远远高于其他两种方法,取得了最好的性能。
With the development of e-commerce and web 2.0, more and more consumers like publishing their own attitudes and views on e-ecommerce websites, forums and blogs after purchasing and using products, these reviews include attitudes about features, function and performance of the product. On one hand, consumers usually consult other people’s suggestions before they purchase product to make more sensible decision; on the other hand, manufacturers can also improve their products according to these product reviews. Artificial reading massive reviews is time-consuming and inefficient, it also has hysteretic nature and one-sidedness. Recently, a variety of automatic opinion extraction approaches from unstructured web product reviews have built up a research hotspot.
     This paper makes deeply research on the construction of sentiment analysis resource and product feature dictionary, the matching approaches between feature words and sentiment words and the judgment criterion of the polarity of product feature words. The main research work is as follows.
     (1) Using open-source tools named Larbin and Xpath, aiming at mobile phone channel of shopping websites to crawl, and then according to webpage format to extract meta data, finally, we construct product review corpus about mobile phone.
     (2) In order to construct sentiment analysis resource, we proposed a method of building basic sentiment word dictionary based baidu baike, and then construct domain sentiment dictionary based on conjunction dictionary and dependency relations, network sentiment dictionary, sentiment modified dictionary.
     (3) In order to extract feature word, we proposed two methods, one is based rule and statistics, the other is base CRF machine learning, the precision and recall of the former reach 0.56 and 0.73 respectively, on the contrary, the precision of the latter is higher, but the recall is not well, which are 0.78 and 0.46 respectively, for the sake of comparison to other researchers, we use method of Hu and Liu on our experimental environment, the experiments show our both methods perform better then Hu and Liu’s method.
     (4) In the area of matching between feature word and sentiment word, judging polarity of feature word, we proposed collocation recognition algorithm based on SVM machine learning, then make comparison with nearest matching algorithm and dependency relations algorithm, the experiment shows the SVM method perform best than the other methods, the precision, recall and F-measure reach 0.83, 0.62 and 0.71 respectively.
引文
[1]中国互联网网络信息中心.第27次中国互联网发展状况统计报告. http://www.cnnic.cn/research/bgxz/tjbg/201101/t20110120_20302.html
    [2] comScore/the Kelsey group. Online consumer-generated reviews have significant impact on offline purchase behavior. Press Release, November2007. http://www.comscore.com/press/release.asp?press=1928.
    [3] John A. Horrigan. Online shopping. Pew Internet & American Life Project Report, 2008.
    [4] Nan Hu. Ling Liu. Jennifer Zhang. Analyst Forecast Revision and Market Sales Discovery of Online[C]. Word of Mouth Proceedings of the 40th Hawaii International Conference on System Sciences. Waikoloa, Big Island, HI, USA. 2007.
    [5] Satoshi Morinaga, Kenji Yamanishi, Kenji Tateishi, Toshikazu Fukushima. Mining Product Reputations on the Web[C]. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. Edmonton, Alberta, Canada, 2002.
    [6] Riloff E, Wiebe J. Learning extraction patterns for subjective expressions[C]. Proceedings of EMNLP.2003
    [7] Wiebe J,Riloff E. Creating Subjective and Objective Sentence Classifiers from Unannotated Texts[C]. Proceedings of the 6th International Conference on Computational Linguistics and Intelligent Text Processing. 2005.
    [8] McDonald R, Hannan K, Neylon T, et al. Structured Models for Fine-to-Coarse Sentiment Analysis[C]. Proceedings of ACL.2007.
    [9] Ku L W, Lo Y S, Chen H H. Using Polarity Scores of Words for Sentence-level Opinion Extraction, in Proceedings of the 6th NTCIR Workshop meeting, 2007.
    [10]王根,赵军,基于多重冗余标记CRF的句子情感分析研究,全国计算语言学学术会议,北京:清华大学出版社, 2007.
    [11] Turney P D.Thumbs up or thumbs down?Semantic orientation applied to unsupervised classification of reviews[C]. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics,Philadelphia,PA,USA,2002 : 417 -424.
    [12] B. Pang, L. Lee, and S Vaithyanathan. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2002.
    [13] Blitzer J,Dredze M,Pereira F.Biographies,Bollywood,Boomboxes and Blenders:Domain adaptation for sentiment classification[C].Proceedings of ACL. 2007.
    [14]刘永丹,曾海泉,李荣陆,胡运发.基于语义分析的倾向性文本过滤[J].通信学报,2004,25(7):78-84.
    [15]胡熠,陆汝占,李学宁等.基于语言建模的文本情感分类研究,计算机研究与发展,2007,44(9):163- 165.
    [16] P.D.Turney, M.L.Littman. Measuring Praise and Criticism: Inference of Semantic Orientation from Association[J]. ACM Transactions on Information Systems, 2003, 21(4):315–346.
    [17] M. Sahami, T. Heilman. A Web-based Kernel Function for Measuring the Similarity of Short Text Snippets[C]//Proceedings of 15th International World Wide Web Conference. W3C, 2006.
    [18] Hatzivassiloglou, V., McKeown, K.R. Predicting the semantic orientation of adjectives[C]. Proceedings of the 35th Annual Meeting of the ACL and the 8th Conference of the European Chapter of the ACL. Stroudsburg, PA, USA: Association for Computational Linguistics, 1997: 174-181.
    [19] Kaji N, Kitsuregawa M. Building Lexicon for Sentiment Analysis from Massive Collection of HTML Documents, In EMNLP, 2007:1075-1083
    [20]赵煜,蔡皖东,樊娜,李慧贤.利用词汇分布相似度的中文词汇语义倾向性计算[J].西安交通大学学报, 2009, 43(6):33-37.
    [21] H. Chen, M. Lin, Y. Wei. Novel Association Measures Using Web Search with Double Checking[C]. Proceedings of the COLING/ACL 2006. ACL,2006:1009–1016.
    [22] A.Andreevskaia, S.Bergler. Mining Wordnet for Fuzzy Sentiment: Sentiment Tag Extraction from Wordnet Glosses[C]//Proceedings of EACL-06, 11th Conference of the European Chapter of the Association for Computational Linguistics. Trento:Morgan Kaufmann Publishers Inc., 2006.
    [23] J.Kamps, M.Marx, R.J.Mokken, et al. Using Wordnet to Measure Semantic Orientation of Adjectives[C]. Processing of LREC-04,4th International Conference on Language Resources and Evaluation. Lisbon: ELRA, 2004:1115–1118.
    [24]刘群,李素建.基于《知网》的词汇语义相似度的计算[C] .第三届汉语词汇语义学研讨会.中国台北,2002.
    [25]朱嫣岚,闵锦,周雅倩.基于HowNet的词汇语义倾向计算[J].中文信息学报, 2006, 20(1):14-20.
    [26]路斌,万小军,杨建武等.基于同义词词林的词汇褒贬计算[J]. In Proc. of the7th Int. Conf. on Chinese Computing, 2007:17-23.
    [27] Nozomi Kobayashi, Kentaro Inui, Yuji Matsumoto, Collecting Evaluative Expressions forOpinion Extraction[C]. IJCNLP 2004:596-605.
    [28]姚天昉,聂青阳,李建超等.一个用于汉语汽车评论的意见挖掘系统[C].中国中文信息学会成立二十五周年学术年会论文集.清华大学出版社.2006.
    [29]姚天昉,程希文,徐飞玉,等.文本意见挖掘综述[J].中文信息学报.2008,3:71-80.
    [30] Yi Jeonhee,Nasukawa T,Bunescu R C,et al. Sentiment analyzer:Extracting sentiments about a given topic using natural language processing techniques[C]. The 3rd IEEE International Conference on Data Mining,2003.
    [31] Kim S M,Hovy E.Determining the sentiment of opinions[C]. The20th International Conference on Computational Linguistics,Geneva,Switzerland,2004.
    [32] M. Hu, and B. Liu, Mining opinion features in customer reviews, In the Proceedings of the AAAI Conference, 2004.
    [33] M. Hu and B. Liu, Mining and summarizing customer reviews, Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2004:168-177.
    [34] Q. Su, X. Xu, H. Guo, X. Wu, X. Zhang, B. Swen, and Z. Su. Hidden Sentiment Association in Chinese Web Opinion. Proceeding of WWW’08, pp 959-968.2008
    [35] Murthy Ganapathibhotla , Bing Liu, Mining opinions in comparative sentences, Proceedings of the 22nd International Conference on Computational Linguistics , p.241-248, August 18-22, 2008, Manchester, United Kingdom
    [36] S Raju, P Pingali, V Varma, An Unsupervised Approach to Product Attribute Extraction, Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval, 2009: 796-800.
    [37] Ana-Maria Popescu, Oren Etzioni. Extracting Product Features and Opinions from Reviews. Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing[C].Vancouver,Canada.2005.
    [38]张孟,彭一凡等.中文倾向性分析的研究[C].第一届中文倾向性分析评测论文集.北京,2008:38-45.
    [39]何慧,李思等.PRIS中文情感倾向性分析技术报告[C].第一届中文倾向性分析评测论文集.北京,2008:46-55.
    [40]张姝,贾文杰等.基于CRF的评价对象抽取[C].第一届中文倾向性分析评测论文集.北京,2008:70-75.
    [41]赵妍妍,刘鸿飞等.HIT_IR_OMS:情感分析系统[C].第一届中文倾向性分析评测论文集.北京,2008:81-88.
    [42]刘军,刘全升等.第一届中文倾向性评测结果浅析[C].第一届中文倾向性分析评测论文集.北京,2008:125-141.
    [43]娄德成,姚天昉 .汉语句子语义极性分析和观点抽取方法的研究[J].计算机应用.2006.11:2622-2625.
    [44] Dave K, Lawrenee S, Pennoek D. Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews[C].WWW’2003:519-528.
    [45] Yi J,Niblack W.Sentiment Mining in WebFountain[C]. In Proc.ICDE-05,the 21st International Conference on Data Engineering,IEEE Computer Society,Tokyo, 2005:1073-1083.
    [46] Gamon M, Aue A, Corston-oliver S, Ringger E. Pulse:Mining Customer Opinions from Free Text[C]. the 6th International Symposium on Intelligent Data Analysis. Lecture Notes in Computer Science, Springer-Verlag, Madrid, 2005:121-132.
    [47] Goo5网[R]. http://www.goo5.cn.
    [48] K. Aasand L.Eikvil. Text categorization: a survey Technical Report[M], Nowegian Computing Cenier. 1999,90-100.
    [49] Galavotti L,Sebastiani F,Simi M. Feature Selection and Negative Evidence in Automated Text Categorization[J]. KDD-2000 Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA,UA ,2000,Computing Surveys.2002,34(I).1-47.
    [50] Yang H W, Meng H M, Wu Z Y. Modeling the Global Acoustic Correlates of expressivity for Chinese text-to-speech Synthesis. Workshop on Spoken Language Technology. 2006: 10-13.
    [51] Chinese Information Processing Society of China. http://www.cipsc.org.cn.
    [52] Larbin. http://larbin.sourceforge.net/index-eng.html
    [53] HowNet R: http://www.keenage.com/
    [54]百度百科. http://baike.baidu.com/
    [55] ICTCLAS. http://ictclas.org/
    [56] CRF++. http://crfpp.sourceforge.net/

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700