Web用户评价的自动情感分析
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着网上信息量的增加和网络应用的扩大,有越来越多的用户借助因特网来获得自己需要的信息。用户在购买某种产品、做某件事之前,往往希望得到相关的一些评价和建议作为参考,因特网成为一种很重要的途径。而因特网上也有很多关于产品或者服务的用户评价信息,但是靠人工来区分这些信息是一件非常艰巨的任务,所以本文提出了自动情感分析方法。
     本文首先研究了情感词汇的自动获取技术,在北大计算语言所提出的“基于同义词词林的词汇褒贬计算”的算法基础上,通过提取部分标注错误的词汇对该方法加以改进,使词汇情感标注的准确率从89.58%上升到91.52%,并提出一种基于规则的动态扩展方法,通过上下文决定歧义词的情感倾向。
     接着研究情感文本分类的一个应用——评价信息的情感分析,对用户评价信息进行情感倾向分析。本文使用文本向量模型,通过对中文语言中各种不同词性,以及否定词,转折词,程度副词对文本的影响,来判断文本情感。并且提出一种迭代算法扩展初始情感词典,以提高分类的准确率。该方法思想简单,容易理解,准确率达到了86.43%,但缺点是算法时间复杂性较高,比较费时。
     本文使用Web挖掘技术将这些用户评价信息挖掘出来并根据用户情感进行分类。输入的是要查询的主题,输出的是对于该主题三种类别(正面、负面、中性)的评价各占的百分比,以及所占比重最高的类别中,权重的绝对值最高的前十条评价信息。
With the increasement of the information on the Internet and the expansion of the network applications, there are more and more people obtain the information they needed by the Internet. Before users buy a product, does something, they often expect access some of the reviews and recommendations as a reference. And so, the Internet becomes a very important way. And there are many kinds of reviews and recommendations on the Internet, but it is a daunting task to discriminate them manually. So this paper prensents an approach for antomatic sentiment analysis.
     First, the paper will introduce Automatic acquisition of emotional Dictionary. Based on the algorithm in paper Using Tongyici Cilin to Compute Word Semantic Polarity proposed by Institute of Computer Science & Technology, we improve it according to extract some words those are wrong tagged and this makes the experimental results improve from 89.58% to 91.52%. Besides, we present a rule-based dynamic expansion method, determining the sentiment orientation of ambiguous words according to their contexts.
     Next the paper research one application of sentiment classification—sentiment analysis of user reviews. We use text vector model, according to the influences of characters of Chinese language, adversatives, privatives and degree adverbs, determining the sentiment orientation of the reviews. At the same time, expand the initial sentiment words by an iterative process. The method is simple, easy to understand, and the total accuracy achieves 86.43%, but has a high time complexity.
     In this paper, we mining the relatively reviews and recommendations, and classified according to user’s sentiment. Input the subject, and output the percentage of the three categories (Positive, Negative and Neutral). And for the highest category, give the top ten pieces of information which absolute values are highest.
引文
[1]徐琳宏,基于语义资源的文本情感计算:[硕士学位论文],大连理工大学,2007.
    [2]路斌,万小军,杨建武,陈晓鸥等,基于同义词词林的词汇褒贬计算,北京:北京大学计算机科学技术研究院
    [3]朱嫣岚,阂锦,周雅倩,黄萱菁,吴立德.基于Hownet的词汇褒贬计算.中文信息学报.2006,Vol.20 No.1,14~20
    [4]卢鹏,孙明勇,陆汝占.基于知网的词汇语义自动分类系统.计算机仿真.2004,第21卷,第2期,127~131
    [5]许云,樊孝忠,张锋.基于知网的语义相关度计算.北京理工大学学报.2005,Vol.25 No.5,411~414
    [6]李钝,曹付元,曹大元,万月亮.基于短语模式的文本情感分类研究.计算机科学.2008,Vol.35,No.4,132~134
    [7]费仲超,基于短语模式的文本情感分类算法及其在邮件过滤系统中的应用:[硕士学位论文],上海大学,2005
    [8] Z.Fei,J.Liu,and G.Wu. Sentiment classification using phrase patterns. In the Fourth International Coference on Computer and Information Technology(CIT’04),Wuhan,China,pages1147~1152,Sept2004.
    [9]徐军,丁宇新,王晓龙.使用机器学习方法进行新闻的情感自动分类.中文信息学报.2007,Vol.21,No.6,95~100
    [10]唐慧丰,谭松波,程学旗.基于监督学习的中文情感分类技术比较研究.中文信息学报.2007,Vol.21,No.6,88~94
    [11]熊德兰,中文网页褒贬倾向性分类研究:[硕士学位论文],郑州大学,2006
    [12]林传鼎.社会主义心理学中的情绪问题.社会心理科学,2006,21(83):37~62.
    [13]许小颖,陶建华.汉语情感系统中情感划分的研究.第一届中国情感计算及智能交互学术会议,北京,2003:199-205.
    [14] .V.Hatzivassilogou and K.R.McKeown.Predicting the semantic orientation of adjectives.In Proceedings of ACL-97,35th Annual Meeting of the Association for Computational Linguistics,pages 174~181,Madrid ES,1997.Association for Computational Linguistics.
    [15] P.D.Turney and M.L.Littman.Measuring praise and criticism: Inference of semantic orientation from association.ACM Transactions on Information Systems,21(4):315~346,2003.
    [16] .J.Kamps,M.Marx,R.J.Mokken,and M.D.Rijke.Using Wordnet to measure semantic orientation of adjectives.In Proceedings of LREC-04,4th International Conference on Language Resources and Evaluation, volume IV,pages 1115~1118,Lisbon,PT,2004.
    [17] P.D.Turney.Mining the web for synoyms: PMI-IR ver-sus LSA on TOEFL. In European Conference on Machine Learning, pages 491~502, 2001.
    [18] .Marco Baroni,Stefano Vegnaduzzo.Identifying Subjective Adjectives through Web-based Mutual Information.
    [19] .Church , K.W.Hanks.Word Association Norms , Mutual Information and Lexicography.In: Proceedings of the 27th Annual Conference of the Association of Computational Lin-guistics,(1989)76~83.
    [20] Satoshi Morinaga,Kenji Yamanishi,Kenji Tateishi,and Toshikazu Fukushima. Mining product reputations on the web. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining,pages 341~349,2002.
    [21] Minqing Hu and Bing Liu,Mining opinion features in customer reviews. In AAAI,pages 755~760,2004.
    [22] Ana-Maria Popescu and Oren Etzioni. Extracting Product Features and Opinions from Reviews. InProceedings of HLT-EMNLP 2005,pages 339~346.ACL,2005.
    [23] .Ann Banfield. Unspeakable Sentences. Boston: Rouledge and Kegan Paul, 1982.
    [24] .Janyce Wiebe, Theresa Wilson, and Matthew Bell. Identifying collocations for recognizing opinions. In proceedings of ACL/EACL’01 Workshop on Collocation, 2001.
    [25] .J.Wiebe. Learning subjective adjectives from corpora. Proceedings of the 17th National Conference on Artificial Intelligence (AAAI-2000), pages 735~740, 2000.
    [26]时达明,Blog热点话题发现及其作者声誉度研究:[硕士学位论文],大连理工大学,2007
    [27]姚建新,产品评论信息的意见抽取研究与应用:[硕士学位论文],上海大学,2007
    [28]李慧,基于用户评价信息的商品推荐技术:[硕士学位论文],扬州大学,2005
    [29] .Hu.M, and Liu, B. 2004. Mining and summarizing customer reviews. KDD’04, 2004.
    [30] .A.Huettner and P.Subasic. Fuzzy typing for document management. Association for Computational Linguistics, pages 26~27, 2000.
    [31] .Das, S. and Chen, M, 2001. Yahoo! For Amazon: Extracting market sentiment from stock message boards. APFA’01.
    [32] .Tong.R, 2001, An Operational System for Detecting an Tranking Opinions in on-line discussion. SIGIR 200 Workshop on Operational Text Classification.
    [33] .PeterD.Turney. Thumbs up or thumbs down? Senmantic orientation applied to unsupervised classification of reviews. In proceedings of the 40th Annual Meeting of the Association for Computational Lingustics (ACL), pages 417~424, 2002.
    [34] .Hong Yu and Vasileios Hatzivassiloglou. Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. Proceedings of EMNLP, 2003.
    [35]彭其伟,基于统计方法的中文文本情感倾向分类研究:[硕士学位论文],山西大学,2007
    [36]姚天昉,娄德成.汉语情感词语义倾向判别的研究.第七届中文信息处理会议(ICCC2007),武汉,2007:221~225.
    [37] Jeonghee Yi,Tetsuya Nasukawa,Razvan Bunescu,Wayne niblack.Sentiment Analyzer:Extracting Sentiments about a Given Topic using Natural Language Processing Techniques.Proceeding of the Third IEEE International Conference on Data Minig(ICDM’03),2003.
    [38]梅家驹,竺一鸣,高蕴琦,殷鸿翔.同义词词林(第二版).上海辞书出版社,1996.5
    [39]游智仁,陈俊谋,左连生,唐秀丽.现代汉语同义词辨析.宁夏人民出版社,1986.7

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700