文本情感分类相关问题研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着互联网的迅猛发展以及互联网用户数量的急剧增加,随之涌现出大量文本形式的信息。人们通过互联网发布含有主观倾向的信息,表达对商品、时事等问题的观点、态度以及褒贬等。
     文本情感倾向性分析涉及到计算语言学、人工智能、机器学习、信息检索以及数据挖掘等多方面的研究,具有广泛的应用价值,目前,情感倾向性分析已经成为国内外研究的热点。
     情感倾向性分析,就是对说话人的态度(或称观点、情感)进行分析,也就是对文本中的主观性信息进行分析。情感倾向分析的研究大致可以分成四个级别:词语情感倾向性分析、短语情感倾向性分析、句子情感倾向性分析、篇章情感倾向性分析。
     词语情感倾向性分析是对含有情感倾向的名词、动词、形容词等进行分析,识别并判断其情感倾向以及情感强度,词语情感倾向性分析是文本情感倾向性分析的前提和基础。句子情感倾向性分析的对象是特定上下文中的语句,包括主客观句子的识别,主观句子情感倾向的判断,以及与句子情感倾向相关的要素的提取。篇章的情感倾向性分析,就是从文档整体上判断某文本的情感倾向性。
     本文针对情感分类中的特征提取问题,分别基于篇章和句子级别,对比了情感搭配、情感词、程度副词、否定副词、以及词序列等作为特征对情感分类的影响。实验结果显示,情感词、程度副词、否定副词是文本情感分类的主要特征,添加情感搭配和词序列后,文本情感分类效果有明显的提高。
     本文针对情感分类中的特征选择问题,分别基于篇章和句子级别,采用文档频率、卡方统计量、文档频率与卡方统计量相结合、信息增益、以及信息增益与遗传算法相结合的特征选择方法,比较了各类特征选择方法对文本情感分类效果的影响。实验结果显示,基于信息增益与遗传算法相结合的特征选择方法对文本情感分类的效果具有明显的改进。
With the development of Internet and the increase of internet users, more and more information has been organized as the textural format. People release subjective information through internet which contains their opinion, attitude and judgment about merchandises and current affairs.
     Text sentiment polarity analysis is related to Computational Linguistics, Artificial Intelligence, Machine Learning, Information Retrieval, and Data Mining, etc. It has broad applications; as a result, text sentiment analysis has become a hotspot in natural language processing in recent years.
     Text sentiment polarity analysis is to analyze the writers’attitude (or point of view, emotion), that is, analyzing the subjective information of text. Sentiment analysis in general terms can be divided into four levels: word sentiment analysis, phrase sentiment analysis, sentence sentiment analysis, and text sentiment analysis. Word sentiment analysis deals with words which contain subjective information, for example: noun, verb, adjective, adverb, etc. It’s the precondition and foundation of text sentiment analysis. Sentence sentiment analysis deals with sentences which contain context information. Text sentiment analysis analyzes text’s subjective information as a whole.
     For the problem of feature extraction in sentiment classification, this paper extracts different kinds of features, including sentiment patterns, sentiment words, degree adverbs, negative adverbs, and word sequences, to analyze their influences toward sentiment classification in text and sentence levels. The experimental results show that sentiment words, degree adverbs and negative adverbs are most helpful, and the performance is obviously improved with the help of sentiment patterns and word sequences.
     For the feature selection of sentiment analysis, this paper uses different methods of feature selection, including DF, CHI, the combination of DF and CHI, IG, the combination of IG and GA, to analyze their influences toward sentiment classification in text and sentence levels. The experimental results show that the method of the combination of IG and GA performs best.
引文
1.彭其伟.基于统计方法的中文文本情感倾向分类研究: [硕士学士论文].山西大学. 2007. 12-15
    2.黄萱菁,赵军.中文文本情感倾向性分析[J].中国计算机学会通讯. 2008, 4(2): 47-53
    3. WordNet. http://wordnet.princeton.edu
    4. General Inquirer. http://wjh.harvard.edu/~inquirer
    5. Vasileios H, Kathleen R McKeown. Predicting the Semantic Orientation of Adjectives [C]. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and the 8th Conference of the European Chapter of the ACL. 1997: 174-181
    6. HowNet. 2008. http://www.keenage.com
    7.朱嫣岚,闵锦,周雅倩,等.基于HowNet的词汇语义倾向计算[J].中文信息学报. 2006, 20(1): 14-20
    8.路斌,万小军,杨建武,等.基于同义词词林的词汇褒贬计算[C].第七届中文信息处理国际会议论文集. 2007: 17-23
    9. Bo Pang and Lillian Lee. A sentiment education :Sentiment analysis using subjectivity summarization based on minimum cuts [A] . In Proceedings of ACL2004 [C] . 2004. 2712278.
    10. Bo Pang , Lillian Lee , ad Shivakumar Vaithyanathan.Thumbs up Sentiment classification using machine learning techniques [A ] . In : Proceedings of EMNL P 2002 [C] . 2002. 79286.
    11. Hong Yu, Vasileios H. Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences [C]. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing. 2003: 129-136
    12. Kim S Min, Eduard H. Determining the Sentiment of Opinions [C]. In Proceedings of COLING-04: The Conference on Computational Linguistics. 2004: 1367-1373
    13. Kim S Min, Eduard H. Identifying Opinion Holders for Question Answering in Opinion Texts [C]. In: Proceedings of AAAI-05 Workshop on Question Answering Restricted Domains. 2005
    14. J. Yi, T. Nasukawa, R. Bunescu, W. Niblack. Sentiment Analyzer: Extracting Sentiments about A Given Topic using Natural Language Processing Techniques[C]. In: Proceedings of Third IEEE International Conference. 2003: 427-434
    15.叶强,张紫琼,罗振雄.面向互联网评论情感分析的中文主观性自动判别方法研究.信息系统学报. 2007, vol.1, 79-91
    16.娄德成,姚天昉.汉语句子语义极性分析和观点抽取方法的研究.计算机应用. 2006
    17. Fei Zhongchao, Liu Jian, Wu Gengfeng. Sentiment Classification Using Phrase Patterns [C]. In: Proceedings of Fouth International Conference on Computer and Information Technology (CIT' 04). 2004
    18.王根,赵军.基于多重标记CRF的句子情感分析研究[C].全国第九届计算语言学学术会议. 2007: 600-605
    19.章剑锋,张奇,黄萱菁,吴立德.中文评论挖掘中的主观性关系抽取[C].第三届全国信息检索与内容安全学术会议. 2007: 675-681
    20. Liu Jian, Yao Jianxin, Wu Gengfeng. Sentiment Classification using Information Extraction Technique [C]. In: International symposium on Intelligent Data Analysis. 2005, 3646: 216-227
    21. Liu Jian, Yao Jianxin, Wu Gengfeng. Super Parsing: Sentiment Classification with Review Extraction [C]. In: Fifth International Conference on Computer and Information Technology-proceeding. 2005: 216-222
    22.蔡建平,林世平.基于极其学习的词语和句子极性分析.第三届全国信息检索与内容安全学术会议.
    23. Bo P, Lillian L, Shivakumar V. Thumbs up Sentiment Classification using Machine Learning Techniques [C]. In: Proceedings of EMNLP 2002
    24. Peter D Turney. Thumbs Up or Thumbs Down Semantic Orientation Applied to Unsupervised Classification of Reviews [C]. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 2002: 417-424
    25. Peter D. Turney, Michael L. Littman. Measuring praise and criticism: Inference of Semantic Orientation from Association [J]. ACM Transactions on Information Systems. 2003, 21(4): 315-346
    26. Church, K.W., Hanks, P. Word Association Norms, Mutual Information and Lexicography. In: Proceedings of the 27th Annual Conference of the ACL. 1989: 76-83
    27. A. Abbasi, H. Chen, and A. Salem. Sentiment Analysis in Multiple Languages: Feature Selection for Opinion Classification in Web Forums. ACM Transactions on Information Systems, Vol. 26, No. 3, Article 12, June 2008.
    28.李培,何中市,黄永文.基于依存关系分析的网络评论极性分类研究.计算机工程与应用. 2010, 46(1)
    29.胡熠,陆汝占,李学宁等.基于语言建模的文本情感分类研究.计算机研究与发展. 2007, 44(9), 1469-1475
    30.李艳玲,戴冠中,朱烨行.基于类别空间模型的文本倾向性分类方法.计算机应用. 2007. Vol.27, No.9
    31.何坤,李伟生,杨勇.基于语义特征的文本情感倾性识别研究.计算机应用研究. 2010, Vol.27, No.3
    32.徐琳宏,林鸿飞,杨志豪.基于语义理解的文本倾向性识别机制.中文信息学报. 2007. Vol. 21, No.1
    33.唐慧丰,谭松波,程学旗.基于监督学习的中文情感分类技术比较研究.中文信息学报. 2007, Vol.21, No.6
    34.周立柱,贺宇凯,王建勇.情感分析研究综述.计算机应用. 2008, Vol. 28, No.11
    35.王晓龙,关毅.计算机自然语言处理.清华大学出版社. 2005: 128-129
    36. Christopher D. Manning, Hinrich Sch tze. Foundations of Statistical Natural Language Processing. 2002: 166-169
    37.苑春法,李庆中,王昀.统计自然语言处理基础.电子工业出版社. 2007. 166-168
    38.秦进,陈笑蓉,汪维家等.文本分类中的特征抽取.计算机应用. 2003. Vol. 23, No.2
    39.康岚兰,董丹丹.常用特征选择方法的比较研究.电脑知识与技术. 2009, Vol. 5, No.34
    40.代六玲,黄河燕,陈肇雄.中文文本分类中特征抽取方法的比较研究.中文信息学报. 2004. Vol. 18, No.1
    41. Y Yang, J O Pedersen. A Comparative Study on Feature Selection in Text Categorization [C]. In: Proceedings of the 14th International Conference on Machine Learning (ICML97). San Francisco, USA. 1997: 412-420
    42. K. W. Church, P. Hanks. Word Association Norms, Mutual Information and Lexicography. Computational Linguistics. 1990, 16(1): 22-29
    43. T. Mitchell. Machine Learning. McGraw-Hill International Edition. 1997
    44. D. Mladenic, M. Grobelnik. Feature Selection for Classification Based on Text Hierarchy. Working Notes of Learning from Text and the Web [C]. Conf. Automated Learning and Discovery (CONALD-98). 1998: 412-420
    45. M. Taboada, C. Anthony and K. Voll. Methods for creating semantic orientationdictionaries. In proceedings of fifth international conference on language resources and evaluation. Genoa, Italy
    46.朱力.中文词语情感倾向研究:[硕士学士论文].哈尔滨:哈尔滨工业大学计算机科学技术学院,2009. 25-30
    47.朱德熙.语法讲义.商务印书馆. 2000
    48.王力.中国现代汉语语法[M].北京:商务印书馆, 1985, p131-132
    49.李玉鑑,周玉珍,操卫平.基于DF和CHI的联合特征提取方法及其应用.北京工业大学学报. 2008, Vol. 34, No.9
    50.赵丽娜,刘培玉,朱振方.自适应遗传算法在特征选择中的改进与应用.计算机工程与应用. 2009, 45(7)
    51. Chinese Information Processing Society of China. http://www.cipsc.org.cn/
    52. http://www.searchforum.org.cn/tansongbo/corpus-senti.htm

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700