基于树核函数的句子级别情感分类研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着互联网的快速发展,网络上的信息呈爆炸式增长,其中主观性文本占有的比例大大增加。如何从这些主观性文本中分析挖掘出作者的观点是一个迫切需要解决的问题。情感分类就是解决这个问题的一种自然语言处理技术,它对文本的主观性信息进行分析,从而得出观点持有者的情感倾向性。
     本文主要研究句子级别情感分类问题。在详细分析了句子情感分类问题的重要性和难点的基础上,本文提出了一种基于树核函数的句子级别情感分类方法。该方法使用基于SVM(Support Vector Machine)的卷积树核函数自动获取句法结构信息,分别将句法树和依存树作为特征,和其它平面特征相结合,对句子进行情感分类。
     首先,本文探索基于句法树的结构化特征在句子级别情感分类中的应用,提出了一种在SVM分类器中使用树核和复合核函数来进行句子级别情感分类的方法。实验结果表明在情感分类中树核和复合核的方法比线性核具有更佳的性能。
     其次,本文提出了基于形容词和基于情感词的句法树裁剪策略。对于前者,提出了一种动态窗口算法来优化一个句子含有多个形容词的情况;对于后者,研究了添加领域相关的情感词对分类性能的影响。实验显示基于情感词的裁剪方法要好于前者。另外,实验证明在隐性情感的分类中本方法也比基于平面特征的方法好。
     最后,本文研究了基于依存关系理论的依存树裁剪策略,把它和树核函数相结合,提出了基于依存树的句子情感分类方法。实验结果显示本文提出的依存树裁剪策略是有效的。
With the rapid development of Internet, the amount of information increases in an explosive way, especially the subjective information. How to effectually mine useful information from these subjective texts is an issue currently. Sentiment classification is a way to analyze the subjective information in the text and then mine the opinion.
     We focus on the sentence-level sentiment classification. On the systematically analyzing the importance and difficulties of the sentence-level sentiment classification, this paper proposes a tree kernel-based approach of sentence-level sentiment classification. It employs the SVM-based convolution tree kernel to automatically capture structural information. We also composite the syntax tree/dependency tree-based features and other flat features to improve the performance.
     Firstly, we focus on how to apply the structure features from the syntax tree to the sentiment classification and propose a novel approach of sentence-level sentiment classification which apply the tree kernel and composite kernel to the SVM classifier. The experimental results show that the performance of our approach can achieve higher F1 measure than that of the linear kernels.
     Secondly, we provide two kinds of syntax tree pruning strategies: adjectives-based and sentiment words-based. As for the former, we propose a dynamic window algorithm to optimize the situation when a sentence contains more than one adjective; and for the latter, we introduce the domain-related sentiment words into the classification. The experimental results show that the latter’s performance is better than that of the former. Otherwise, the experimental results also show that the tree kernel can achieve higher performance in the implicit sentiment classification.
     Finally,we proposed a novel approach of sentence-level sentiment classification which apply the tree kernel and dependency tree to the SVM classifier. The pruning strategy of dependency tree is based on sentiment words and denpendency relation filtering. The experimental results show that our pruning strategy is feasible.
引文
[1] Jeonghee Yi, Tetsuya Nasukawa, Razvan Bunescu, Wayne Niblack. Sentiment Analyzer: Extracting Sentiments about a Given Topic using Natural Language Processing Techniques [A]. In Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM-2003) [C]. 2003:427-434.
    [2] Minqing Hu, Bing Liu. Mining Opinion Features in Customer Reviews [A].In Proceedings of 9th National Conference on Artificial Intellgience (AAAI-2004) [C]. 2004:755-760.
    [3] Ana M. Popescu, Oren Etzioni. Extracting Product Features and Opinions from Reviews [A].In Proceedings of the Human Language Technology Conference/ Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP-2005) [C]. 2007:339-346.
    [4] Xiwen Cheng. Automatic Topic Term Detection and Sentiment Classification for Opinion Mining [D]. Master Thesis:The University of Saarland. 2007.
    [5] Steven Bethard, Hong Yu, Ashley Thornton, Vasileios Hatzivassiloglou, Dan Jurafsky. Automatic Extraction of Opinion Propositions and their Holders [A]. In Proceedings of the AAAI Spring Symposium on Exploring Attitude and Affect in Text: Theories and Applications [C]. 2004.
    [6] Yejin Choi, Claire Cardie, Ellen Riloff, Siddharth Patwardhan. Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns [A]. In Proceedings of the Human Language Technology Conference/ Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP-2005) [C]. 2005:355-362.
    [7] Bo Pang, Lillian Lee. A Sentiment Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts [A]. In Proceedings of the Association for Computational Linguistics (ACL-2004) [C]. 2004:271-278.
    [8] Tetsuya Nasukawa, Jeonghee Yi. Sentiment Analysis: Capturing Favorabilityusing Natural Language Processing [A]. In Proceedings of the 2nd International Conference on Knowledge Capture (K-CAP-2003) [C]. 2003:70-77.
    [9] Peter D. Turney. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews [A]. In Proceedings of the Association for Computational Linguistics (ACL-2002) [C]. 2002:417-424.
    [10] Theresa Wilson, Janyce Wiebe, Paul Hoffmann. Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis [A]. In Proceedings of the Human Language Technology Conference/ Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP-2005) [C]. 2005:347-354.
    [11] Kushal Dave, Steve Lawrence, David M. Pennock. Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews [A]. In Proceedings of the 12th International World Wide Web Conference (WWW-2003) [C]. 2003:519-528.
    [12] Michael Gamon, Anthony Aue, Simon Corston-Oliver, Eric Ringger. Pulse: Mining Customer Opinions from Free Text [A]. In Proceeding of the 6th Intrenational Symposium on Intelligent Data Analysis (IDA-2005) [C]. 2005:121-132.
    [13] Bing Liu, Minqing Hu, Junsheng Cheng. Opinion Observer: Analyzing and Comparing Opinions on the Web [A]. In Proceeding of the 14th International World Wide Web Conference (WWW-2005) [C]. 2005:342-351.
    [14] Jeonghee Yi, Wayne Niblack. Sentiment Mining in WebFountain [A]. In Proceedings of the 21st International Conference on Data Engineering (ICDE-2005) [C]. 2005:1073-1083.
    [15] Theresa Wilson, Paul Hoffmann, Swapna Somasundaran, Jason Kessler, Janyce Wiebe, Yejin Choi, Claire Cardie, Ellen Riloff, Siddharth Patwardhan. OpinionFinder: A System for Subjectivity Analysis [A]. In Proceedings of Demonstration Description in Conference on Empirical Methods in Natural Language Processing 2005 (HLT/EMNLP-2005). 2005:34-35.
    [16] Vasileios Hatzivassiloglou, Kathleen R. McKeown. Predicting the SemanticOrientation of Adjectives [A]. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and the 8th Conference of the European Chapter of the ACL [C]. 1997:174-181.
    [17] Bo Pang, Lillian Lee, Shivakumar Vaithyanathan. Thumbs up? Sentiment Classification using Machine Learning Techniques [A]. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP-2002) [C]. 2002:79-86.
    [18] Bo Pang, Lillian Lee. Seeing Stars: Exploiting Class Relationship for Sentiment Categorization with Respect to Rating Scales [A]. In Proceedings of the Association for Computational Linguistics (ACL-2005) [C]. 2005:115-124.
    [19] Ellen Riloff, Siddharth Patwardhan. Feature Subsumption for Opinion Analysis [A]. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP-2006) [C]. 2006:440-480.
    [20] Ramanthan Narayanan, Bing Liu, Alok Choudhary. Sentiment Analysis of Conditional Sentences [A]. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP-2009) [C]. 2009:180-189.
    [21] Vincent Ng, Sajib Dasgupta, S. M. Niaz Arifin. Examining the Role of Linguistic Knowledge Sources in the Automatic Identification and Classification of Reviews [A]. In Proceedings of the Association for Computational Linguistics (ACL-2006) [C]. 2006:611-618.
    [22] Shotaro Matsumoto, Hiroya Takamura, Manabu Okumura. Sentiment Classification using Word Sub-sequences and Dependency Sub-trees [A]. In Proceedings of the 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-2005) [C]. 2005:301-310.
    [23] Janyce M. Wiebe, Rebecca F. Bruce, Thomas P. O'Hara. Development and Use of a Gold-Standard Data Set for Subjectivity Classifications [A]. In Proceedings of the Association for Computational Linguistics (ACL-1999) [C]. 1999:246-253.
    [24] Ellen Riloff, Janyce Wiebe. Learning Extraction Patterns for SubjectiveExpressions [A]. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP-2003) [C]. 2003:105-112.
    [25] Ellen Riloff, Janyce Wiebe, Theresa Wilson. Learning Subjective Nouns using Extraction Pattern Bootstrapping [A]. In Proceedings of Conference on Natural Language Processing (CoNLL-2003) [C]. 2003:25-32.
    [26] Tony Mullen, Nigel Collier. Sentiment Analysis using Support Vector Machines with Diverse Information Sources [A]. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP-2004) [C]. 2004:412-418.
    [27] Linlin Li, Tianfang Yao. Kernel-based Sentiment Classification for Chinese Sentence [A]. In Proceedings of the 6th International Conference on Advanced Language Processing and Web Information Technology [C]. 2007:22-24.
    [28] Jaap Kamps, Maarten Marx, Robert J. Mokken, Maarten de Rijke. Using WordNet to Measure Semantic Orientations of Adjectives [A]. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC-2004) [C]. 2004:1115-1118.
    [29] Songbo Tan, Yuefen Wang, Xueqi Cheng. Combining Learn-based and Lexicon-based Techniques for Sentiment Detection without using Labeled Examples [A]. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval [C]. 2008:743-744.
    [30]朱嫣岚,闵锦,周雅倩,黄萱菁,吴立德.基于HowNet的词语语义倾向计算[J].中文信息学报, 2006, 20(1):14-20.
    [31] Van Rijsbergen C J. Information Retrieval [M].Dordrecht: springer netherlands,1979.
    [32] Michael Collins, Nigel Duffy. Convolution Kernels for Natural Language.NIPS 2002.
    [33] Michael Collins, Nigel Duffy. New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron [A]. InProceedings of the Association for Computational Linguistics (ACL-2002) [C]. 2002:263-270.
    [34] David Haussler. Convolution Kernels on Discrete Structures [R]. Technical Report UCS-CRL-99-10. University of Clifornia, Santa Cruz.1999.
    [35] Huma Lodhi, Craig Saunders, John Shawe-Taylor, Nello Cristianini, Chris Watkins, Bernhard Scholkopf. Text Classification using String Kernels [J]. Journal of Machine Learning Research, 2002, 2:419-444.
    [36] Alessandro Moschitti. Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees [A]. In Proceedings of the 17th European Conference on Machine Learning [C]. 2006:318-329.
    [37] Dmitry Zelenko, Chinatsu Aone, Anthony Richardella. Kernel Methods for Relation Extraction [J]. Journal of Machine Learning Research, 2003, 2:1083-1106.
    [38] Aron Culotta, Jeffrey Sorensen. Dependency Tree Kernels for Relation Extraction [A]. In Proceedings of the Association for Computational Linguistics (ACL-2004) [C]. 2004:423-429.
    [39] Min Zhang, Jie Zhang, Jian Su, Guodong Zhou.A Composite Kernel to Extract Relations between Entities with both Flat and Structured Features [A].In Proceedings of the Association for Computational Linguistics (ACL-2006) [C]. 2006:825-832.
    [40] LongHua Qian, GuoDong Zhou, Fang Kong, QiaoMing Zhu, PeiDe Qian. Tree Kernel-Based Semantic Relation Extraction using Unified Dynamic Relation Tree [A]. In Proceedings of the 7th International Conference on Advanced Language Processing and Web Information Technology (ALPIT-2008) [C]. 2008.
    [41] Wanxiang Che, Min Zhang, Ting Liu, and Sheng Li. A Hybrid Convolution Tree Kernel for Semantic Role Labeling [A]. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (Coling/ACL-2006) [C]. 2006:73-80.
    [42] Livia Polanyi, Annie Zaenen. Contextual Lexical Valence Shifters [A]. In Proceedings of the AAAI Spring Symposium on Exploring Attitude and Affect in Text: Theories and Applications [C]. 2004.
    [43] Minqing Hu, Bing Liu. Opinion Extraction and Summarization on the Web [A]. In Proceedings of the American Association for Artificial Intelligence (AAAI-2006) [C]. 2006:1621-1624.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700