观点提出者(opinion holder)提取研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
情感分析主要是针对主观性文本单元自动的获取有用的意见信息和相关知识。随着互联网和信息产业的快速发展,大量用户在论坛、博客等平台上发表自已的意见和观点,针对的内容几乎囊括所能想象的一切。在情感分析领域中,对于意见和观点的提出,需要提取出意见的提出者或发起者(opinion holder),以更全面的掌握人们对社会或公众问题的看法,从而制定更加正确的措施或发表更加正确的言论。因此,基于自然语言处理方法的opinion holder提取有着重要的研究价值。
     本文针对不同领域的语料,采用基于统计和基于规则的方法分别对opinion holder进行提取,最后将基于统计和基于规则的方法相结合进行提取。本文的研究成果主要有:
     首先,通过分析opinion holder的定义,提取和提出了相应的6个特征,分别为词、主观表达触发词、词性标记、命名实体、依存关系和句子结构特征,并对特征定义了特征观察窗口以尽量精确的包含特征的上下文。其次,通过进行句法分析,定义了两条基于主观表达触发词的用于提取opinion holder的句法规则,并根据所提出的句法规则设计了基于句法规则的opinion holder提取算法。
     最后,将基于条件随机场和基于句法规则的opinion holder提取进行结合,即将句法规则所得结果进行句法路径挖掘和置信度分析后选取相应特征作为条件随机场的训练特征。其结合进行提取的结果显示了较高的准确率和召回率,得到了较满意的结果。然而不足之处在于我们并没有进行指代消解,下一步将进行指代消解并运用语义消歧来进一步提高opinion holder识别的精确性。
Sentiment analysis mainly aims to automatically obtain useful sentimental knowledge and relevant information from subjectivity texts. With the development of the Internet and Information Industry, many users can make their reviews in the forum, blog or other platforms, and what they have been talking is nearly all inclusive. In the field of sentiment analysis, among those views that had been put up it is important to identify the author or sponsor that is opinion holder in order to clearly know how people thought about the social or public problems meanwhile to device better measures and make the reviews properly. Therefore, identifying opinion holder based on natural language processing technology is of great value.
     In this paper, opinion holders from different fields are identified respectively based on statistical and rules, then we combined the results from statistical with rules to obtain the final identification result. The main results of this paper are:
     Firstly, by analyzing the definition of opinion holder, the relevant six features are extracted and proposed, including lexical, the opinionated_trigger words, POS tags, named entities, dependency and sentence structure, and feature observation windows are designed to contain the contextual information of features as precisely as possible. Secondly, by analyzing the layer of structure from parsing trees on a large scale, we propose two novel syntactic rules with opinionated_trigger words to directly identify opinion holder from the parse trees through the designed opinion holder extraction algorithm based on proposed two syntactic rules.
     Finally, a combination method of CRF with syntactic rules is proposed to identify opinion holder, where the syntactic rules are regarded as additional three features for CRF obtained through the feature extracting algorithm we designed. The combination identification results show a high precision and recall, and indicate satisfactory results. However, the anaphora resolution is not used in our study, so in the future the anaphora resolution combined with semantic disambiguation will be used to further improve the accuracy of opinion holder identification.
引文
[1]周立柱,贺宇凯,王建勇.情感分析研究综述[J].计算机应用,2008,28(11):2725-2728
    [2]王素格.基于Web的评论文本情感分类问题研究[学位论文],上海,上海大学,2008
    [3]王小捷,常宝宝.自然语言处理技术基础[M].北京:北京邮电大学出版社.2001:93-98
    [4]Bo Pang, Lillian Lee, Shivakumar Vaithyanathan, Thumbs up?:sentiment classification using machine learning techniques, Proceedings of the ACL-02 conference on Empirical methods in natural language processing, p.79-86, July 06,2002
    [5]Turney, P., "Thumbs up or thumbs down?:semantic orientation applied to unsupervised classification of reviews"[A],2002, In 40th ACL[C], pp:417-424
    [6]Kim, S. and E.H. Hovy.2004. Determining the Sentiment of Opinions. Proceedings of COLING-04
    [7]Tao Li, Yi Zhang, Vikas Sindhwani. A non-negative matrix tri-factorization approach to sentiment classification with lexical prior knowledge. Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pages 244-252, Suntec, Singapore,2-7 August 2009
    [8]Yejin Choi, Claire Cardie, Ellen Riloff, Siddharth Patwardhan, Identifying sources of opinions with conditional random fields and extraction patterns, Proceedings of the conference on HLT/EMNLP, Vancouver, British Columbia, Canada, pp.355-362, October 06-08,2005
    [9]Ruihong Huang, Longxin Pan and Le Sun. ISCAS in Opinion Analysis Pilot Task: Experiment with sentimental dictionary based classifier and CRF model. In Proc of the six NTCIR Workshop. May.2007, Japan.2007
    [10]K. Liu, and J. Zhao, "NLPR at Multilingual Opinion Analysis Task in NTCIR7," pp.226-231
    [11]Yu-Chieh Wu, Li-Wei Yang, Jeng-Yan Shen, Liang-Yu Chen, Shih-Tung Wu. Tornado in Multilingual Opinion Analysis:A Transductive Learning Approach for Chinese Sentimental Polarity Recognition. In Proc of the NTCIR7 Workshop.
    [12]Kim, Soo-Min and Eduard Hovy. Identifying Opinion Holders for Question Answering in Opinion Texts. Proceedings of AAAI-05 Workshop on Question Answering in Restricted Domains.2005
    [13]S.-M. Kim and Eduard Hovy. Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text. In Proceedings of workshop on sentiment and subjectivity in text at proceedings of the 21st international conference on computational linguistics/the 44th annual meeting of the association for computational linguistics (COLING/ACL 2006), Sydney, Australia, pp.1-8
    [14]Y. Seki, "Opinion holder extraction from author and authority viewpoints", Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR 2007), pp.841-842
    [15]Bethard, Steven, Hong Yu, Ashley Thornton, Vasileios Hatzivassiloglou, and Dan Jurafsky. Automatic Extraction of Opinion Propositions and their Holders, AAAI Spring Symposium on Exploring Attitude and Affect in Text:Theories and Applications.2004
    [16]Kim, Youngho, Jung, Yuchul and Myaeng, S.-H. (2007). Identifying Opinion Holders in Opinion Text from Online Newspapers. Proceedings of the 2007 IEEE International Conference on Granular Computing
    [17]J Lafferty, A McCallum, F Pereira. Conditional random fields:Probabilistic models for segmenting and labeling sequence data. In Proceedings. ICML-01, pp. 282.289.2001
    [18]Erik F. Tjong Kim Sang, Jorn Veenstra, Representing text chunks, Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics, June 08-12,1999
    [19]Dan Shen, Geert-Jan M. Kruijff, and Dietrich Klakow.2005. Exploring syntactic relation patterns for question answering. In Proceedings of IJCNLP2005
    [20]D. Shen, D. Klakow.2006. Exploring correlation of dependency relation paths for answer extraction. In Proceedings of the COLING/ACL,889-896
    [21]Riloff, E. and J. Wiebe.2003. Learning Extraction Patterns for Subjective Expressions. Proceedings of the EMNLP-03
    [22]Riloff, E., J. Wiebe, and T. Wilson.2003. Learning Subjective Nouns Using Extraction Pattern Bootstrapping. Proceedings of CoNLL-03
    [23]E. Riloff.1996b. Automatically Generating Extraction Patterns from Untagged Text. In Proceedings of the 13th National Conference on Artificial Intelligence.
    [24]Wenjing Zhao, Yanquan Zhou. A Template-Based Approach to Extract Product Features and Sentiment Words, IEEE NLP-KE 2009,335-339
    [25]Burr Settles.Biomedical Named Entity Recognition Using Conditional Randomds and Rich Feature Sets[A].Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications[C].Geneva,Switzerland,2004:104-107
    [26]Fei Sha, Fernando Pereira.Shallow Parsing with Conditional Random Fields[A].Proceedings of HLT-NAACL 2003[C].Edmonton, Canada,2003
    [27]Andrew McCallum, Wei Li.Early Results for Named Entity Recognition withConditional Random Fields Feature Induction and Web-Enhanced Lexicons[A]. Proceedings of the 7th Conference on Natural Language Learing[C].Edmonton, Canada,2003:188-191
    [28]Andrew McCallum, Fang-fang Feng.Chinese Word Segmentation with Conditional Random Fields and Integrated Domain Knowledge[A]. In Unpublished Manuscript,2003
    [29]Fuchun Peng, Fangfang Feng, Andrew McCallum.Chinese Segmentation and New Word Detection using Conditional Random Fields[A].Proceedings of the 20th International Conference on Computational Linguistics[C].Switzerland, Geneva,2004:562-568
    [30]周俊生;戴新宇;尹存燕;陈家骏;基于层叠条件随机场模型的中文机构名自动识别[J];电子学报;2006年05期
    [31]洪铭材;张阔;唐杰;李涓子;基于条件随机场(CRFs)的中文词性标注方法[J];计算机科学;2006年10期.
    [32]宗成庆,统计自然语言处理[M].2008
    [33]Dan Klein and Christopher D. Manning.2003. Fast Exact Inference with a Factored Model for Natural Language Parsing. In Advances in Neural Information Processing Systems 15 (NIPS 2002), Cambridge, MA:MIT Press, pp.3-10
    [34]Dan Klein and Christopher D. Manning.2003. Accurate Unlexicalized Parsing. Proceedings of the 41st Meeting of the Association for Computational Linguistics, pp.423-430
    [35]Roger Levy and Christopher D. Manning.2003. Is it harder to parse Chinese, or the Chinese Treebank?. ACL 2003
    [36]http://www.coli.uni-saarland.de/projects/milca/courses/coal/html/node145.hml
    [37]许亚因,吴佑寿,葛成辉,丁晓青;一种基于语义和句法的书面汉语分析系统的 研究与实现[J];中文信息学报;1989年03期
    [38]赵铁军,李生,周明;一种生成复杂特征集句法树的汉语句法分析方法与系统实现[J];中文信息学报;1992年04期
    [39]杨开城,一种基于句法语义特征的汉语句法分析器[J].中文信息学报.2004,14(3):46-53
    [40]林颖,史晓东,郭锋;一种基于概率上下文无关文法的汉语句法分析[J];中文信息学报:2006年02期
    [41]Minhwa Chung.Dan Moldovan, Memory-based parsing with parallel marker-passing, Proceedings of the Conference on Artificial Intelligence Applications 1994. Publ by IEEE, IEEE Service Center, Piscataway, NJ, USA, 94CH3421-5:202-207
    [42]周明,潘海华.基于变换的汉语句法功能标注探讨[J].中文信息学报.11(4):1-10
    [43]Collins, M., Duffy, N.:Convolution Kernels for Natural Language. Advances in Neural Information Processing Systems 14, Cambridge, MA. MIT Press (2002).
    [44]5. Culotta, A., Sorensen, J.:Dependency Tree Kernels for Relation Extraction. In: Proceedings of ACL-2004 (2004)
    [45]刘峰.关联规则改进算法及应用.内蒙古科技与经济,13,2007:32-34COAE2008,北京,2008:32-37
    [46]http://chasen.org/-taku/software/CRF++/
    [47]http://www.cnts.ua.ac.be/con112000/chunking/
    [48]百度百科http://baike.baidu.com/view/598089.htm?fr=ala0_1

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700