基于本体的话题情感分析研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
?针对“未然态”的舆情信息,挖掘网络热点、焦点及敏感话题,把握舆情动态,提高处置与监管网络突发事件能力等,是舆情分析的重要研究内容。话题情感分析是舆情分析的重要组成部分。论文针对特定话题的网络资源分析其情感倾向性,目的是解决情感词汇的倾向性判断、文本情感分类即话题整体倾向性判断等问题。
     本文针对特定话题的网络资源进行情感分析。首先,通过本体学习的方法半自动地建立领域知识库,基于领域知识从多话题的基础语料库中过滤出特定话题的语料。然后,以情感词汇本体对情感知识的细粒度分类和倾向性强度判定为指导,对过滤出来的话题语料进行情感分析,从而获得指定话题的整体情感倾向。最后,以春晚评价为应用实例,初步实现了话题情感分析系统中的主要功能模块。本文的主要工作如下。
     (1)本体构建:包括特定话题的领域本体构建和情感词汇本体构建。前者用于从多话题基础语料库中过滤出特定话题的语料。后者旨在为段落和篇章级的情感分类,以及为话题语料中的情感特征词细粒度划分提供依据。
     (2)话题语料库构建:首先从Web上自动获取语料,语料文本经预处理后,对其进行聚类以构建多话题基础语料库,然后基于构建的领域本体对话题语料进行过滤,形成单一话题的语料库。
     (3)话题情感分析:其关键是主观信息的提取和计算,包括从语料文本中抽取主观句,从主观句中提取情感词,以及基于情感词汇本体对主观句中的情感词进行情感倾向计算。最后,通过实验证明了话题情感分析算法的有效性。
     (4)话题情感分析应用研究:介绍了一个将话题情感分析应用到春晚评价领域中的应用实例。对系统中的主要功能模块进行了详细描述,并通过初步实现系统的主要功能,展示了将话题情感分析技术应用到舆情监控系统中能够提高分析的准确性。
In allusion to the unlighted public opinion information, the important aspects of public opinion analysis are as follows: network hot spots, focus and sensitive topic mining, public opinion trends control, the ability to handle and monitor the network emergency improvement and so on. Topic sentiment analysis is an important part of public opinion analysis. In order to judge the tendency of the emotional words and the overall orientation of the topics, sentiment analysis of the specific network resources is done in this paper.
     The purpose of this paper is to analysis the sentiment of some network topic. Firstly, the domain ontology of some specific topic is constructed based on semi-automatic ontology learning methods, based on which we can filter out the corpus of the topic we want from the corpus of multiple topics. Then, in order to get the overall tendency of the topic, the knowledge of fine-grained classification and the intensity value of the vocabulary in the sentiment vocabulary ontology is used to analysis the sentiment of the corpus that have already been filtered out. Finally, an application example of topic oriented sentiment analysis is introduced, and the main function modules of the Spring Festival Evening evaluation system are completed. The main work of this paper is as follows.
     (1) Ontology construction. This section includes two parts, one is the construction of domain ontology about a specific network topic, and another is the sentiment vocabulary ontology construction. The former is used to filter out certain topics from the basic corpus with multiple topics. The latter is designed to guide the fine-grained classification of the emotional corpus.
     (2) Topic Corpus Construction. First, the Web corpus is crawled automatically, the texts of the corpus are clustered after being pre-processed in order to build multi-topic corpus, and then the topic corpus is build based on the domain ontology in order to form a single topic corpus.
     (3) Topic oriented sentiment analysis. The key task of the topic oriented sentiment analysis is the extraction and calculation of subjective information, including subjective sentences text extraction from the corpus, subjective emotional words extraction from a sentence, and ontology-based emotional words’tendency calculation. Finally, experiments show that the sentiment analysis algorithm is effective.
     (4) Application. This paper presents an application example of topic oriented sentiment analysis. The main function modules of the Spring Festival Evening evaluation system are described in detail. This system shows that topic oriented sentiment analysis is suitable to applied to network monitoring system of public opinion.
引文
[1]赵妍研,秦兵,刘挺.文本情感分析[J].软件学报,2010,21(8):1834-1848.
    [2]黄萱菁,赵军.中文文本情感倾向性分析[R].北京:中国中文信息学会信息检索专业委员会,2008.
    [3]徐琳宏,林鸿飞,赵晶.情感语料库的构建和分析[J].中文信息学报,2008,22(1):116-122.
    [4] Rao D,Ravichandran D.Semi-Supervised polarity lexicon induction[C].In:Lascarides A,ed.Proc.of the EACL 2009.Morristown:ACL,2009:675-682.
    [5] Wiebe J.Learning subjective adjectives from corpora[C].In:Schultz AC,ed.Proc.of the AAAI.Menlo Park:AAAI Press,2000:735-740.
    [6] Riloff E,Wiebe J.Learning extraction patterns for subjective expressions[C].In:Collins M,Steedman M,eds.Proc.of the EMNLP 2003.Morristown:ACL,2003:105-112.
    [7] Turney P,Littman ML.Measuring praise and criticism:Inference of semantic orientation from association[J].ACM Trans.on Information Systems,2003,21(4):315-346.
    [8] Kim SM,Hovy E.Automatic detection of opinion bearing words and sentences[C].In:Carbonell JG,Siekmann J,eds.Proc.of the IJCNLP 2005.Morristown:ACL,2005:61-66.
    [9] Kim SM,Hovy E.Identifying and analyzing judgment opinions[C].In:Bilmes J,et al.,eds.Proc.of the Joint Human Language Technology/North American Chapter of the ACL Conf.(HLT-NAACL).Morristown:ACL,2006:200-207.
    [10]朱嫣岚,闵锦,周雅倩等.基于HowNet的词汇语义倾向计算[J].中文信息学报,2006,20(1):14-20.
    [11] Andreevskaia A,Bergler S.Mining WordNet for a fuzzy sentiment:Sentiment tag extraction from WordNet glosses[C].In:McCarthy D,Wintner S,eds.Proc.of the European Chapter of the Association for Computational Linguistics (EACL).Morristown:ACL,2006:209-216.
    [12] Su F,Markert K.Subjectivity recognition on word senses via semi-supervised mincuts[C].In:Ostendorf M,ed.Proc.of the NAACL 2009.Morristown:ACL,2009:1-9.
    [13] Esuli A,Sebastiani F.Determining the semantic orientation of terms through gloss analysis[C].In:Herzog O,ed.Proc.of the ACMSIGIR Conf.on Information and Knowledge Management (CIKM).New York:ACM Press,2005:617-624.
    [14] Esuli A,Sebastiani F.Determining term subjectivity and term orientation for opinion mining[C].In:McCarthy D,Wintner S,eds.Proc.of the European Chapter of the Association for Computational Linguistics (EACL).Morristown:ACL,2006:193-200.
    [15] Kamps J,Marx M,Mokken RJ.Using WordNet to measure semantic orientation of adjectives[C].In:Calzolari N,et al.,eds.Proc.of the LREC,2004:1115-1118.
    [16] Yu H,Hatzivassiloglou V.Towards answering opinion questions:separating facts from opinions and identifying the polarity of opinion sentences[C].In:Collins M,Steedman M,eds.Proc.of the EMNLP 2003.Morristown:ACL,2003:129-136.
    [17] Turney P.Thumbs up or thumbs down?Semantic orientation applied to unsupervised classification of reviews[C].In:Isabelle P,ed.Proc.of the ACL 2002.Morristown:ACL,2002:417-424.
    [18] Hu MQ,Liu B.Mining and summarizing customer reviews[C].In:Kohavi R,ed.Proc.of the KDD 2004.New York:ACM Press,2004:168-177.
    [19] Pang B,Lee L,Vaithyanathan S.Thumbs up?Sentiment classification using machine learning techniques[C].In:Isabelle P,ed.Proc.of the EMNLP 2002.Morristown:ACL,2002:79-86.
    [20] Cui H,Mittal VO,Datar M.Comparative experiments on sentiment classification for online product reviews[C].In:Gil Y,Mooney RJ,eds.Proc.of the AAAI 2006.Menlo Park:AAAI Press,2006:1265-1270.
    [21] Zhao J , Liu K , Wang G . Adding redundant features for CRFs-based sentence sentiment classification[C].In:Lapata M,Ng HT,eds.Proc.of the Conf.on Empirical Methods in Natural Language Processing (EMNLP 2008).Morristown:ACL,2008:117-126.
    [22] Kim SM,Hovy E.Automatic identification of pro and con reasons in online reviews[C].In:Dale R,Paris C,eds.Proc.of the COLING/ACL 2006.Morristown:ACL,2006:483-490.
    [23] Kim SM,Hovy E.Crystal:Analyzing predictive opinions on the Web[C].In:Eisner J,ed.Proc.of the Joint Conf.on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CONLL).Morristown:ACL,2007:1056-1064.
    [24] Lin WH,Wilson T,Wiebe J.Which side are you on?Identifying perspectives at the document and sentence levels[C].In:Bilmes J,et al.,eds.Proc.of the Conf.on Natural Language Learning (CONLL).Morristown:ACL,2006:109-116.
    [25] Pang B,Lee L.Seeing stars:Exploiting class relationships for sentiment categorization with respect to rating scales[C].In:Knight K,ed.Proc.of the Association for Computational Linguistics (ACL).Morristown:ACL,2005:115-124.
    [26] Goldberg AB,Zhu X.Seeing stars when there aren’t many stars:Graph-Based semi-supervised learning for sentiment categorization[C].In:Bilmes J,et al.,eds.Proc.of the HLT-NAACL 2006 Workshop on Textgraphs:Graph-Based Algorithms for Natural Language Processing.Morristown:ACL,2006:45-52.
    [27]陈建美,林鸿飞,杨志豪.基于语法的情感词汇自动获取[J].智能系统学报.2009,(2):100-106.
    [28]桑爱菊.基于Text2Onto的中文本体学习技术研究[D].青岛:中国海洋大学,2009.
    [29] Faatz A,Steinmetz R.Ontology enrichment with texts from the www[C].Semantic Web Mining 2nd Workshop at ECML/PKDD-2002,Helsinki,Finland,2002.
    [30] Aussenac-Gilles N,Biebow B,Szulman S.Corpus Analysis For Conceptual Modelling[C].Workshop on Ontologies and Text,Knowledge Engineering and Knowledge Management:Methods,Models and Tools,12th International Conference EKAW’00,Juan-les-pins,France:Springer-Verlag,2000.
    [31] Deitel A,Faron C,Dieng R.Learning Ontologies from RDF Annotations[C].In:Proceedings of the IJCAI Workshop in Ontology Learning,Seattle,2001.
    [32] Suryanto H,Compton P.Learning classification taxonomies from a classification knowledge based system[C].Proceeding of the First Workshop on Ontology Learning in conjunction with ECAI-2000,eds.Steffen Staab,Alexander Maedche,Claire Neddellec,Peter Wierner-Hastings,Berlin Germany,2000:1-6.
    [33]杨柳.基于文本的中文本体知识获取的研究[D].北京:中国科学院研究生院(计算技术研究所).2006.
    [34]刘柏嵩.基于Web的通用本体学习研究[D].浙江:浙江大学,2007.
    [35] Debole F,Sebastiani F.Supervised term weighting for automated text categorization[C].In:Haddad H,George AP,eds.Proc.SAC-03,Melbourne:ACM Press,2003:784-788.
    [36] Lertnattee V , Theeramunkong T . Effect of term distributions on centroid-based text categorization[J].Information Sciences,2004,158(1):89-115.
    [37]张秦龙,穗志方,丁万松.术语自动提取中领域度计算方法研究[C].第三届学生计算语言学研讨会论文集.沈阳,2006.
    [38]郭志鑫,金海,陈汉华.SemreX中基于语义的文档参考文献元数据信息提取[J].计算机研究与发展,2006,43(8):1368-1374.
    [39]知网.情感分析用词语集(beta版)[EB/OL].http://www.keenage.com,2007-10-22.
    [40]刘群,李素建.基于知网的词汇语义相似度计算[EB/OL].http://www.keenage.com,2003-3-18.
    [41]董振东,董强.知网简介[EB/OL].http://www.how-net.com,2011-2-20.
    [42]王晓东,刘倩,陶县俊.情感Ontology构建与文本倾向性分析[J].计算机工程与应用,2010,46(30):117-120.
    [43]周明.面向语料库标注的汉语依存体系的探讨[J].中文信息学报,1994,8(3):35-51.
    [44]贺玲,吴玲达,蔡益朝.数据挖掘中的聚类算法综述[J].计算机应用研究,2007,(1):10-13.
    [45] Chris Ding.A Tutorial on Spectral Clustering[J].Statistics and Computing,2007,17(4):395-416.
    [46] Riloff E,Wiebe J,Phillips W.Exploiting subjectivity classification to improve information extraction[C]. In:Yanco H,ed.Proc.of the AAAI 2005.Menlo Park:AAAI Press, 2005:1106-1111.
    [47]李艺红,蒋秀凤.中文句子倾向性分析[J].福州大学学报(自然科学版),2010,38(4):504-508.
    [48]王晓东,刘倩,张征.基于情感词汇Ontology的话题倾向性计算[J].计算机工程与应用,2010-8-18录用.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700