BBS主观倾向分析
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
在社会主义民主政治以及和谐社会发展的过程中,论坛BBS已成为人们交流意见和发表评论的重要平台。为了及时采集BBS舆论信息,掌握BBS热点话题评论内容的观点、态度和情感倾向,监管和净化BBS网络环境,为党政机构和相关部门提供民众意见倾向,以便快速和科学的决策,而BBS主观倾向分析则是BBS舆论监管的重要手段之一。在国外,英国科波拉软件公司的“感情色彩”软件能判断媒体文章对政党政策或网络产品评论信息所持评价态度和情感倾向。在国内,方正智思舆情监测分析系统帮助监管部门对网络舆论信息进行评估、分析和规划舆情内容,形成舆情预警信息。
     基于机器学习和语义模式的BBS文本倾向性研究都是将文档看作是词或模式的集合,根据计算或查找这些短语或模式的倾向性值,将计算结果累加得到整个将要判断评论性文档的BBS文本倾向性值;然而并没有将观点评价对象和对应的极性情感倾向进行细化和对应并且忽略了句子语法结构中主谓与动宾结构间的连动关系,导致BBS热点的主题词对应的情感词极性倾向判断偏差和BBS文本倾向分析的不准确。BBS主观倾向性分析的数据获取具有复杂性和多样性,常常与讨论的热点主题相关,具有随意性、广泛性、领域独特性和实效性。因此,本文首先对BBS主题的观点评价对象和相应的极性倾向进行细化与对应;然后结合极性情感词典、基于语法结构的依存句法分析Parsing以及主题极性识别算法进行BBS主观倾向分析,利用一种改进的基于上下文的倾向分析方法计算主题极性倾向值;最后进行极性主题、焦点主题和敏感主题分析和发现,利用倾向离散度的时间变化来发现主题走势,并进行对比实验验证在主题识别和对应极性倾向判断的准确率方面上本文的BBS主观倾向分析方法具有更高的有效性和可行性。
     主要工作:
     (1)利用Html和DOM抽取非结构化的BBS文本信息,进行禁用词过滤后完成中文分词预处理并以XML方式存储。
     (2)提出基于极性情感词典、依存句法分析技术Parsing和主题极性倾向识别算法的方法,分析主题词和对应极性情感词的极性倾向以进行BBS主观倾向分析。建立与整合正负情感词典和否定词典,计算句子的倾向值提取BBS评论内容中具有情感描述项的主题倾向句,并利用主题极性倾向识别算法计算基于上下文的词语极性倾向值。
     (3)提出一种改进的计算上下文极性的方法,通过添加主题识别标记和主谓与动宾结构之间的连动关系,弥补SBV(Subjective-Verb,语法中的主谓结构关系)极性传递算法主题词判断错误和极性词极性倾向判断偏差的问题。
     (4)进行BBS主观倾向关键点分析以发现极性主题、焦点主题和敏感主题;定义倾向离散度、聚焦度和敏感度,并通过倾向离散度的时间变化来分析和发现主题趋势。
     (5)通过对比实验验证在主题识别和对应极性倾向判断的准确率上,本文的BBS主观倾向分析方法具有更高的有效性和可行性。
In order to promote socialist democracy and harmonious development of society, the network has turned into an important stage for exchange of views and comments. BBS is now. becoming a communication platform to express speech of freedom, their personal views and attitudes. BBS contains a mass of information on public opinion. To enable the Government to quickly collect information on public forums, to timely grasp the public views, attitudes and subjeptive comments about the most concerned topics during various periods and to monitor BBS public opinion information, then a correct and scientific decision-making will be made. Text tendency analysis has developed greatly in the whole identification of attitudes and subjective tendency analysis of the subjective.comments texts or product reviews. Overseas, Emotion Analysis software made by British Coppola Software Company can determine a newspaper article on whether party policy or online products commentaries hold a positive attitude or a negative attitude. In China, public opinion monitoring and analysis system of Fangzheng helps monitoring department with public opinion information assessment, analysis and planning so as to provide early warning information.
     Whether it is based on semantics or machine learning of BBS text tendency analysis, a word document or pattern set will be handled, calculate and find all preference values of these phrases or patterns, and then the results will be added together to determine the BBS text tendency polarity value. The theme evaluation object of BBS and its corresponding polarity tendency would not only be refined, but also the relationship between Subject-Predicate and Verb-Object structure is ignored. So combined with polarity dictionary and dependency parsing and corresponding algorithm, the tendency analysis of subjective comments towards BBS will be carried out, and then we establish some mathematical models for the theme trend detection and analysis. Main tasks:
     (1)Information extraction technology based on Html and DOM tree is utilized to extract BBS unstructured text, and then disable word filtering and Chinese word segregation is pretreated and will be stored in XML.
     (2)Combined with a sentimental dictionary and dependency parsing technology, using polarity tendency identification algorithm to identify topic words and the polarity of their corresponding emotional words, so as to achieve to calculate and analyze polarity tendency of topic orientation sentences of the subjective comments towards BBS. Through the integration of positive and negative sentimental dictionary and negative dictionary, we could calculate the tendency value of the topic tendency sentence towards BBS comment information and calculate the context polarity tendency value and use the tendency identification algorithm to get the theme topic polarity tendency value.
     (3)A improved method computing and analyzing context polarity value is proposed that we are able to add identification mark of keywords and relationship between Subject-Predicate and Verb-Object structure to improve SBV(Subjective-Verb, Subject-Predicate structure of the syntax) algorithm, in order to make up for the disadvantages of keywords judgment error and polarity tendency. (4)Based on the tendency analysis against BBS, polarity theme model, focus theme model and sensitive theme model are established to find the theme trend. (5)Comparative Experiments validate that the BBS tendency analysis approach proposed in this paper has higher validity and feasibility.
引文
[1]李培,何中市,黄永文.基于依存关系分析的网络评论极性分类研究[J].计算机工程与应用,2010,46(11):138-141.
    [2]李艺红,蒋秀凤.中文句子倾向性分析[J].福州大学学报(自然科学版),2010,4(8):9-11.
    [3]李实,叶强.挖掘中文网络客户评论的产品特征及情感倾向[J].计算机应用研究,2010,27(8):54-56.
    [4]昝红英,郭明,柴玉梅,吴云芳.新闻报道文本的情感倾向性研究叨.计算机工程,2010,36(15):20-22.
    [5]闻彬,何婷婷,罗乐,宋乐,王倩.基于语义理解的文本情感分类方法研究[J].计算机科学,2010,37(6):261-264.
    [6]王晓东,刘倩,陶县俊.情感Ontology构建和文本倾向性分析[J].计算机工程与应用,2010,46(30):117-119.
    [7]王爽,熊德兰,赵会洋.基于BBS论坛主题的网页褒贬倾向性识别[J].计算机技术与发展,2009,19(9):50-52.
    [8]程显毅,杨天明,朱倩,蔡月红.基于语义倾向性的文本过滤研究[J].计算机应用研究,2009,26(12):460-462.
    [9]杨天明.基于语义的文本倾向性分析与应用研究[D].江苏:江苏大学,2009.
    [10]申晓晔,封化民,毋非.基于语义的Web新闻内容倾向性分析框架[J].郑州大学学报(理科版),2009,41(1):36-39.
    [11]哈尔滨工业大学信息检索研究室.中文依存句法分析概况介绍[EB/OL]. http://ir.hit.edu.cri/phpwebsite/index.php? module=pagemaster&PAGE_user_op=view_page&PAGE_id=1478&MMN_position=52:48,2009
    [12]Z. Dong and Q. Dong. HowNet [EB/OL]. http://www.keenage.com/zhiwang/e zhiwang.html,2008.
    [13]姚晓娜.BBS热点挖掘和观点分析[D].大连:大连海事大学,2008..
    [14]肖伟.基于语义的BLOG社区文本倾向性分析[D].上海:上海交通大学,2008.
    [15]张超.文本倾向性分析在舆情监控系统中的应用研究[D].北京:北京邮电大学,2008.
    [16]薛玮.网络舆情信息挖掘系统的研究[D].北京:北京交通大学,2008.
    [17]朱文轩BLOG文本内容敏感信息的自动提取技术[D].上海:上海交通大学,2008.
    [18]王素格.基于Web的评论文本情感分类问题研究[博士学位论文].上海:上海大学,2008.
    [19]倪茂树.基于语义理解的观点评论挖掘研究[D].大连:大连理工大学,2008.
    [20]黄管普,赵军.中文文本情感倾向性分析[J].中国计算机学会通讯,2008,4(2):47-53.
    [21]徐燕,李锦涛,王斌.基于区分类别能力的高性能特征选择方法[J].软件学报,2008,19(12):82-89.
    [22]徐琳宏,林鸿飞,杨志豪.基于语义理解的文本倾向性识别机制[J].中文信息学报,2007,21(1):96-100.
    [23]姚天防,娄德成.汉语语句主题语义倾向分析的研究[J].中文信息学报,2007,21(5):73-76.
    [24]娄德成.基于NLP技术的中文网络评论观点抽取方法的研究[D].上海:上海交通大学,2007.
    [25]姚天防,娄德成.汉语情感词语义倾向判别的研究[A].第七届中文信息处理国际会议[C],武汉,2007.
    [26]孙茂松,陈群秀.内容计算的研究与应用前沿-第九届全国计算语言学学术会议论文集[C].北京:清华大学出版社,2007.
    [27]萧国政,何炎祥,孙茂松.中文计算技术与语言问题研究-第七届中文信息处理国际会议论文集[C].北京:电子工业出版社,2007.
    [28]朱巧明,程学旗,刘挺.第三届全国信息检索与内容安全学术会议论文集[C].苏州:苏州大学,2007.
    [29]宋东风,张志浩.短文本数据的自动分类[J].电脑与信息技术,2007,15(1):36-38.
    [30]姚天防,聂青阳,李建超.一个用于汉语汽车评论的意见挖掘系统[A].中国中文信息学会二十五周年学术会议论文集[C].北京:清华大学出版社,2006,260-281.
    [31]娄德成,姚天防.汉语句子语义极性分析和观点抽取方法的研究[J].计算机应用,2006,26(11):22-25.
    [32]朱嫣岚,阂锦,周雅倩,黄首蓄,吴立德.基于HowNet的词汇语义倾向计算[J].中文信息学报,2006,20(1):14-20.
    [33]姚天防,聂青阳,李建超等.一个用于汉语汽车评论的意见挖掘系统.中文信息处理前沿进展-中国中文信息学会二十五周年学术会议[C].北京,2006:260-281.
    [34]Pang B, Lee L, Vaithyanathan S. Thumps up Sentiment Classification Using Machine Learning Techniques [A].In: Proceedings of the Conference on Empirical Methods in Natural Language Processing[C].Philadelphia, Pennsylvania, 2006:79-86.
    [35]Yuchul, Hogun P, Sung H.A hybrid mood classification approach for blog text[A]. PRICA 2006:Trends in artificial Intelligence[C],2006:1099-1103.
    [36]Lun-Wei K, Yu-Ting L, Hsin-Hsi Ch.Opinion extraction, summarization and racking in news and blog corpora. Proceedings of AAAI-2006 Spring Symposium on Computational Approaches to Analyzing Weblogs[C], AAAI Technical Report SS-06-03, California, USA,2006:100-107.
    [37]Bo Pang, LillianLee. Seeingstars:Exploiting Class Relationships for Sentiment Categorization with Respect to Rating
    Scales [A]. In:Proceedings of the Association for Computational Linguistics(ACL2005)[C],2005:115-124.
    [38]M. Gamon, A.Aue, S.Corston-Oliver and E.Ringger. Mining Customer Opinions from Free Text[A]. In Proc. of IDA-05, the 6th International Syposium on Intelligent Data Analysis. Lecture Notes in Computer Science, Springer-Verlag[C]. Madrid, Spain:2005.211-216.
    [39]邹嘉彦.评述新闻报道或帖子色彩—正负两极性自动分类的研究[A].全国第八届计算语言学联合学术会议(JSCL-2005)[C].南京,2005:21-23.
    [40]Yi J, Niblack W. Sentiment mining in WebFountain[A]. In Proc. ICDE-05, the 21st International Conference on Data Engineering[C], IEEE Computer Society, Tokyo,2005:1073-1083.
    [41]Liu B, Hu M, Cheng J. Opinion observer:analyzing and comparing opinions on the web[A]. In Proceedings of the 14th international conference on world wide web(WWW'05)[C],Chiba, Japan,2005:342-351.
    [42]Gamon M, Aue A, Corston-Oliver S et al. Pulse:mining customer opinions from free text[A]. In Proc. of IDA-05, the 6th International Symposium on Intelligent Data Analysis. Lecture Notes in Computer Science[C], Springer-Verlag, Madrid, 2005:121-132.
    [43]Mishne G Experiments with mood classification in blog posts[A]. Style2005 the 1st Workshops on Stylistic Analysis of Tex for Information Access[C], at ACM SIGIR,2005.
    [44]Tsou B.Polarity classification of celebrity coverage in the Chinese press. In Proc. of the International Conference on Intelligence Analysis[C], Virginia, USA,2005.
    [45]李荣陆.文本分类及相关技术研究[D].上海:复旦大学,2005.
    [46]张宁,贾自艳,史忠植.使用KNN算法的文本分类[J].计算机工程,2005,31(8):171-185.
    [47]Evgeniy Gabrilovich, Shaul Markovitch. Feature Generation for Text Categorization Using World Knowledge[C]. IJCAI2005,2005:1048-1053.
    [48]Hu M, Liu B. Mining opinion features in customer reviews[A]. In the Proceedings of AAAI(Arnerican Association for artificial intelligence)'04[C], San Jose, California,2004:755-760.
    [49]Fei Zhongchao, LiuJian, Wu Gengfeng. Sentiment Classifieation Using Phrase Patterns, The Fourth International Conference on Computer and information technology[C],2004:1147-1157.
    [50]Hu M, Liu B. Mining and summarizing Customer reviews[A]. In the Proceedings of KDD(Knowledge Discovery and Data Mining)'04[C],2004:16-177.
    [51]英国开发舆论分析软件.《环球时报》.2005.04.11第6版.
    [52]北大方正技术研究院.http://www.founderrd.com/
    [53]The 2002 Topic Detection and Tracking(TDT2002) Task Definition and Evaluation Plan. ftp://iaguar.ncsl.nist.gov/tdt/ tdt2002/evalplans/TDTO2.Eval.Plan.vl. 1ps.
    [54]川刘群,张华平,俞鸿魁,程学旗.基于层叠隐马模型的汉语词法分析[J].2004,41(8):1421-1429.
    [55]青桂仙,苏筱蔚,陈淑艳.中文文本挖掘的无词典分词的算法及其应用[J].吉林工学院学报,2004,23(1):16-18.
    [56]Zhang Huaping, Liu Qun. Automatic recognition of Chinese Person based on roles tagging. Chinese Journal of Computers[J].2004,27(1):55-91.
    [57]Mitchell T M.机器学习[M].北京:机械工业出版社,2004.42-56.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700