摘要
【目的/意义】目前,静态情感倾向判断成为分析舆情信息的一种重要手段,但这种方法局限于最终的情感分类结果,不能追溯到整个情感演变过程以及各阶段的影响因素,因此无法提出更为细致和有针对性的措施。【方法/过程】鉴于此,本文提出一种基于动态主题—情感演化模型的舆情信息分析方法,通过对评论文本进行语义角色标注,建立情感单元词表;然后将改进的TF-IDF和K-Means聚类方法相结合提取主题词,形成主题-情感匹配词表,比起传统的TF-IDF方法,其准确率和F值都有明显提升;最后引入时间节点,利用点互信息(Pointwise Mutual Information,PMI)和情感词典的方法,进行动态情感演化分析。【结果/结论】实验研究证明,该方法得出的情感演化趋势与实际情况相吻合,为进一步制定治理网络舆情危机的措施,提供了有效依据。
【Purpose/significance】At present,the judgment of static emotional tendency has become an important method for the analysis of public opinion information,but this method is limited to the final emotion classification results,and cannot be traced back to the whole process of emotion evolution and the influencing factors of each stage,so it can't put forward more detailed and targeted measures.【Method/process】This paper proposes a public opinion information analysis method based on the dynamic theme-emotion evolution model.Through semantic role labeling on the comment text,the emotion unit word list is established.Then the improved TF-IDF and K-Means clustering method are combined to extract the subject words and form the theme-emotion matching word list.Compared with the traditional TF-IDF method,the accuracy and F value are significantly improved. Finally,time nodes are introduced to analyze the dynamic emotional evolution using the method of Pointwise Mutual Information(PMI)and the dictionary of emotion.【Result/conclusion】Experimental research proves that the trend of emotion evolution obtained by this method is consistent with the actual situation,which provides effective basis for further developing measures to govern online public opinion crisis.
引文
1 Blei D M, Ng A Y, Jordan M I. Latent Dirichletal Allocation[J].Journal of Manchine Learning Research,2003,3(1):993-1022.
2 Lin C H, He Y L. Joint Sentiment/topic Model for Sentiment analysis[C]//Proceeding of the 18th ACM Conference on Information and Konwledge Management New York:ACM,2009:375-384.
3 Mei Q Z, Ling X, Wondra M, et al. Topic Sentiment Mixture:Modeling Facets and Opinions in Weblogs.07[C]//Proceedings of the 16th International Conference on World Wide WebC. New York:ACM,2007:171-180.
4 Li F, Huang M, Zhu X. Sentiment analysis with global topics and local dependency[C]//Twenty-Fourth AAAI Conference on Artificial Intelligence. AAAI Press, 2010:1371-1376.
5 孙艳,周学广,付伟.基于主题情感混合模型的无监督文本情感分析[J].北京大学学报:自然科学版,2013,49(1):102-108.
6 赵煜,蔡皖东.一种面向观点挖掘的多粒度话题情感联合模型[J].西安电子科技大学学报:自然科学版,2011,38(3):181-188.
7 闻彬,何婷婷,罗乐,等.基于语义理解的文本情感分类方法研究[J].计算机科学,2010,20(1):261-264.
8 BLEID M, LAFFERTY J D. Dynamic topic models[C]//Proceedings of the 23rd international conference on machine learning.New York:ACM,2006:113-120.
9 WANG XUERUI, ANDREW MCCALLUM. Topics over time:a non-markov continuous-time model of topical trends[C]//Proceedings of International Conference on Knowledge Discovery and Data Mining,2006:424-433.
10 GRIFFITHS T L, STEYVERS M. Finding Scientific topics[J].Proceedings of the national academy of sciences,2004,101(sl):5228-5235.
11 ALSUMAITL, BARBARA D, DOMENICONI C. Online LDA:adaptive topic models for mining text streams with applications to topic detection and tracking[C]//Proceedings of the 8th IEEE international Conference on data mining(ICDM’08).Italy:IEEE,2008:3-12.
12 黄卫东,林萍,董怡,等.基于话题特征词的网络舆情参与者情感演化分析[J].情报杂志,2015,34(11):117-122.
13 李超雄,黄发良,温肖谦,等.基于动态主题情感混合模型的微博主题情感演化分析方法[J].计算机应用,2015,35(10):2905-2910.
14 李慧,胡云凤.基于动态情感主题模型的在线评论分析[J].现代图书情报技术,2017,1(19):74-82.
15 刘玉文,郭强,吴宣够,等.基于TSSCM模型的新闻舆情演化识别[J].情报杂志,2017,36(2):115-121.
16 刘怀军,车万翔,刘挺.中文语义角色标注的特征工程[J].中文信息学报,2007,21(1):79-84.
17 Yih W T, Goodman J, Carvalho V R. Finding advertising keywords on web pages[C]//International Conference on World Wide Web, WWW 2006, Edinburgh, Scotland, Uk,May. DBLP, 2006:213-222.
18 Salton G, McGill M J. Introduction to modern information retrieval[M].New York:McGraw-Hill Book Co,1983.
19 MacQueen J. Some Models for Classification and Analysis of Multivariate Observations[C]//In:Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probaility.Berkeley,University of California Press,1967:281-297.
20 Rezaee M R, Lelieveldt B P, Reiber J H. A New Cluster Validity Index for the Fuzzy C-means[J].Pattern Recognition Letters,1998,19(3-4):237-246.
21 Turney P D, Littman M L. Unsupervised Learning of Semantic Orientation from a H-undred-Billion-Word Corpus[R].National Research Council of Can-ada,Tech.Rep:EGB-1094,2002.