基于图分析方法和余弦相似性的主题检测研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Topic detection based on graph analytical method and cosine similarity
  • 作者:马长林 ; 程梦丽 ; 王涛
  • 英文作者:MA Chang-lin;CHENG Meng-li;WANG Tao;School of Computer,Central China Normal University;
  • 关键词:主题检测 ; 图分析方法 ; 余弦相似性
  • 英文关键词:topic detection;;graph analytical method;;cosine similarity
  • 中文刊名:JSJK
  • 英文刊名:Computer Engineering & Science
  • 机构:华中师范大学计算机学院;
  • 出版日期:2019-04-15
  • 出版单位:计算机工程与科学
  • 年:2019
  • 期:v.41;No.292
  • 基金:国家自然科学基金(61003192)
  • 语种:中文;
  • 页:JSJK201904019
  • 页数:5
  • CN:04
  • ISSN:43-1258/TP
  • 分类号:138-142
摘要
如何从海量文本中自动提取有价值的主题信息已成为重要的技术挑战,当下的研究方法大多数是在假设主题相互独立的前提下进行的,但实际上主题与主题之间有着复杂的内在联系。为解决以上问题,将相关性理论与改进的图分析方法相结合,基于主题相关性和术语共现性对主题检测进行建模,高精度语义信息和潜在共现关系同时被用于主题检测,来发现重要且有意义的主题和趋势,仿真实验验证了本文模型的有效性。
        How to automatically extract valuable topic information from massive texts has become an important technical challenge.Currently,most methods carry out their research under the assumption that topics are independent.However,there are complicated inherent relationships between topics.In order to solve the abovementioned problem,we combine the correlated theory with an improved graph analytical approach to model topic detection based on topic correlation and term co-occurrence.Semantic information with high accuracy and potential co-occurrence relationship are simultaneously considered for topic detection to discover important and meaningful topics and trends.Simulation results verify the validity of the proposed model.
引文
[1]Allan J.Topic detection and tracking:Event-based information organization[M].New York:Springer,2002.
    [2]Sakaki T,Okazaki M,Matsuo Y.Tweet analysis for real-time event detection and earthquake reporting system development[J].IEEE Transactions on Knowledge and Data Engineering,2013,25(4):919-931.
    [3]Sabbah T,Selamat A,Selamat M H,et al.Hybridized termweighting method for dark web classification[J].Neurocomputing,2015,173(P3):1908-1926.
    [4]Ding W Y,Chen C M.Dynamic topic detection and tracking:A comparison of HDP,C-word,and cocitation methods[J].Journal of the Association for Information Science and Technology,2014,65(10):2084-2097.
    [5]Suominen A,Toivanen H.Map of science with topic modeling:Comparison of unsupervised learning and human-assigned subject classification[J].Journal of the Association for Information Science and Technology,2016,67(10):2464-2476.
    [6]Yoon J,Kim K.Trend preceptor:A property-function based technology intelligence system for identifying technology trends from patents[J].Expert Systems with Applications:An International Journal,2012,39(3):2927-2938.
    [7]Hofmann T.Probabilistic latent semantic analysis[C]∥Proc of the 15th Conference on UAI,1999:289-296.
    [8]Blei D M,Ng A Y,Jordan M I.Latent dirichlet allocation[J].Journal of Machine Learning Research,2003,3(1):993-1022.
    [9]Zhao W X,Jiang J,Yan H F,et al.Jointly modeling aspects and opinions with a MaxEnt-LDA hybrid[C]∥Proc of the2010Conference on Empirical Methods in Natural Language Processing:Association for Computational Linguistics,2010:56-65.
    [10]Jo Y,Oh A.Aspect and sentiment unification model for online review analysis[C]∥Proc of the 4th ACM International Conference on Web Search and Data Mining(2011),2011:815-824.
    [11]Li F T,Huang M L,Zhu X Y.Sentiment analysis with global topics and local dependency[C]∥Proc of the 24th AAAIConference on Artificial Intelligence,2010:1371-1376.
    [12]Michal R Z,Chemudugunta C,Griffiths T,et al.Learning author-topic models from text corpora[J].ACM Transactions on Information Systems(TOIS),2010,28(1):Article No.4.
    [13]Zhang C,Wang H,Cao L L,et al.A hybrid term-term relations analysis approach for topic detection[J].KnowledgeBased Systems,2015,93:109-120.
    [14]Sayyadi H,Raschid L.A graph analytical approach for topic detection[J].ACM Transactions on Internet Technology,2013,13(4):Article No.4.
    [15]Wang H,Xu F J,Hu X H,et al.IdeaGraph:A graph-based algorithm of mining latent information for human cognition[C]∥Proc of 2013IEEE International Conference on Systems,Man and Cybernetics,2013:952-957.
    [16]Xu W,He J,Mao B,et al.TIDM:Topic-specific information detection model[J].Procedia Computer Science,2017,122:229-236.
    [17]Mnih A,Hinton G.A scalable hierarchical distributed language model[C]∥Proc of the 21st International Conference on Neural Information Processing Systems,2008:1081-1088.
    [18]Queyroi F,Beauguitte L,Pecout H.RSS flows,world structure&community detection[C]∥Proc of European Colloquium of Theoretical and Quantitative Geography,2015:112-131.
    [19]Gadek G,Pauchet A,Malandain N,et al.Topical cohesion of communities on Twitter[J].Procedia Computer Science,2017,112:584-593.
    [20]Yang J,Mcauley J,Leskovec J.Community detection in networks with node attributes[C]∥Proc of IEEE International Conference on Data Mining,2014:1151-1156.
    [21]Blei D M,Lafferty J D.Correlated topic models[C]∥Proc of the 18th International Conference on Neural Information Processing Systems,2005:147-154.
    [22]Blei D M,Lafferty J D.A correlated topic model of science[J].The Annals of Applied Statistics,2007,1(1):17-35.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700