基于多特征融合的财经新闻话题检测研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Study of Financial News Topic Detection Based on Multi-feature Fusion
  • 作者:谭梦婕 ; 吕鑫 ; 陶飞飞
  • 英文作者:TAN Mengjie;Lü Xin;TAO Feifei;College of Computer and Information,Hohai University;
  • 关键词:财经新闻 ; 话题检测 ; 多特征融合 ; 凝聚层次聚类 ; K最近邻
  • 英文关键词:financial news;;topic detection;;multi-feature fusion;;Hierarchical Agglomerative Clustering(HAC);;K-Nearest Neighbor(KNN)
  • 中文刊名:JSJC
  • 英文刊名:Computer Engineering
  • 机构:河海大学计算机与信息学院;
  • 出版日期:2018-08-30 10:07
  • 出版单位:计算机工程
  • 年:2019
  • 期:v.45;No.498
  • 基金:国家重点研发计划(2016YFC0400910);; 国家自然科学基金面上项目(61272543);; NSFC-广东联合基金重点项目(U1301252)
  • 语种:中文;
  • 页:JSJC201903049
  • 页数:8
  • CN:03
  • ISSN:31-1289/TP
  • 分类号:299-305+314
摘要
为辅助投资者在短期内及时发现投资热点,结合财经新闻的特点,提出一种财经新闻话题检测模型。构建基于财经新闻的时间窗切分新闻流,根据新闻文本中的主题事件、特征词、新闻语义及金融命名实体提取文本特征,并应用最近邻-凝聚层次聚类算法获得话题簇。实验结果表明,与传统多特征话题检测模型相比,该模型可有效降低聚类算法运行时间,提高话题检测准确度,且在一定程度上协助投资者进行决策判断。
        In order to help investors find hot spots of investment in a short time,this paper combines the characteristics of the financial news and proposes a financial news topic detection model.The model constructs a time window based on financial news to segment news streams,combines the topic events,feature words,news semantics and financial name entities to extract text features,and applies the Nearest Neighbor-Hierarchical Agglomerative Clustering(NNHAC) algorithm to get the topic clusters.Experimental results show that,compared with tranditional multi-feature topic detection models,this model can effectively reduce the running time of the clustering algorithm,improve the accuracy of topic detection,and to a certain extent,it helps investors to make decision and judgement.
引文
[1] MITCHEL M L,MULHERIN J H.The impact of public information on the stock market[J].The Journal of Finance,1994,49(3):923-950.
    [2] FANG L,PERESS J.Media coverage and the cross-section of stock returns[J].The Journal of Finance,2009,64(5):2023-2052.
    [3] 洪宇,张宇,刘挺.话题检测与跟踪的评测及研究综述[J].中文信息学报,2007,21(6):71-87.
    [4] GUO Q L,LI Y M,TANG Q.The similarity computing of documents based on VSM[C]//Proceedings of the 32nd Annual IEEE International Computer Software and Applications Conference.Washington D.C.,USA:IEEE Press,2008:585-586.
    [5] 陈朔鹰,金镇晟.基于改进的TF-IDF算法的微博话题检测[J].科技导刊,2016,34(2):282-286.
    [6] SCOTT D,SUSAN T D,GEORAGE W F,et al.Indexing by latent semantic analysis[J].Journal of the American Society of Information Science,1990,41(6):391-407.
    [7] HOFMANN T.Probabilistic latent semantic indexing[C]//Proceeding of the 22nd Annual International SIGIR Conference.New York,USA:ACM Press,1999:289-296.
    [8] BLEI D M,NG A Y,JORDAN M I,et al.Latent dirichlet allocation[J].Journal of Machine Learning Research,2003,3:993-1022.
    [9] 贺亮,李芳.基于话题模型的科技文献话题发现和趋势分析[J].中文信息学报,2012,26(2):109-115.
    [10] 刘金硕,彭映月,章岚昕,等.网络食品安全问题话题发现的LDA-K-Means算法[J].武汉大学学报(工科版),2017,50(2):307-310.
    [11] 王少鹏,彭岩,王洁.基于LDA的文本聚类在网络舆情分析中的应用研究[J].山东大学学报(理学版),2014,49(9):129-134.
    [12] 车蕾,杨小平.多特征融合文本聚类的新闻话题发现模型[J].国防科技大学学报,2017,39(3):85-90.
    [13] 郑德荣.新闻热点话题自动发现方法[D].哈尔滨:哈尔滨工业大学,2011.
    [14] 路荣,项亮,刘明荣,等.基于隐主题分析和文本聚类的微博客中新闻话题的发现[J].模式识别与人工智能,2012,25(3):382-387.
    [15] IWATA T,YAMADA T,SAKURAI Y,et al.Online multiscale dynamic topic models[C]//Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York,USA:ACM Press,2010:663-672.
    [16] CHEN C C,CHEN Y T,SUN Y,et al.Life cycle modeling of news events using aging theory[C]//Proceedings of European Conference on Machine Learning.Berlin,Germany:Springer,2003:47-59.
    [17] 蚂蚁软件.2017年度社会热点事件传播特点分析[EB/OL].[2018-01-22].http://www.eefung.com/hot-report/20180122160439.
    [18] 吴平博,陈群秀.基于时空分析的线索性事件的抽取与集成系统研究[J].中文信息学报,2006,20(1):21-28.
    [19] ZHANG Y,CHEN M D,LIU L Z.A review on text mining[C]//Proceedings of the 6th IEEE International Conference on Software Engineering and Service Science.Washington D.C.,USA:IEEE Press,2015:5.
    [20] FAHAD A,ALSHATRI N,TARI Z,et al.A survey of clustering algorithms for big data:taxonomy and empirical analysis[J].IEEE Transactions on Emerging Topics in Computing,2014,2(3):267-279.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700