用户名: 密码: 验证码:
作者主题模型及其改进的方法与应用研究综述
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A Review of Methods and Applications for Author-Topic Model and Its Improved Models
  • 作者:徐涵 ; 刘小平
  • 英文作者:Xu Han;Liu Xiaoping;National Science Library, Chinese Academy of Sciences;Department of Library, Information and Archives Management, School of Economics and Management,University of Chinese Academy of Sciences;
  • 关键词:作者主题模型 ; 主题演化 ; 社区发现 ; 模型评估
  • 英文关键词:Author-Topic model;;topic evolution;;community detection;;model evolution
  • 中文刊名:TSQB
  • 英文刊名:Library and Information Service
  • 机构:中国科学院文献情报中心;中国科学院大学经济与管理学院图书情报与档案管理系;
  • 出版日期:2019-04-05
  • 出版单位:图书情报工作
  • 年:2019
  • 期:v.63;No.620
  • 语种:中文;
  • 页:TSQB201907021
  • 页数:11
  • CN:07
  • ISSN:11-1541/G2
  • 分类号:136-146
摘要
[目的/意义]作者主题模型作为近年来计算机领域关注度较高的新型概率模型,在文本挖掘与自然语言处理等方向已有广泛应用。分析国内外作者主题模型及其改进的思路与应用,更好地把握其研究现状,以期为计算机、图书情报等相关领域科研人员提供参考。[方法/过程]本文选取Web of Science核心数据库、DBLP及中国知网(CNKI)数据库作为文献来源,通过制定检索规则、去重及人工判读等操作提炼出关于作者主题模型及其改进方法的文献集,从模型应用过程的视角,结合文献分析法对现有研究进行总结归纳。[结果/结论]通过分析发现,现有相关研究已形成较为完整的分析流程,且模型的改进角度、适用领域也日益多样化。但性能优化、模型评价指标的规范完善以及在图书情报领域的进一步应用等方面仍有待深入探索。
        [Purpose/significance] Author-Topic model, as a new probabilistic model which has a high degree of attention in computer science, has been widely applied in text mining, natural language processing and other fields in recent years. This paper analyzes the ideas and applications of AT model and its improved models to grasp its research status and provide reference and ideas for researchers in computer science, library and information science or some other related fields. [Method/process] Using data sets on Web of Science Core Collection, DBLP and CNKI(China Academic Journals Full-text Database), a literature collection on Author-Topic model and its improved models is constructed through the establishment of retrieval rules, data de-duplication, artificial judgment and other operations. This paper summarizes the existing research based on literature analysis method from the perspective of the application process of the model. [Result/conclusion] The results show that the existing related research has formed a comparatively complete analysis process and the improvement angle and application area of the models are increasingly diversified. However, some problems, such as performance optimization, standardization and perfection and further application in the field of library and information science, still need to be explored in depth.
引文
[1] STEYVERS M,SMYTH P,ROSEN-ZVI M,et al.Probabilistic author-topic models for information discovery[C]// The tenth ACM SIGKDD international conference on knowledge discovery and data mining.Seattle,Washington:ACM,2004:306-315.
    [2] 骆国靖.基于主题模型的模块化网络和社区挖掘研究[D].杭州:浙江大学,2008.
    [3] TANG J,ZHANG J,YAO L,et al.ArnetMiner:extraction and mining of academic social networks[C]// ACM SIGKDD international conference on knowledge discovery and data mining.Henderson:ACM,2008:990-998.
    [4] 王燕鹏.国内基于主题模型的科技文献主题发现及演化研究进展[J].图书情报工作,2016,60(3):130-137.
    [5] 吴良,黄威靖,陈薇,等.ACT-LDA:集成话题、社区和影响力分析的概率模型[J].计算机科学与探索,2013,7(8):718-728.
    [6] WANG T,HUANG Z,GAN C.On mining latent topics from healthcare chat logs[J].Journal of biomedical informatics,2016,61(C):247-259.
    [7] MORCHID M,BOUAZIZ M,KHEDER W B,et al.Spoken language understanding in a latent topic-based subspace[C]// International symposium on computer architecture.San Francisco:Springer,2016:710-714.
    [8] LEE M,HUANG R,TONG W.Discovery of transcriptional targets regulated by nuclear receptors using a probabilistic graphical model[J].Toxicological sciences,2016,150(1):64-73.
    [9] XUAN J,LU J,ZHANG G,et al.Infinite author topic model based on mixed gamma-negative binomial process[C]// IEEE international conference on data mining.Atlantic City:IEEE,2016:489-498.
    [10] 陈霄咚.生物医学领域的专家寻找研究[D].上海:复旦大学,2013.
    [11] NEWMAN D,KARIMI S,CAVEDON L.Using topic models to interpret medline’s medical subject headings[C]// Australasian joint conference on advances in artificial intelligence.Berlin,Heidelberg:Springer,2009:270-279.
    [12] CHAIWANAROM P,ICHISE R,LURSINSAP C.Finding potential research collaborators in four degrees of separation[C]// Advanced data mining and applications,international conference,ADMA 2010.Chongqing:DBLP,2010:399-410.
    [13] ICHISE R,FUJITA S,MURAKI T,et al.Research mining using the relationships among authors,topics and papers[C]// International conference information visualization.Zurich:IEEE Computer Society,2007:425-430.
    [14] MOU H,GENG Q,JIN J,et al.An author subject topic model for expert recommendation[M]//Information retrieval technology.Cham:Springer International Publishing,2015:83-95.
    [15] 薛维.基于非对称先验的作者主题模型[D].杭州:浙江大学,2011.
    [16] KIM J,KIM D,OH A.Joint modeling of topics,citations,and topical authority in academic corpora[J].Transactions of the Association for Computational Linguistics,2017,5(8):191-204.
    [17] MAO J,CAO Y,LU K,et al.Topic scientific community in science:a combined perspective of scientific collaboration and topics[J].Scientometrics,2017,112(2):851-875.
    [18] 吴钟刚,吕钊.一种基于局部相似性的社区发现算法[J].计算机工程,2016,42(12):196-203.
    [19] 关鹏,王曰芬.学科领域生命周期中作者研究兴趣演化分析[J].图书情报工作,2016,60(19):116-124.
    [20] JEONG Y S,LEE S H,GWEON G.Discovery of research interests of authors over time using a topic model[C]// International conference on big data and smart computing.New York:IEEE Computer Society,2016:24-31.
    [21] FENG S,CAO J,CHEN Y,et al.A model for discovering unpopular research interests[C]// International conference on knowledge science,engineering and management.Cham:Springer,2015:382-393.
    [22] KUZNETSOV A,KYPRIANOU A E,PARDO J C.Analyzing topics and authors in chat logs for crime investigation[J].Knowledge & information systems,2014,39(2):351-381.
    [23] CHEN C,REN J.Forum Latent Dirichlet Allocation for user interest discovery[J].Knowledge-based systems,2017,126(C):1-7.
    [24] LIU S N,LIU C,PENG Z,et al.Mining individual learning topics in course reviews based on author topic model[J].International journal of distance education technologies,2017,15(3):1-14.
    [25] YANG T,COMAR P M,XU L.Community detection by popularity based models for authored networked data[C]// IEEE/ACM international conference on advances in social networks analysis and mining.New York:IEEE,2013:74-81.
    [26] MORCHID M,PORTILLA Y,JOSSELIN D,et al.An author-topic based approach to cluster tweets and mine their location[J].Procedia environmental sciences,2015,27(7):26-29.
    [27] MUKHERJEE S,BASU G,JOSHI S.Joint author sentiment topic model[C]// Computer security applications conference.New York:IEEE,2014:90-98.
    [28] 李春山.面向社会化媒体内容的若干聚类算法研究[D].哈尔滨:哈尔滨工业大学,2014.
    [29] 范长俊.基于信息交互网络的个体角色识别方法研究[D].长沙:国防科学技术大学,2015.
    [30] 罗杰斯.结合用户及地理信息的图像主题建模[D].杭州:浙江大学,2012.
    [31] CHENG K,ZHAN Y,QI M.AL-DDCNN:a distributed crossing semantic gap learning for person re-identification[J].Concurrency & computation practice & experience,2017,29(3):1-16.
    [32] MORCHID M,DUFOUR R,BOUALLEGUE M,et al.Author-topic based representation of call-center conversations[C]// Spoken language technology workshop.New York:IEEE,2014:218-223.
    [33] MORCHID M,DUFOUR R,LINARèS G,et al.Latent topic model based representations for a robust theme identification of highly imperfect automatic transcriptions[J].Lecture notes in computer science,2015,9042(2) :596-605.
    [34] LUO W,LI H,LIU G,et al.Semantic annotation of satellite images using author-genre-topic model[J].IEEE transactions on geoscience & remote sensing,2013,52(2):1356-1368.
    [35] ZHU X D,YAO Y,LIU Z J,et al.Activity clustering for online anomaly detection[J].Journal of computers,2011,6(6):441-453.
    [36] 史庆伟,乔晓东,徐硕,等.作者主题演化模型及其在研究兴趣演化分析中的应用[J].情报学报,2013,32(9):912-919.
    [37] 刘智超,卢美莲.基于混合模型的学术论文推荐方法[EB/OL].[2019-01-18].http://www.paper.edu.cn/releasepaper/content/201411-282.
    [38] WANG J,HU X,TU X,et al.Author-conference topic-connection model for academic network search[C]// ACM international conference on information and knowledge management.New York:ACM,2012:2179-2183.
    [39] HA J K,AN J Y,YOO K J,et al.Exploring the leading authors and journals in major topics by citation sentences and topic modeling[C]// BIRNDL 2016 joint workshop on bibliometric-enhanced information retrieval and NLP for digital libraries.New York:ACM,2016:42-50.
    [40] 李杰,王小伟.基于作者主题模型的遥感图像自动类别标注方法[J].计算机应用与软件,2013,26(10):263-265.
    [41] LUO W,LI H,LIU G,et al.Global salient information maximization for saliency detection[J].Signal processing image communication,2012,27(3):238-248.
    [42] LUO W,LI H,LIU G.Automatic annotation of multispectral satellite images using author-topic model[J].IEEE geoscience & remote sensing letters,2012,9(4):634-638.
    [43] BERTOLERO M A,YEO B T,D’ESPOSITO M.The modular and integrative functional architecture of the human brain[J].Proceedings of the National Academy of Sciences of the United States of America,2015,112(49):798-807.
    [44] 王永贵,张旭,任俊阳,等.结合微博关注特性的UF_AT模型用户兴趣挖掘研究[J].计算机应用研究,2015,32(7):1982-1985.
    [45] 王永贵,张丰田,刘雨诗,等.微博中结合转发特性的用户兴趣话题挖掘方法[J].计算机应用研究,2017,34(7):2068-2071.
    [46] 李敬,印鉴,刘少鹏,等.基于话题标签的微博主题挖掘[J].计算机工程,2015,41(4):30-35.
    [47] 王萍.网络环境下的领域知识挖掘[D].上海:华东师范大学,2010.
    [48] ZHONG Y,FAN Y,TAN W,et al.Web service recommendation with reconstructed profile from mashup descriptions[J].IEEE transactions on automation science & engineering,2016,15(2):468-478.
    [49] TANG J,ZHANG J.Modeling the evolution of associated data[J].Data & knowledge engineering,2010,69(9):965-978.
    [50] YANG M,MEI J,XU F,et al.Discovering author interest evolution in topic modeling[J].2016,21(7):801-804.
    [51] ZHOU X,SHUNXIANG W U,ZHOU X,et al.The biterm author topic in the sentences model for e-mail analysis[J].IEICE transactions on information & systems,2017,100(8):1852-1859.
    [52] MCCALLUM A,WANG X,CORRADA-EMMANUEL A.Topic and role discovery in social networks with experiments on enron and academic email[J].Journal of artificial intelligence research,2010,30(2):249-272.
    [53] KANG I S,NA S H,LEE S,et al.On co-authorship for author disambiguation[J].Information processing & management an international journal,2009,45(1):84-97.
    [54] ZHANG R,SHEN D,KOU Y,et al.Author name disambiguation for citations on the deep web[J].Lecture notes in computer science,2010,6185(10):198-209.
    [55] 郑威杰.科技文献作者消歧方法研究[D].杭州:杭州电子科技大学,2017.
    [56] 陈永恒,左万利,林耀进.作者标签主题模型在科技文献中的应用[J].计算机应用,2015,35(4):1001-1005.
    [57] LI C S,YE Y M,ZHANG X F.TPS:an unsupervised web page segmentation algorithm based on DOM tree structure mining[J].Information,2012,1(15):387-394.
    [58] 桂小庆,张俊,张晓民,等.时态主题模型方法及应用研究综述[J].计算机科学,2017,44(2):46-55.
    [59] DUFOUR R,MORCHID M,PARCOLLET T.Tracking dialog states using an Author-Topic based representation[C]// Spoken language technology workshop.New York:IEEE,2017:544-551.
    [60] LI D C,OKAMOTO J,LEISCHOW S,et al.An author topic analysis of tobacco regulation investigators[C]// Pacific-Asia conference on knowledge discovery and data mining.Cham :Springer,2014:616-627.
    [61] 亓晓青,景晓军.应用于微博的LDA模型改进[EB/OL].[2019-01-18].http://www.paper.edu.cn/releasepaper/content/201212-118.
    [62] 孙国超,徐硕,乔晓东.AToT 模型可视化工具开发[J].情报工程,2016,2(4):20-29.
    [63] XU S,SHI Q,QIAO X,et al.A dynamic users’ interest discovery model with distributed inference algorithm[J].International journal of distributed sensor networks,2014,2014(1):1-11.
    [64] HO T,DO P.Analyzing users’ interests with the temporal factor based on topic modeling[C]// Asian conference on intelligent information and database systems.Cham:Springer International Publishing,2015:105-115.
    [65] WANG X,MCCALLUM A.Topics over time:a non-Markov continuous-time model of topical trends[C]// ACM SIGKDD international conference on knowledge discovery and data mining.New York:ACM,2006:424-433.
    [66] 杨如意.基于主题模型的文本语义挖掘[D].西安:西安电子科技大学,2015.
    [67] 杨如意,刘东苏,李慧.一种融合外部特征的改进主题模型[J].现代图书情报技术,2016,32(1):48-54.
    [68] 余传明,左宇恒,郭亚静,等.基于复合主题演化模型的作者研究兴趣动态发现[EB/OL].[2018-05-26].http://kns.cnki.net/kcms/detail/37.1389.N.20180419.1330.002.html.
    [69] LI C S,CHEUNG W K,YE Y,et al.The Author-Topic-Community model:a generative model relating authors’ interests and their community structure[C]// International conference on advanced data mining and applications.Berlin:Springer,2012:753-765.
    [70] LI C,CHEUNG W K,YE Y,et al.The Author-Topic-Community model for author interest profiling and community discovery[J].Knowledge & information systems,2015,44(2):359-383.
    [71] YAN E,DING Y,MILOJEVIC S,et al.Topics in dynamic research communities:an exploratory study for the field of information retrieval[J].Journal of informetrics,2012,6(1):140-153.
    [72] DING Y.Topic-based PageRank on author cocitation networks[J].Journal of the American Society for Information Science and Technology.2011,62(3):449-466.
    [73] YANG J,ZENG J,CHEUNG W K.Multiplex Topic Models[C]// Pacific-Asia conference on knowledge discovery and data mining.Berlin Heidelberg:Springer,2013:568-582.
    [74] 江雨燕,李平,王清,等.融合DSTM和USTM方法的主题模型[J].计算机科学与探索,2014,8(5):630-639.
    [75] YANG Z,HONG L,DAVISON B D.Academic network analysis:a joint topic modeling approach[C]//IEEE/ACM international conference on advances in social networks analysis and mining.New York:IEEE,2014:324-333.
    [76] MCCALLUM A,CORRADA-EMMANUEL A,WANG X.Topic and role discovery in social networks[J].IJCAI,2005,30(2):786-791.
    [77] NAVEED N,SIZOV S,STAAB S.ATT:analyzing temporal dynamics of topics and authors in social media[C]// Proceedings of the 3rd international web science conference.New York:ACM,2011:1-7.
    [78] YANG Q Q,LI W J.The LDA Topic Model Extension Study[J].Logistics engineering,management and computer science,2015,169(15):857-860.
    [79] PODDAR L,HSU W,LEE M L.Author-aware aspect topic sentiment model to retrieve supporting opinions from reviews[C]// Conference on empirical methods in natural language processing.Berlin Heidelberg:Springer,2017:472-481.
    [80] 万路康,章倩雯,谢瑾奎.社会网络下的基于主题概率的影响从众性模型和分析[J].小型微型计算机系统,2017,38(2):277-281.
    [81] SCHNEIDER K M.Weighted average pointwise mutual information for feature selection in text categorization[M]// Knowledge discovery in databases:PKDD 2005.Berlin Heidelberg:Springer,2005:252-263.
    [82] WALLACH H M,MURRAY I,SALAKHUTDINOV R,et al.Evaluation methods for topic models[C]// International conference on machine learning.New York:ACM,2009:1105-1112.
    [83] BUNTINE W.Estimating likelihoods for topic models[M].Advances in machine learning.Berlin Heidelberg:Springer,2009:51-64.
    [84] XIE X,LI L,ZHANG Z,et al.Back-buy prediction based on TriFG[C]// ACM SIGKDD workshop on mining data semantics.New York:ACM,2012:1-8.
    [85] ZHANG T,LUO W.Image quality assessment using author topic model[C]// International Conference on Information Technology & Applications.New York:IEEE,2013:63-65.
    [86] XU Z,RU L,XIANG L,et al.Discovering user interest on twitter with a modified author-topic model[C]// IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology.New York:IEEE,2011:422-429.
    [87] 余攀.基于话题模型的教育领域微博账号萃取[D].武汉:华中师范大学,2017.
    [88] 何力,贾焰,韩伟红,等.基于用户主题模型的微博用户兴趣挖掘[J].中国通信,2014,11(8):131-144.
    [89] 禤良.基于主题模型的企业微博推荐方法研究与实现[D].合肥:安徽大学,2016.
    [90] TU Y,JOHRI N,DAN R,et al.Citation author topic model in expert search[C]//International conference on computational linguistics:Posters.New York:ACM,2010:1265-1273.
    [91] KATARIA S,MITRA P,CARAGEA C,et al.Context sensitive topic models for author influence in document networks[C]// International joint conference on artificial intelligence.California:AAAI,2011:2274-2280.
    [92] YAN L,NICULESCU-MIZIL A,GRYC W.Topic-link LDA:joint models of topic and author community[C]// International conference on machine learning.New York:ACM,2009:665-672.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700