摘要
[目的/意义]文章利用规划文本和论文数据两种数据源,提出一种基于多维度划分的细主题粒度研究前沿识别方法。[方法/过程]获取碳纳米管研究领域相关规划文本和论文,将两种数据源分别划分为理论创新、实际应用和风险管理三个维度,在各个维度内利用主题模型和回溯原文方法进行细粒度主题抽取,通过不同数据源不同维度细粒度主题对比分析,识别相应研究前沿主题。[结果/结论]实验结果表明该方法可以更有效地识别出细分领域科学研究前沿主题。
[Purpose/significance] Using the data sources of planning texts and paper data,this paper proposes a method to identify the fine-grained topics of research fronts based on multidimensional partition. [Method/process] First,the paper obtains the planning texts and papers which are related to the carbon nanotubes and divides the two data sources into three dimensions respectively,namely theoretical innovation,practical application,and risk management. The topic model and original text retrieval method are used in each dimension to extract the fine-grained topics. Through the comparison and analysis of different fine-grained topics in different dimensions from different data sources,the corresponding front topics are identified. [Result/conclusion] The empirical study shows that this method can effectively identify the fine-grained topics of research fronts in the subdivision fields.
引文
[1]邱均平.信息计量学(一)[J].情报理论与实践,2000,23(1):75-80.
[2]王效岳,白如江.海量网络学术文献自动分类技术研究[M].北京:人民出版社,2015:40-42.
[3]PRICE D J D S.Networks of scientific papers[J].Science,1965,149(3683):510.
[4]白如江,冷伏海,廖君华.科学研究前沿探测主要方法比较与发展趋势研究[J].情报理论与实践,2017,40(5):33-38.
[5]SMALL H.Co-citation in the scientific literature:a new measure of the relationship between two documents[J].Journal of the Association for Information Science&Technology,1973,24(4):265-269.
[6]MORRIS S A,YEN G,WU Z,et al.Time line visualization of research fronts[J].Journal of the Association for Information Science&Technology,2003,54(5):413-422.
[7]GARFIELD E.Historiographic mapping of knowledge domains literature[J].Journal of Information Science,2004,30(2):119-145.
[8]白如江,冷伏海,廖君华.一种基于多数据源主题对比的科学研究前沿识别方法[J].情报理论与实践,2017,40(8):43-48,36.
[9]CHEN C.Searching for intellectual turning points:progressive knowledge domain visualization.[J].Proceedings of the National Academy of Sciences of the United States of America,2004,101(Sl):5303.
[10]CALLON M,COURTIAL J J P,TURNER W A,et al.From translations to problematic networks-an introduction to co-word analysis.Soc Sci Inf Sur Les Sci Soc[J].Social Science Information,1983,22(2):191-235.
[11]SWANSON D R.Undiscovered public knowledge[J].Library Quarterly,1986,56(2):103-118.
[12]ZHANG J,SONG Y,ZHANG C,et al.Evolutionary hierarchical dirichlet processes for multiple correlated time-varying corpora[C]//ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,Washington,Dc,Usa,July.DBLP,2010:1079-1088.
[13]祝清松,冷伏海.引文内容分析方法研究综述[J].情报资料工作,2013,(05):39-43.
[14]米黑尔·罗科,查德·米尔金,马克·赫尔萨姆.面向2020年社会需求的纳米科技研究[M].北京:科学出版社,2014.
[15]陈炘钧,米黑尔·罗科.纳米科技创新与知识图谱:世界纳米科技专利与文献分析[M].北京:科学出版社,2013:11-13.
[16]彭敏,黄佳佳,朱佳晖,等.基于频繁项集的海量短文本聚类与主题抽取[J].计算机研究与发展,2015,52(9):1941-1953.