用户名: 密码: 验证码:
学术文本关键词语义功能数据集构建与分析——以Journal of Informetrics为例
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Construction and Analysis of Semantic Data Sets for Academic Text Keywords
  • 作者:刘智锋 ; 李信 ; 程齐凯 ; 陆伟
  • 英文作者:LIU Zhifeng;LI Xin;CHENG Qikai;LU Wei;
  • 关键词:语义功能 ; 关键词 ; 学术文本 ; 信息计量学 ; 标准化数据集
  • 英文关键词:semantic functions;;keywords;;academic texts;;informetrics;;standardized data sets
  • 中文刊名:TSGL
  • 英文刊名:Library Tribune
  • 机构:武汉大学信息管理学院;
  • 出版日期:2019-01-30 09:41
  • 出版单位:图书馆论坛
  • 年:2019
  • 期:v.39;No.243
  • 基金:国家自然科学基金青年项目“基于深度语义挖掘的引文推荐多样化研究”(项目编号:71704137);国家自然科学基金项目“面向词汇功能的学术文本语义识别与知识图谱构建”(项目编号:71473183)研究成果;; 国家社会科学基金重大项目“基于认知计算的学术论文评价理论与方法研究”(项目编号:17ZDA292)
  • 语种:中文;
  • 页:TSGL201907008
  • 页数:11
  • CN:07
  • ISSN:44-1306/G2
  • 分类号:68-78
摘要
文章制定信息计量学领域关键词语义功能分类框架,并基于该分类框架构建信息计量学领域关键词语义功能标注数据集,为学术文本语义分析与理解相关研究提供理论基础和数据支撑,同时对数据集进行分析,为该数据集的应用作初步的探索。阐述了学术文本关键词语义功能及其相关的研究进展,在此基础之上构建信息计量学领域关键词语义功能分类框架;选取Journal of Informetrics (JOI)作为标注数据源,构建语义功能标注数据集;对标注数据集进行描述性分析,并从不同的语义功能角度出发,对信息计量学领域的研究现状进行分析。结果表明,文章构建了一个包含693篇论文,3312个关键词的关键词语义功能标注数据集;经分析可知,其中占比最大的语义功能为研究主题,其次为研究方法;此外,从不同的语义功能角度出发,能够细粒度地分析信息计量学领域的研究现状。
        Based on the classification framework of semantic functions of keywords in the field of informetrics,a semantically annotated data set of keywords can be constructed. The data set provides the theoretical basis and data support for related research on semantic analysis and understanding of academic texts. It can be further analyzed for making a preliminary exploration of data set applications. The semantic functions of keywords in academic texts and the progress of related research can also be deliberated. On this basis,the Journal of Informetrics is selected as the data source for constructing the data set. An annotated data set containing 693 papers and 3, 312 keywords is constructed. By analyzing the semantic functions of data set,the status of research in the field of informetrics is identified. It reveals that the largest proportion of semantic functions are about research topics and followed by research methods. From the perspective of semantic functions,the research status of informetrics can be analyzed in a precise way.
引文
[1]Fortunato S,Bergstrom C T,B觟rner K,et al.Science of science[J].Science,2018,359(6379):eaao0185.
    [2]胡昌平,陈果.科技论文关键词特征及其对共词分析的影响[J].情报学报,2014(1):23-32.
    [3]刘自强,王效岳,白如江.多维度视角下学科主题演化可视化分析方法研究---以我国图书情报领域大数据研究为例[J].中国图书馆学报,2016(6):67-84.
    [4]刘自强,王效岳,白如江.语义分类的学科主题演化分析方法研究--以我国图书情报领域大数据研究为例[J].图书情报工作,2016(15):76-85,93.
    [5]Gildea,Daniel,Jurafsky,Daniel.Automatic labeling of semantic roles[J].Computational Linguistics,2002,28(28):245-288.
    [6]KONDO T,NANBA H,TAKEZAWA T,et al.Technical trend analysis by analyzing research papers’titles[M].Human language technology:challenges for computer science and linguistics.Berlin:Springer,2009:512-521.
    [7]NANBA H,KONDO T,TAKEZAWA T.Automatic creation of a technical trend map from research papers and patents[C]//Pa IR'10 Proceedings of the 3rd international workshop on Patent information retrieval.New York:ACM,2010:11-16.
    [8]GUPTA S,MANNING C D.Analyzing the dynamics of research by extracting key aspects of scientific papers[C]//Proceedings of the 5th International Joint Conference on Natural Language Processing.Thailand:The Association for Computational Linguistics,2011:1-9.
    [9]Augenstein I,Das M,Riedel S,et al.Sem Eval 2017Task10:Science IE-Extracting KeyphrasesandRelations from Scientific Publications[C]//Proceedings of the 11th International Workshop on Semantic Evaluations.Vancouver:The Association for Computational Linguistics,2017:546-555.
    [10]Tsai C T,Kundu G,Roth D.Concept-based analysis of scientific literature[C]//Proceedings of the 22nd ACMinternational conference on Information&Knowledge Management.New York:ACM,2013:1733-1738.
    [11]Dan S,Agarwal S,Singh M,et al.Which techniques does your application use?:An information extraction framework for scientific articles[EB/OL].[2018-10-19].https://arxiv.org/abs/1608.06386.
    [12]Siddiqui T,Xiang R,Parameswaran A,et al.Facet Gist:Collective Extraction of Document Facets in Large Technical Corpora[C]//Proceedings of the 25th ACM International on Conference on Information and Knowledge Management.New York:ACM,2016:871-880.
    [13]Mesbah S,Fragkeskos K,Lofi C,et al.Facet Embeddings for Explorative Analytics in Digital Libraries[C]//International Conference on Theory and Practice of Digital Libraries.Greece:Springer,2017:86-99.
    [14]Heffernan K,Teufel S.Identifyingproblemsandsolutions in scientific text[J].Scientometrics,2018(1):1-16.
    [15]程齐凯.学术文本的词汇功能识别[D].武汉:武汉大学,2015.
    [16]王芳,史海燕,纪雪梅.我国情报学研究中理论的应用:基于《情报学报》的内容分析[J].情报学报,2015(6):581-591.
    [17]王芳,陈锋,祝娜,等.我国情报学理论的来源、应用及学科专属度研究[J].情报学报,2016(11):1148-1164.
    [18]王芳,祝娜,翟羽佳.我国情报学研究中混合方法的应用及其领域分布分析[J].情报学报,2017(11):1119-1129.
    [19]陈锋,翟羽佳,王芳.基于条件随机场的学术期刊中理论的自动识别方法[J].图书情报工作,2016(2):122-128.
    [20]赵洪,王芳.理论术语抽取的深度学习模型及自训练算法研究[J].情报学报,2018(9):923-938.
    [21]章成志,丁睿祎,王玉琢.基于学术论文全文内容的算法使用行为及其影响力研究[J].情报学报,2018(12):1175-1187.
    [22]王玉琢,章成志.考虑全文本内容的算法学术影响力分析研究[J].图书情报工作,2017(23):6-14.
    [23]Isabelle Augenstein.Sem Eval 2017 Task 10:Extracting Keyphrases and Relations from Scientific Publications[EB/OL].[2019-01-07].https://scienceie.github.io/index.html.
    [24]Teufel S,Siddharthan A,Dan T.An annotation scheme for citation function[C]//Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue.New York:ACM,2006:80-87.
    [25]Chu H.Research methods in library and information science:A content analysis[J].Library&Information Science Research,2015,37(1):36-41.
    [26]Hider P,Pymm B.Empirical research methods reported in high-profile LIS journal literature[J].Library&Information Science Research,2008,30(2):108-114.
    [27]Zhao M,Yan E,Li K.Data set mentions and citations:A content analysis of full literature[J].Library&Information Science Research,2008,30(2):108-114.
    [28]彭博.合作网络的多角度可视化分析--以Journal of Informetrics合作网络为例[J].信息资源管理学报,2018(1):83-92.
    [29]Galán A D.Textual Interaction:An Introduction to Written Discourse Analysis,by Michael Hoey[J].Atlantis,2002,24(2):259-264.
    [30]Sahragard R,Meihami H.A diachronic study on the information provided by the research titles of applied linguistics journals[J].Scientometrics,2016,108(3):1-17.
    [31]Mesbah S,Fragkeskos K,Lofi C,et al.Facet Embeddings for Explorative Analytics in Digital Libraries[C]//International Conference on Theory and Practice of Digital Libraries.Greece:Springer,2017:86-99.
    [32]Carletta J.Assessing Agreement on Classification Tasks:The Kappa Statistic[J].Computational Linguistics,1996,22(2):249-254.
    [33]Alonso S,Cabrerizo F J,Herrera-Viedma E,et al.h-Index:A review focused in its variants,computation and standardization for different scientific fields[J].Journal of Informetrics,2009,3(4):273-289.
    [34]Sicilia M A,Sánchez-Alonso S,García-Barriocanal E.Comparing impact factors from two different citation databases:The case of Computer Science[J].Journal of Informetrics,2011,5(4):698-704.
    [35]Paji D.On the stability of citation-based journal rankings[J].Journal of Informetrics,2015,9(4):990-1006.
    [36]Abramo G,D’Angelo C A.Ranking research institutions by the number of highly-cited articles per scientist[J].Journal of Informetrics,2015,9(4):915-923.
    [37]Yan E,Guns R.Predicting and recommending collaborations:Anauthor-,institution-,and countrylevel analysis[J].Journal of Informetrics,2014,8(2):295-309.
    [38]Abramo G,D’Angelo C A,Murgia G.Gender differences in research collaboration[J].Journal of Informetrics,2013,7(4):811-822.
    [39]Thelwall M,Wilson P.Distributions for cited articles from individual subjects and years[J].Journal of Informetrics,2014,8(4):824-839.
    [40]Vaccario G,Medo M,Wider N,et al.Quantifying and suppressing ranking bias in a large citation network[J].Journal of Informetrics,2017,11(3):766-782.
    [41]王伟,王丽伟,朱红.国内外信息计量学研究现状和发展趋势[J].医学信息学杂志,2010(2):5-9,39.
    [42]刘丽敏,王晴.近十年国际信息计量学研究足迹与知识结构分析[J].现代情报,2017(8):154-160.
    [43]Bouyssou D,Marchant T.An axiomatic approach to bibliometric rankings and indices[J].Journal of Informetrics,2014,8(3):449-477.
    [44]Wu J.Distributions of scientific funding across universities and research disciplines[J].Journal of Informetrics,2015,9(1):183-196.
    [45]Xiao Y,Lu L Y,Liu J S,et al.Knowledge diffusion path analysis of data quality literature:A main path analysis[J].Journal of Informetrics,2014,8(3):594-605.
    [46]Chen D Z,Huang M H,Hsieh H C,et al.Identifying missing relevant patent citation links by using bibliographic coupling in LED illuminating technology[J].Journal of Informetrics,2011,5(3):400-412.
    [47]Li K,Yan E,Feng Y.How is R cited in research outputs?Structure,impacts,and citation standard[J].Journal of Informetrics,2018,11(4):989-1002.
    [48]王佳敏,李信,刘齐进.全文本文献计量分析学术沙龙综述[J].信息资源管理学报,2018(4):119-125.
    [49]Kim H J,Jeong Y K,Song M.Content-and proximity-based author co-citation analysis using citation sentences[J].Journal of Informetrics,2016,10(4):954-966.
    [50]Jeong Y K,Heo G E,Kang K Y,et al.Trajectory analysis of drug-research trends in pancreatic cancer on Pub Med and Clinical Trials.gov[J].Journal of Informetrics,2016,10(1):273-285.
    [51]Song M,Kim H J,Kim H J.Exploring author name disambiguation on Pub Med-scale[J].Journal of Informetrics,2015,9(4):924-941.
    [52]朱庆华,李亮.社会网络分析法及其在情报学中的应用[J].情报理论与实践,2008(2):179-183,174.
    [53]Cimenler O,Reeves K A,Skvoretz J.An evaluation of collaborative research in a college of engineering[J].Journal of Informetrics,2015,9(3):577-590.
    [54]Nykl M,Je ek K,Fiala D,et al.Page Rank variants in the evaluation of citation networks[J].Journal of Informetrics,2014,8(3):683-692.
    [55]Quentin L.Burrell.Hirsch’s h-index:A stochastic model[J].Journal of Informetrics,2007,1(1):16-25.
    [56]Matja Perc,Zipfurrell.Hirsch's h-index:A stochastic model[J].Journal of Informetrics,2007,1(1):16-25.
    [57]Birkmaier D,Wohlrabe K.The Matthew effect in economics reconsidered[J].Journal of Informetrics,2014,8(4):880-889.
    [58]Chen C,Li Q,Deng Z,et al.The preferences of Chinese LIS journal articles in citing works outside the discipline[J].Journal of Documentation,2017,74(36):99-118.
    [59]王贤文,方志超,胡志刚.科学论文的科学计量分析:数据、方法与用途的整合框架[J].图书情报工作,2015(16):74-82.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700