基于深度挖掘的学术论文关联数据构建与可视化分析

英文篇名：Construction and Visual Analysis of Academic Paper-Linked Data Based on In-depth Mining
作者：曲佳彬 ; 欧石燕 ; 凌洪飞
英文作者：Qu Jiabin;Ou Shiyan;Ling Hongfei;School of Information Management, Nanjing University;Yantai University Library;
关键词：关联数据 ; 可视化分析 ; 学术论文
英文关键词：linked data;;visual analysis;;academic papers
中文刊名：QBXB
英文刊名：Journal of the China Society for Scientific and Technical Information
机构：南京大学信息管理学院;烟台大学图书馆;
出版日期：2019-06-24
出版单位：情报学报
年：2019
期：v.38
基金：国家社会科学基金重点项目“基于关联数据的学术文献内容语义发布及其应用研究”(17ATQ001);; 教育部人文社会科学研究青年项目“面向科学文献内容的知识单元语义关联研究”(18YJC870016)
语种：中文;
页：QBXB201906005
页数：17
CN：06
ISSN：11-2257/G3
分类号：43-59

摘要

自关联数据被提出以来,其已成为在网络上发布结构化数据的主流方式,随着关联数据集的急速增多,如何有效地消费和利用关联数据正成为研究人员关注的焦点。本研究对关联数据的深度挖掘和可视化分析进行了探索。首先,采用文本挖掘技术,深入挖掘地质领域学术论文元数据中的隐含信息;接下来,基于设计的"学术论文-学者"本体模型对学术论文元数据和挖掘出的信息进行语义化表示,以构建RDF关联数据。在此基础上,利用不同的可视化分析方法,从多个维度对学术论文关联数据中蕴含的宏观和微观知识进行可视化展示。结果表明:①基于深度挖掘的学术论文关联数据能够更加深入和全面地展示学术论文元数据中蕴涵的知识;②关联数据可视化分析能够以直观的图形展示关联数据中的宏观和微观知识,帮助用户快速对关联数据进行消费和利用。
Since Linked Data was proposed, it has become the mainstream method of publishing structured data on the Web. With the rapid increase in linked data sets, the effective consumption and utilization of linked data has become the fo‐cus of researchers. This study intended to explore the mining and visual analysis of linked data. Firstly, we conducted indepth mining of implicit information hidden in the metadata of academic papers in the geological field using text mining techniques. We then transformed the metadata and mined information into RDF-based semantic representation to construct the linked data of academic papers based on a newly designed"academic paper-scholar"ontology. On this basis, five visu‐al analysis modules were designed to visualize the macro-and micro-knowledge of academic paper-linked data from multi‐ple perspectives. The results showed that(1) the linked data constructed based on in-depth mining can deeply and compre‐hensively display knowledge hidden in the metadata of academic papers and(2) the visual analysis of linked data can intui‐tively present macro-and micro-knowledge in the form of graphics and thus facilitate users' rapid consumption and utiliza‐tion of linked data.

引文

[1]陈烨,赵一鸣,姜又琦.基于关联数据的知识组织研究述评[J].情报理论与实践,2016,39(2):139-144.
    [2]欧石燕.面向关联数据的语义数字图书馆资源描述与组织框架设计与实现[J].中国图书馆学报,2012,38(6):58-71.
    [3]Glaser H,Millard I,Jaffri A.RKB explorer.com:A knowledge driven infrastructure for linked data providers[C]//Proceedings of the 5th European Conference on the Semantic Web:Research and Applications.Heidelberg:Springer,2008:797-801.
    [4]赵斌.数据可视化在上海图书馆数据展示服务中的应用[J].图书馆杂志,2015,34(2):23-29.
    [5]任瑞娟,濮德敏,张媛.基于五维学术关系发现的知识脉络可视化实践[J].大学图书馆学报,2016,34(1):69-75.
    [6]石泽顺,肖明.基于RelFinder的图情学科关联数据语义关系发现实践[J].图书情报工作,2017,61(17):139-148.
    [7]Javed M,Payette S,Blake J,et al.VIZ-VIVO:Towards visual‐izations-driven linked data navigation[C]//Proceedings of the Second International Workshop on Visualization and Interaction for Ontologies and Linked Data Co-located with the 15th Interna‐tional Semantic Web Conference.Japan,2016:80-92.
    [8]Hu Y,Janowicz K,Mckenzie G,et al.A linked-data-driven and semantically-enabled journal portal for scientometrics[C]//Pro‐ceedings of International Semantic Web Conference.New York:Springer,2013:114-129.
    [9]McKenzie G,Janowicz K,Hu Y G,et al.Linked scientometrics:Designing interactive scientometrics with linked data and seman‐tic web reasoning[C]//Proceedings of the 12th International Se‐mantic Web Conference.Aachen:CEUR-WS.org,2013:53-56.
    [10]Alonen M,Kauppinen T,Suominen O,et al.Exploring the linked university data with visualization tools[M].Heidelberg:Springer,2013:204-208.
    [11]陈涛,夏翠娟,刘炜,等.关联数据的可视化技术研究与实现[J].图书情报工作,2015,59(17):113-119.
    [12]Heim P,Hellmann S,Lehmann J,et al.RelFinder:Revealing rela‐tionships in RDF knowledge bases[C]//Proceedings of the 4th In‐ternational Conference on Semantic and Digital Media Technolo‐gies:Semantic Multimedia.Heidelberg:Springer,2009:182-187.
    [13]洪娜,钱庆,范炜,等.关联数据中关系发现的可视化实践[J].现代图书情报技术,2013,29(2):11-17.
    [14]曲佳彬,欧石燕.基于主题过滤与主题关联的学科主题演化分析[J].数据分析与知识发现,2018,2(1):64-75.
    [15]Blei D M,Lafferty J D.Dynamic topic models[C]//Proceedings of the 23rd International Conference on Machine Learning.New York:ACM Press,2006:113-120.
    [16]曹丽娜,唐锡晋.基于主题模型的BBS话题演化趋势分析[J].管理科学学报,2014,17(11):109-121.
    [17]Mann G S,Mimno D,McCallum A.Bibliometric impact mea‐sures leveraging topic analysis[C]//Proceedings of the 6th ACM/IEEE-CS Joint Conference.New York:ACM Press,2006:65-74.
    (1)https://www.oclc.org/en/worldcat/data-strategy.html
    (2)https://logd.tw.rpi.edu/demo/international_dataset_catalog_search
    (3)http://www.linkedgeodata.org/About
    (4)https://www2.dcc.uchile.cl/cold2016
    (5)http://voila2019.visualdataweb.org
    (6)https://duraspace.org/vivo
    (7)http://semantic-web-journal.com/SWJPortal
    (8)http://linkeduniversities.org/index.html
    (1)ReFinder是一个RDF数据集交互探索工具,通过SPARQL调用远程或本地数据集,能够揭示数据节点间的关联。
    (2)LISTA数据库是EBSCO期刊出版商针对图书馆学、情报学领域的科技文摘数据库。
    (1)LDA(Latent Dirichlet Allocation,潜在狄利克雷)模型是一种基于词袋(bag of words)方法的文本特征表示模型,在识别大规模文档集中潜在的主题时效果良好。
    (2)困惑度(perplexity)是评估LDA模型优劣的一个指标,可理解为对于一篇文献D,所训练出来的模型对文献D属于哪个主题的不确定程度,困惑度越低说明模型的泛化能力越强。
    (3)JS散度也叫JS距离,其可较好地衡量两个概率分布间的距离,因此常用于计算主题模型中各主题间的相似度。
    (1)BIBO是一个用于描述书目信息的本体,网址是http://www.bibliontology.com/
    (2)FOAF是一个用于描述个人和组织的本体,网址是http://www.foaf-project.org/
    (3)VCARD是一个用于描述人和机构的本体,网址是https://www.w3.org/TR/vcard-rdf/
    (4)GEO本体是一个专门用于描述地理位置经纬度信息的本体,网址是https://www.w3.org/2003/01/geo/
    (1)GeoNames本体是目前使用最为广泛地地理关联数据集之一,覆盖了全球所有的国家,包含1100万个地名,其中gn:Feature表示地理特征点,GeoNames网址是http://www.geonames.org

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700