中文百科概念术语服务平台SinoPedia的构建研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:SinoPedia: An Unified Chinese Terminology Service Platform Based on Linked Data
  • 作者:陈涛 ; 刘炜 ; 朱庆华
  • 英文作者:CHEN Tao;LIU Wei;ZHU Qinghua;
  • 关键词:SinoPedia ; 关联数据 ; 知识图谱 ; 数字人文 ; 知识发现
  • 英文关键词:SinoPedia;;Linked Data;;Knowledge graph;;Digital Humanities;;Knowledge discovery
  • 中文刊名:ZGTS
  • 英文刊名:Journal of Library Science in China
  • 机构:上海图书馆/上海科学技术情报研究所;南京大学信息管理学院;
  • 出版日期:2018-07-17 11:09
  • 出版单位:中国图书馆学报
  • 年:2018
  • 期:v.44;No.236
  • 基金:国家社会科学基金重大项目“面向大数据的数字图书馆移动视觉搜索机制及应用研究”(编号:15ZDB126)的研究成果之一~~
  • 语种:中文;
  • 页:ZGTS201804001
  • 页数:15
  • CN:04
  • ISSN:11-2746/G2
  • 分类号:6-20
摘要
随着"数据的网络"的兴起,万维网的内容已不再是纯粹的文本,而是表达和模拟多种事物及事件之间相互关系的实体集合,其中实体名称、属性及取值词表的规范十分重要。国外已形成覆盖广泛的"关联开放数据(LOD)"服务。中文概念术语的缺乏已严重阻碍中文知识图谱和中文领域本体的标准化和推广应用。本文提出的SinoPedia平台采用RDF三元组对目前公共领域的百科概念术语赋予唯一的URI进行资源的持久化,并通过SOOOPA模块提供检索服务。同时,自建的资源词条已与DBPedia、WikiData、上海图书馆人名规范档等多个开放资源做了实体关联。除检索服务外,SinoPedia还提供了关联数据发布服务,可以充当关联数据发布中心(Hub)。通过扩展LODVIEW系统为不同关联数据站点(SPARQL Endpoint)提供统一的关联数据发布和内容协商服务。此外,SinoPedia集成了LODLIVE系统,能够实现不同数据集之间关联数据的发现与融合。目前SinoPedia包括了554万条三元组数据,并提供API接口和SPARQL Endpoint两种数据调用方式,下一步将申请接入LOD云图。SinoPedia将来可以作为数字人文领域的数据链接中心,推动数字人文研究的快速发展。
        With the development of "Web of data", the content of the World Wide Web is no longer purely text but a collection of entities that can express and simulate events and their interrelationships. It is very important to specify entity names, attributes, and vocabularies on the World Wide Web. Europe and the United States have formed extensive Linked Open Data( LOD) services. However, the lack of Chinese conceptual terms has severely hindered the standardization and promotion of ontology in Chinese Knowledge Maps and Chinese domains. The SinoP edia platform proposed in this paper uses RDF triples to assign unique URIs with respect to the current public domain encyclopedia terminology and persist resources. It follows theLinked Data of W3C that will publish the resources by four publishing principles. Moreover, the SinoP edia,acts as a publishing center of resources and can provide Linked data-related services to access external Linked Data sets( SPARQL Endpoint). The SinoP edia is composed of SOOOPA retrieval module,LODVIEW publish module and LODLIVE discovery module. It has been associated with DBPedia,WikiD ata and the Shanghai Librarian Name Authority File using the SOOOPA module to provide search services, and self-built resource entries. SinoP edia can store RDF data using OpenL ink Virtuoso database.The search module of SOOOPA can retrieve words, multi-words, simplified and traditional Chinese characters and resource URIs, which can make intelligent ranking of search results. The retrieval results also give a link to other open resources, and the relevant information of the entries can be seen in other data sources in these results.In addition to these search services, SinoP edia also provides Linked Data publishing services that can act as Linked Data distribution centers( Hubs). The SinoP edia provides a unified RDF data publication and content negotiation service for different Linked Data sites accessed by SPARQL Endpoints. Our platform extends the system of LODVIEW to support SPARQL Endpoint configurations with multiple external data sources. Resources from different sources are re-assigned in SinoP edia to obtain a uniform resource URI address, and, these resources can be redirected to the origin resource. The raw data of this resource are published using the new URI address of SinoP edia platform.The SinoP edia integrates the LODLIVE system to realize the discovery and integration of Linked Data between different resources. The unified publication of different data sets achieves the unity of data syntax layer( RDF structuring). The links of different data sets achieve the unity of the data semantic layer, that is,the integration of multi-source data is realized through association. LODLIVE's Discovery Module displays the Linked Data from different sources in the form of knowledge graph. This Discovery Module also implements semantic extension and knowledge discovery services for resources through correlation.At present, SinoP edia currently contains 5.54 million triplet data that includes people, places and institutions, and 730 000 instances. SinoP edia also provides API interface and SPARQL Endpoint calls.Finally, SinoP edia endpoint will also be registered in the Linked Open Data( LOD) cloud to make up the deficiency of knowledge base of Chinese encyclopedia in the LOD. In the future, SinoP edia can be used as a data link center in the digital humanities field to get more resource information by connecting to SinoP edia, and promote the development of digital humanities research.
引文
[1]Fang Zhijia,Wang Haofen,Gracia J,et al.Zhishi.lemon:on publishing Zhishi.me as linguistic Linked Open Data[C]//ISWC 2016.Lecture Notes in Computer Science,vol 9982,2016:47-55.
    [2]Bo Xu,Yong Xu,Jiaqing Liang,et al.CN-DBpedia:a never-ending Chinese knowledge extraction system[C]//International Conference on Industrial,Engineering and Other Applications of Applied Intelligent Systems,IEA/AIE 2017:428-438.
    [3]Heath T,Bizer C.Linked Data:evolving the Web into a global data space[J].Synthesis Lectures on the Semantic Web:Theory and Technology,2011,1(1):1-136.
    [4]Berners-Lee T.Linked Data[EB/OL].(2009-06-18)[2018-06-20].https://www.w3.org/Design Issues/Linked Data.html.
    [5]W3C Working Group.Best practices for publishing Linked Data[EB/OL].(2014-01-09)[2018-06-15].https://www.w3.org/TR/ld-bp/.
    [6]Marjit U,Sharma K,Biswas U.Discovering resume information using Linked Data[J].International Journal of Web&Semantic Technology(IJWes T),2012,3(2):51-62.
    [7]沈志宏,刘筱敏,郭学兵,等.关联数据发布流程与关键问题研究[J].中国图书馆学报,2013(3):53-62.(Shen Zhihong,Liu Xiaomin,Guo Xuebing,et al.A research on publishing workflow and key issues of Linked Data:experience with publishing scientific literature and scientific data as Linked Data[J].Journal of Library Science in China,2013(3):53-62.)
    [8]游毅.面向馆藏数据库的关联数据发布研究[J].国家图书馆学刊,2014(5):74-81.(You Yi.Research on collection database-oriented Linked Data publishing pattern[J].Journal of the National Library of China,2014(5):74-81.)
    [9]王忠义,周杰,黄京.数字图书馆多粒度关联数据的创建与发布[J].情报学报,2016,35(8):885-896.(Wang Zhongyi,Zhou Jie,Huang Jing.The creating and publishing of multi-granularity Linked Data for the digital library resources[J].Journal of the China Society for Scientific and Technical Information,2016,35(8):885-896.)
    [10]牛永骎,常娥.基于D2R发布学者关联数据集探究---以图书情报领域为例[J].图书情报工作,2017,61(19):13-21.(Niu Yongqin,Chang E.Research on publishing scholar repository Linked Data based on D2R[J].Library and Information Service,2017,61(19):13-21.)
    [11]杨萌.基于Drupal发布学者知识库关联数据的研究[J].图书馆研究,2015(5):22-26.(Yang Meng.Research on publishing scholar repository Linked Data based on Drupal[J].Library Research,2015(5):22-26.)
    [12]白林林,祝忠明.基于Drupal的中文古籍书目关联数据发布研究[J].图书情报工作,2017,61(4):123-129.(Bai Linlin,Zhu Zhongming.Research on publishing Chinese ancient books bibliographic data to Linked Data based on Drupal[J].Library and Information Service,2017,61(4):123-129.)
    [13]夏翠娟,刘炜.关联数据的消费技术及实现[J].大学图书馆学报,2013(3):29-37.(Xia Cuijuan,Liu Wei.Technologies and implementation of consuming Linked Data[J].Journal of Academic Libraries,2013(3):29-37.)
    [14]夏翠娟,刘炜,陈涛,等.家谱关联数据服务平台的开发实践[J].中国图书馆学报,2016(5):27-38.(Xia Cuijuan,Liu Wei,Chen Tao,et al.A genealogy data service platform implemented with Linked Data technology[J].Journal of Library Science in China,2016(5):27-38.)
    [15]夏翠娟,林海青,刘炜.面向循证实践的中文古籍数据模型研究与设计[J].中国图书馆学报,2017,43(6):16-34.(Xia Cuijuan,Lin Haiqing,Liu Wei.Designing a data model of Chinese ancient books for evidencebased practice[J].Journal of Library Science in China,2017,43(6):16-34.)
    [16]夏翠娟,许磊.中文关联书目数据发布方案研究[J].数字图书馆论坛,2018,(1):8-16.(Xia Cuijuan,Xu Lei.Research and implementation of Chinese linked bibliographic data[J].Digital Library Forum,2018(1):8-16.)
    [17]娄秀明,危红.书目格式的过去与未来[J].图书馆杂志,2015(5):25-31,111.(Lou Xiuming,Wei Hong.The past and the future of bibliographic format:from MARC to BIBFRAME[J].Library Journal,2015(5):25-31,111.)
    [18]胡小菁.BIBFRAME核心类演变分析[J].中国图书馆学报,2016(5):20-26.(Hu Xiaojing.Evolution of BIBFRAME core classes[J].Journal of Library Science in China,2016(5):20-26.)
    [19]Using full text search in SPARQL[EB/OL].[2018-06-12].http://docs.openlinksw.com/virtuoso/rdfsparqlrulefulltext.
    [20]Camarda D V,Mazzini S,Antonuccio A.Lod Live,exploring the Web of data[C]//Proceedings of the 8th International Conference on Semantic Systems,2012:197-200.
    (1)https://wiki.dbpedia.org
    (2)https://www.wikidata.org
    (3)http://zhishi.me
    (4)http://kw.fudan.edu.cn/cndbpedia
    (5)http://lod-cloud.net
    (6)http://openrefine.org
    (7)http://data.library.sh.cn/tools/rdb2rdf
    (8)http://www.usources.cn
    (9)http://lindas-data.ch
    (10)http://d2rq.org
    (1)http://lodview.it
    (2)http://en.lodlive.it
    (3)http://sws.geonames.org
    (4)http://data.nobelprize.org
    (5)http://names.library.sh.cn
    (1)http://yago-knowledge.org
    (2)http://freebase.com
    (3)http://viaf.org
    (4)http://www.dnb.de
    (1)https://projects.iq.harvard.edu/cbdb/home
    (2)http://id.loc.gov

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700