基于语义的中医药数据采集工程及应用平台

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于语义的中医药数据采集工程及应用平台

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Semantic Based TCM Data Acquisition Ngineering and Application Platform
作者：陶金火
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：语义配置 ; 语义本体 ; 语义关系图标注 ; 数据采集平台
英文关键词：semantic configuration ; ontology ; graph of semantic relation ; data acquisition platform
学位年度：2011
导师：陈华钧 ; 姜晓红
学科代码：081203
学位授予单位：浙江大学
论文提交日期：2010-12-01

摘要

积累了两千多年的中医药数据文献是一个价值连城的知识宝库。将中医药数据结构化的收录到信息系统中,对中医药数据的分析,处理,利用有着至关重要的作用。十多年来CCNT实验室网格组与中国中医药科学院合作对中医药文献的结构化建模及数据采集方面做了大量工作,建立了一套中医药语义本体和多个中医药专题数据采集系统。尽管如此,中医药文献数据采集还有许多亟待改进的地方,比如数据采集系统数量太多,彼此之间相互孤立,无法相互连接访问,组件重用性较低,可维护性差,数据采集智能化程度偏低等。
     针对这些问题,本文提出了一种采用语义本体配置元数据对中医药数据模型和存储逻辑进行配置的方法,该方法实现了对数据模型的语义描述,并且描述了如何将数据存储到存储逻辑中。另外本文还提出了一种语义关系图标注算法,用以辅助数据采集。该算法以语义本体知识库为基础,对中医药文献进行关键词抽取,高频关键词计算,及关键词之间1语义关系的识别和预测,.得到语义关系图,实现数据采集的半自动化。最后本文设计实现了一体化中医药数据采集平台,以语义配置信息为系统配置元数据,将不同专题的数据集成到一个统一的平台中采集。
     一体化中医药数据采集平台以语义本体对中医药数据模型的描述为基础,实现数据采集的高度可配置性,解决了中医药文献数据模型繁多的问题。一体化中医药数据采集平台支持基于语义关系图标注的半自动化的文献加工的。一体化中医药数据采集平台是一个坚持面向实际应用的语义数据网格系统。目前平台已经投入实际使用,提高了数据采集的效率,大幅降低运维成本。
With a more than 2000 years research and application, data of Chinese traditional medicine has come to a huge number. The structure of TCM data in information system has a vital role with the puropse of data analysis, handling, use. Cooperated with china academy of Chinese medical science over decade, CCNT laboratory has done a great contribution to information of Chinese traditional medicine, and has established a set of TCM semantic ontology several TCM information system also has been established in the last decade。However, there are many place which need to be improved, such as many separated TCM information system, bad interconnection, pool maintenance, low auto-collection and so on.
     In order to solve these problems, improving TCM data collection efficiency and reduce the cost, this paper puts forward a method based on the semantic configuration metadata and the storage medium, and based on this, integrate different data model into a unified platform for data management. According to this method, we designed and developed Chinese traditional medicine co-construction platform, which used to manage data of different model.
     The platform by using semantic ontology to describe the data model of TCM, realize highly configurability of data acquisition, solve the problem that TCM data model is various. Additional, the platform support semi-automatic literature processing based on semantic relation annotation. The platform is a insists on providing practical application of semantic grid system. At present it has been put into practical, and achieved good effect.
     This paper also puts forward a semantic relationship graph annotation method, to facilitate data acquisition based on semantic ontology knowledge base, this method extract keywords, calculate high-frequency keywords, and identify and predict semantic relation between keywords to realize the semi-automatic data acquisition.

引文

[1]T.Berners-Lee, J.Hendler, O.Lassila. The Semantic Web. Scientific American, 2001
    [2]N.Shadbolt, T. Berners-Lee, W. Hall. The Semantic Web Revised. IEEE Intelligent Systems,2006,21 (3):96-101
    [3]Zhaohui Wu, Huajun Chen, Yuxin Mao, Guozhou Zheng, Dart Database Grid:A Dynamic, Adaptive,RDF-Mediated,Transparent Approach to Database Integration for Semantic Web, APweb2005
    [4]Hasegawa Takaaki, Satoshi Sekine and Ralph Grishman. Discovering relations among named entities from large corpora [C], Proceeding of Conference ACL 2004. Barcelona, Spain:Association for Computational Linguistics,2004: 415-422.
    [5]R. Agrawal; T. Imielinski; A. Swami:Mining Association Rules Between Sets of Items in Large Databases", SIGMOD Conference 1993:207-216
    [6]Tsai, C. H. (2000). MMSEG:A word identification system for Mandarin Chinese text based on two variants of the maximum matching algorithm.
    [7]Chen HJ, Wu ZH, Semantic Web Model, Methodology and Applications: Springer-Verlag GmbH,2008
    [8]G. Salton, A. Wong, A vector space model for automatic indexing. Communications of the ACM,18(11):613-620,1975.
    [9]Huajun Chen,Zhaohui Wu,Heng Wang,Yuxin Mao, RDF/RDFS-based Relational Database Integration,ICDE2006
    [10]Leskovec, J., Grobelnik, M.,& Milic-Frayling, N. (2004). Learning sub-structures of document semantic graphs for document summarization. In KDD 2004 Workshop on Link Analysis and Group Detection (LinkKDD), Seattle, Washington.
    [11]Xiaogang'Zhang, Huajun Chen. Ontology Based Semantic Relation Verification for TCM Semantic Grid:ChinaGrid Annual Conference [C],2009. ChinaGrid'09. Fourth,185-191.
    [12]何前锋,尹爱宁,刘静等中医药同异名现象与标准研究[J].中国中医药信息杂志,2008,(S1)
    [13]Doan A H, Madhavan J, Dhamannnnkar R, et al. Learning to Match Ontologies on the Semantic Web[J]. The VLDB Journal,2003,12 (4):303-319.
    [14]Kalfoglou Y, Schorlemmer M. Ontology Mapping:The State of the Art [J]. The Knowledge Engineering Review,2003,18(1):1-31.
    [15]李玉华刘涛.基于混合本体方法的集成算法研究[J].计算机工程与科学,2007,29(7)：71-73.
    [16]Aleksovski Z, Klein M, Kate W T. Matching Unstructured Vocabularies Using a Backgound Ontology[C]//Proc. of EKAW'06. Podebrady, Czech Republic:[s. n.],2006.
    [17]Qu Yuzhong, Hu Wei, Cheng Gong. Constructing Virtual Documents for Ontology Matching[C]//Proceedings of the 15th International Conference on World Wide Web. Edinburgh, Scotland:[s.n.],2006.
    [18]Erich Gamma, Richard Helm, Ralph Johnson等著.设计模式：可复用面向对象软件的基础.李英军”马晓星”蔡敏等译.机械工业出版社,2000：57-87
    [19]Hristovski D, Friedman C, Rindflesch T, et al. Exploiting semantic relmions for literature-based discovery. AMIA Annu Symp Proe. Washington, DC,2006: 349-53.
    [20]Maurice van Keulen, Ander de Keijzer, Wouter Alink, A Probabilistic XML Approach to Data Integration, Proceedings of the 21 st International Conference on Data Engineering(ICDE'05),2005.4:459-470.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700